How does the flow builder assistant work?

A user describes what they want to automate in natural language -- for example, 'When I get a new email with an attachment, save it to Google Drive and send a Slack notification.' The flow builder assistant parses that description, identifies the required triggers and actions, selects the right integrations, and generates a working workflow. The user can then review and modify the generated flow before activating it.

Why track AI costs per execution step instead of per workflow?

A single workflow might use AI in three out of ten steps. Per-workflow cost tracking would blur the cost of those three AI steps with seven non-AI steps. Per-step tracking via Langfuse lets us show users exactly which AI operations cost what, which models were used, and how many tokens were consumed. This granularity is essential for optimizing workflows that mix AI and traditional automation.

What's the difference between the OpenAI direct integration and the OpenRouter integrations?

The OpenAI direct integration uses OpenAI's API directly for GPT models and DALL-E. OpenRouter is an aggregation service that provides access to 15+ models from different providers through a single API. We added both because some users want the simplicity of a direct OpenAI connection, while others want access to Claude, Gemini, Llama, Mistral, and other models without managing multiple API keys.

When Autopilot Learned to Think

July 2023. Autopilot was two years old. It had 168 integrations, a visual flow builder, three types of BullMQ workers, and zero AI capabilities. It could connect Slack to Google Sheets, trigger workflows on a schedule, and chain actions across dozens of services. But every step in every workflow did exactly what it was told. No interpretation. No generation. No intelligence.

Then we added one OpenAI plugin. And then another. And within eighteen months, the dumb pipe automation platform had 40+ AI models, a flow builder that could create workflows from natural language, per-step token analytics, and structured output parsing. This is the story of that transformation.

The first AI: OpenAI and DALL-E

The first AI commit in Autopilot landed in July 2023. It was a standard integration plugin – the same pattern we’d used for Slack, Twilio, and 165 other services. An authentication handler for the OpenAI API key. A set of actions: text completion, chat completion, image generation via DALL-E. Nothing architecturally special.

But operationally, everything was different.

Every other integration in Autopilot had predictable execution times. An API call to Slack takes 200-500 milliseconds. Google Sheets operations finish in under a second. Even Salesforce queries, notoriously slow, complete in two to three seconds.

OpenAI completions could take 30 seconds. DALL-E image generation could take a minute. Our BullMQ workers were configured for sub-second task execution with concurrency settings of 50 for flow workers, 50 for trigger workers, and 100 for action workers. Long-running AI tasks didn’t crash the system, but they consumed worker slots and slowed down the entire queue.

We didn’t fix this immediately. The first version worked, and working was enough for July 2023. But the worker concurrency problem would come back every time we added a new AI model.

September to December 2023: The AI avalanche

Once the pattern existed, AI integrations came fast.

September 2023 brought Claude integration and an AI summarizer action. Claude’s API was similar enough to OpenAI’s that the plugin took less than a day to build. The summarizer was more interesting – it was our first AI action that wasn’t just “send text to a model and return the response.” It had logic: split long documents into chunks, summarize each chunk, then summarize the summaries. A simple map-reduce pattern, but it was the first time an Autopilot action did multiple AI calls in sequence.

October 2023 was the Replicate integration, and it changed the scale of what Autopilot could do. Replicate hosts thousands of open-source models across 25+ categories: text generation, image generation, image-to-text, speech synthesis, video generation, and more. One integration plugin, dozens of model categories. We also added Langfuse analytics in the same month – the first step toward understanding what all these AI calls were actually costing.

November 2023 brought the cross-product moment. We connected Autopilot to the AIWIZE Dashboard apps: art generation, document composition, and knowledge base queries. A workflow could now trigger an image generation in the dashboard’s art studio, use the result in a subsequent step, and post it to Slack. The products were talking to each other.

December 2023 was the Wizard system. We’d built a chat interface for configuring workflows, and it used AI to understand user intent. Instead of dragging and dropping triggers and actions, users could describe what they wanted in natural language. The Wizard wasn’t building complete workflows yet – that would come later – but it was interpreting configuration requests and applying them to the flow editor.

The flow builder assistant

In May 2024, commit feat: flow builder assistant (#352) landed. This was the moment Autopilot stopped being a tool that used AI and became a tool that was built by AI.

The flow builder assistant takes a natural language description of an automation and generates a working workflow. “When a new row is added to my Google Sheet, check if the email column contains a valid address, then send a welcome email via SendGrid and log the result to Slack.” The assistant would parse that description, identify four steps (Sheet trigger, validation logic, SendGrid action, Slack action), select the right integration plugins, configure the parameters, and produce a workflow the user could review and activate.

This wasn’t simple prompt engineering. The assistant needed to understand Autopilot’s plugin catalog – which integrations existed, what triggers and actions each provided, what parameters each action required, and how data flowed between steps. The context window included a compressed representation of the available integrations and their capabilities.

The magic prompter shipped alongside it – a tool that helped users write better descriptions for the flow builder assistant. Meta-AI: an AI that helped users talk to another AI.

The OpenRouter expansion

From July through September 2024, we added 15+ AI models through OpenRouter integrations. OpenRouter acts as an aggregation layer – one API key, access to models from OpenAI, Anthropic, Google, Meta, Mistral, and dozens of other providers.

Each OpenRouter model integration followed the same plugin pattern, but the model parameters varied. Temperature ranges, token limits, system prompt handling, and response formatting all differed between providers. We built a configuration layer that normalized these differences, so a workflow could swap from GPT-4 to Claude to Gemini by changing a single parameter.

November 2024 added Claude model integrations specifically and introduced structured output via OpenAI’s API. The structured output feature – commit feat(formatter): ai parser via openai's structured output (#1164) – was significant. It meant AI actions could return typed, validated data instead of free-form text. A workflow could call an AI model and get back a JSON object matching a specific schema, which subsequent steps could consume without parsing or error handling.

December 2024 brought the Browser Agent plugin. An Autopilot workflow could now control a browser – navigate to pages, extract content, fill forms, click buttons – as an automation step. Combined with AI models for decision-making, this gave workflows the ability to interact with websites that didn’t have APIs.

Making AI costs visible

The Langfuse integration evolved in three phases.

Phase 1 (October 2023): Basic analytics. We tracked which models were called and how many tokens were consumed. Aggregate data. Useful for billing, not for optimization.

Phase 2 (October 2024): Per-step tracking. Commit Track execution step action to langfuse (#840) made every workflow execution step a traced event. Each AI call was tagged with the workflow ID, step ID, model name, input tokens, output tokens, and cost. Now we could answer: “This workflow costs $0.47 per run, and $0.41 of that is the GPT-4 summarization step.”

Phase 3: Token analytics dashboard. Commit Token Analytics implementation (#802) exposed this data to users. Charts showing cost per workflow, cost per model, token usage over time. Users could see which workflows were expensive and why. One user discovered that a workflow they ran hourly was spending $280/month on GPT-4 calls that could have been handled by GPT-3.5 for $12/month.

Per-step cost visibility changed how users designed workflows. When you can see that the AI summarization step costs ten times more than every other step combined, you start thinking about whether a cheaper model would work. Cost became a design parameter, not a surprise.

The weekly Replicate updates

By 2025, Replicate’s model catalog was growing faster than we could manually track. New models appeared weekly. Existing models got updated. Some were deprecated. Our integration plugin needed to reflect the current state of the catalog without manual intervention.

The solution was an automated GitHub Actions workflow that ran weekly. It queried Replicate’s API for the current model catalog, compared it against our plugin’s model definitions, generated the diffs, and opened a pull request with the updates. GPT-5 and GPT-5-mini support, HeyGen AI video integration, and dozens of model updates all flowed through this automated pipeline.

No human touched the Replicate plugin for routine updates. The automation platform that helped users automate their work had automated its own maintenance.

From dumb pipe to thinking machine

The transformation took 30 months. From zero AI in July 2023 to 40+ models, natural language workflow creation, per-step cost analytics, structured output parsing, browser automation, and self-updating model catalogs in early 2026.

The technical lesson is that AI doesn’t replace automation – it makes automation smarter. A workflow that fetches data from an API, processes it through an AI model, validates the output with structured schemas, and routes the result to a downstream service is fundamentally more capable than any of those pieces alone.

The product lesson is that the moment users can describe what they want instead of building it step by step, the product changes category. Autopilot went from a tool for technical users who understood triggers and actions to a platform where anyone could describe an automation in plain language.

We didn’t plan that transition. We just kept adding AI plugins, one at a time, until the sum was greater than its parts. The fork that started as a Zapier clone had learned to think.

When Autopilot Learned to Think

The first AI: OpenAI and DALL-E

September to December 2023: The AI avalanche

The flow builder assistant

The OpenRouter expansion

Making AI costs visible

The weekly Replicate updates

From dumb pipe to thinking machine

Related Posts

168 Integrations, One Plugin at a Time

5,876 Commits Across Three AI Products

Building Custom GPTs Before OpenAI Did

See what AIWAYZ can do for your team

Products

Solutions

Company

Legal