Why not use PostgreSQL for everything?

PostgreSQL could technically store AI conversation data, but it would fight us every step. Chat messages have nested structures -- tool calls, citations, file attachments, model-specific metadata -- that change shape as we add features. In PostgreSQL, every schema change requires a migration. In MongoDB, we add a field and existing documents continue working. When you're iterating on AI features weekly, schema flexibility isn't a luxury. It's a requirement.

Why did you remove Mongoose after one week?

Mongoose adds schema validation and an ORM-like interface on top of MongoDB's native driver. That sounds helpful until you need complex aggregation pipelines, bulk operations, or fine-grained control over query behavior. Mongoose's abstraction layer got in the way more than it helped. We moved schema validation to the application layer and used MongoDB's native driver directly, which gave us full access to the aggregation framework and better performance on bulk operations.

How do you handle data consistency across three databases?

We don't try to maintain strict transactional consistency across databases. Each database owns its domain. PostgreSQL is the source of truth for workflow definitions and execution history. MongoDB owns AI conversations, user documents, and agent state. Redis is ephemeral by design -- if it restarts, caches rebuild and job queues replay. Cross-database references use IDs, and eventual consistency is acceptable for the use cases where data crosses boundaries.

Three Databases, Three Problems

85 PostgreSQL migrations. 31+ MongoDB collections. Three Redis deployments handling cache, job queues, and streaming state. We didn’t choose three databases because we enjoy operational complexity. We chose them because each product’s data model demanded something different.

The story of our database choices is really the story of three products with three fundamentally different relationships to data. Workflow automation is relational. AI conversations are documents. Job queues need raw speed. Trying to force all three into one database would have created problems worse than running three.

PostgreSQL: Where workflows live

Autopilot runs on PostgreSQL. The choice was inherited from the Automatisch fork, and it was the right one.

Workflow automation data is inherently relational. A flow has steps. Steps have connections to other steps. Each step references an integration plugin, which has triggers and actions. Executions belong to flows. Execution logs belong to execution steps. Users belong to organizations. Organizations have roles. Roles have permissions.

Every one of those relationships is a foreign key in PostgreSQL. When we delete a flow, cascading deletes clean up the steps, connections, and execution history. When we query “show me all executions of flows that use the Slack integration,” that’s a three-table join that returns in milliseconds because the indexes are right.

Objection.js as the ORM and Knex as the query builder give us two layers of abstraction. Objection handles model relationships and validation. Knex handles migrations and raw query building for cases where the ORM gets in the way. It’s not a trendy stack – Prisma gets more attention these days – but it’s stable, well-documented, and doesn’t try to hide SQL from you.

85 migrations over four years. That number sounds high until you do the math. It’s roughly one migration every two weeks. Add a column here, create a new table there, add an index, change a default value. Migrations are the heartbeat of a PostgreSQL application. They tell you how fast the schema is evolving. 85 migrations means 85 times the data model needed to change, and every change was tracked, versioned, and reversible.

The early migrations were structural: creating the core tables for flows, steps, connections, users, and permissions. The middle period added features: datastore tables for persistent state between workflow runs, template tables for flow blueprints, OAuth credential storage. The recent migrations reflect the AI evolution: columns for model parameters, token usage tracking, Langfuse trace identifiers.

PostgreSQL’s strength for Autopilot is predictability. The schema is strict. If a column expects an integer, you can’t accidentally store a string. If a foreign key references a flow, that flow must exist. These constraints catch bugs at the database level instead of the application level. For a workflow engine where a corrupted execution could cascade into downstream failures, that strictness matters.

MongoDB: Where AI conversations live

Dashboard v1 and v2 run on MongoDB. This wasn’t the inherited choice – we made it deliberately, and we’d make it again.

AI conversation data is schema-flexible by nature. A chat message might contain plain text, or it might contain a tool call with nested parameters, a citation list with URLs and excerpts, file attachments with metadata, or model-specific fields that only apply to certain providers. The shape of a message changes based on which model generated it, which tools were available, and what the user was doing.

In PostgreSQL, every variation of message shape requires a schema decision. Do you add nullable columns for optional fields? Do you use JSONB columns and lose type safety? Do you create separate tables for tool calls, citations, and attachments, then join them on every query? Every approach has costs, and every schema change requires a migration that might touch millions of rows.

In MongoDB, a message document contains exactly the fields it needs. A message with a tool call has a toolCalls array. A message without one doesn’t have the field at all. When we add support for a new model that returns a reasoning_content field, we add it to new documents and existing documents don’t need to change.

The collection list tells the story of Dashboard’s evolution: users, documents, templates, chat messages, compose history, bulk chat, image generations, knowledge base, chatbots, threads, AI studio, models, folders, agents, sessions, and more. 31+ collections, each representing a feature that was added, iterated on, and sometimes restructured without the overhead of migration files.

The Mongoose experiment

When we started Dashboard v1, we tried Mongoose. It’s the most popular MongoDB ODM for Node.js, and for good reason – schema validation, middleware hooks, virtual properties, and a query builder that feels familiar to anyone who’s used an ORM.

We removed it within a week when building Dashboard v2.

The problem wasn’t Mongoose itself. The problem was the impedance mismatch between what Mongoose wants to do and what we needed. Mongoose wants to enforce schemas on a schemaless database. That’s the point of it. But we’d chosen MongoDB specifically because we didn’t want rigid schemas. Using Mongoose on MongoDB is like buying a convertible and leaving the top up.

The specific breaking points were aggregation pipelines and bulk operations. MongoDB’s aggregation framework is powerful – $lookup for joins, $unwind for array expansion, $group for analytics, $facet for multi-pipeline queries. Mongoose wraps the aggregation framework, but the wrapper adds overhead and sometimes doesn’t expose the latest pipeline stages.

Bulk operations hit similar friction. When you’re inserting 10,000 chat messages from an import or updating 500 agent configurations in a batch, you want bulkWrite with ordered: false for maximum throughput. Mongoose’s model layer adds per-document validation and middleware that slows bulk operations significantly.

We moved schema validation to the application layer. TypeScript interfaces define the shape of documents. Validation happens before the data hits MongoDB, not in the database driver. The native MongoDB driver gives us direct access to every feature without an abstraction layer translating our intent.

MongoMemoryServer for tests was the one place where MongoDB’s flexibility caused problems. Integration tests need a database, and spinning up a real MongoDB instance for every test run is slow. MongoMemoryServer runs an in-memory MongoDB instance that starts in under a second. Tests run against a real MongoDB engine, not a mock, which catches driver-level issues that mocks would miss.

Redis: The speed layer

Redis serves three roles across our platform, and none of them are traditional data storage.

In Autopilot, Redis powers BullMQ job queues. Three worker types – flow, trigger, and action – with concurrency settings of 50, 50, and 100 respectively. Every workflow execution starts as a job in a Redis queue. BullMQ handles retries, dead letter queues, priority scheduling, and concurrency limits. Redis’s single-threaded execution model guarantees that job dequeuing is atomic – no two workers will ever pick up the same job.

The concurrency numbers reflect four years of tuning. Early versions started at 10/10/20. We increased them as the infrastructure scaled and as we learned where bottlenecks actually were. Action workers got the highest concurrency because actions are typically short-lived API calls. Flow workers got lower concurrency because flow orchestration involves more state management. The 50/50/100 split has held stable for over a year.

In Dashboard v2, Redis handles caching, streaming state, and real-time data. AI model responses are cached by input hash so that identical queries don’t cost duplicate tokens. Streaming state – which connections are active, which responses are in progress – lives in Redis because it needs sub-millisecond reads and writes. Session storage for user authentication uses Redis with TTL-based expiration.

Across both products, Redis is the ephemeral layer. If Redis restarts, nothing is permanently lost. Job queues replay from their checkpoint. Caches rebuild on demand. Sessions expire and users re-authenticate. This ephemerality is a feature, not a limitation. It’s fast because it can afford to lose data.

Why three is better than one

The operational overhead of three databases is real. Three backup strategies. Three monitoring dashboards. Three failure modes to diagnose.

But one database for everything creates a different kind of overhead. PostgreSQL with JSONB columns for AI data works until you need aggregation pipelines that JSONB can’t express. MongoDB for workflows works until you need cascading deletes. Redis for everything works until you need durability.

Each database does one thing well, and we let it do that thing. PostgreSQL enforces structure on data that benefits from structure. MongoDB accommodates flexibility in data that needs to evolve. Redis provides speed for data that doesn’t need to persist.

The Mongoose lesson generalizes beyond MongoDB. Every database has an intended use pattern. When you fight that pattern – adding an ORM to a schemaless database, or using JSONB to avoid schema changes in a relational database – you get the worst of both worlds.

85 migrations isn’t a problem. It’s PostgreSQL working as designed. 31+ collections without migration files isn’t chaos. It’s MongoDB working as designed. Three Redis deployments with no backup strategy isn’t careless. It’s Redis working as designed.

Three databases. Three problems. Three correct answers.

Three Databases, Three Problems

PostgreSQL: Where workflows live

MongoDB: Where AI conversations live

The Mongoose experiment

Redis: The speed layer

Why three is better than one

Related Posts

168 Integrations, One Plugin at a Time

5,876 Commits Across Three AI Products

Building Custom GPTs Before OpenAI Did

See what AIWAYZ can do for your team

Products

Solutions

Company

Legal