Skip to main content

The Infrastructure Nobody Sees

Auth services, message queues, three deployment regions, and the shared layer that keeps three products running.

By Alexey Suvorov · · Updated · 5 min read
Featured image for The Infrastructure Nobody Sees

Three products. One login. One billing system. One message queue connecting them all.

From the outside, Autopilot, Dashboard v1, and Dashboard v2 look like independent applications. Different interfaces, different use cases, different technology stacks. But underneath, they share a layer of infrastructure that took longer to build than any single feature we’ve shipped. This is the story of that invisible layer – the services, contracts, and deployment patterns that keep three products running as one platform.

The shared services problem

When we built Autopilot in 2021, infrastructure was simple. PostgreSQL for data, BullMQ for job processing, Stripe for billing, JWT for auth. Everything lived in one codebase. One deployment. One set of credentials.

Then we built Dashboard v1 in February 2023. Suddenly we had a second product with its own users, its own data store (MongoDB instead of PostgreSQL), and its own deployment pipeline. The question hit immediately: do users create separate accounts for each product?

The answer was obviously no. But “obviously no” turned into three months of extracting auth into its own service.

auth-api became our first shared service. A centralized JWT authentication layer that issues and validates tokens across all three products. When a user logs into Autopilot, the token works on Dashboard v1. When they sign up through Dashboard v2, their identity exists everywhere. One service, one user table, one source of truth.

It sounds clean when described in a paragraph. In practice, it meant rewriting auth flows in two existing products, migrating user records, handling edge cases where the same email existed in both systems with different passwords, and building a token refresh mechanism that worked across different frontend architectures (React with Apollo in Autopilot, React with TanStack Query in Dashboard v2).

Billing as a shared concern

billing-api followed the same trajectory. Direct Stripe integration in each product was the fast path. Shared billing through a centralized service was the correct path.

The billing service handles subscriptions, credits, and payment method management for all three products. It communicates with individual services through RabbitMQ – when a user purchases credits in Dashboard v2, a message flows to the billing service, which processes the Stripe charge, updates the credit balance, and publishes a confirmation event. The originating service picks up that confirmation and updates its local state.

This architecture means no product directly touches Stripe. They all go through billing-api. That indirection cost us velocity in the early days – adding a new pricing tier meant coordinating changes across three repositories. But it paid off when we migrated from subscriptions to credits. One service changed. Every product got the new billing model.

The LikeClaw variant broke this pattern intentionally. For a standalone deployment that doesn’t depend on external services, we built local Stripe billing with inline price_data instead of pre-created Stripe products. That was a deliberate trade-off: simplicity for a single-product deployment versus consistency across the multi-product platform.

Why RabbitMQ, not HTTP

Early in Dashboard v1 development, we tried direct HTTP calls between services. auth-api called billing-api to verify subscription status. billing-api called auth-api to look up user details. It worked until it didn’t.

The first outage taught us the lesson. billing-api went down during a deployment. auth-api’s requests to verify subscriptions started timing out. Because auth depended on billing responses to complete login flows, users couldn’t log in. A billing deployment took down authentication. That’s temporal coupling, and it’s poison.

RabbitMQ solved this by decoupling communication into asynchronous messages. We run five handler types across the platform:

  • wizard: Handles multi-step onboarding flows that span services
  • approval: Manages skill and agent approval workflows
  • users: Synchronizes user state changes across products
  • organizations: Propagates organization membership and role changes
  • management: Administrative operations like bulk credit grants

Each handler type has its own exchange and queue configuration. Messages are durable – they survive broker restarts. Consumers acknowledge messages only after successful processing. Failed messages get retried with exponential backoff before hitting a dead letter queue for manual inspection.

The overhead is real. RabbitMQ adds operational complexity: broker management, queue monitoring, message format versioning. But the alternative – a web of synchronous HTTP calls between services where any single failure cascades – is worse.

Protobuf contracts and bounded contexts

With three products, two databases (PostgreSQL and MongoDB), and five shared services, we needed a way to keep everyone speaking the same language. Protobuf gave us that.

We define four bounded contexts in our Protobuf contracts: auth, billing, dashboard, and workers. Each context has its own .proto files that specify the exact shape of messages flowing between services. When auth-api publishes a user-created event, the schema is defined in Protobuf. When billing-api expects a credit-purchase request, the fields and types are defined in Protobuf.

gRPC handles the inter-service calls that do need to be synchronous (real-time balance checks, token validation). Protobuf handles the serialization for both gRPC calls and RabbitMQ messages. One schema definition, two communication patterns.

The discipline this imposes is valuable. You can’t quietly add a field to a message and hope all consumers handle it. The contract is explicit. Breaking changes require version bumps. Backward compatibility isn’t a suggestion – it’s enforced by the compiler.

The Russia requirement

The deployment architecture was multi-region from day one, and not by choice.

Russian data residency regulations require that certain categories of personal data for Russian citizens be stored on servers physically located in Russia. We had Russian users from the start. Compliance wasn’t optional.

So we built three deployment environments per product: staging for testing, production-global for most users, and production-Russia for users subject to Russian data residency requirements. That’s nine deployment targets across the three products, all managed through Pulumi infrastructure-as-code on Google Kubernetes Engine.

The application code is identical across regions. Same Docker multi-stage builds, same Kubernetes manifests, same Helm charts. What differs is the infrastructure configuration: database connection strings, storage bucket locations, RabbitMQ broker endpoints. Pulumi parameterizes these per environment, so a single pulumi up deploys the right configuration to the right region.

GKE handles container orchestration. Each service runs as a Kubernetes deployment with horizontal pod autoscaling. Pino handles structured logging. OpenTracing handles distributed tracing so we can follow a request from the frontend through auth-api, into RabbitMQ, through billing-api, and back.

Observability: Langfuse and the cost of AI

When your product runs 30+ AI models, token costs become a first-class infrastructure concern. Langfuse gives us that visibility.

Every AI request – whether it’s a chat message routed through the LLM proxy, a background task running in an E2B sandbox, or an evaluation run scored by Claude Sonnet 4.5 – gets traced in Langfuse. We track input tokens, output tokens, model used, latency, and cost per request.

This isn’t just accounting. It’s product insight. We discovered that certain agents consistently generated 3x more output tokens than others because their system prompts encouraged verbose responses. We found that DeepSeek R1’s reasoning tokens were inflating costs for simple queries. We identified that background task chains could accumulate significant token costs when loop detection didn’t catch recursive patterns early enough.

Without per-request cost tracking, we’d be flying blind on the economics of our own product. Langfuse turned token usage from a monthly surprise into a real-time signal.

The lesson nobody teaches

Infrastructure work doesn’t demo well. You can’t show a stakeholder a RabbitMQ exchange configuration and get the same reaction as a new chat feature. Protobuf contracts don’t screenshot nicely. Kubernetes manifests aren’t exciting.

But here’s what we’ve learned after four years: the invisible layer determines the ceiling of everything visible. Every feature we’ve shipped – 40+ agents, sandboxed execution, skills marketplace, multi-model chat – runs on infrastructure that was built without fanfare, without demos, without anyone outside the team knowing it existed.

When we shipped Dashboard v2 in 88 days, the speed wasn’t just about writing NestJS controllers and React components. It was about having auth, billing, messaging, storage, and observability already solved. The rewrite was fast because the foundation was already there.

If we were starting over, we’d build the shared layer first. Not because it’s exciting. Because everything else depends on it. The infrastructure nobody sees is the infrastructure that makes everything else possible.

Alexey Suvorov

CTO, AIWAYZ

10+ years in software engineering. CTO at Bewize and Fulldive. Master's in IT Security from ITMO University. Builds AI systems that run 100+ microservices with small teams.

LinkedIn

Related Posts

See what AIWAYZ can do for your team

Start a free trial — no credit card, no commitment.

© 2026 AIWAYZ. All rights reserved.

+1-332-208-14-10