Thirty Models, One Interface
How we went from GPT-3.5 to 30+ AI models across four providers without building 30 different integrations.

February 3, 2023. Our first AI API call was to OpenAI’s GPT-3.5. One model, one provider, one endpoint. We hardcoded the model name in the request body and called it done.
Three years later, we support 30+ models from four providers, stream responses via server-sent events, count tokens with model-specific tokenizers, and price each model differently in a pay-as-you-go credit system. None of this was in the original plan.
This is the story of how multi-model support went from an afterthought to the core of our platform.
One model was enough – for about two months
The first version of our dashboard was a news aggregator that happened to use GPT-3.5 for text processing. By April 2023, we’d added a chat feature. Still GPT-3.5. Still one endpoint. The integration code was maybe 40 lines.
Then OpenAI released GPT-4. Users immediately asked for it. Fair enough – we added a second model option. The code was still simple: an if/else that swapped the model name in the API payload. Two models, same provider, same auth token, same response format.
That’s when we made the architectural decision that would either save us or haunt us: we wrapped the OpenAI client in an abstraction layer. Not because we had grand multi-provider ambitions. Because we didn’t want model names scattered across the codebase. A small act of hygiene that turned out to be the most consequential refactor of the year.
The OpenRouter unlock
March 2024. A team member suggested we try OpenRouter. The pitch was straightforward: one API key, one endpoint, access to models from Anthropic, Meta, Mistral, Google, and dozens of smaller providers. The 5% markup on every request seemed reasonable compared to maintaining separate billing accounts and API integrations for each provider.
We integrated OpenRouter in a single afternoon. Because we’d already abstracted the model layer, adding a new provider meant implementing one interface: send a prompt, get a response, count the tokens. The OpenRouter API is intentionally compatible with OpenAI’s format, so even the response parsing was reusable.
Overnight, we went from 3 models to 20+. Mixtral, Llama, DeepSeek, Qwen, Perplexity – all available through the same code path. We didn’t have to negotiate API access or manage separate rate limits. We just added model identifiers to a configuration file.
The lesson was clear: don’t build provider integrations. Build provider abstractions. The difference is that an integration couples you to one API’s quirks. An abstraction lets you swap providers without touching business logic.
Direct integrations still matter
OpenRouter solved the breadth problem, but not the depth problem. Some providers offer capabilities through their direct APIs that proxy services can’t replicate.
In June 2024, we added Anthropic’s Claude API directly. Claude 2.0, 2.1, then Claude 3 Opus, 3 Sonnet, and 3.5 Sonnet as they launched. By July, we had Claude Vision support – image analysis that required Anthropic’s native multimodal API format, which OpenRouter didn’t fully support at the time.
In April 2025, we added Google’s Vertex AI for direct Gemini Pro and Flash access. Google’s API has its own authentication model (service accounts, not API keys) and its own response format. But because our abstraction layer was solid by this point, the Vertex integration took a day and a half.
The pattern became: use OpenRouter for breadth, use direct integrations for depth. If a provider offers something unique – vision, extended context, specific safety controls – we connect directly. For everything else, OpenRouter handles the routing.
The streaming problem nobody warns you about
Multi-model support sounds clean until you add streaming. We use server-sent events (SSE) for real-time response delivery, so users see tokens appear as they’re generated rather than waiting for the entire response.
Every provider streams differently. OpenAI sends data: {"choices": [{"delta": {"content": "token"}}]}. Anthropic sends event: content_block_delta with a different JSON structure. Google’s streaming format is different again. OpenRouter normalizes most of this, but edge cases leak through – stop reasons, usage statistics, and error formats all vary.
We ended up building a streaming normalizer that sits between the provider response and our SSE output. It accepts any provider’s stream format and emits a consistent event shape to the client. Adding a new provider now means writing one stream adapter, typically under 100 lines.
Token counting across model families
Here’s a detail that doesn’t come up until you’re building a billing system: the same sentence produces different token counts depending on which model processes it.
GPT-4 uses the cl100k_base tokenizer. Claude uses its own tokenizer. Llama models use a SentencePiece-based tokenizer. The sentence “Summarize this document in three bullet points” might be 8 tokens on one model and 11 on another.
We use js-tiktoken for OpenAI-compatible counting and gpt-tokens for broader coverage. At the billing layer, we calculate credits based on the actual token count for the specific model used. This means the same prompt costs different amounts depending on the model, which is exactly how the underlying providers charge.
Getting this right took three iterations. The first version estimated tokens from character count (wildly inaccurate). The second used OpenAI’s tokenizer for everything (wrong for non-OpenAI models). The third uses model-specific tokenizers with a fallback to cl100k_base when we don’t have an exact match.
Subscription tiers were the wrong abstraction
Our first approach to model access was subscription-based gating. Free users got GPT-4o-mini, Perplexity, and Gemini Flash. Starter tier unlocked GPT-4-turbo. Boss Mode unlocked everything – Claude, GPT-4, Gemini Pro, the full OpenRouter catalog.
This created perverse incentives. Users on the free tier who needed Claude for one specific task had to upgrade their entire subscription. Users on Boss Mode who only used GPT-4o-mini were overpaying. The model access tiers didn’t map to actual usage patterns.
When we built Dashboard v2, we replaced subscription gating with per-model credit pricing. Every model has a credit cost per 1,000 tokens, set proportionally to what the underlying provider charges us. Users buy credits and spend them on whichever models they want. GPT-4o-mini costs a fraction of Claude 3.5 Sonnet, which costs a fraction of GPT-5.1.
This was the right model from the start. We just needed two years of subscription headaches to realize it.
The current state: 30+ models, 4 providers, one interface
As of February 2026, our model catalog includes GPT-4, GPT-4o, GPT-5, GPT-5.1, Claude 3.5 Sonnet, Gemini Pro, Gemini Flash, DeepSeek R1, Mixtral, Llama variants, Qwen, and a rotating set of models through OpenRouter. Dashboard v2 runs a native LLM proxy with per-model credit pricing.
Adding a new model takes hours, not weeks. Adding a new provider takes a day or two. The abstraction layer that started as a cleanup refactor in 2023 now handles:
- Provider routing – directing requests to the correct API based on model identifier
- Authentication – managing API keys, service accounts, and OAuth tokens per provider
- Streaming normalization – converting provider-specific SSE formats to a unified output
- Token counting – model-specific tokenization for accurate billing
- Error handling – translating provider error codes to consistent user-facing messages
- Rate limiting – respecting per-provider and per-model rate limits
What multi-model taught us about product strategy
The multi-model strategy wasn’t a product vision. It was a response to user behavior. People who work with AI professionally don’t have loyalty to a single model. They have preferences per task. They want to test a new model the week it launches without waiting for us to build a dedicated integration.
The technical lesson is that abstractions compound. That first wrapper around the OpenAI client, written when we had exactly one provider, made every subsequent provider integration cheaper. By the time we added Vertex AI as our fourth direct integration, the pattern was so well-established that most of the work was reading Google’s API documentation, not writing application code.
The business lesson is that model access is a commodity. Nobody will pay a premium for “we have GPT-4” when OpenAI sells it directly. What they’ll pay for is the layer on top: the agents, the knowledge bases, the workflow automation, the sandboxed execution. The models are the engine. The product is the car.
We started with one model and one API call. We’ll end up wherever the model ecosystem goes next. The abstraction layer doesn’t care.
Alexey Suvorov
CTO, AIWAYZ
10+ years in software engineering. CTO at Bewize and Fulldive. Master's in IT Security from ITMO University. Builds AI systems that run 100+ microservices with small teams.
LinkedIn