Why not just pick one model provider and stick with it?

Because no single model is the best at everything. Our users consistently told us they wanted Claude for writing, GPT for code, and Gemini for analysis. Forcing a single-model choice would have driven away the exact users who understood AI well enough to care about the differences.

What does OpenRouter actually do?

OpenRouter is an API gateway that gives you access to 200+ models through a single endpoint. You send a request with a model identifier, and they route it to the correct provider. They charge a 5% markup on all requests, which is the trade-off for not having to maintain 20+ separate provider integrations yourself.

How do you handle token counting across different models?

We use js-tiktoken and gpt-tokens for token counting. Different model families use different tokenizers, so the same text can produce different token counts depending on the model. We normalize this at the billing layer so users see consistent credit costs regardless of tokenizer differences.

Thirty Models, One Interface

February 3, 2023. Our first AI API call was to OpenAI’s GPT-3.5. One model, one provider, one endpoint. We hardcoded the model name in the request body and called it done.

Three years later, we support 30+ models from four providers, stream responses via server-sent events, count tokens with model-specific tokenizers, and price each model differently in a pay-as-you-go credit system. None of this was in the original plan.

This is the story of how multi-model support went from an afterthought to the core of our platform.

One model was enough – for about two months

The first version of our dashboard was a news aggregator that happened to use GPT-3.5 for text processing. By April 2023, we’d added a chat feature. Still GPT-3.5. Still one endpoint. The integration code was maybe 40 lines.

Then OpenAI released GPT-4. Users immediately asked for it. Fair enough – we added a second model option. The code was still simple: an if/else that swapped the model name in the API payload. Two models, same provider, same auth token, same response format.

That’s when we made the architectural decision that would either save us or haunt us: we wrapped the OpenAI client in an abstraction layer. Not because we had grand multi-provider ambitions. Because we didn’t want model names scattered across the codebase. A small act of hygiene that turned out to be the most consequential refactor of the year.

The OpenRouter unlock

March 2024. A team member suggested we try OpenRouter. The pitch was straightforward: one API key, one endpoint, access to models from Anthropic, Meta, Mistral, Google, and dozens of smaller providers. The 5% markup on every request seemed reasonable compared to maintaining separate billing accounts and API integrations for each provider.

We integrated OpenRouter in a single afternoon. Because we’d already abstracted the model layer, adding a new provider meant implementing one interface: send a prompt, get a response, count the tokens. The OpenRouter API is intentionally compatible with OpenAI’s format, so even the response parsing was reusable.

Overnight, we went from 3 models to 20+. Mixtral, Llama, DeepSeek, Qwen, Perplexity – all available through the same code path. We didn’t have to negotiate API access or manage separate rate limits. We just added model identifiers to a configuration file.

The lesson was clear: don’t build provider integrations. Build provider abstractions. The difference is that an integration couples you to one API’s quirks. An abstraction lets you swap providers without touching business logic.

Direct integrations still matter

OpenRouter solved the breadth problem, but not the depth problem. Some providers offer capabilities through their direct APIs that proxy services can’t replicate.

In June 2024, we added Anthropic’s Claude API directly. Claude 2.0, 2.1, then Claude 3 Opus, 3 Sonnet, and 3.5 Sonnet as they launched. By July, we had Claude Vision support – image analysis that required Anthropic’s native multimodal API format, which OpenRouter didn’t fully support at the time.

In April 2025, we added Google’s Vertex AI for direct Gemini Pro and Flash access. Google’s API has its own authentication model (service accounts, not API keys) and its own response format. But because our abstraction layer was solid by this point, the Vertex integration took a day and a half.

The pattern became: use OpenRouter for breadth, use direct integrations for depth. If a provider offers something unique – vision, extended context, specific safety controls – we connect directly. For everything else, OpenRouter handles the routing.

The streaming problem nobody warns you about

Multi-model support sounds clean until you add streaming. We use server-sent events (SSE) for real-time response delivery, so users see tokens appear as they’re generated rather than waiting for the entire response.

Every provider streams differently. OpenAI sends data: {"choices": [{"delta": {"content": "token"}}]}. Anthropic sends event: content_block_delta with a different JSON structure. Google’s streaming format is different again. OpenRouter normalizes most of this, but edge cases leak through – stop reasons, usage statistics, and error formats all vary.

We ended up building a streaming normalizer that sits between the provider response and our SSE output. It accepts any provider’s stream format and emits a consistent event shape to the client. Adding a new provider now means writing one stream adapter, typically under 100 lines.

Token counting across model families

Here’s a detail that doesn’t come up until you’re building a billing system: the same sentence produces different token counts depending on which model processes it.

GPT-4 uses the cl100k_base tokenizer. Claude uses its own tokenizer. Llama models use a SentencePiece-based tokenizer. The sentence “Summarize this document in three bullet points” might be 8 tokens on one model and 11 on another.

We use js-tiktoken for OpenAI-compatible counting and gpt-tokens for broader coverage. At the billing layer, we calculate credits based on the actual token count for the specific model used. This means the same prompt costs different amounts depending on the model, which is exactly how the underlying providers charge.

Getting this right took three iterations. The first version estimated tokens from character count (wildly inaccurate). The second used OpenAI’s tokenizer for everything (wrong for non-OpenAI models). The third uses model-specific tokenizers with a fallback to cl100k_base when we don’t have an exact match.

Subscription tiers were the wrong abstraction

Our first approach to model access was subscription-based gating. Free users got GPT-4o-mini, Perplexity, and Gemini Flash. Starter tier unlocked GPT-4-turbo. Boss Mode unlocked everything – Claude, GPT-4, Gemini Pro, the full OpenRouter catalog.

This created perverse incentives. Users on the free tier who needed Claude for one specific task had to upgrade their entire subscription. Users on Boss Mode who only used GPT-4o-mini were overpaying. The model access tiers didn’t map to actual usage patterns.

When we built Dashboard v2, we replaced subscription gating with per-model credit pricing. Every model has a credit cost per 1,000 tokens, set proportionally to what the underlying provider charges us. Users buy credits and spend them on whichever models they want. GPT-4o-mini costs a fraction of Claude 3.5 Sonnet, which costs a fraction of GPT-5.1.

This was the right model from the start. We just needed two years of subscription headaches to realize it.

The current state: 30+ models, 4 providers, one interface

As of February 2026, our model catalog includes GPT-4, GPT-4o, GPT-5, GPT-5.1, Claude 3.5 Sonnet, Gemini Pro, Gemini Flash, DeepSeek R1, Mixtral, Llama variants, Qwen, and a rotating set of models through OpenRouter. Dashboard v2 runs a native LLM proxy with per-model credit pricing.

Adding a new model takes hours, not weeks. Adding a new provider takes a day or two. The abstraction layer that started as a cleanup refactor in 2023 now handles:

Provider routing – directing requests to the correct API based on model identifier
Authentication – managing API keys, service accounts, and OAuth tokens per provider
Streaming normalization – converting provider-specific SSE formats to a unified output
Token counting – model-specific tokenization for accurate billing
Error handling – translating provider error codes to consistent user-facing messages
Rate limiting – respecting per-provider and per-model rate limits

What multi-model taught us about product strategy

The multi-model strategy wasn’t a product vision. It was a response to user behavior. People who work with AI professionally don’t have loyalty to a single model. They have preferences per task. They want to test a new model the week it launches without waiting for us to build a dedicated integration.

The technical lesson is that abstractions compound. That first wrapper around the OpenAI client, written when we had exactly one provider, made every subsequent provider integration cheaper. By the time we added Vertex AI as our fourth direct integration, the pattern was so well-established that most of the work was reading Google’s API documentation, not writing application code.

The business lesson is that model access is a commodity. Nobody will pay a premium for “we have GPT-4” when OpenAI sells it directly. What they’ll pay for is the layer on top: the agents, the knowledge bases, the workflow automation, the sandboxed execution. The models are the engine. The product is the car.

We started with one model and one API call. We’ll end up wherever the model ecosystem goes next. The abstraction layer doesn’t care.

Thirty Models, One Interface

One model was enough – for about two months

The OpenRouter unlock

Direct integrations still matter

The streaming problem nobody warns you about

Token counting across model families

Subscription tiers were the wrong abstraction

The current state: 30+ models, 4 providers, one interface

What multi-model taught us about product strategy

Related Posts

168 Integrations, One Plugin at a Time

5,876 Commits Across Three AI Products

Building Custom GPTs Before OpenAI Did

See what AIWAYZ can do for your team

Products

Solutions

Company

Legal