Why not stick with Pinecone?

Cost and operational complexity. Pinecone charges per vector stored and per query, which scales unpredictably when users upload large document collections. It also added a vendor dependency that complicated our self-hosted deployment story. When we found we could get comparable retrieval quality with a simpler proxy approach, the migration was straightforward.

What's the difference between a vector database approach and a VFS approach to knowledge management?

A vector database treats documents as collections of embeddings that you search semantically. A VFS treats documents as files in a filesystem that happen to have AI-readable content. The VFS approach is more intuitive for users -- they understand folders and files. It also supports operations that vector databases don't: versioning, undo, folder organization, access controls at the file level.

How does Gemini 2.5 Flash handle PDF OCR compared to traditional OCR tools?

Traditional OCR tools like Tesseract extract text but lose layout context -- tables become comma-separated values, multi-column pages get concatenated wrong. Gemini 2.5 Flash processes the PDF as a visual document, preserving table structure, understanding column layouts, and extracting text in reading order. The accuracy improvement on complex documents like financial reports or technical papers is significant.

The Knowledge Base We Rebuilt Twice

August 2023. We shipped our first knowledge base feature with Pinecone as the vector store, LangChain for embeddings, and a document processing pipeline that could ingest PDFs, DOCX files, and scraped web pages. It worked. Users uploaded documents, asked questions, and got answers grounded in their data.

Two and a half years later, we’re on our third architecture. Pinecone is gone. LangChain is gone. The vector database approach itself is gone. Each rebuild simplified the system while adding capabilities the previous version couldn’t support.

This is the story of three knowledge base architectures and what each one taught us about retrieval-augmented generation.

Architecture 1: Pinecone and LangChain (August 2023)

The first implementation followed the 2023 RAG playbook almost exactly. Everyone building knowledge bases in mid-2023 used roughly the same stack, and we were no exception.

Document ingestion. Users could create knowledge base entries from three sources: raw text input, file uploads (PDF via pdf-parse, DOCX via mammoth), or web scraping. Each document went through a processing pipeline that extracted text and prepared it for vectorization.

Text splitting. LangChain’s text splitter chunked documents into overlapping segments. The chunk size and overlap were configurable, but we settled on 1,000 tokens with 200-token overlap as reasonable defaults. Smaller chunks improved precision but lost context. Larger chunks preserved context but retrieved too much irrelevant text.

Embeddings and storage. OpenAI’s embedding model (ada-002 at the time) converted each chunk into a 1,536-dimensional vector. Those vectors went to Pinecone’s managed index. Each vector stored the original text as metadata alongside its embedding.

Retrieval. When a user asked a question, we embedded the query, searched Pinecone for the k most similar chunks (typically k=5), and injected those chunks into the prompt as context. The AI then answered based on the provided context rather than its training data.

It worked. For straightforward Q&A over document collections, the retrieval quality was acceptable. Users could upload a company handbook and ask “What’s the vacation policy?” and get the right answer.

But the problems accumulated.

What went wrong with v1

Cost scaling. Pinecone charges per vector stored and per query. Small knowledge bases were cheap. Enterprise users with thousands of documents generated bills that were hard to predict and harder to justify. One client uploaded a 500-page technical manual, which split into roughly 2,500 chunks, each stored as a vector. The storage cost was manageable, but the query costs during heavy usage periods were not.

Vendor lock-in. Our self-hosted deployment promise meant that every external dependency was a liability. Pinecone is a cloud service. You can’t run it on-premise. For clients who chose our platform specifically because they wanted data sovereignty, sending their document embeddings to a third-party vector database undermined the value proposition.

Retrieval quality. Semantic search sounds good in theory. In practice, embedding-based retrieval has well-documented failure modes. It struggles with negation (“What are the policies that DON’T apply to contractors?”), precise numerical queries (“What was revenue in Q3 2022?”), and multi-hop reasoning (“Compare the vacation policy for full-time employees in the US office with part-time employees in the London office”).

Operational complexity. Pinecone added another service to monitor, another authentication system to manage, another failure mode to handle. When Pinecone had an outage, our knowledge base was down even if everything else was healthy.

By early 2024, the cost and complexity arguments were strong enough to motivate a change.

Architecture 2: SmartFAQ proxy (May 2024)

The second architecture replaced Pinecone with SmartFAQ, an internal proxy service that handled vectorization and retrieval without the external dependency.

SmartFAQ was simpler by design. Instead of a managed vector database with its own indexing, scaling, and query optimization, we built a lightweight service that:

Accepted documents and generated embeddings using the same OpenAI embedding model
Stored vectors alongside document metadata in a format we controlled
Performed similarity search using straightforward cosine distance calculation
Returned ranked results with relevance scores

The retrieval quality was comparable to Pinecone for our use cases. Our largest knowledge bases had tens of thousands of chunks, not billions. At that scale, brute-force cosine similarity is fast enough.

The migration added features Pinecone didn’t support natively. Tag filtering for restricting searches to specific categories. Pagination for browsing contents rather than only searching. Organization-level KBs for shared knowledge bases with proper access controls.

SmartFAQ removed the vendor dependency, reduced costs, and gave us flexibility. But it was still a vector search system with the same retrieval quality limitations.

The problem with thinking in vectors

Both v1 and v2 shared a conceptual flaw: they treated knowledge management as a search problem. Upload documents, chunk them, embed them, search them. The user’s interaction model was “ask a question, get relevant chunks.”

Real knowledge management is more nuanced. Users don’t just search their documents. They organize them into folders. They update documents and want the old version available. They share specific documents with specific colleagues. They need to know when a document was last modified and by whom. They want to browse, not just query.

Vector databases are excellent at semantic search. They’re terrible at everything else. You can’t version a vector. You can’t organize embeddings into folders. You can’t share a specific chunk with a colleague while restricting access to the rest of the document.

The realization that we were solving the wrong problem drove the third architecture.

Architecture 3: Virtual File System (November 2025)

Dashboard v2’s knowledge management doesn’t use vectors at all. It uses a virtual file system backed by MongoDB.

The VFS treats documents as files. They live in folders. They have names, creation dates, modification timestamps, and access permissions. Users interact with their knowledge base the way they interact with a file system – because that’s the mental model everyone already has.

Document storage. Files are stored in MongoDB with their full content, metadata, and organizational hierarchy. No chunking. No embedding. The raw document is preserved exactly as uploaded.

Retrieval. When an AI agent needs context from the knowledge base, it uses document content directly. With modern context windows – Claude handles 200,000 tokens, Gemini over a million – the “chunk and embed” approach from 2023 is often unnecessary in 2026. For knowledge bases exceeding the context window, we use metadata filtering and AI-driven relevance assessment. The model reads document summaries and selects which full documents to load.

Versioning. Every mutation to a file creates a snapshot. The VFS maintains version history with 30-day TTL on snapshots. Users can view previous versions, restore old content, and understand how a document has changed over time. This was impossible in both the Pinecone and SmartFAQ architectures.

PDF OCR via Gemini 2.5 Flash. In January 2026, we added OCR for PDF documents using Gemini 2.5 Flash. Rather than traditional OCR that extracts text character-by-character and loses layout context, we send the PDF pages to Gemini as images. The model understands table structure, column layouts, headers, footers, and reading order. The extracted text preserves the document’s logical structure in a way that pdf-parse never could.

The pattern across three architectures

Each rebuild followed the same pattern: simplify the architecture while expanding the capabilities.

V1 (Pinecone + LangChain): Complex infrastructure, basic features. External vector database, external embedding library, chunking pipeline. Users could upload and search. That’s it.

V2 (SmartFAQ): Simpler infrastructure, more features. Internal service, same embedding approach, but with tag filtering, pagination, and organization-level access. Removed vendor dependency.

V3 (VFS): Simplest infrastructure, most features. MongoDB storage, no vector search, but with versioning, folder organization, file-level permissions, and AI-driven OCR. Removed the entire vector paradigm.

The complexity decreased at each step while the capability increased. That’s the pattern you want in system evolution. Each rebuild should do more with less.

What we’d tell teams starting their RAG journey

Start simpler than you think you need. In 2023, we reached for Pinecone because that’s what the tutorials recommended. A PostgreSQL database with pgvector would have served us just as well.

Question the chunking paradigm. In 2026, with 200K+ token context windows becoming standard, many knowledge bases fit entirely in context. The “chunk, embed, retrieve” pipeline may not be necessary for your scale.

Version everything. Knowledge isn’t static. A knowledge base without versioning can’t answer “What did this say last month?”

Treat knowledge management as a file system problem, not a search problem. Users understand files and folders. They don’t understand vector embeddings and cosine similarity.

We rebuilt our knowledge base twice. Each rebuild was driven by a clearer understanding of what users actually need from their documents. The first version asked “How do we search documents with AI?” The second asked “How do we search documents cheaper?” The third asked the right question: “How do we help users manage their knowledge?”

The architecture that answers the right question tends to be the simplest one. It just takes a few wrong questions to figure out what the right one is.

The Knowledge Base We Rebuilt Twice

Architecture 1: Pinecone and LangChain (August 2023)

What went wrong with v1

Architecture 2: SmartFAQ proxy (May 2024)

The problem with thinking in vectors

Architecture 3: Virtual File System (November 2025)

The pattern across three architectures

What we’d tell teams starting their RAG journey

Related Posts

168 Integrations, One Plugin at a Time

5,876 Commits Across Three AI Products

Building Custom GPTs Before OpenAI Did

See what AIWAYZ can do for your team

Products

Solutions

Company

Legal