I see an interesting challenge for engineering managers — making the old vs new decisions. There’s always a flexible and well-known way of solving virtually any problem that senior engineers are used to. At the same time, there’s a newer way that is often riskier, but it makes the life of developers easier. Over the years, I’ve seen plenty of examples of this. AWS cloud vs. your own VPS, which is not even a question nowadays. React vs. vanilla JS or jQuery. Whether or not to use Typescript. The list can go on and on — every new tech goes through this adoption phase.
OpenAI’s Knowledge Retrieval
On November 6, 2023, Sam Altman, the CEO of OpenAI, introduced a significant update to the products the company is offering. One of the interesting ones was the ability to create custom assistants that can automatically retrieve knowledge you upload manually or programmatically and use it as context.
Knowledge retrieval built into the OpenAI API sounded exciting during the presentation, as it’s one of the biggest challenges when building AI-powered assistants. While the GPT-4 Turbo knowledge cutoff date is now April 2023, it is limited to public data only. Working with proprietary data is a must for any decent application, and it presents a wide range of challenges.
Pinecone and Vector Search
Unfortunately, OpenAI’s document retrieval functionality is not very useful for developers right now despite all the hype on Twitter (or X; I’m still confused about this). There are several reasons for this. First, knowledge retrieval works only with custom assistants. You can’t use it outside their scope. Also, it is limited to only 20 documents, which is insufficient to work with a decent knowledge base. And finally, the assistants are leaking the uploaded knowledge files to the end users, which is a serious privacy issue. As a result, developers still have to build their own implementation of document retrieval.
One of the most popular ways to retrieve documents that can be provided as context to LLMs is vector search. Pinecone, a vector-embedding database, raised $100M in a Series B round, which is seen as part of the AI gold rush.
However, plenty of critics argue that Pinecone has no unique advantage and faces competition from free alternatives like Faiss or Weviate. The main concern, though, is that conventional SQL and NoSQL databases can easily add similar functionality to what Pinecone offers, and these databases already have a wide distribution.
The Role of RAG
Retrieval Augmented Generation (RAG) is a clever way of making AI smarter. It’s like giving a language model a library card, allowing it to pull in information you need outside its training data. This is especially useful when you need up-to-date domain knowledge.
RAG converts documents and user queries into a compatible format, a process known as embedding. It then compares the user’s prompt to the documents it has embeddings for, pulling in relevant information to enhance its responses.
While RAG works very well for many use cases, it’s important to remember that you can retrieve documents using any method, even a simple SQL query. And, of course, RAG comes with its own challenges.
The Reality of Implementing RAG
Implementing RAG isn’t just about indexing and retrieving documents. It might work magically in simple cases, but if you ever built a search, you know how hard it is to fulfill often contradictory requirements.
One of the main challenges is choosing the right chunks of documents that will get sent as a context to the model. You need to select chunks relevant to the prompt, which is a complex engineering problem, just like a regular search. Another challenge is fine-tuning your LLM to improve its performance.
If not implemented correctly, RAG can negatively impact responses by injecting irrelevant information or surfacing sensitive information that should have been kept confidential.
Pinecone vs. Conventional Databases
There’s a debate brewing between using vector databases like Pinecone and conventional databases like Postgres for document retrieval. Pinecone is designed for high-performance AI applications, offering optimized storage and querying for embeddings. However, it’s not open-source and can be pricey because it’s a product, not just a database.
On the flip side, Postgres is a mature relational database that offers a lot of most of the needs of different kinds of applications, including LLM-powered assistants. It has JSONB fields for storing object-like data and a pgvector extension for vector similarity search, which does exactly the same thing Pinecone does.
We chose Postgres because we already use it in our apps and because developers are already familiar with it. It’s flexible, scalable, fast, and cost-effective. YMMV, the choice between Pinecone and conventional databases depends on your company, project, team, and needs.
Pinecone is easier for developers to set up, and I can see it working well for small teams, startups, or other greenfield projects. There’s no one-size-fits-all answer here. Each tool has its place, depending on the task at hand.
The Future of Databases in GenAI
The future of databases in GenAI looks promising, with new tools like Pinecone gaining traction, especially among younger developers. Just like at the dawn of the internet, they are the ones adopting the latest technology. They’re leading the charge in building GPTs and OpenAI wrappers and using tools like Github Copilot, and they’re increasingly moving away from conventional tools like databases.
Despite some skepticism, particularly from senior developers, the GenAI movement is rapidly moving forward. Chances are the new generation of engineers won’t need conventional databases for a long time, if ever. And as GenAI continues to evolve, so will the tools and databases supporting it.
Originally published on Medium.com