I write a newsletter called Above Average where I talk about the second order insights behind everything that is happening in big tech. If you are in tech and don’t want to be average, subscribe to it.
What is Retrieval in the context of building RAGs?
In RAG (Retrieval Augmented Generation) applications, retrieval refers to the process of extracting the most relevant data chunks/splits from Vector databases based on the question received from the user. If your retrieval technique is not good it effects how good the information you can give to the user as a reply. The data chunks retrieved from the vector db are sent to the LLM as a context to generate the final answer that will be sent to the user as an output.
Different type of retrieval techniques:
- Basic Semantic Similarity: In this algorithm you are retrieving data chunks from vector db that are most semantically close to the question asked by the user. For example, if the user question was “Tell me about all-white mushrooms with large fruiting bodies”. A simple semantic similarity will get you the answer but will not give the information about how they might also be piousness. Some edge cases you would see using semantic similarity would be:
- If we load duplicate files in data loading and our answer exists in both, the result will give both of them. So we need to get relevant & distinct results from vector embedding when we ask a question.
- If we ask a question where all answers should come from doc-2, but we do get answers from doc-1 as well because this is a semantic search and we haven’t explicitly controlled which doc to look into and which to skip.
- Maximal Marginal Relevance (MMR): In all cases of retrievals we may not want just similarity but also divergence. In the above example of Tell me about all-white mushrooms with large fruiting bodies. If we do just similar we will not give the info on how poisonous. We can instead use MMR to have some divergence in our retrievals. Here’s the algorithm of MMR on how it picks the relevant data chunks.
- Query the vector store
- Choose the fetch_K most similar responses
- In those responses choose K most diverse
3. LLM Aided Retrieval: We can also use LLM to do retrieval by splitting the question into filter and the search term. Once we split the query using the LLM we pass the filter to the vector db as a metadata filter which most vector DB’s support.
Note that there are retrieval techniques that do not use vector databases like SVM, TF-IDF etc.,
Retrieval is where a lot of innovation is currently happening and is changing rapidly. I will be using retrieval technique in the upcoming blog to build a chat with your data application. Keep an eye out for it.
That’s it for Day 6 of 100 Days of AI.