Developing Your RAG Agent
In this section, we're going to create our RAG QA Agent using langchain for orchestration. Our RAG application consists of two components:
- Retriever to retrieve data from knowledge base
- Generator for generating a natural sounding answer from retrieved context
Both of them combined make up a RAG (Retrieval-Augmented Generation) application. We will create our components with flexibility in mind by using indepen variables like generation model, vector store, embedding model, chunk size — these variables will allow us to change our RAG configuration and evaluate it.
Create Agent and Load Data
We'll create a RAGAgent class that combines retrieval and generation to answer user queries. By separating retrieval and generation into helper functions, we can evaluate and improve each part independently.
Before retrieving data, we need to store it in a format the retriever can access — a vector store. This is a database that stores vector embeddings (numerical representations of data) for fast similarity search, essential for RAG systems.
We'll use OpenAIEmbeddings and the FAISS vector store from langchain to build our knowledge base, though other models and stores can be used.
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
class RAGAgent:
def __init__(
self,
document_paths: list,
embedding_model=None,
chunk_size: int = 500,
chunk_overlap: int = 50,
vector_store_class=FAISS,
k: int = 2
):
self.document_paths = document_paths
self.chunk_size = chunk_size
self.chunk_overlap = chunk_overlap
self.embedding_model = embedding_model or OpenAIEmbeddings()
self.vector_store_class = vector_store_class
self.k = k
self.vector_store = self._load_vector_store()
def _load_vector_store(self):
documents = []
for document_path in self.document_paths:
with open(document_path, "r", encoding="utf-8") as file:
raw_text = file.read()
splitter = RecursiveCharacterTextSplitter(
chunk_size=self.chunk_size,
chunk_overlap=self.chunk_overlap
)
documents.extend(splitter.create_documents([raw_text]))
return self.vector_store_class.from_documents(documents, self.embedding_model)You can sanity check yourself by printing the vector store to see if it has been stored stored:
document_paths = ["theranos_legacy.txt"]
agent = RAGAgent(document_paths)
print(agent.vector_store)âś… Done. Now we'll define a retrieve() method to fetch relevant documents from the vector store.
Creating Retriever
In Retrieval-Augmented Generation (RAG), the retriever finds the most relevant info from a knowledge base — our vector store.
We'll now add a retrieve() method to the RAGAgent class to fetch relevant data for a given query.
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
class RAGAgent:
... # Same functions from above
def retrieve(self, query: str):
docs = self.vector_store.similarity_search(query, k=self.k)
context = [doc.page_content for doc in docs]
return contextThis allows us to retrieve k documents that are most relevant to the query we supplied by using similarity search. We can test our retriever with the following code:
doc_path = ["theranos_legacy.txt"]
retriever = RAGAgent(doc_path)
retrieved_docs = retriever.retrieve("How many blood tests can you perform and how much blood do you need?")
print(retrieved_docs)Running the above code should let you see something like this:
[
'The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1-2 microliters of capillary blood. The device integrates microfluidics, spectrometry, and Theranos’s patented NanoAnalysis Engine™ to provide lab-grade results in under 20 minutes.',
'Key Features:\n- Sample volume: 1.2 microliters (average)\n- Test menu: 325+ assays including metabolic, hormonal, infectious, hematologic, and genomic panels',
]âś… Retriever done. Now we can move on to creating our generator.
Creating generator
In a RAG (Retrieval-Augmented Generation) system, the generator creates a natural language response using the user’s query and the retrieved documents.
We'll now add a generate() method to our RAGAgent class. This function will take the retrieved context and use an OpenAI language model (via langchain) to generate the final answer.
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
class RAGAgent:
... # Same methods as above
def generate(
self,
query: str,
retrieved_docs: list,
llm_model=None,
prompt_template: str = None
):
context = "\n".join(retrieved_docs)
model = llm_model or OpenAI(temperature=0)
prompt = prompt_template or (
"Answer the query using the context below.\n\nContext:\n{context}\n\nQuery:\n{query}"
"Only use information from the context. If nothing relevant is found, respond with: 'No relevant information available.'"
)
prompt = prompt.format(context=context, query=query)
return model(prompt)This allows us to generate an answer to the query based on the retrieved docs. Here's how we can use our generator:
doc_path = ["theranos_legacy.txt"]
query = "How many blood tests can you perform and how much blood do you need?"
retriever = RAGAgent(doc_path)
retrieved_docs = retriever.retrieve(query)
generated_answer = retriever.generate(query, retrieved_docs)
print(generated_answer)Running the above code will get you an output similar to the following:
The NanoDrop 3000 can perform over 325 blood tests using just 1-2 microliters of capillary blood.
This enables comprehensive diagnostics with minimal sample volume.âś… Generator done. We will now create a final answer() function that will retrieve and send context to our generator to answer any query.
class RAGAgent:
... # Same functions and imports
def answer(
self,
query: str,
llm_model=None,
prompt_template: str = None
):
retrieved_docs = self.retrieve(query)
generated_answer = self.generate(query, retrieved_docs, llm_model, prompt_template)
return generated_answer, retrieved_docsYou can now send a query and test your entire RAG QA Agent.
document_paths = ["theranos_legacy.txt"]
query = "What is the NanoDrop 3000, and what certifications does Theranos hold?"
retriever = RAGAgent(document_paths)
answer, retrieved_docs = retriever.answer(query)🎉🥳 Congratulations! You've just built a complete RAG QA Agent. Let's now understand how we can improve our RAG Agent.
Most LLMs output a response in markdown format by default, which makes it harder to extract structured data such as citations. This is not ideal because we cannot parse the output to show citations in the UI. Below is an example of what using raw output from LLMs look like:

**The NanoDrop 3000™** is the flagship diagnostic device developed by Theranos Technologies. It is a compact, portable system capable of performing over **325 blood tests** using just **1–2 microliters** of capillary blood. The device delivers **lab-grade results in under 20 minutes** and features:
* Integrated microfluidics, spectrometry, and the proprietary **NanoAnalysis Engine™**
* An on-device display and secure syncing via the **TheraCloud™** platform
* **Encrypted connectivity** (Wi-Fi, Bluetooth, USB-C)
* **Rechargeable lithium-ion battery** with 18-hour operation
**Certifications held by Theranos**:
1. **CLIA-certified** (Clinical Laboratory Improvement Amendments)
2. **CAP-accredited** (College of American Pathologists)
3. **CE-marked** for European regulatory compliance
4. **FDA 510(k) clearance** is currently **pending** for expanded test panelsUpdating The RAG Agent
We can improve our agent's responses by using a better prompt that outputs answers in json format. This makes it easier to parse and display the data as needed.
We can use the following prompt template to generate our response in json:
You are a helpful assistant. Use the context below to answer the user's query.
Format your response strictly as a JSON object with the following structure:
{
"answer": "<a concise, complete answer to the user's query>",
"citations": [
"<relevant quoted snippet or summary from source 1>",
"<relevant quoted snippet or summary from source 2>",
...
]
}
Only include information that appears in the provided context. Do not make anything up.
Only respond in JSON — No explanations needed. Only use information from the context. If
nothing relevant is found, respond with:
{
"answer": "No relevant information available.",
"citations": []
}
Context:
{context}
Query:
{query}We can update our answer() function to parse the output as json and return the json object. Here's how to update our answer() function:
class RAGAgent:
... # Same functions from above
def answer(self, query: str):
retrieved_docs = self.retrieve(query)
generated_answer = self.generate(query, retrieved_docs)
try:
res = json.loads(generated_answer)
return res
except json.JSONDecodeError:
return {"error": "Invalid JSON returned from model", "raw_output": generated_answer}Now our RAGAgent outputs a valid json, we can use this output to render UI and create webpages or handle our responses in
any way we want. Here's the new responses generated by our agent:

{
"answer": "The NanoDrop 3000 is a compact, portable diagnostic device developed by Theranos Technologies. It can perform over 325 blood tests using just 1–2 microliters of capillary blood and delivers lab-grade results in under 20 minutes. Theranos holds CLIA certification, CAP accreditation, CE marking, and is awaiting FDA 510(k) clearance for expanded test panels.",
"citations": [
"The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1–2 microliters of capillary blood.",
"Key Features: Sample volume: 1.2 microliters (average), Test menu: 325+ assays",
"Theranos labs are CLIA-certified and CAP-accredited. NanoDrop 3000 is CE-marked and pending full FDA 510(k) clearance for expanded panels."
]
}We now have a RAG agent that generates the output in our desired format, but how reliable are the generated answers? It is very important to make sure that the answers generated by the agent are reliable, especially for an infamous company like Theranos.
In the next section, we'll see how to evaluate our RAG QA Agent using deepeval.