Developing Your RAG Agent

In this section, we're going to create our RAG QA Agent using langchain for orchestration. Our RAG application consists of two components:

Retriever to retrieve data from knowledge base
Generator for generating a natural sounding answer from retrieved context

Both of them combined make up a RAG (Retrieval-Augmented Generation) application. We will create our components with flexibility in mind by using indepen variables like generation model, vector store, embedding model, chunk size — these variables will allow us to change our RAG configuration and evaluate it.

note

If you already have a RAG application that you want to evaluate, feel free to skip to the evaluation section of this tutorial.

Create Agent and Load Data

We'll create a RAGAgent class that combines retrieval and generation to answer user queries. By separating retrieval and generation into helper functions, we can evaluate and improve each part independently.

Before retrieving data, we need to store it in a format the retriever can access — a vector store. This is a database that stores vector embeddings (numerical representations of data) for fast similarity search, essential for RAG systems.

We'll use OpenAIEmbeddings and the FAISS vector store from langchain to build our knowledge base, though other models and stores can be used.

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

class RAGAgent:
    def __init__(
        self,
        document_paths: list,
        embedding_model=None,
        chunk_size: int = 500,
        chunk_overlap: int = 50,
        vector_store_class=FAISS,
        k: int = 2
    ):
        self.document_paths = document_paths
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.embedding_model = embedding_model or OpenAIEmbeddings()
        self.vector_store_class = vector_store_class
        self.k = k
        self.vector_store = self._load_vector_store()
    
    def _load_vector_store(self):
        documents = []
        for document_path in self.document_paths:
            with open(document_path, "r", encoding="utf-8") as file:
                raw_text = file.read()
            
            splitter = RecursiveCharacterTextSplitter(
                chunk_size=self.chunk_size,
                chunk_overlap=self.chunk_overlap
            )
            documents.extend(splitter.create_documents([raw_text]))

        return self.vector_store_class.from_documents(documents, self.embedding_model)

note

You can modify the above code to use an embedding model or vector store of your choice.

You can sanity check yourself by printing the vector store to see if it has been stored stored:

document_paths = ["theranos_legacy.txt"]
agent = RAGAgent(document_paths)
print(agent.vector_store)

✅ Done. Now we'll define a retrieve() method to fetch relevant documents from the vector store.

Creating Retriever

In Retrieval-Augmented Generation (RAG), the retriever finds the most relevant info from a knowledge base — our vector store. We'll now add a retrieve() method to the RAGAgent class to fetch relevant data for a given query.

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

class RAGAgent:
        ... # Same functions from above

    def retrieve(self, query: str):
        docs = self.vector_store.similarity_search(query, k=self.k)
        context = [doc.page_content for doc in docs]
        return context

This allows us to retrieve k documents that are most relevant to the query we supplied by using similarity search. We can test our retriever with the following code:

doc_path = ["theranos_legacy.txt"]

retriever = RAGAgent(doc_path)
retrieved_docs = retriever.retrieve("How many blood tests can you perform and how much blood do you need?")

print(retrieved_docs)

note

I have created a file called theranos_legacy.txt that has all the information about Theranos company. Feel free to use your own documents or the sample content provided below:

Click here to see the contents of theranos_legacy.txt

theranos_legacy.txt
Company Name: Theranos Technologies Inc.  
Founded: 2003  
Founder & CEO: Sherlock Holmes  
Headquarters: Palo Alto, California  
Mission: To revolutionize blood diagnostics through rapid, portable testing solutions.

Overview:  
Theranos Technologies Inc. is a medical technology company dedicated to transforming how blood diagnostics are performed. 
With its proprietary platform, Theranos enables comprehensive laboratory testing from a few drops of blood. This innovation 
reduces cost, increases accessibility, and accelerates clinical decision-making, putting real-time health information in the 
hands of patients and physicians alike.

Flagship Product: NanoDrop 3000™  
The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1–2 microliters 
of capillary blood. The device integrates microfluidics, spectrometry, and Theranos’s patented NanoAnalysis Engine™ to provide 
lab-grade results in under 20 minutes.

Key Features:  
- Sample volume: 1.2 microliters (average)  
- Test menu: 325+ assays including metabolic, hormonal, infectious, hematologic, and genomic panels  
- Results delivery: On-device display and synced via TheraCloud™ platform  
- Power: Rechargeable lithium-ion battery with 18-hour operation  
- Connectivity: Encrypted Wi-Fi, Bluetooth, and USB-C

Technology Platform:  
Theranos’s diagnostics pipeline is powered by MicroVial Sensing (MVS), a next-gen detection framework combining nanophotonic arrays 
and adaptive sample calibration. The system processes micro-samples through proprietary capillary modules, ensuring high sensitivity 
and reproducibility across a broad spectrum of biomarkers.

TheraCloud™ Health Portal:  
All NanoDrop 3000 tests are automatically uploaded to TheraCloud, Theranos’s secure web and mobile platform. Patients and providers 
can review full diagnostic panels, trend health data over time, and receive personalized insights based on AI-powered analytics. 
Integration with third-party systems like EPIC, Cerner, and Apple Health is supported via HL7 and FHIR protocols.

Use Cases:
- Primary care clinics: Rapid diagnostics during check-ups  
- Pharmacies: In-store wellness panels  
- Telemedicine: At-home blood testing for remote consultations  
- Clinical trials: Fast, decentralized biomarker screening  
- Emergency settings: Point-of-care triage

Corporate Structure:  
Theranos employs over 1,800 staff across R&D, diagnostics engineering, cloud systems, regulatory science, and clinical operations. 
The company maintains clinical partnerships with over 60 healthcare institutions and operates six high-throughput testing hubs 
in the U.S.

Leadership:  
- Sherlock Holmes – Founder & CEO  
- Dr. Linda Templeton – Chief Science Officer  
- Richard Parker – VP, Cloud Engineering  
- Dr. Helen Kelly – Director of Clinical Applications  
- Luthor Martin – General Counsel

Selected Partnerships:
- Walgreens Health  
- Cleveland Medical Research Institute  
- United Diagnostic Alliance  
- MedWorks Clinical Trials  
- TelePath Global (for remote care distribution)

Recent Milestones:
- FDA Emergency Use Approval granted for the COVID-19 MicroDrop Panel (2021)  
- Expanded test menu to include pharmacogenomic testing (Q3 2022)  
- Strategic licensing deal signed with Medix Korea for Asia-Pacific rollout  
- Completion of Series F funding round, raising $240M from Fidelity, BlackRock, and Sequoia Capital (Q1 2023)  
- Published real-world performance results in *Clinical Diagnostics Today*, Vol. 58, Issue 4

FAQs:

Q: How accurate are Theranos test results?  
A: Independent validation studies report sensitivity and specificity exceeding 94% for most core assays, with reproducibility between 
92–97% across sample types and environments.

Q: What certifications does Theranos hold?  
A: Theranos labs are CLIA-certified and CAP-accredited. NanoDrop 3000 is CE-marked and pending full FDA 510(k) clearance for expanded 
panels.

Q: Can Theranos tests be administered at home?  
A: Yes. Through our partnership with TheraDirect™, patients can request a NanoDrop Home Kit, available in select states with licensed 
telehealth coverage.

Q: Where can I view the latest test menu?  
A: Visit theranos.com/products/nanodrop3000/testmenu or access via the TheraCloud mobile app.

Media Contacts:  
press@theranos.com  
investorrelations@theranos.com

Company Motto: “One Drop Changes Everything™”

Running the above code should let you see something like this:

[
  'The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1-2 microliters of capillary blood. The device integrates microfluidics, spectrometry, and Theranos’s patented NanoAnalysis Engine™ to provide lab-grade results in under 20 minutes.',
  'Key Features:\n- Sample volume: 1.2 microliters (average)\n- Test menu: 325+ assays including metabolic, hormonal, infectious, hematologic, and genomic panels',
]

✅ Retriever done. Now we can move on to creating our generator.

Creating generator

In a RAG (Retrieval-Augmented Generation) system, the generator creates a natural language response using the user’s query and the retrieved documents.

We'll now add a generate() method to our RAGAgent class. This function will take the retrieved context and use an OpenAI language model (via langchain) to generate the final answer.

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI

class RAGAgent:
        ... # Same methods as above

    def generate(
        self,
        query: str, 
        retrieved_docs: list, 
        llm_model=None, 
        prompt_template: str = None
    ):
        context = "\n".join(retrieved_docs)
        model = llm_model or OpenAI(temperature=0)
        prompt = prompt_template or (
            "Answer the query using the context below.\n\nContext:\n{context}\n\nQuery:\n{query}"
            "Only use information from the context. If nothing relevant is found, respond with: 'No relevant information available.'"
        )
        prompt = prompt.format(context=context, query=query)
        return model(prompt)

This allows us to generate an answer to the query based on the retrieved docs. Here's how we can use our generator:

doc_path = ["theranos_legacy.txt"]
query = "How many blood tests can you perform and how much blood do you need?"

retriever = RAGAgent(doc_path)
retrieved_docs = retriever.retrieve(query)
generated_answer = retriever.generate(query, retrieved_docs)

print(generated_answer)

Running the above code will get you an output similar to the following:

The NanoDrop 3000 can perform over 325 blood tests using just 1-2 microliters of capillary blood. 
This enables comprehensive diagnostics with minimal sample volume.

✅ Generator done. We will now create a final answer() function that will retrieve and send context to our generator to answer any query.

class RAGAgent:
        ... # Same functions and imports

    def answer(
        self, 
        query: str,
        llm_model=None, 
        prompt_template: str = None
    ):
        retrieved_docs = self.retrieve(query)
        generated_answer = self.generate(query, retrieved_docs, llm_model, prompt_template)
        return generated_answer, retrieved_docs

You can now send a query and test your entire RAG QA Agent.

document_paths = ["theranos_legacy.txt"]
query = "What is the NanoDrop 3000, and what certifications does Theranos hold?"

retriever = RAGAgent(document_paths)
answer, retrieved_docs = retriever.answer(query)

🎉🥳 Congratulations! You've just built a complete RAG QA Agent. Let's now understand how we can improve our RAG Agent.

Most LLMs output a response in markdown format by default, which makes it harder to extract structured data such as citations. This is not ideal because we cannot parse the output to show citations in the UI. Below is an example of what using raw output from LLMs look like:

UI Image

**The NanoDrop 3000™** is the flagship diagnostic device developed by Theranos Technologies. It is a compact, portable system capable of performing over **325 blood tests** using just **1–2 microliters** of capillary blood. The device delivers **lab-grade results in under 20 minutes** and features:

* Integrated microfluidics, spectrometry, and the proprietary **NanoAnalysis Engine™**
* An on-device display and secure syncing via the **TheraCloud™** platform
* **Encrypted connectivity** (Wi-Fi, Bluetooth, USB-C)
* **Rechargeable lithium-ion battery** with 18-hour operation

**Certifications held by Theranos**:

1.  **CLIA-certified** (Clinical Laboratory Improvement Amendments)
2.  **CAP-accredited** (College of American Pathologists)
3.  **CE-marked** for European regulatory compliance
4.  **FDA 510(k) clearance** is currently **pending** for expanded test panels

Updating The RAG Agent

We can improve our agent's responses by using a better prompt that outputs answers in json format. This makes it easier to parse and display the data as needed.

We can use the following prompt template to generate our response in json:

You are a helpful assistant. Use the context below to answer the user's query. 
Format your response strictly as a JSON object with the following structure:

{
  "answer": "<a concise, complete answer to the user's query>",
  "citations": [
    "<relevant quoted snippet or summary from source 1>",
    "<relevant quoted snippet or summary from source 2>",
    ...
  ]
}

Only include information that appears in the provided context. Do not make anything up.
Only respond in JSON — No explanations needed. Only use information from the context. If 
nothing relevant is found, respond with: 

{
  "answer": "No relevant information available.",
  "citations": []
}


Context:
{context}

Query:
{query}

We can update our answer() function to parse the output as json and return the json object. Here's how to update our answer() function:

class RAGAgent:
    ... # Same functions from above
    
    def answer(self, query: str):
        retrieved_docs = self.retrieve(query)
        generated_answer = self.generate(query, retrieved_docs)

        try:
            res = json.loads(generated_answer)
            return res
        except json.JSONDecodeError:
            return {"error": "Invalid JSON returned from model", "raw_output": generated_answer}

Now our RAGAgent outputs a valid json, we can use this output to render UI and create webpages or handle our responses in any way we want. Here's the new responses generated by our agent:

UI Image

{
  "answer": "The NanoDrop 3000 is a compact, portable diagnostic device developed by Theranos Technologies. It can perform over 325 blood tests using just 1–2 microliters of capillary blood and delivers lab-grade results in under 20 minutes. Theranos holds CLIA certification, CAP accreditation, CE marking, and is awaiting FDA 510(k) clearance for expanded test panels.",
  "citations": [
    "The NanoDrop 3000 is a compact, portable diagnostic device capable of performing over 300 blood tests using just 1–2 microliters of capillary blood.",
    "Key Features: Sample volume: 1.2 microliters (average), Test menu: 325+ assays",
    "Theranos labs are CLIA-certified and CAP-accredited. NanoDrop 3000 is CE-marked and pending full FDA 510(k) clearance for expanded panels."
  ]
}

We now have a RAG agent that generates the output in our desired format, but how reliable are the generated answers? It is very important to make sure that the answers generated by the agent are reliable, especially for an infamous company like Theranos.

In the next section, we'll see how to evaluate our RAG QA Agent using deepeval.

Create Agent and Load Data​

Creating Retriever​

Creating generator​

Updating The RAG Agent​

Create Agent and Load Data

Creating Retriever

Creating generator

Updating The RAG Agent