RAG Pipeline¶

The Retrieval-Augmented Generation pipeline is the core intelligence layer of DualLens Analytics. It grounds LLM answers in company-specific PDF documents.

Pipeline Stages¶

1. Document Ingestion¶

flowchart LR
    A["ZIP archive"] --> B["extract_zip()"] --> C["PDF directory"]

The data_loader module extracts ai_initiative_reports.zip into data/ai_initiatives/.

2. Chunking¶

RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=tiktoken (cl100k_base),
)

Each PDF is split into overlapping chunks to preserve context across boundaries.

3. Embedding & Indexing¶

Chunks are embedded with OpenAI text-embedding-ada-002 and stored in a ChromaDB collection (ai_initiatives). The collection is persisted to data/chroma_db/ so subsequent runs skip re-indexing.

4. Retrieval¶

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5},
)

A user question is embedded and compared against stored chunks via cosine similarity. The top-k documents are returned.

5. Prompt Assembly¶

The system prompt instructs the LLM to answer only from the provided context. A user template injects the retrieved passages and the question:

CONTEXT:
{context}

QUESTION:
{question}

6. Generation¶

The assembled messages are sent to gpt-4o-mini. The response and the source excerpts are returned to the frontend.

Evaluation Layer¶

After generation, answers can be evaluated by the LLM-as-Judge module:

Groundedness – Is the answer faithful to the retrieved context?
Relevance – Does the answer address the user's question?

Each evaluation returns a score (1-5) and a justification.

Configuration Knobs¶

Parameter	Config Key	Default	Effect
Chunk size	`chunking.chunk_size`	1000	Larger = more context per chunk
Chunk overlap	`chunking.chunk_overlap`	200	Higher = less info loss at boundaries
Top-k results	`retriever.k`	5	More docs = richer context but noisier
Temperature	`llm.temperature`	0.3	Lower = more deterministic answers
Max tokens	`llm.max_tokens`	1024	Cap on answer length