Skip to content

rag

duallens_analytics.rag

RAG pipeline – retrieval + LLM generation.

This module implements the core Retrieval-Augmented Generation loop:

  1. Accept a user question.
  2. Retrieve the top-k relevant document chunks from the ChromaDB vector store.
  3. Assemble a prompt that injects the retrieved context.
  4. Invoke the Chat LLM and return the generated answer together with the source excerpts.

QNA_SYSTEM_MESSAGE = "You are an assistant specialised in reviewing AI initiatives of companies and providing accurate answers based on the provided context.\n\nUser input will include all the context you need to answer their question.\nThis context will always begin with the token: ###Context.\nThe context contains references to specific AI initiatives, projects, or programmes of companies relevant to the user's question.\n\nInstructions:\n- Answer ONLY based on the context provided.\n- If the context is insufficient, clearly state that.\n- Your response should be well-structured and concise.\n" module-attribute

System message instructing the LLM to answer only from provided context.

QNA_USER_TEMPLATE = '###Context\nHere are some documents that are relevant to the question mentioned below.\n{context}\n\n###Question\n{question}\n' module-attribute

User-message template with {context} and {question} placeholders.

get_llm(settings)

Return a configured :class:~langchain_openai.ChatOpenAI instance.

All LLM hyper-parameters (model, temperature, max_tokens, etc.) are read from the supplied :class:~duallens_analytics.config.Settings.

Parameters:

Name Type Description Default
settings Settings

Application settings.

required

Returns:

Type Description
ChatOpenAI

A ready-to-invoke ChatOpenAI object.

Source code in src/duallens_analytics/rag.py
def get_llm(settings: Settings) -> ChatOpenAI:
    """Return a configured :class:`~langchain_openai.ChatOpenAI` instance.

    All LLM hyper-parameters (model, temperature, max_tokens, etc.) are
    read from the supplied :class:`~duallens_analytics.config.Settings`.

    Args:
        settings: Application settings.

    Returns:
        A ready-to-invoke ``ChatOpenAI`` object.
    """
    logger.debug(
        "Creating ChatOpenAI: model=%s, temp=%.2f, max_tokens=%d",
        settings.model,
        settings.temperature,
        settings.max_tokens,
    )
    return ChatOpenAI(
        model=settings.model,
        temperature=settings.temperature,
        max_tokens=settings.max_tokens,
        top_p=settings.top_p,
        frequency_penalty=settings.frequency_penalty,
    )

query_rag(question, retriever, settings)

Execute the full RAG loop: retrieve context then generate an answer.

Parameters:

Name Type Description Default
question str

Natural-language question from the user.

required
retriever VectorStoreRetriever

A LangChain retriever backed by the ChromaDB vector store (see :func:~duallens_analytics.vector_store.get_retriever).

required
settings Settings

Application settings forwarded to :func:get_llm.

required

Returns:

Type Description
str

A tuple of (answer_text, source_excerpts) where

list[str]

source_excerpts is a list of the raw page_content strings

tuple[str, list[str]]

from the retrieved document chunks.

Source code in src/duallens_analytics/rag.py
def query_rag(
    question: str,
    retriever: VectorStoreRetriever,
    settings: Settings,
) -> tuple[str, list[str]]:
    """Execute the full RAG loop: retrieve context then generate an answer.

    Args:
        question: Natural-language question from the user.
        retriever: A LangChain retriever backed by the ChromaDB vector
            store (see :func:`~duallens_analytics.vector_store.get_retriever`).
        settings: Application settings forwarded to :func:`get_llm`.

    Returns:
        A tuple of ``(answer_text, source_excerpts)`` where
        *source_excerpts* is a list of the raw ``page_content`` strings
        from the retrieved document chunks.
    """
    logger.info("RAG query: %s", question[:120])
    docs = retriever.invoke(question)
    context_parts = [d.page_content for d in docs]
    logger.debug("Retrieved %d context chunks", len(context_parts))
    context = ". ".join(context_parts)

    prompt = (
        f"[INST]{QNA_SYSTEM_MESSAGE}\n"
        f"user: {QNA_USER_TEMPLATE.format(context=context, question=question)}"
        f"[/INST]"
    )

    llm = get_llm(settings)
    logger.debug("Invoking LLM for RAG answer")
    resp = llm.invoke(prompt)
    logger.info("RAG answer generated (%d chars)", len(resp.content))
    return resp.content, context_parts