config¶

`duallens_analytics.config` ¶

Application configuration – loads .env and Hydra YAML.

Hydra manages all tuneable knobs (model params, chunking, retriever, etc.) while secrets (API keys) come from .env via python-dotenv.

This dual approach showcases Hydra's structured-config capability while keeping credentials out of version control.

`CHROMA_DIR = DATA_DIR / 'chroma_db'` `module-attribute` ¶

Persisted ChromaDB vector-store directory.

`CONF_DIR = PROJECT_ROOT / 'conf'` `module-attribute` ¶

Directory containing the Hydra YAML configuration files.

`DATA_DIR = PROJECT_ROOT / 'data'` `module-attribute` ¶

Top-level runtime data directory (created at first run).

`PDF_DIR = DATA_DIR / 'pdfs'` `module-attribute` ¶

Directory where extracted PDF documents are stored.

`PROJECT_ROOT = Path(file).resolve().parents[2]` `module-attribute` ¶

Absolute path to the repository root (project_1_DualLens_Analytics/).

`ZIP_PATH = PROJECT_ROOT / 'notebooks' / 'Companies-AI-Initiatives.zip'` `module-attribute` ¶

Path to the ZIP archive containing the AI-initiative PDF reports.

`Settings` `dataclass` ¶

Runtime settings assembled from Hydra config + environment secrets.

This dataclass is the single source of truth for every tuneable parameter consumed at runtime. It is populated by :func:load_settings, which merges .env secrets with the Hydra YAML file.

Attributes:

Name	Type	Description
`api_key`	`str`	OpenAI-compatible API key (sourced from `.env`).
`api_base`	`str`	Base URL for the LLM endpoint (sourced from `.env`).
`model`	`str`	Chat-completion model identifier (e.g. `"gpt-4o-mini"`).
`embedding_model`	`str`	Embedding model identifier (e.g. `"text-embedding-ada-002"`).
`temperature`	`float`	Sampling temperature for the LLM (0 = deterministic).
`max_tokens`	`int`	Maximum number of tokens in the LLM response.
`top_p`	`float`	Nucleus-sampling probability mass.
`frequency_penalty`	`float`	Penalises repeated token sequences.
`chunk_size`	`int`	Target token count per document chunk.
`chunk_overlap`	`int`	Overlap in tokens between consecutive chunks.
`encoding_name`	`str`	Tiktoken encoding used for chunking (e.g. `"cl100k_base"`).
`retriever_k`	`int`	Number of top-k documents returned by the retriever.
`search_type`	`str`	ChromaDB search strategy (`"similarity"` or `"mmr"`).
`collection_name`	`str`	Name of the ChromaDB collection.
`companies`	`list[str]`	Ticker symbols of companies to analyse.
`stock_period`	`str`	Yahoo-Finance period string (e.g. `"3y"`).
`financial_metrics`	`list[str]`	Column labels shown on dashboards and reports.

Source code in src/duallens_analytics/config.py

@dataclass
class Settings:
    """Runtime settings assembled from Hydra config + environment secrets.

    This dataclass is the single source of truth for every tuneable
    parameter consumed at runtime.  It is populated by :func:`load_settings`,
    which merges ``.env`` secrets with the Hydra YAML file.

    Attributes:
        api_key: OpenAI-compatible API key (sourced from ``.env``).
        api_base: Base URL for the LLM endpoint (sourced from ``.env``).
        model: Chat-completion model identifier (e.g. ``"gpt-4o-mini"``).
        embedding_model: Embedding model identifier
            (e.g. ``"text-embedding-ada-002"``).
        temperature: Sampling temperature for the LLM (0 = deterministic).
        max_tokens: Maximum number of tokens in the LLM response.
        top_p: Nucleus-sampling probability mass.
        frequency_penalty: Penalises repeated token sequences.
        chunk_size: Target token count per document chunk.
        chunk_overlap: Overlap in tokens between consecutive chunks.
        encoding_name: Tiktoken encoding used for chunking
            (e.g. ``"cl100k_base"``).
        retriever_k: Number of top-*k* documents returned by the retriever.
        search_type: ChromaDB search strategy (``"similarity"`` or
            ``"mmr"``).
        collection_name: Name of the ChromaDB collection.
        companies: Ticker symbols of companies to analyse.
        stock_period: Yahoo-Finance period string (e.g. ``"3y"``).
        financial_metrics: Column labels shown on dashboards and reports.
    """

    # secrets (from .env)
    api_key: str = ""
    api_base: str = ""

    # LLM (from Hydra)
    model: str = DEFAULT_MODEL
    embedding_model: str = DEFAULT_EMBEDDING_MODEL
    temperature: float = 0.0
    max_tokens: int = 5000
    top_p: float = 0.95
    frequency_penalty: float = 1.2

    # chunking (from Hydra)
    chunk_size: int = 1000
    chunk_overlap: int = 200
    encoding_name: str = "cl100k_base"

    # retriever (from Hydra)
    retriever_k: int = 10
    search_type: str = "similarity"

    # vector store
    collection_name: str = "AI_Initiatives"

    # companies
    companies: list[str] = field(default_factory=lambda: ["GOOGL", "MSFT", "IBM", "NVDA", "AMZN"])

    # stock
    stock_period: str = "3y"

    # financial metrics
    financial_metrics: list[str] = field(
        default_factory=lambda: [
            "Market Cap",
            "P/E Ratio",
            "Dividend Yield",
            "Beta",
            "Total Revenue",
        ]
    )

    # --- helpers -----------------------------------------------------------
    def apply_env(self) -> None:
        """Push ``api_key`` and ``api_base`` into ``os.environ``.

        LangChain reads ``OPENAI_API_KEY`` and ``OPENAI_BASE_URL``
        from the environment, so this method bridges our dataclass
        with the library's expectations.
        """
        logger.info("Applying API credentials to os.environ")
        os.environ["OPENAI_API_KEY"] = self.api_key
        os.environ["OPENAI_BASE_URL"] = self.api_base
        logger.debug("OPENAI_BASE_URL=%s", self.api_base)

`apply_env()` ¶

Push api_key and api_base into os.environ.

LangChain reads OPENAI_API_KEY and OPENAI_BASE_URL from the environment, so this method bridges our dataclass with the library's expectations.

Source code in src/duallens_analytics/config.py

def apply_env(self) -> None:
    """Push ``api_key`` and ``api_base`` into ``os.environ``.

    LangChain reads ``OPENAI_API_KEY`` and ``OPENAI_BASE_URL``
    from the environment, so this method bridges our dataclass
    with the library's expectations.
    """
    logger.info("Applying API credentials to os.environ")
    os.environ["OPENAI_API_KEY"] = self.api_key
    os.environ["OPENAI_BASE_URL"] = self.api_base
    logger.debug("OPENAI_BASE_URL=%s", self.api_base)

`get_hydra_cfg()` ¶

Load conf/config.yaml via OmegaConf (Hydra-compatible).

We use OmegaConf directly so the Streamlit app (which has its own entry-point) can still benefit from structured YAML config without requiring @hydra.main.

Source code in src/duallens_analytics/config.py

def get_hydra_cfg() -> DictConfig:
    """Load ``conf/config.yaml`` via OmegaConf (Hydra-compatible).

    We use OmegaConf directly so the Streamlit app (which has its own
    entry-point) can still benefit from structured YAML config without
    requiring ``@hydra.main``.
    """
    global _hydra_cfg
    if _hydra_cfg is None:
        yaml_path = CONF_DIR / "config.yaml"
        logger.info("Loading Hydra config from %s", yaml_path)
        _hydra_cfg = OmegaConf.load(yaml_path)  # type: ignore[assignment]
        logger.debug("Hydra config loaded: %s", OmegaConf.to_yaml(_hydra_cfg))
    return _hydra_cfg  # type: ignore[return-value]

`load_settings()` ¶

Build a :class:Settings instance from .env secrets and Hydra YAML.

The function performs two steps:

Loads .env via python-dotenv to make API_KEY and OPENAI_API_BASE available in os.environ.
Reads conf/config.yaml through :func:get_hydra_cfg and maps every section to the corresponding :class:Settings field.

Returns:

Type	Description
`Settings`	A fully populated :class:`Settings` dataclass.

Source code in src/duallens_analytics/config.py

def load_settings() -> Settings:
    """Build a :class:`Settings` instance from ``.env`` secrets and Hydra YAML.

    The function performs two steps:

    1. Loads ``.env`` via ``python-dotenv`` to make ``API_KEY`` and
       ``OPENAI_API_BASE`` available in ``os.environ``.
    2. Reads ``conf/config.yaml`` through :func:`get_hydra_cfg` and maps
       every section to the corresponding :class:`Settings` field.

    Returns:
        A fully populated :class:`Settings` dataclass.
    """
    # 1. secrets from .env
    logger.info("Loading .env from %s", PROJECT_ROOT / ".env")
    load_dotenv(PROJECT_ROOT / ".env")

    # 2. structured config from Hydra YAML
    cfg = get_hydra_cfg()

    logger.info(
        "Building Settings: model=%s, chunk_size=%d, retriever_k=%d",
        cfg.llm.model,
        cfg.chunking.chunk_size,
        cfg.retriever.k,
    )
    return Settings(
        # secrets
        api_key=os.getenv("API_KEY", ""),
        api_base=os.getenv("OPENAI_API_BASE", ""),
        # LLM
        model=cfg.llm.model,
        temperature=cfg.llm.temperature,
        max_tokens=cfg.llm.max_tokens,
        top_p=cfg.llm.top_p,
        frequency_penalty=cfg.llm.frequency_penalty,
        embedding_model=cfg.embedding.model,
        # chunking
        chunk_size=cfg.chunking.chunk_size,
        chunk_overlap=cfg.chunking.chunk_overlap,
        encoding_name=cfg.chunking.encoding_name,
        # retriever
        retriever_k=cfg.retriever.k,
        search_type=cfg.retriever.search_type,
        # vector store
        collection_name=cfg.vector_store.collection_name,
        # companies & stock
        companies=OmegaConf.to_container(cfg.companies, resolve=True),  # type: ignore[arg-type]
        stock_period=cfg.stock.period,
        # financial metrics
        financial_metrics=OmegaConf.to_container(cfg.financial_metrics, resolve=True),  # type: ignore[arg-type]
    )

config¶

duallens_analytics.config ¶

CHROMA_DIR = DATA_DIR / 'chroma_db' module-attribute ¶

CONF_DIR = PROJECT_ROOT / 'conf' module-attribute ¶

DATA_DIR = PROJECT_ROOT / 'data' module-attribute ¶

PDF_DIR = DATA_DIR / 'pdfs' module-attribute ¶

PROJECT_ROOT = Path(__file__).resolve().parents[2] module-attribute ¶

ZIP_PATH = PROJECT_ROOT / 'notebooks' / 'Companies-AI-Initiatives.zip' module-attribute ¶

Settings dataclass ¶

apply_env() ¶

get_hydra_cfg() ¶

load_settings() ¶

`duallens_analytics.config` ¶

`CHROMA_DIR = DATA_DIR / 'chroma_db'` `module-attribute` ¶

`CONF_DIR = PROJECT_ROOT / 'conf'` `module-attribute` ¶

`DATA_DIR = PROJECT_ROOT / 'data'` `module-attribute` ¶

`PDF_DIR = DATA_DIR / 'pdfs'` `module-attribute` ¶

`PROJECT_ROOT = Path(file).resolve().parents[2]` `module-attribute` ¶

`ZIP_PATH = PROJECT_ROOT / 'notebooks' / 'Companies-AI-Initiatives.zip'` `module-attribute` ¶

`Settings` `dataclass` ¶

`apply_env()` ¶

`get_hydra_cfg()` ¶

`load_settings()` ¶