config¶
duallens_analytics.config
¶
Application configuration – loads .env and Hydra YAML.
Hydra manages all tuneable knobs (model params, chunking, retriever, etc.)
while secrets (API keys) come from .env via python-dotenv.
This dual approach showcases Hydra's structured-config capability while keeping credentials out of version control.
CHROMA_DIR = DATA_DIR / 'chroma_db'
module-attribute
¶
Persisted ChromaDB vector-store directory.
CONF_DIR = PROJECT_ROOT / 'conf'
module-attribute
¶
Directory containing the Hydra YAML configuration files.
DATA_DIR = PROJECT_ROOT / 'data'
module-attribute
¶
Top-level runtime data directory (created at first run).
PDF_DIR = DATA_DIR / 'pdfs'
module-attribute
¶
Directory where extracted PDF documents are stored.
PROJECT_ROOT = Path(__file__).resolve().parents[2]
module-attribute
¶
Absolute path to the repository root (project_1_DualLens_Analytics/).
ZIP_PATH = PROJECT_ROOT / 'notebooks' / 'Companies-AI-Initiatives.zip'
module-attribute
¶
Path to the ZIP archive containing the AI-initiative PDF reports.
Settings
dataclass
¶
Runtime settings assembled from Hydra config + environment secrets.
This dataclass is the single source of truth for every tuneable
parameter consumed at runtime. It is populated by :func:load_settings,
which merges .env secrets with the Hydra YAML file.
Attributes:
| Name | Type | Description |
|---|---|---|
api_key |
str
|
OpenAI-compatible API key (sourced from |
api_base |
str
|
Base URL for the LLM endpoint (sourced from |
model |
str
|
Chat-completion model identifier (e.g. |
embedding_model |
str
|
Embedding model identifier
(e.g. |
temperature |
float
|
Sampling temperature for the LLM (0 = deterministic). |
max_tokens |
int
|
Maximum number of tokens in the LLM response. |
top_p |
float
|
Nucleus-sampling probability mass. |
frequency_penalty |
float
|
Penalises repeated token sequences. |
chunk_size |
int
|
Target token count per document chunk. |
chunk_overlap |
int
|
Overlap in tokens between consecutive chunks. |
encoding_name |
str
|
Tiktoken encoding used for chunking
(e.g. |
retriever_k |
int
|
Number of top-k documents returned by the retriever. |
search_type |
str
|
ChromaDB search strategy ( |
collection_name |
str
|
Name of the ChromaDB collection. |
companies |
list[str]
|
Ticker symbols of companies to analyse. |
stock_period |
str
|
Yahoo-Finance period string (e.g. |
financial_metrics |
list[str]
|
Column labels shown on dashboards and reports. |
Source code in src/duallens_analytics/config.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 | |
apply_env()
¶
Push api_key and api_base into os.environ.
LangChain reads OPENAI_API_KEY and OPENAI_BASE_URL
from the environment, so this method bridges our dataclass
with the library's expectations.
Source code in src/duallens_analytics/config.py
get_hydra_cfg()
¶
Load conf/config.yaml via OmegaConf (Hydra-compatible).
We use OmegaConf directly so the Streamlit app (which has its own
entry-point) can still benefit from structured YAML config without
requiring @hydra.main.
Source code in src/duallens_analytics/config.py
load_settings()
¶
Build a :class:Settings instance from .env secrets and Hydra YAML.
The function performs two steps:
- Loads
.envviapython-dotenvto makeAPI_KEYandOPENAI_API_BASEavailable inos.environ. - Reads
conf/config.yamlthrough :func:get_hydra_cfgand maps every section to the corresponding :class:Settingsfield.
Returns:
| Type | Description |
|---|---|
Settings
|
A fully populated :class: |