vector_store¶
duallens_analytics.vector_store
¶
ChromaDB vector-store management.
This module provides functions to build, load, and query a ChromaDB vector store backed by OpenAI embeddings. The store is persisted to disk so subsequent application runs skip re-embedding.
Typical usage::
store = get_or_create_vector_store(settings)
retriever = get_retriever(store, k=5)
docs = retriever.invoke("What AI projects is Google working on?")
build_vector_store(chunks, settings, persist_dir=CHROMA_DIR, collection_name='AI_Initiatives')
¶
Create a ChromaDB collection from pre-chunked documents.
Each document chunk is embedded via the configured OpenAI embedding model and stored in a local ChromaDB directory so that subsequent runs can load the collection without re-embedding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Document]
|
Pre-split :class: |
required |
settings
|
Settings
|
Application settings (used to select the embedding model). |
required |
persist_dir
|
Path
|
Filesystem path where ChromaDB persists its data. |
CHROMA_DIR
|
collection_name
|
str
|
Logical name of the collection inside ChromaDB. |
'AI_Initiatives'
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
Chroma
|
class: |
Chroma
|
backed by the newly created collection. |
Source code in src/duallens_analytics/vector_store.py
collection_exists(persist_dir=CHROMA_DIR, collection_name='AI_Initiatives')
¶
Check whether a persisted ChromaDB collection already exists.
The heuristic inspects the presence of chroma.sqlite3 inside
persist_dir.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
persist_dir
|
Path
|
Directory that ChromaDB uses for persistence. |
CHROMA_DIR
|
collection_name
|
str
|
Unused at present but kept for future multi-collection support. |
'AI_Initiatives'
|
Returns:
| Type | Description |
|---|---|
bool
|
|
Source code in src/duallens_analytics/vector_store.py
get_or_create_vector_store(settings, persist_dir=CHROMA_DIR, collection_name=None)
¶
Load an existing vector store or ingest from PDFs if none is found.
This is the recommended entry-point for application code. It encapsulates the load-or-build logic so callers do not need to manage ingestion themselves.
If the ChromaDB data directory already exists on disk the documents are not re-embedded, saving time and API credits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
Settings
|
Application settings (embedding model, chunking params, collection name). |
required |
persist_dir
|
Path
|
Filesystem path for ChromaDB persistence. |
CHROMA_DIR
|
collection_name
|
str | None
|
Override for the collection name. Defaults to
|
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
Chroma
|
class: |
Chroma
|
ready for retrieval. |
Source code in src/duallens_analytics/vector_store.py
get_retriever(store, k=10, search_type='similarity')
¶
Create a LangChain retriever over the given vector store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
store
|
Chroma
|
A ChromaDB-backed vector store. |
required |
k
|
int
|
Number of top-matching documents to return per query. |
10
|
search_type
|
str
|
Retrieval strategy — |
'similarity'
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
VectorStoreRetriever
|
class: |
VectorStoreRetriever
|
that can be called with |
Source code in src/duallens_analytics/vector_store.py
load_vector_store(settings, persist_dir=CHROMA_DIR, collection_name='AI_Initiatives')
¶
Load an already-persisted ChromaDB collection from disk.
This skips the embedding step entirely by reading the existing SQLite database that ChromaDB maintains inside persist_dir.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
Settings
|
Application settings (needed for the embedding function so that query-time embeddings match the stored ones). |
required |
persist_dir
|
Path
|
Filesystem path where ChromaDB persists its data. |
CHROMA_DIR
|
collection_name
|
str
|
Logical name of the collection inside ChromaDB. |
'AI_Initiatives'
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
Chroma
|
class: |