Ports (Interfaces)¶
Ports are the abstract contracts that define what the system needs from the outside world — without specifying how those needs are met.
They use Python's typing.Protocol with @runtime_checkable, which means:
- Any class that implements the required methods is a valid adapter
- No inheritance required — duck typing at its best
- Mock adapters in tests satisfy the protocol without importing any real adapter
Swapping a provider¶
To replace any provider (e.g. swap Gemini for GPT-4o):
- Write a new adapter class with the methods defined by the Port
- Change one import line in
services/container.py - Done — no other file changes
EmbeddingPort¶
Defines the contract for turning text into dense float vectors.
embedding_port ¶
ports/embedding_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for text embedding providers.
Any class that implements these methods (structural subtyping via Protocol) is a valid EmbeddingPort — no inheritance required.
Current implementation: VertexEmbeddingAdapter (Vertex AI text-embedding-005) To swap: write a new adapter implementing this Protocol and change container.py
EmbeddingPort ¶
Bases: Protocol
Contract for a text embedding provider.
Source code in prod/ports/embedding_port.py
embed_query ¶
Embed a search query.
Uses asymmetric retrieval task type (RETRIEVAL_QUERY) so the vector is optimised for matching against document embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Natural-language query string. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
Dense vector of floats. |
Raises:
| Type | Description |
|---|---|
EmbeddingError
|
On API failure or empty response. |
Source code in prod/ports/embedding_port.py
embed_document ¶
Embed a document for storage.
Uses RETRIEVAL_DOCUMENT task type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Document text. |
required |
title
|
str
|
Optional document title (improves quality). |
''
|
Returns:
| Type | Description |
|---|---|
list[float]
|
Dense vector of floats. |
Source code in prod/ports/embedding_port.py
embed_documents_batch ¶
embed_documents_batch(texts: list[str], titles: list[str] | None = None) -> list[list[float] | None]
Embed multiple documents in a single API call.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of document strings. |
required |
titles
|
list[str] | None
|
Optional parallel list of document titles. |
None
|
Returns:
| Type | Description |
|---|---|
list[list[float] | None]
|
List of vectors in the same order as input. |
list[list[float] | None]
|
Individual elements may be None if that item failed. |
Source code in prod/ports/embedding_port.py
LLMPort¶
Defines the contract for generating a JSON string from a prompt pair.
llm_port ¶
ports/llm_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for LLM (large language model) providers.
Current implementation: GeminiLLMAdapter (Vertex AI Gemini) To swap to OpenAI GPT: write OpenAILLMAdapter implementing this Protocol, then change ONE line in services/container.py.
LLMPort ¶
Bases: Protocol
Contract for a JSON-generating LLM provider.
Source code in prod/ports/llm_port.py
generate_json ¶
Send a prompt to the LLM and return its JSON response as a string.
The caller is responsible for parsing the returned string. The adapter should request JSON-mode output from the underlying model so the response is guaranteed to be valid JSON.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
system_prompt
|
str
|
System-level instruction. |
required |
user_message
|
str
|
User-turn content. |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
Raw JSON string, or None if the call failed. |
Raises:
| Type | Description |
|---|---|
LLMError
|
On unrecoverable API failure. |
Source code in prod/ports/llm_port.py
DatabasePort¶
Defines the contract for the hybrid search database backend.
Note that RRF fusion is deliberately not part of this port — it lives in
services/retriever.py as pure Python so it can be unit-tested without a
database connection.
database_port ¶
ports/database_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for the vector + FTS database.
The port deliberately separates the three database concerns
- vector_search — ANN similarity search
- fts_search — keyword / full-text search
- fetch_by_codes — bulk record retrieval by primary key
This separation means
• RRF fusion is done in pure Python (services/retriever.py), making it trivially unit-testable with no database dependency. • Each search method can be mocked or replaced independently.
Current implementation: PostgresDatabaseAdapter (psycopg2 + pgvector) To swap: write a new adapter (e.g. WeaviateDatabaseAdapter) implementing this Protocol and change ONE line in services/container.py.
DatabasePort ¶
Bases: Protocol
Contract for the hybrid search database backend.
Source code in prod/ports/database_port.py
vector_search ¶
Run approximate nearest-neighbour search over stored embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding
|
list[float]
|
Query vector (must match the stored dimension). |
required |
limit
|
int
|
Maximum number of results. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[str, int]]
|
List of (anzsic_code, rank) tuples ordered by similarity |
list[tuple[str, int]]
|
(rank 1 = most similar). |
Raises:
| Type | Description |
|---|---|
DatabaseError
|
On connection or query failure. |
Source code in prod/ports/database_port.py
fts_search ¶
Run full-text search using the stored tsvector index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_text
|
str
|
Natural-language search string. |
required |
limit
|
int
|
Maximum number of results. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[str, int]]
|
List of (anzsic_code, rank) tuples ordered by FTS score |
list[tuple[str, int]]
|
(rank 1 = highest relevance). |
Raises:
| Type | Description |
|---|---|
DatabaseError
|
On connection or query failure. |
Source code in prod/ports/database_port.py
fetch_by_codes ¶
Fetch full ANZSIC records for a list of codes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
codes
|
list[str]
|
List of anzsic_code primary keys to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, dict]
|
Dict mapping anzsic_code → record dict. |
dict[str, dict]
|
Codes not found in the database are silently omitted. |
Raises:
| Type | Description |
|---|---|
DatabaseError
|
On connection or query failure. |