Ports (Interfaces)¶

Ports are the abstract contracts that define what the system needs from the outside world — without specifying how those needs are met.

They use Python's typing.Protocol with @runtime_checkable, which means:

Any class that implements the required methods is a valid adapter
No inheritance required — duck typing at its best
Mock adapters in tests satisfy the protocol without importing any real adapter

Swapping a provider¶

To replace any provider (e.g. swap Gemini for GPT-4o):

Write a new adapter class with the methods defined by the Port
Change one import line in services/container.py
Done — no other file changes

EmbeddingPort¶

Defines the contract for turning text into dense float vectors.

embedding_port ¶

ports/embedding_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for text embedding providers.

Any class that implements these methods (structural subtyping via Protocol) is a valid EmbeddingPort — no inheritance required.

Current implementation: VertexEmbeddingAdapter (Vertex AI text-embedding-005) To swap: write a new adapter implementing this Protocol and change container.py

EmbeddingPort ¶

Bases: Protocol

Contract for a text embedding provider.

Source code in prod/ports/embedding_port.py

@runtime_checkable
class EmbeddingPort(Protocol):
    """Contract for a text embedding provider."""

    @property
    def model_name(self) -> str:
        """Identifier of the underlying embedding model."""
        ...

    @property
    def dimensions(self) -> int:
        """Number of dimensions in the output vectors."""
        ...

    def embed_query(self, text: str) -> list[float]:
        """Embed a search query.

        Uses asymmetric retrieval task type (RETRIEVAL_QUERY) so the vector
        is optimised for matching against document embeddings.

        Args:
            text: Natural-language query string.

        Returns:
            Dense vector of floats.

        Raises:
            EmbeddingError: On API failure or empty response.
        """
        ...

    def embed_document(self, text: str, title: str = "") -> list[float]:
        """Embed a document for storage.

        Uses RETRIEVAL_DOCUMENT task type.

        Args:
            text:  Document text.
            title: Optional document title (improves quality).

        Returns:
            Dense vector of floats.
        """
        ...

    def embed_documents_batch(
        self,
        texts: list[str],
        titles: list[str] | None = None,
    ) -> list[list[float] | None]:
        """Embed multiple documents in a single API call.

        Args:
            texts:  List of document strings.
            titles: Optional parallel list of document titles.

        Returns:
            List of vectors in the same order as input.
            Individual elements may be None if that item failed.
        """
        ...

model_name `property` ¶

model_name: str

Identifier of the underlying embedding model.

dimensions `property` ¶

dimensions: int

Number of dimensions in the output vectors.

embed_query ¶

embed_query(text: str) -> list[float]

Embed a search query.

Uses asymmetric retrieval task type (RETRIEVAL_QUERY) so the vector is optimised for matching against document embeddings.

Parameters:

Name	Type	Description	Default
`text`	`str`	Natural-language query string.	required

Returns:

Type	Description
`list[float]`	Dense vector of floats.

Raises:

Type	Description
`EmbeddingError`	On API failure or empty response.

Source code in prod/ports/embedding_port.py

def embed_query(self, text: str) -> list[float]:
    """Embed a search query.

    Uses asymmetric retrieval task type (RETRIEVAL_QUERY) so the vector
    is optimised for matching against document embeddings.

    Args:
        text: Natural-language query string.

    Returns:
        Dense vector of floats.

    Raises:
        EmbeddingError: On API failure or empty response.
    """
    ...

embed_document ¶

embed_document(text: str, title: str = '') -> list[float]

Embed a document for storage.

Uses RETRIEVAL_DOCUMENT task type.

Parameters:

Name	Type	Description	Default
`text`	`str`	Document text.	required
`title`	`str`	Optional document title (improves quality).	`''`

Returns:

Type	Description
`list[float]`	Dense vector of floats.

Source code in prod/ports/embedding_port.py

def embed_document(self, text: str, title: str = "") -> list[float]:
    """Embed a document for storage.

    Uses RETRIEVAL_DOCUMENT task type.

    Args:
        text:  Document text.
        title: Optional document title (improves quality).

    Returns:
        Dense vector of floats.
    """
    ...

embed_documents_batch ¶

embed_documents_batch(texts: list[str], titles: list[str] | None = None) -> list[list[float] | None]

Embed multiple documents in a single API call.

Parameters:

Name	Type	Description	Default
`texts`	`list[str]`	List of document strings.	required
`titles`	`list[str] \| None`	Optional parallel list of document titles.	`None`

Returns:

Type	Description
`list[list[float] \| None]`	List of vectors in the same order as input.
`list[list[float] \| None]`	Individual elements may be None if that item failed.

Source code in prod/ports/embedding_port.py

def embed_documents_batch(
    self,
    texts: list[str],
    titles: list[str] | None = None,
) -> list[list[float] | None]:
    """Embed multiple documents in a single API call.

    Args:
        texts:  List of document strings.
        titles: Optional parallel list of document titles.

    Returns:
        List of vectors in the same order as input.
        Individual elements may be None if that item failed.
    """
    ...

LLMPort¶

Defines the contract for generating a JSON string from a prompt pair.

llm_port ¶

ports/llm_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for LLM (large language model) providers.

Current implementation: GeminiLLMAdapter (Vertex AI Gemini) To swap to OpenAI GPT: write OpenAILLMAdapter implementing this Protocol, then change ONE line in services/container.py.

LLMPort ¶

Bases: Protocol

Contract for a JSON-generating LLM provider.

Source code in prod/ports/llm_port.py

@runtime_checkable
class LLMPort(Protocol):
    """Contract for a JSON-generating LLM provider."""

    @property
    def model_name(self) -> str:
        """Identifier of the underlying LLM."""
        ...

    def generate_json(
        self,
        system_prompt: str,
        user_message: str,
    ) -> str | None:
        """Send a prompt to the LLM and return its JSON response as a string.

        The caller is responsible for parsing the returned string.  The
        adapter should request JSON-mode output from the underlying model
        so the response is guaranteed to be valid JSON.

        Args:
            system_prompt: System-level instruction.
            user_message:  User-turn content.

        Returns:
            Raw JSON string, or None if the call failed.

        Raises:
            LLMError: On unrecoverable API failure.
        """
        ...

model_name `property` ¶

model_name: str

Identifier of the underlying LLM.

generate_json ¶

generate_json(system_prompt: str, user_message: str) -> str | None

Send a prompt to the LLM and return its JSON response as a string.

The caller is responsible for parsing the returned string. The adapter should request JSON-mode output from the underlying model so the response is guaranteed to be valid JSON.

Parameters:

Name	Type	Description	Default
`system_prompt`	`str`	System-level instruction.	required
`user_message`	`str`	User-turn content.	required

Returns:

Type	Description
`str \| None`	Raw JSON string, or None if the call failed.

Raises:

Type	Description
`LLMError`	On unrecoverable API failure.

Source code in prod/ports/llm_port.py

def generate_json(
    self,
    system_prompt: str,
    user_message: str,
) -> str | None:
    """Send a prompt to the LLM and return its JSON response as a string.

    The caller is responsible for parsing the returned string.  The
    adapter should request JSON-mode output from the underlying model
    so the response is guaranteed to be valid JSON.

    Args:
        system_prompt: System-level instruction.
        user_message:  User-turn content.

    Returns:
        Raw JSON string, or None if the call failed.

    Raises:
        LLMError: On unrecoverable API failure.
    """
    ...

DatabasePort¶

Defines the contract for the hybrid search database backend.

Note that RRF fusion is deliberately not part of this port — it lives in services/retriever.py as pure Python so it can be unit-tested without a database connection.

database_port ¶

ports/database_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for the vector + FTS database.

The port deliberately separates the three database concerns

vector_search — ANN similarity search
fts_search — keyword / full-text search
fetch_by_codes — bulk record retrieval by primary key

This separation means

• RRF fusion is done in pure Python (services/retriever.py), making it trivially unit-testable with no database dependency. • Each search method can be mocked or replaced independently.

Current implementation: PostgresDatabaseAdapter (psycopg2 + pgvector) To swap: write a new adapter (e.g. WeaviateDatabaseAdapter) implementing this Protocol and change ONE line in services/container.py.

DatabasePort ¶

Bases: Protocol

Contract for the hybrid search database backend.

Source code in prod/ports/database_port.py

@runtime_checkable
class DatabasePort(Protocol):
    """Contract for the hybrid search database backend."""

    def vector_search(
        self,
        embedding: list[float],
        limit: int,
    ) -> list[tuple[str, int]]:
        """Run approximate nearest-neighbour search over stored embeddings.

        Args:
            embedding: Query vector (must match the stored dimension).
            limit:     Maximum number of results.

        Returns:
            List of (anzsic_code, rank) tuples ordered by similarity
            (rank 1 = most similar).

        Raises:
            DatabaseError: On connection or query failure.
        """
        ...

    def fts_search(
        self,
        query_text: str,
        limit: int,
    ) -> list[tuple[str, int]]:
        """Run full-text search using the stored tsvector index.

        Args:
            query_text: Natural-language search string.
            limit:      Maximum number of results.

        Returns:
            List of (anzsic_code, rank) tuples ordered by FTS score
            (rank 1 = highest relevance).

        Raises:
            DatabaseError: On connection or query failure.
        """
        ...

    def fetch_by_codes(self, codes: list[str]) -> dict[str, dict]:
        """Fetch full ANZSIC records for a list of codes.

        Args:
            codes: List of anzsic_code primary keys to retrieve.

        Returns:
            Dict mapping anzsic_code → record dict.
            Codes not found in the database are silently omitted.

        Raises:
            DatabaseError: On connection or query failure.
        """
        ...

vector_search ¶

vector_search(embedding: list[float], limit: int) -> list[tuple[str, int]]

Run approximate nearest-neighbour search over stored embeddings.

Parameters:

Name	Type	Description	Default
`embedding`	`list[float]`	Query vector (must match the stored dimension).	required
`limit`	`int`	Maximum number of results.	required

Returns:

Type	Description
`list[tuple[str, int]]`	List of (anzsic_code, rank) tuples ordered by similarity
`list[tuple[str, int]]`	(rank 1 = most similar).

Raises:

Type	Description
`DatabaseError`	On connection or query failure.

Source code in prod/ports/database_port.py

def vector_search(
    self,
    embedding: list[float],
    limit: int,
) -> list[tuple[str, int]]:
    """Run approximate nearest-neighbour search over stored embeddings.

    Args:
        embedding: Query vector (must match the stored dimension).
        limit:     Maximum number of results.

    Returns:
        List of (anzsic_code, rank) tuples ordered by similarity
        (rank 1 = most similar).

    Raises:
        DatabaseError: On connection or query failure.
    """
    ...

fts_search ¶

fts_search(query_text: str, limit: int) -> list[tuple[str, int]]

Run full-text search using the stored tsvector index.

Parameters:

Name	Type	Description	Default
`query_text`	`str`	Natural-language search string.	required
`limit`	`int`	Maximum number of results.	required

Returns:

Type	Description
`list[tuple[str, int]]`	List of (anzsic_code, rank) tuples ordered by FTS score
`list[tuple[str, int]]`	(rank 1 = highest relevance).

Raises:

Type	Description
`DatabaseError`	On connection or query failure.

Source code in prod/ports/database_port.py

def fts_search(
    self,
    query_text: str,
    limit: int,
) -> list[tuple[str, int]]:
    """Run full-text search using the stored tsvector index.

    Args:
        query_text: Natural-language search string.
        limit:      Maximum number of results.

    Returns:
        List of (anzsic_code, rank) tuples ordered by FTS score
        (rank 1 = highest relevance).

    Raises:
        DatabaseError: On connection or query failure.
    """
    ...

fetch_by_codes ¶

fetch_by_codes(codes: list[str]) -> dict[str, dict]

Fetch full ANZSIC records for a list of codes.

Parameters:

Name	Type	Description	Default
`codes`	`list[str]`	List of anzsic_code primary keys to retrieve.	required

Returns:

Type	Description
`dict[str, dict]`	Dict mapping anzsic_code → record dict.
`dict[str, dict]`	Codes not found in the database are silently omitted.

Raises:

Type	Description
`DatabaseError`	On connection or query failure.

Source code in prod/ports/database_port.py

def fetch_by_codes(self, codes: list[str]) -> dict[str, dict]:
    """Fetch full ANZSIC records for a list of codes.

    Args:
        codes: List of anzsic_code primary keys to retrieve.

    Returns:
        Dict mapping anzsic_code → record dict.
        Codes not found in the database are silently omitted.

    Raises:
        DatabaseError: On connection or query failure.
    """
    ...

Ports (Interfaces)¶

Swapping a provider¶

EmbeddingPort¶

embedding_port ¶

EmbeddingPort ¶

model_name property ¶

dimensions property ¶

embed_query ¶

embed_document ¶

embed_documents_batch ¶

LLMPort¶

llm_port ¶

LLMPort ¶

model_name property ¶

generate_json ¶

DatabasePort¶

database_port ¶

DatabasePort ¶

vector_search ¶

fts_search ¶

fetch_by_codes ¶

model_name `property` ¶

dimensions `property` ¶

model_name `property` ¶