Skip to content

Ports (Interfaces)

Ports are the abstract contracts that define what the system needs from the outside world — without specifying how those needs are met.

They use Python's typing.Protocol with @runtime_checkable, which means:

  • Any class that implements the required methods is a valid adapter
  • No inheritance required — duck typing at its best
  • Mock adapters in tests satisfy the protocol without importing any real adapter

Swapping a provider

To replace any provider (e.g. swap Gemini for GPT-4o):

  1. Write a new adapter class with the methods defined by the Port
  2. Change one import line in services/container.py
  3. Done — no other file changes

EmbeddingPort

Defines the contract for turning text into dense float vectors.

embedding_port

ports/embedding_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for text embedding providers.

Any class that implements these methods (structural subtyping via Protocol) is a valid EmbeddingPort — no inheritance required.

Current implementation: VertexEmbeddingAdapter (Vertex AI text-embedding-005) To swap: write a new adapter implementing this Protocol and change container.py

EmbeddingPort

Bases: Protocol

Contract for a text embedding provider.

Source code in prod/ports/embedding_port.py
@runtime_checkable
class EmbeddingPort(Protocol):
    """Contract for a text embedding provider."""

    @property
    def model_name(self) -> str:
        """Identifier of the underlying embedding model."""
        ...

    @property
    def dimensions(self) -> int:
        """Number of dimensions in the output vectors."""
        ...

    def embed_query(self, text: str) -> list[float]:
        """Embed a search query.

        Uses asymmetric retrieval task type (RETRIEVAL_QUERY) so the vector
        is optimised for matching against document embeddings.

        Args:
            text: Natural-language query string.

        Returns:
            Dense vector of floats.

        Raises:
            EmbeddingError: On API failure or empty response.
        """
        ...

    def embed_document(self, text: str, title: str = "") -> list[float]:
        """Embed a document for storage.

        Uses RETRIEVAL_DOCUMENT task type.

        Args:
            text:  Document text.
            title: Optional document title (improves quality).

        Returns:
            Dense vector of floats.
        """
        ...

    def embed_documents_batch(
        self,
        texts: list[str],
        titles: list[str] | None = None,
    ) -> list[list[float] | None]:
        """Embed multiple documents in a single API call.

        Args:
            texts:  List of document strings.
            titles: Optional parallel list of document titles.

        Returns:
            List of vectors in the same order as input.
            Individual elements may be None if that item failed.
        """
        ...

model_name property

model_name: str

Identifier of the underlying embedding model.

dimensions property

dimensions: int

Number of dimensions in the output vectors.

embed_query

embed_query(text: str) -> list[float]

Embed a search query.

Uses asymmetric retrieval task type (RETRIEVAL_QUERY) so the vector is optimised for matching against document embeddings.

Parameters:

Name Type Description Default
text str

Natural-language query string.

required

Returns:

Type Description
list[float]

Dense vector of floats.

Raises:

Type Description
EmbeddingError

On API failure or empty response.

Source code in prod/ports/embedding_port.py
def embed_query(self, text: str) -> list[float]:
    """Embed a search query.

    Uses asymmetric retrieval task type (RETRIEVAL_QUERY) so the vector
    is optimised for matching against document embeddings.

    Args:
        text: Natural-language query string.

    Returns:
        Dense vector of floats.

    Raises:
        EmbeddingError: On API failure or empty response.
    """
    ...

embed_document

embed_document(text: str, title: str = '') -> list[float]

Embed a document for storage.

Uses RETRIEVAL_DOCUMENT task type.

Parameters:

Name Type Description Default
text str

Document text.

required
title str

Optional document title (improves quality).

''

Returns:

Type Description
list[float]

Dense vector of floats.

Source code in prod/ports/embedding_port.py
def embed_document(self, text: str, title: str = "") -> list[float]:
    """Embed a document for storage.

    Uses RETRIEVAL_DOCUMENT task type.

    Args:
        text:  Document text.
        title: Optional document title (improves quality).

    Returns:
        Dense vector of floats.
    """
    ...

embed_documents_batch

embed_documents_batch(texts: list[str], titles: list[str] | None = None) -> list[list[float] | None]

Embed multiple documents in a single API call.

Parameters:

Name Type Description Default
texts list[str]

List of document strings.

required
titles list[str] | None

Optional parallel list of document titles.

None

Returns:

Type Description
list[list[float] | None]

List of vectors in the same order as input.

list[list[float] | None]

Individual elements may be None if that item failed.

Source code in prod/ports/embedding_port.py
def embed_documents_batch(
    self,
    texts: list[str],
    titles: list[str] | None = None,
) -> list[list[float] | None]:
    """Embed multiple documents in a single API call.

    Args:
        texts:  List of document strings.
        titles: Optional parallel list of document titles.

    Returns:
        List of vectors in the same order as input.
        Individual elements may be None if that item failed.
    """
    ...

LLMPort

Defines the contract for generating a JSON string from a prompt pair.

llm_port

ports/llm_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for LLM (large language model) providers.

Current implementation: GeminiLLMAdapter (Vertex AI Gemini) To swap to OpenAI GPT: write OpenAILLMAdapter implementing this Protocol, then change ONE line in services/container.py.

LLMPort

Bases: Protocol

Contract for a JSON-generating LLM provider.

Source code in prod/ports/llm_port.py
@runtime_checkable
class LLMPort(Protocol):
    """Contract for a JSON-generating LLM provider."""

    @property
    def model_name(self) -> str:
        """Identifier of the underlying LLM."""
        ...

    def generate_json(
        self,
        system_prompt: str,
        user_message: str,
    ) -> str | None:
        """Send a prompt to the LLM and return its JSON response as a string.

        The caller is responsible for parsing the returned string.  The
        adapter should request JSON-mode output from the underlying model
        so the response is guaranteed to be valid JSON.

        Args:
            system_prompt: System-level instruction.
            user_message:  User-turn content.

        Returns:
            Raw JSON string, or None if the call failed.

        Raises:
            LLMError: On unrecoverable API failure.
        """
        ...

model_name property

model_name: str

Identifier of the underlying LLM.

generate_json

generate_json(system_prompt: str, user_message: str) -> str | None

Send a prompt to the LLM and return its JSON response as a string.

The caller is responsible for parsing the returned string. The adapter should request JSON-mode output from the underlying model so the response is guaranteed to be valid JSON.

Parameters:

Name Type Description Default
system_prompt str

System-level instruction.

required
user_message str

User-turn content.

required

Returns:

Type Description
str | None

Raw JSON string, or None if the call failed.

Raises:

Type Description
LLMError

On unrecoverable API failure.

Source code in prod/ports/llm_port.py
def generate_json(
    self,
    system_prompt: str,
    user_message: str,
) -> str | None:
    """Send a prompt to the LLM and return its JSON response as a string.

    The caller is responsible for parsing the returned string.  The
    adapter should request JSON-mode output from the underlying model
    so the response is guaranteed to be valid JSON.

    Args:
        system_prompt: System-level instruction.
        user_message:  User-turn content.

    Returns:
        Raw JSON string, or None if the call failed.

    Raises:
        LLMError: On unrecoverable API failure.
    """
    ...

DatabasePort

Defines the contract for the hybrid search database backend.

Note that RRF fusion is deliberately not part of this port — it lives in services/retriever.py as pure Python so it can be unit-tested without a database connection.

database_port

ports/database_port.py ────────────────────────────────────────────────────────────────────────────── Abstract interface for the vector + FTS database.

The port deliberately separates the three database concerns
  1. vector_search — ANN similarity search
  2. fts_search — keyword / full-text search
  3. fetch_by_codes — bulk record retrieval by primary key
This separation means

• RRF fusion is done in pure Python (services/retriever.py), making it trivially unit-testable with no database dependency. • Each search method can be mocked or replaced independently.

Current implementation: PostgresDatabaseAdapter (psycopg2 + pgvector) To swap: write a new adapter (e.g. WeaviateDatabaseAdapter) implementing this Protocol and change ONE line in services/container.py.

DatabasePort

Bases: Protocol

Contract for the hybrid search database backend.

Source code in prod/ports/database_port.py
@runtime_checkable
class DatabasePort(Protocol):
    """Contract for the hybrid search database backend."""

    def vector_search(
        self,
        embedding: list[float],
        limit: int,
    ) -> list[tuple[str, int]]:
        """Run approximate nearest-neighbour search over stored embeddings.

        Args:
            embedding: Query vector (must match the stored dimension).
            limit:     Maximum number of results.

        Returns:
            List of (anzsic_code, rank) tuples ordered by similarity
            (rank 1 = most similar).

        Raises:
            DatabaseError: On connection or query failure.
        """
        ...

    def fts_search(
        self,
        query_text: str,
        limit: int,
    ) -> list[tuple[str, int]]:
        """Run full-text search using the stored tsvector index.

        Args:
            query_text: Natural-language search string.
            limit:      Maximum number of results.

        Returns:
            List of (anzsic_code, rank) tuples ordered by FTS score
            (rank 1 = highest relevance).

        Raises:
            DatabaseError: On connection or query failure.
        """
        ...

    def fetch_by_codes(self, codes: list[str]) -> dict[str, dict]:
        """Fetch full ANZSIC records for a list of codes.

        Args:
            codes: List of anzsic_code primary keys to retrieve.

        Returns:
            Dict mapping anzsic_code → record dict.
            Codes not found in the database are silently omitted.

        Raises:
            DatabaseError: On connection or query failure.
        """
        ...
vector_search(embedding: list[float], limit: int) -> list[tuple[str, int]]

Run approximate nearest-neighbour search over stored embeddings.

Parameters:

Name Type Description Default
embedding list[float]

Query vector (must match the stored dimension).

required
limit int

Maximum number of results.

required

Returns:

Type Description
list[tuple[str, int]]

List of (anzsic_code, rank) tuples ordered by similarity

list[tuple[str, int]]

(rank 1 = most similar).

Raises:

Type Description
DatabaseError

On connection or query failure.

Source code in prod/ports/database_port.py
def vector_search(
    self,
    embedding: list[float],
    limit: int,
) -> list[tuple[str, int]]:
    """Run approximate nearest-neighbour search over stored embeddings.

    Args:
        embedding: Query vector (must match the stored dimension).
        limit:     Maximum number of results.

    Returns:
        List of (anzsic_code, rank) tuples ordered by similarity
        (rank 1 = most similar).

    Raises:
        DatabaseError: On connection or query failure.
    """
    ...
fts_search(query_text: str, limit: int) -> list[tuple[str, int]]

Run full-text search using the stored tsvector index.

Parameters:

Name Type Description Default
query_text str

Natural-language search string.

required
limit int

Maximum number of results.

required

Returns:

Type Description
list[tuple[str, int]]

List of (anzsic_code, rank) tuples ordered by FTS score

list[tuple[str, int]]

(rank 1 = highest relevance).

Raises:

Type Description
DatabaseError

On connection or query failure.

Source code in prod/ports/database_port.py
def fts_search(
    self,
    query_text: str,
    limit: int,
) -> list[tuple[str, int]]:
    """Run full-text search using the stored tsvector index.

    Args:
        query_text: Natural-language search string.
        limit:      Maximum number of results.

    Returns:
        List of (anzsic_code, rank) tuples ordered by FTS score
        (rank 1 = highest relevance).

    Raises:
        DatabaseError: On connection or query failure.
    """
    ...

fetch_by_codes

fetch_by_codes(codes: list[str]) -> dict[str, dict]

Fetch full ANZSIC records for a list of codes.

Parameters:

Name Type Description Default
codes list[str]

List of anzsic_code primary keys to retrieve.

required

Returns:

Type Description
dict[str, dict]

Dict mapping anzsic_code → record dict.

dict[str, dict]

Codes not found in the database are silently omitted.

Raises:

Type Description
DatabaseError

On connection or query failure.

Source code in prod/ports/database_port.py
def fetch_by_codes(self, codes: list[str]) -> dict[str, dict]:
    """Fetch full ANZSIC records for a list of codes.

    Args:
        codes: List of anzsic_code primary keys to retrieve.

    Returns:
        Dict mapping anzsic_code → record dict.
        Codes not found in the database are silently omitted.

    Raises:
        DatabaseError: On connection or query failure.
    """
    ...