Developer Journey — From Python Script to Production AI System¶
Audience: A Python developer who can write and read Python code comfortably, but has not yet built a production-grade system.
Goal: Walk through every architectural decision, coding pattern, and terminal command used to build this system — so you can reason about why things are structured the way they are, not just what is there.
1. Where we started — the "script problem"¶
The first version of this classifier lived in two files:
anzsic_agent.py ← 407 lines: database calls, API calls, retry logic, prompts — all mixed together
app.py ← Streamlit UI calling functions from anzsic_agent.py
This works fine as a prototype. It becomes a problem when:
- You want to run tests — to test the RRF logic you also have to have a live database running
- You want to swap the LLM — changing from Gemini to GPT-4o means hunting across 400 lines
- A second person joins — there's no structure telling them where to put new code
- You need a REST API — you have to rewrite everything around a Flask/FastAPI app
The prod/ folder solves all four problems by imposing a structure before they
become painful. Every decision from here on is motivated by one of these four goals.
2. Setting up the Python environment¶
Before writing any production code, you need an isolated Python environment. This is non-negotiable — never install project packages into your system Python.
# Create a virtual environment in a folder called .venv
python3 -m venv .venv
# Activate it (your terminal prompt will change to show (.venv))
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows
# Confirm you are in the venv — should print a path inside .venv/
which python
# Deactivate when you leave the project
deactivate
Why a venv? Every project has different dependency versions.
psycopg2>=2.9 here might conflict with psycopg2==2.7 in another project on
the same machine. A venv gives each project its own isolated Python with its
own packages.
Installing dependencies¶
# Install from a requirements list
pip install psycopg2-binary pgvector requests python-dotenv pydantic streamlit
# Or, if the project has pyproject.toml (our case):
pip install -e prod/ # -e = editable install (code changes take effect immediately)
pip install -e "prod/[dev]" # include dev tools: pytest, ruff, mypy
The pyproject.toml file (prod/pyproject.toml)
is the modern replacement for requirements.txt. It also defines the package
name, Python version requirement, and entry-point commands (anzsic-classify).
3. The folder structure — one concept per layer¶
prod/
├── config/ ← What the app is configured with (settings, prompts)
├── domain/ ← What the app works on (data shapes, error types)
├── ports/ ← What the app needs (abstract interface contracts)
├── adapters/ ← How those needs are met (concrete API/DB code)
├── services/ ← What the app does (business logic: RRF, reranking)
├── interfaces/ ← How humans talk to the app (CLI, Streamlit)
└── tests/ ← Proof that each layer does what it claims
This layout is called Hexagonal Architecture (also "Ports & Adapters"). The rule is: dependencies only point inward.
servicesknow aboutportsanddomain— nothing elseadaptersknow aboutportsanddomain— nothing elseinterfacesknow aboutservicesanddomain— nothing else- No layer imports from the layer outside it
This rule is what lets you swap Gemini for GPT-4o by changing one line.
Why __init__.py in every folder?¶
# Every subfolder needs this (can be empty) to be a Python package
touch prod/__init__.py
touch prod/config/__init__.py
touch prod/domain/__init__.py
# ... and so on
Without __init__.py, Python won't let you write from prod.domain.models import Candidate.
The file signals "this folder is a Python package, not just a folder".
4. OOP concept: Dataclasses — typed data containers¶
Before Pydantic, Python's standard library offers dataclasses for structured
data. We use @dataclass(frozen=True) in settings:
from dataclasses import dataclass, field
@dataclass(frozen=True) # frozen=True → fields are read-only after creation
class Settings:
db_dsn: str = field(default_factory=lambda: os.getenv("DB_DSN", "dbname=anzsic_db"))
rrf_k: int = field(default_factory=lambda: int(os.getenv("RRF_K", "60")))
frozen=True means settings.rrf_k = 99 will raise a FrozenInstanceError
at runtime. This prevents accidental mutation of configuration — settings should
be read-only after the app starts.
field(default_factory=...) is used instead of a plain default value when
the default is computed (e.g. reading an env var). Plain defaults like rrf_k: int = 60
are evaluated once at class definition time, not at instance creation.
OOP concept: @lru_cache — singleton via function¶
from functools import lru_cache
@lru_cache(maxsize=1)
def get_settings() -> Settings:
return Settings()
@lru_cache(maxsize=1) memoises the function — it runs once, caches the result,
and returns the same object on every subsequent call. This gives us a singleton
Settings object without writing a Singleton class. Import get_settings anywhere
in the codebase and you always get the same instance.
5. OOP concept: Pydantic models — validated data¶
The domain models (prod/domain/models.py) use Pydantic:
from pydantic import BaseModel, Field, field_validator
class SearchRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=2000)
top_k: int = Field(5, ge=1, le=20) # ge = greater-or-equal, le = less-or-equal
@field_validator("query")
@classmethod
def strip_query(cls, v: str) -> str:
return v.strip() # run automatically on every instantiation
What Pydantic adds over a plain dataclass:
| Feature | Plain dataclass | Pydantic |
|---|---|---|
| Type hints enforced at runtime | ❌ | ✅ |
| Built-in validation (min/max) | ❌ | ✅ |
| Auto-strip whitespace | Manual | @field_validator |
| Serialise to JSON / dict | Manual | .model_dump() |
| Generate JSON Schema | ❌ | ✅ (free FastAPI docs later) |
OOP concept: Enums — named constants with type safety¶
SearchMode.FAST is better than the string "fast" throughout the codebase
because your IDE can autocomplete it, and misspelling "fsat" raises an error
immediately instead of silently doing the wrong thing at runtime.
str, Enum (inheriting from both str and Enum) means instances compare
equal to their string value: SearchMode.FAST == "fast" is True. This
makes them work transparently with JSON serialisation.
6. OOP concept: Protocols — interface contracts without inheritance¶
This is the most important concept in the entire codebase.
from typing import Protocol, runtime_checkable
@runtime_checkable
class EmbeddingPort(Protocol):
"""Any class that has these methods is a valid EmbeddingPort."""
@property
def model_name(self) -> str: ...
def embed_query(self, text: str) -> list[float]: ...
def embed_document(self, text: str, title: str = "") -> list[float]: ...
The key insight: HybridRetriever accepts an EmbeddingPort in its
constructor. It never imports VertexEmbeddingAdapter or OpenAIEmbeddingAdapter.
It only knows the contract (the Port).
This is called structural subtyping (also "duck typing with types").
Any class that implements model_name, embed_query, and embed_document
satisfies EmbeddingPort — even if it never inherits from it.
Compare to traditional OOP inheritance:
# Traditional (Java-style) — tightly coupled
class VertexEmbeddingAdapter(BaseEmbedder): # inherits from a base class
...
# Protocol-style — loosely coupled
class VertexEmbeddingAdapter: # no inheritance at all
def embed_query(self, text: str) -> list[float]:
... # just implement the methods
The Protocol approach means swapping adapters has zero inheritance hierarchy to maintain. Your mock test adapter is also just a plain class with the right methods.
7. Setting up PostgreSQL and pgvector¶
# Install PostgreSQL (macOS with Homebrew)
brew install postgresql@15
brew services start postgresql@15
# Create the database
createdb anzsic_db
# Connect to it
psql anzsic_db
# Inside psql — install the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
\q # quit psql
Why pgvector inside PostgreSQL?¶
The vector(768) column type stores a 768-dimensional float array. The HNSW
index makes approximate nearest-neighbour search fast:
-- Create the table with both a vector column and a full-text column
CREATE TABLE anzsic_codes (
anzsic_code TEXT PRIMARY KEY,
anzsic_desc TEXT,
embedding vector(768), -- pgvector
fts_vector tsvector -- PostgreSQL built-in full-text
);
-- HNSW index for vector search (cosine distance)
CREATE INDEX ON anzsic_codes
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- GIN index for full-text search
CREATE INDEX ON anzsic_codes USING gin (fts_vector);
HNSW (Hierarchical Navigable Small World) is a graph-based ANN algorithm.
m=16 is the number of bidirectional links per node; ef_construction=64 is
the search width during index build. Larger values = better recall but slower
build time.
Useful psql commands for debugging¶
psql anzsic_db # connect
\dt # list all tables
\d anzsic_codes # describe table structure + indexes
SELECT COUNT(*) FROM anzsic_codes; # count rows
\timing on # show query execution time
EXPLAIN ANALYZE SELECT ...; # show query plan + actual timing
\q # quit
8. OOP concept: Dependency Injection — wiring without coupling¶
Look at HybridRetriever's constructor:
class HybridRetriever:
def __init__(
self,
db: DatabasePort,
embedder: EmbeddingPort,
settings: Settings,
) -> None:
self._db = db
self._embedder = embedder
self._settings = settings
HybridRetriever doesn't know or care which database or embedder it gets.
It receives them as constructor arguments — this is Dependency Injection (DI).
The caller who creates HybridRetriever decides which concrete implementations
to inject. That caller is services/container.py — the only file allowed to
name concrete adapter classes:
# container.py — the single wiring point
def _build_embedder(settings) -> EmbeddingPort:
if settings.embed_provider == "openai":
from prod.adapters.openai_embedding import OpenAIEmbeddingAdapter
return OpenAIEmbeddingAdapter(settings)
else:
from prod.adapters.vertex_embedding import VertexEmbeddingAdapter
auth = GCPAuthManager(settings)
return VertexEmbeddingAdapter(auth, settings)
@lru_cache(maxsize=1)
def get_pipeline() -> ClassifierPipeline:
settings = get_settings()
embedder = _build_embedder(settings) # EmbeddingPort
llm = _build_llm(settings) # LLMPort
db = PostgresDatabaseAdapter(settings)
retriever = HybridRetriever(db=db, embedder=embedder, settings=settings)
reranker = LLMReranker(llm=llm, settings=settings)
return ClassifierPipeline(retriever=retriever, reranker=reranker, settings=settings)
The layering rule enforced by container.py:
services/container.py ← ONLY file that imports adapter classes
↓
services/* ← import only Ports (EmbeddingPort, LLMPort, DatabasePort)
↓
adapters/* ← import only Ports + domain models
If you ever find yourself importing VertexEmbeddingAdapter inside
services/retriever.py — that's a rule violation.
9. Writing an adapter — the pattern¶
Every adapter follows the same pattern. Here is OpenAILLMAdapter stripped to its skeleton:
class OpenAILLMAdapter:
"""Implements LLMPort. Injected via container.py."""
def __init__(self, settings: Settings) -> None:
if not settings.openai_api_key:
raise AuthenticationError("OPENAI_API_KEY is not set.")
self._settings = settings
self._headers = {"Authorization": f"Bearer {settings.openai_api_key}"}
# ── LLMPort implementation (these method signatures must match the Protocol)
@property
def model_name(self) -> str:
return self._settings.openai_llm_model
def generate_json(self, system_prompt: str, user_message: str) -> str | None:
payload = self._build_payload(system_prompt, user_message)
return self._post_with_retry(payload)
# ── Private helpers (named with _ prefix = "internal, don't call these directly")
def _build_payload(self, system_prompt: str, user_message: str) -> dict:
return {
"model": self._settings.openai_llm_model,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message},
],
"response_format": {"type": "json_object"},
}
def _post_with_retry(self, payload: dict, retries: int = 3) -> str | None:
delay = 2.0
for attempt in range(1, retries + 1):
resp = requests.post(self._url, headers=self._headers, json=payload)
if resp.status_code == 429: # rate limited — wait and retry
time.sleep(delay); delay *= 2 # exponential back-off
continue
if resp.ok:
return self._extract_text(resp.json())
return None
Naming conventions used throughout:
| Pattern | Meaning |
|---|---|
_method() |
Private — internal to this class, don't call from outside |
__method() |
Name-mangled — strongly private (used rarely) |
UPPER_CASE |
Module-level constants |
_UPPER_CASE |
Module-level private constants |
Custom exceptions — the hierarchy¶
# domain/exceptions.py
class ANZSICError(Exception): # root — catch everything with: except ANZSICError
...
class AuthenticationError(ANZSICError): ... # bad API key
class EmbeddingError(ANZSICError): ... # API call failed
class DatabaseError(ANZSICError): ... # DB query failed
Raising a specific exception (raise EmbeddingError(...)) instead of a
generic Exception lets callers decide how broadly or narrowly to catch:
try:
result = embedder.embed_query(text)
except EmbeddingError:
# handle specifically
except ANZSICError:
# catch anything from our app
except Exception:
# catch truly unexpected failures
10. The from __future__ import annotations line¶
You will see this at the top of nearly every file:
This enables postponed evaluation of type annotations — Python 3.10+ behaviour backported to 3.9/3.8. It lets you write:
...even in a file where SearchRequest or Candidate might be defined later
in the same file, or where the type is a forward reference. Without this import,
you'd need to wrap the type in quotes: -> "list[Candidate]". With it, all
annotations are strings at import time and only evaluated when needed.
11. The two-stage pipeline — RRF fusion¶
The core algorithm is in services/retriever.py:
def compute_rrf(
vector_results: list[tuple[str, int]], # [(code, rank), ...]
fts_results: list[tuple[str, int]],
k: int = 60,
) -> list[_RRFResult]:
scores: dict[str, float] = {}
for code, rank in vector_results:
scores[code] = scores.get(code, 0.0) + 1.0 / (k + rank)
for code, rank in fts_results:
scores[code] = scores.get(code, 0.0) + 1.0 / (k + rank)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
compute_rrf is a pure function: same inputs always produce same outputs,
no database calls, no API calls, no side effects. This makes it trivially
testable:
def test_code_in_both_systems_has_highest_score():
vec_results = [("CODE_A", 1), ("CODE_B", 2)]
fts_results = [("CODE_A", 1), ("CODE_C", 2)]
results = compute_rrf(vec_results, fts_results, k=60)
assert results[0].anzsic_code == "CODE_A" # appears in both → wins
Design principle: extract pure logic into standalone functions. They are the easiest to test and the safest to refactor.
12. Environment variables and .env files¶
Never hardcode secrets or configuration values in source code. Use environment variables instead:
# Check what an env var is set to
echo $OPENAI_API_KEY
# Set one in the current shell session (lost when you close the terminal)
export OPENAI_API_KEY="sk-..."
# Or put it in .env (loaded by python-dotenv at startup)
echo "OPENAI_API_KEY=sk-..." >> .env
The .env file must never be committed to git:
# .gitignore — create this at the repo root if it doesn't exist
echo ".env" >> .gitignore
echo ".venv/" >> .gitignore
echo "site/" >> .gitignore
echo "__pycache__/" >> .gitignore
echo "*.pyc" >> .gitignore
prod/.env.example is the
safe version — it shows the shape of the configuration with placeholder values.
New developers copy it and fill in real values:
How python-dotenv loads it:
from dotenv import load_dotenv
load_dotenv(Path(__file__).parent.parent.parent / ".env")
# This runs at import time in settings.py — before any Settings() is created
13. Testing — the three layers¶
prod/tests/
├── unit/ ← Tests one class/function in complete isolation (no I/O)
├── integration/ ← Tests one adapter against a real external system
└── e2e/ ← Tests the full pipeline end-to-end
Running tests¶
# Run everything
pytest prod/tests/unit prod/tests/integration -v
# Run only unit tests (fast — no network, no DB)
pytest prod/tests/unit -v
# Run a single test file
pytest prod/tests/unit/test_rrf_fusion.py -v
# Run a single test by name
pytest prod/tests/unit/test_rrf_fusion.py::TestComputeRRF::test_rrf_scores_are_positive -v
# Run with coverage report
pytest prod/tests/unit --cov=prod --cov-report=term-missing
How mock adapters work¶
The key to fast unit tests is mock adapters in tests/conftest.py.
A mock adapter is a plain Python class that satisfies a Port Protocol
without doing any real I/O:
class MockEmbeddingAdapter:
"""Satisfies EmbeddingPort — returns a fixed fake vector instantly."""
model_name = "mock-embedding"
dimensions = 8
def embed_query(self, text: str) -> list[float]:
return [0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4] # always the same
def embed_document(self, text: str, title: str = "") -> list[float]:
return [0.2, 0.1, 0.4, 0.3, 0.2, 0.1, 0.4, 0.3]
When HybridRetriever gets a MockEmbeddingAdapter injected instead of a
VertexEmbeddingAdapter, it cannot tell the difference — both satisfy the
EmbeddingPort Protocol. This is dependency injection + Protocols making
tests effortless.
unittest.mock.patch — intercepting HTTP calls¶
For the OpenAI adapter tests, we don't want real API calls. patch replaces
requests.post with a MagicMock for the duration of one test:
from unittest.mock import patch, MagicMock
def test_embed_query_calls_correct_endpoint(openai_settings):
vec = [0.1] * 8
mock_resp = MagicMock()
mock_resp.ok = True
mock_resp.json.return_value = {"data": [{"index": 0, "embedding": vec}]}
with patch("prod.adapters.openai_embedding.requests.post", return_value=mock_resp):
adapter = OpenAIEmbeddingAdapter(openai_settings)
result = adapter.embed_query("café owner")
assert result == vec
# Verify the right URL was called
assert mock_resp.json.called
with patch(...) as mock_post: — everything inside the with block runs
with the real requests.post replaced by the mock. After the with block,
the real requests.post is restored automatically.
pytest fixtures — reusable test setup¶
# conftest.py — shared across all test files in the same folder
import pytest
from prod.config.settings import Settings
@pytest.fixture
def openai_settings():
"""Return a Settings instance configured for OpenAI."""
return Settings(openai_api_key="sk-test", embed_provider="openai", ...)
# In a test file — pytest injects it automatically by name
def test_something(openai_settings): # ← pytest sees the name, finds the fixture
adapter = OpenAIEmbeddingAdapter(openai_settings)
...
Fixtures in conftest.py are automatically available to all test files in
the same directory and all subdirectories — no import needed.
14. Logging — the right way to print in production¶
Never use print() in production code. Use Python's logging module:
import logging
logger = logging.getLogger(__name__) # __name__ = "prod.adapters.openai_llm"
logger.debug("Sending request | url=%s model=%s", url, model) # dev detail
logger.info("OpenAI adapter ready | model=%s", model_name) # normal ops
logger.warning("Rate limited (429) — back-off %.1fs", delay) # worth noting
logger.error("API failed after %d retries", retries) # needs attention
logging.getLogger(__name__) uses the module's dotted path as the logger name.
This lets operators filter logs by module:
# Show only warnings and above from the whole app
python -m prod.interfaces.cli "barista" --log-level WARNING
# Show debug logs only from the embedding adapter
# (configured in logging config — not covered here)
Why not print()?
print()always goes to stdout, alwaysloggingcan be directed to files, cloud log sinks, suppressed in tests- Log levels let you turn up detail in production without code changes
15. Building the documentation¶
# Install MkDocs and plugins
pip install mkdocs-material "mkdocstrings[python]" mkdocs-mermaid2-plugin ruff
# Serve locally with live-reload (best for writing docs)
.venv/bin/mkdocs serve
# Open http://127.0.0.1:8000 in your browser
# Build static HTML into site/
.venv/bin/mkdocs build
# Build with strict mode — treats any warning as an error
.venv/bin/mkdocs build --strict
How mkdocstrings works:
Any docstring you write in the code is automatically rendered into the docs
by placing a ::: directive in a markdown file:
This renders the class docstring, all method docstrings, and their type signatures — no duplication between code and docs.
What makes a good docstring:
def embed_query(self, text: str) -> list[float]:
"""Embed a search query.
Uses RETRIEVAL_QUERY task type so the vector is optimised for
matching against document embeddings.
Args:
text: Natural-language query string.
Returns:
Dense vector of floats with length ``settings.embed_dim``.
Raises:
EmbeddingError: On API failure or empty response.
"""
Google-style docstring format (Args / Returns / Raises sections) is what
mkdocstrings expects and renders beautifully.
16. Pivotal commands — quick reference¶
Environment¶
python3 -m venv .venv # create venv
source .venv/bin/activate # activate
pip install -e "prod/[dev]" # install project + dev tools
Database¶
createdb anzsic_db # create the database
psql anzsic_db # connect
CREATE EXTENSION IF NOT EXISTS vector; # enable pgvector (inside psql)
\d anzsic_codes # inspect table schema
SELECT COUNT(*) FROM anzsic_codes; # count rows
Tests¶
pytest prod/tests/unit -v # unit tests (fast, no I/O)
pytest prod/tests/unit --cov=prod # with coverage
pytest -k "test_rrf" # run tests matching a name pattern
pytest --tb=short # shorter tracebacks on failure
Docs¶
mkdocs serve # live preview at localhost:8000
mkdocs build --strict # production build (zero warnings)
mkdocs gh-deploy --force # deploy to GitHub Pages
Linting & formatting¶
ruff check prod/ # lint check (find code style issues)
ruff format prod/ # auto-format all Python files
ruff check --fix prod/ # auto-fix fixable issues
Git essentials for a new project¶
git init # initialise a new repo
git add . # stage all changes
git status # see what's staged / changed
git commit -m "feat: add OpenAI adapters" # commit
git push origin main # push to GitHub
17. The prod/ layering — a visual map¶
┌──────────────────────────────────┐
│ INTERFACES │
│ cli.py streamlit_app.py │
└─────────────┬────────────────────┘
│ calls
┌─────────────▼────────────────────┐
│ SERVICES │
│ ClassifierPipeline │
│ ├── HybridRetriever │
│ │ └── compute_rrf() │
│ └── LLMReranker │
└──────┬──────────────┬────────────┘
│ uses │ uses
┌───────────▼──┐ ┌──────▼───────────┐
│ PORTS │ │ PORTS │
│ EmbeddingPort│ │ LLMPort │
│ DatabasePort │ │ │
└───────┬──────┘ └──────┬────────────┘
│ implemented by │ implemented by
┌────────────▼──────────────────▼────────────┐
│ ADAPTERS │
│ VertexEmbeddingAdapter OpenAIEmbedding │
│ GeminiLLMAdapter OpenAILLMAdapter │
│ PostgresDatabaseAdapter │
│ GCPAuthManager │
└────────────────────────────────────────────┘
↑ wired together by
┌───────────────────┐
│ container.py │ ← only file that names concrete adapters
└───────────────────┘
Everything rests on:
┌─────────────────────────────────────────────────────┐
│ DOMAIN │
│ SearchRequest Candidate ClassifyResult │
│ SearchMode ANZSICError hierarchy │
│ (pure Python — no external imports) │
└─────────────────────────────────────────────────────┘
18. What to learn next¶
| Topic | Where to look |
|---|---|
| FastAPI — add a REST API layer | Add prod/interfaces/api.py; domain models already produce JSON Schema |
| async Python | Replace requests with httpx + async def; adapters become async |
| Docker — containerise the app | Write a Dockerfile, docker-compose.yml for the DB |
| CI/CD — auto-run tests on push | .github/workflows/test.yml with pytest + mkdocs gh-deploy |
| Connection pooling — scale the DB | Swap psycopg2 for psycopg2.pool.ThreadedConnectionPool in postgres_db.py |
| Weaviate / Qdrant — dedicated vector DB | Write a WeaviateDatabaseAdapter implementing DatabasePort |
Every one of these is a one-file change or addition — the hexagonal structure was deliberately designed so that the scope of any future change is as small as possible.