Streamlit UI¶

The ANZSIC Classifier has a full browser-based interface built with Streamlit. It supports both single-query and batch classification, with results in card, table, or JSON format and CSV download.

Starting the UI¶

cd /Users/s748779/CEP_AI/anzsic_mapping
streamlit run prod/interfaces/streamlit_app.py

The browser opens automatically at http://localhost:8501.

First run

The first request takes a few extra seconds while the pipeline loads and establishes the database connection. Subsequent requests are fast because the pipeline is cached for the lifetime of the Streamlit session (@st.cache_resource).

The left sidebar contains all configuration options.

Search Type¶

Option	Description
Single Query	Type one description and get results immediately
Batch (file upload)	Upload a `.txt` file with multiple queries

Search Mode¶

Mode	Pipeline stages	Typical speed	Best for
High Fidelity (+ Gemini)	Retrieval + LLM re-ranking	2–5 seconds	Final classification, needs explanations
Fast (retrieval only)	Retrieval only (RRF)	< 500 ms	Exploration, large batches

ANZSIC Type¶

Currently locked to 6-digit (unit groups) — the most granular level of the ANZSIC hierarchy. Future releases may support 4-digit (class) or 2-digit (division) classification.

Max Results¶

Number of ANZSIC codes returned per query.

In High Fidelity mode: Gemini selects the best matches up to this limit. Setting this lower does not speed up the call (all candidates are still sent to Gemini).
In Fast mode: the top-N by RRF score are returned directly.

Retrieval Pool¶

Controls how many candidates Stage 1 returns before re-ranking. A larger pool gives Gemini more to choose from, at the cost of a slightly longer prompt. Default 20 is optimal for most queries. Increase to 30–50 for rare or highly specific occupations.

Single query mode¶

Type your description in the text box
Press Classify
Results appear in three tabs:

🃏 Cards tab¶

Each result is shown as a card with:

Rank badge — #1 is highlighted in green
ANZSIC code (monospace, e.g. S9419_03)
ANZSIC description — the official occupation title
Class and Division — hierarchical context
Reason — Gemini's plain-English explanation (High Fidelity mode only)

📋 Table tab¶

All results in a sortable, filterable data table. Includes a Download CSV button to export the current results.

{ } JSON tab¶

The raw ClassifyResponse object rendered as interactive, collapsible JSON. Useful for developers who want to inspect all metadata fields (RRF score, source systems, model versions, timestamp).

Batch mode¶

Select Batch (file upload) in the sidebar
Upload a .txt file — one description per line, lines starting with # ignored
A preview shows the first 10 queries
Press Run Batch
A progress bar tracks each query as it processes
Results aggregate into a single table with a Query column

Batch file format¶

queries.txt

# ANZSIC classification batch — Feb 2026
mobile mechanic
café owner
registered nurse
software engineer
primary school teacher
delivery driver

Batch and rate limits

Each query in a batch makes one embedding API call and (in High Fidelity mode) one Gemini API call. For batches larger than 50 queries, consider using Fast mode in the UI, or use the CLI batch mode which gives you more control over rate limiting.

Metrics row¶

After each single query, a row of metric tiles appears:

Metric	What it shows
Results	Number of ANZSIC codes returned
Candidates	Stage 1 retrieval pool size used
Latency	Wall-clock time for the full classify() call
Mode	`Fast` or `High Fidelity`

Customising the UI¶

The Streamlit app is a single file at prod/interfaces/streamlit_app.py. It imports only from prod.services.container and prod.domain.models — no infrastructure knowledge.

To embed the classifier in another Streamlit app:

from prod.services.container import get_pipeline
from prod.domain.models import SearchRequest, SearchMode

# In your Streamlit page:
pipeline = get_pipeline()  # cached automatically
result = pipeline.classify(SearchRequest(query=user_input))