Streamlit UI¶
The ANZSIC Classifier has a full browser-based interface built with Streamlit. It supports both single-query and batch classification, with results in card, table, or JSON format and CSV download.
Starting the UI¶
The browser opens automatically at http://localhost:8501.
First run
The first request takes a few extra seconds while the pipeline loads and
establishes the database connection. Subsequent requests are fast because
the pipeline is cached for the lifetime of the Streamlit session
(@st.cache_resource).
Sidebar controls¶
The left sidebar contains all configuration options.
Search Type¶
| Option | Description |
|---|---|
| Single Query | Type one description and get results immediately |
| Batch (file upload) | Upload a .txt file with multiple queries |
Search Mode¶
| Mode | Pipeline stages | Typical speed | Best for |
|---|---|---|---|
| High Fidelity (+ Gemini) | Retrieval + LLM re-ranking | 2–5 seconds | Final classification, needs explanations |
| Fast (retrieval only) | Retrieval only (RRF) | < 500 ms | Exploration, large batches |
ANZSIC Type¶
Currently locked to 6-digit (unit groups) — the most granular level of the ANZSIC hierarchy. Future releases may support 4-digit (class) or 2-digit (division) classification.
Max Results¶
Number of ANZSIC codes returned per query.
- In High Fidelity mode: Gemini selects the best matches up to this limit. Setting this lower does not speed up the call (all candidates are still sent to Gemini).
- In Fast mode: the top-N by RRF score are returned directly.
Retrieval Pool¶
Controls how many candidates Stage 1 returns before re-ranking. A larger pool gives Gemini more to choose from, at the cost of a slightly longer prompt. Default 20 is optimal for most queries. Increase to 30–50 for rare or highly specific occupations.
Single query mode¶
- Type your description in the text box
- Press Classify
- Results appear in three tabs:
🃏 Cards tab¶
Each result is shown as a card with:
- Rank badge —
#1is highlighted in green - ANZSIC code (monospace, e.g.
S9419_03) - ANZSIC description — the official occupation title
- Class and Division — hierarchical context
- Reason — Gemini's plain-English explanation (High Fidelity mode only)
📋 Table tab¶
All results in a sortable, filterable data table. Includes a Download CSV button to export the current results.
{ } JSON tab¶
The raw ClassifyResponse object rendered as interactive, collapsible JSON.
Useful for developers who want to inspect all metadata fields (RRF score,
source systems, model versions, timestamp).
Batch mode¶
- Select Batch (file upload) in the sidebar
- Upload a
.txtfile — one description per line, lines starting with#ignored - A preview shows the first 10 queries
- Press Run Batch
- A progress bar tracks each query as it processes
- Results aggregate into a single table with a Query column
Batch file format¶
# ANZSIC classification batch — Feb 2026
mobile mechanic
café owner
registered nurse
software engineer
primary school teacher
delivery driver
Batch and rate limits
Each query in a batch makes one embedding API call and (in High Fidelity mode) one Gemini API call. For batches larger than 50 queries, consider using Fast mode in the UI, or use the CLI batch mode which gives you more control over rate limiting.
Metrics row¶
After each single query, a row of metric tiles appears:
| Metric | What it shows |
|---|---|
| Results | Number of ANZSIC codes returned |
| Candidates | Stage 1 retrieval pool size used |
| Latency | Wall-clock time for the full classify() call |
| Mode | Fast or High Fidelity |
Customising the UI¶
The Streamlit app is a single file at prod/interfaces/streamlit_app.py.
It imports only from prod.services.container and prod.domain.models
— no infrastructure knowledge.
To embed the classifier in another Streamlit app: