ADR 003 — Vertex AI Embeddings (text-embedding-005)¶
Accepted Date: 2025 Deciders: CEP AI Team
Context¶
High-quality semantic embeddings are the backbone of Stage 1 retrieval. The embedding model determines:
- Whether colloquial queries ("barista", "runs a café") match correctly
- The vector dimension (affects index size and memory)
- Latency and cost per query
We evaluated:
| Model | Dimensions | Task types | Auth | Latency |
|---|---|---|---|---|
| Vertex AI text-embedding-005 | 768 | Asymmetric (QUERY / DOCUMENT) | gcloud ADC | ~150–200 ms |
| OpenAI text-embedding-3-large | 3072 | Symmetric | API key | ~80–120 ms |
| OpenAI text-embedding-3-small | 1536 | Symmetric | API key | ~50–80 ms |
| Cohere embed-v3 | 1024 | Asymmetric | API key | ~100–150 ms |
| Local (sentence-transformers) | 384–768 | Symmetric | None | ~5–50 ms |
Decision¶
We chose Vertex AI text-embedding-005.
Reasons:
-
Existing GCP auth infrastructure — The organisation already uses GCP (
top-arc-65ca,australia-southeast1). gcloud Application Default Credentials were already configured. No new API keys to manage. -
Asymmetric task types —
RETRIEVAL_QUERYvsRETRIEVAL_DOCUMENTtask types optimise the embedding space for retrieval asymmetry. A short query embedding and a long document embedding are in compatible spaces. OpenAI's text-embedding-3 models do not support this distinction. -
Data residency —
australia-southeast1keeps embeddings generation within Australian GCP infrastructure, satisfying corporate data policy. -
768 dimensions — Smaller than OpenAI's 3072-dim model, reducing HNSW index memory and cosine distance computation time, with negligible quality difference at this dataset size.
Asymmetric retrieval explained¶
Query: "mobile mechanic" → RETRIEVAL_QUERY embedding
Documents: "Automotive Repair and → RETRIEVAL_DOCUMENT embeddings
Maintenance (own account) (pre-computed at ingest time)
Motor Vehicle Parts..."
The model is trained so that RETRIEVAL_QUERY vectors lie close to RETRIEVAL_DOCUMENT vectors for semantically related text, even when the surface forms are completely different. This is why "barista" finds "Cafes and Restaurants" rather than just keyword matches.
Consequences¶
Positive:
- Works within existing GCP auth (
gcloud auth application-default login) - Embeddings remain within Australian infrastructure
GCPAuthManageris shared with Gemini — one token serves both adapters
Negative / trade-offs:
- Requires network access (or proxy) to
australia-southeast1-aiplatform.googleapis.com - Cannot run fully offline (unlike local sentence-transformers)
- gcloud subprocess for auth adds ~30 ms to the first call per session
When to revisit:
If offline operation becomes a requirement, swap in a local embedding model
via a new adapter. The EmbeddingPort abstraction makes this a one-file change.
EMBED_DIM in settings handles the dimension change transparently.