ADR 006 — Multi-Provider Architecture (Vertex AI / OpenAI)¶
Accepted Date: 2025 Deciders: CEP AI Team
Context¶
The initial implementation used Vertex AI for both embeddings and LLM re-ranking. This restricted the tool to users who have:
- A GCP project with Vertex AI APIs enabled
gcloudinstalled and authenticated- Network access to
australia-southeast1-aiplatform.googleapis.com
This covered the AI team developers but excluded the broader organisation — business analysts, data engineers, and other teams who have an OpenAI API key but no GCP access.
The Hexagonal Architecture already provided EmbeddingPort and LLMPort
abstractions. The question was: how to let users choose their provider
without modifying any code?
Decision¶
We added environment-variable-driven provider selection to container.py:
# GCP users (default — no changes required)
EMBED_PROVIDER=vertex
LLM_PROVIDER=vertex
# OpenAI users
EMBED_PROVIDER=openai
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
# Mix-and-match is valid
EMBED_PROVIDER=openai
LLM_PROVIDER=vertex
Two new adapters implement the existing Ports:
| New file | Port implemented | Technology |
|---|---|---|
adapters/openai_embedding.py |
EmbeddingPort |
OpenAI /v1/embeddings |
adapters/openai_llm.py |
LLMPort |
OpenAI /v1/chat/completions |
container.py was refactored into two private factory functions
(_build_embedder, _build_llm) that read the provider env vars and
return the appropriate adapter. GCPAuthManager is only instantiated when
at least one GCP adapter is active — OpenAI-only users incur zero GCP
import overhead.
No other file changed.
Key design choices¶
No OpenAI SDK dependency¶
Both adapters use requests directly, consistent with the existing Vertex AI
adapters. This keeps the dependency footprint minimal — openai Python package
would add ~80 MB of transitive dependencies for functionality we don't need
(streaming, assistants API, etc.).
JSON mode parity¶
| Provider | JSON guarantee mechanism |
|---|---|
| Gemini (Vertex) | responseMimeType: application/json |
| GPT-4o (OpenAI) | response_format: {"type": "json_object"} |
Both mechanisms guarantee syntactically valid JSON without post-processing.
The LLMReranker._parse_response() function is unchanged — it already
handles the None / empty / invalid-JSON cases defensively.
Dimension flexibility¶
OpenAI's text-embedding-3 models accept a dimensions parameter.
OpenAIEmbeddingAdapter always passes dimensions=settings.embed_dim,
making output compatible with whatever pgvector column the database
was initialised with. If the database was set up with Vertex AI's 768-dim
vectors, set EMBED_DIM=768 — OpenAI will reduce its output accordingly.
No prompt changes¶
The existing build_system_prompt() in config/prompts.py contains the
word "JSON" — satisfying OpenAI JSON mode's requirement that the system
prompt explicitly mention JSON. No prompt modifications were needed.
Consequences¶
Positive:
- Any user with an
OPENAI_API_KEYcan now run the full pipeline - Zero code changes needed to switch providers — only
.envchanges - Mix-and-match allows hybrid setups (e.g. OpenAI embeddings + Gemini LLM for users migrating incrementally)
- GCP credentials are never required for OpenAI-only deployments
- 35 new unit tests cover both adapters, retry logic, and provider routing
Negative / trade-offs:
- OpenAI and Vertex AI embeddings live in different vector spaces — a database populated with Vertex AI embeddings cannot be queried with OpenAI embeddings (the cosine similarities would be meaningless)
- A user switching providers must re-embed and re-ingest all 5,236 records with the new embedding model
- Two providers means two sets of API credentials to manage in production
Neutral:
gpt-4ois not in Australian infrastructure; if data residency for the LLM prompt is required, keepLLM_PROVIDER=vertex
When to add a third provider:
The _build_embedder / _build_llm factory functions in container.py
use a simple if/else chain. Adding Cohere, Azure OpenAI, or any other
provider requires only:
1. A new adapter file implementing the relevant Port
2. An additional elif branch in the factory function