RAG in Production: A Stack EU SaaS Teams Can Audit
By DataDiwan · 2026-06-18 · 9 min read
RAG in Production: A Stack EU SaaS Teams Can Audit
Short answer: Production RAG is not "embed PDFs and pray." It is chunking, locale-aware ingest, hybrid retrieval, cited answers, and retention policies your DPO can read — especially if you serve EU customers from a SaaS product.
When RAG beats fine-tuning for SaaS
| Fine-tuning | RAG |
|---|---|
| Expensive retraining per doc update | Update documents, re-ingest |
| Hard to cite sources | Natural citations from chunks |
| Risk of memorising PII | Retrieve only what you index |
| Slow compliance reviews | Auditors see retrieval logs |
For policy-heavy SaaS (legal, health, fintech, HR), grounded retrieval is usually the right default. Fine-tune later for tone, not facts.
Reference stack (vendor-agnostic)
- Ingest — markdown, PDF, tickets; detect locale (
doc.ar.md, frontmatter) - Chunk — ~500–800 tokens with overlap; keep title + source metadata
- Embed — voyage, open-source, or hosted; document dimension consistency
- Store — Postgres + pgvector (or equivalent) with tenant_id column
- Retrieve — vector + full-text fallback; filter by locale and tenant
- Generate — system prompt that forbids uncited claims
- Log — optional query log with retention cap (GDPR)
We use this pattern on datadiwan.com and in client deployments.
EU compliance hooks that matter
- Data minimisation — do not embed what you do not need in answers
- Retention — TTL on query logs; document in DPIA
- Sub-processors — list embedding and LLM providers in privacy policy
- Arabic + Finnish — filename or frontmatter locale beats "English-only index"
Our EU AI Act scorecard includes documentation prompts for RAG deployments.
Evaluation: the step most teams skip
Before launch, build 30–50 real questions from:
- Customer support macros
- Onboarding docs
- Compliance FAQs
Score:
| Metric | Target |
|---|---|
| Answer uses correct doc | >85% |
| Citation matches source | >90% |
| "I don't know" when missing | allowed and encouraged |
| Latency p95 | <5s for internal tools |
Regression-test after every ingest change.
Common failures we fix
- English-only ingest for trilingual customers
- Giant chunks that dilute retrieval precision
- No FTS fallback when embeddings drift
- Blog posts outranking policy docs on compliance questions
- Shared index across tenants in multi-tenant SaaS
From prototype to product
| Phase | Duration | Output |
|---|---|---|
| Readiness sprint | 1–3 weeks | Architecture, risk tier, eval set |
| Build & deploy | 4–12 weeks | Ingest CLI, API, admin UI, monitoring |
| Handover | — | Runbooks, DPIA inputs, model cards |
Resources
- Knowledge Intelligence service
- RAG starter kit (coming to Gumroad)
- Case study: public health RAG assistant
Next step
Book a free AI readiness call or download the EU AI Act scorecard.
DataDiwan — data science, generative AI, RAG, and automation for teams shipping SaaS in Europe and the Arab world.