Why the client insisted on private cloud
The group runs three clinics and a diagnostic lab. They handle PHI daily, and their legal team forbids sending identifiers to third-party LLM APIs. They also wanted a full audit trail for every patient interaction and the ability to shut the system off instantly if anything drifted. The mandate to Zyphh was clear: keep data local, deliver sub-5-second responses, and integrate with existing workflows without retraining staff.
Discovery: mapping intents and redaction rules
We interviewed care coordinators and pulled 60 days of Zendesk tickets. 71% of tickets clustered around five intents: insurance verification, location hours, appointment scheduling, test prep instructions, and follow-up timelines. We also cataloged every field that could contain PHI—names, MRNs, phone numbers, emails, appointment IDs—and wrote redaction patterns to strip them before any query left the edge gateway.
Architecture at a glance
- Ingress: Web widget and WhatsApp webhook hitting a FastAPI gateway in the client VPC.
- Redaction layer: Regex + deterministic masking for names, MRNs, and contact fields; runs before storing messages.
- Retrieval: RAG pipeline with pgvector in Postgres; embeddings generated with a local model.
- Model: Self-hosted 13B LLM fine-tuned on synthetic dialogue; quantized for GPU efficiency.
- Cache: Redis for hot answers; cut p95 latency by 2.1s after rollout.
- Observability: Loki + Grafana dashboards with PHI-safe logs and alerting to Slack/Email.
Data preparation and governance
We ingested SOPs, insurance partners, lab prep PDFs, and clinic-specific FAQs. Each document went through a cleansing pass to remove embedded PHI, then chunked to 500 tokens with overlap to preserve context. A governance table tracks document owners, expiry dates, and when content must be revalidated—critical for medical information that changes quarterly.
Conversation design
The assistant was trained to disclose it is not a clinician, never provides diagnoses, and routes anything symptomatic to a human within one turn. We added citation snippets to every answer and a “was this helpful?” collector that feeds back into content prioritization. When patients request appointments, the bot gathers only minimal data (preferred time, location, reason), then hands off to the scheduling system via an internal API without storing PHI in logs.
Security & compliance guardrails
- All traffic terminates inside the client’s VPC with mutual TLS.
- Role-based access: only care coordinators and compliance can view redacted logs; engineers see masked fields only.
- PII redaction before vectorization; embeddings never contain PHI.
- Automatic transcript purge after 30 days in line with their retention policy.
- Weekly red-team prompts to probe for data leakage; none observed in eight rounds.
Deployment timeline
Intake, ticket analysis, redaction policy approval, and infra sizing.
RAG pipeline built, documents cleaned, embeddings generated locally.
Pilot on web-only, 15% traffic, daily evals and clinician review.
WhatsApp channel added, Redis cache, guardrails tightened, alerts live.
Training, runbooks, handoff, and a livefire compliance test.
Measured outcomes after 90 days
The chatbot handled 318 tickets per month, deflecting 42% of what previously reached agents. Median response time stayed under five seconds, with p95 at 7.3 seconds. First-contact resolution for benefits questions hit 88%. Human agents were freed for complex cases, saving ~28 hours weekly. Most importantly, audits confirmed zero PHI left the environment, and security teams could see every interaction with redaction context.
Operational playbook that kept risk low
- Daily sampling of 25 conversations for accuracy and tone, with a feedback loop to retrain intents.
- Automated content staleness alerts when SOPs exceed 90 days without review.
- Shadow mode for new intents—log only until a clinician approves the response set.
- Blue/green deploys for the model so rollbacks take under 2 minutes.
Lessons for other regulated teams
Private doesn’t have to mean slow. Running everything in the client VPC cut vendor risk while keeping latency competitive. Redaction before embeddings is non-negotiable. And governance beats novelty—consistent document owners and review cadences did more for accuracy than any prompt trick.
FAQ
Yes. We align to existing controls, keep data in your cloud, and integrate with your SIEM for centralized logging. We also ship data flow diagrams for auditors.
We deployed English and Spanish from day one. The same RAG pipeline serves both, and we log language detection so you can see channel mix over time.
Yes. The bot can route to a live agent queue with transcript context, preserving the redaction state. Patients never have to repeat themselves.
We run daily evals on held-out question sets, track citation coverage, and monitor sentiment on a five-point scale. Quality scores and drift are visible in Grafana.