Was any PHI sent to OpenAI or third parties?

No. All processing ran on the client's private cloud with a self-hosted LLM. Redaction and role-based access were enforced before queries touched the model. Zero payloads left their VPC.

How fast were responses?

Median response time was under 5 seconds across web and WhatsApp channels, with p95 at 7.3 seconds after caching knowledge chunks.

What volume did the bot handle?

The assistant resolved 318 tickets per month (averaged over 90 days), covering intake, benefits questions, and appointment FAQs, deflecting 42% of what previously hit human agents.

How long did it take to deploy?

We launched a pilot in 14 days, hardened for production by day 24, and completed training plus handoff by day 30.

Private AI Chatbot for Healthcare: Zero Data Sent to OpenAI

Why the client insisted on private cloud

The group runs three clinics and a diagnostic lab. They handle PHI daily, and their legal team forbids sending identifiers to third-party LLM APIs. They also wanted a full audit trail for every patient interaction and the ability to shut the system off instantly if anything drifted. The mandate to Zyphh was clear: keep data local, deliver sub-5-second responses, and integrate with existing workflows without retraining staff.

Discovery: mapping intents and redaction rules

We interviewed care coordinators and pulled 60 days of Zendesk tickets. 71% of tickets clustered around five intents: insurance verification, location hours, appointment scheduling, test prep instructions, and follow-up timelines. We also cataloged every field that could contain PHI—names, MRNs, phone numbers, emails, appointment IDs—and wrote redaction patterns to strip them before any query left the edge gateway.

Architecture at a glance

Ingress: Web widget and WhatsApp webhook hitting a FastAPI gateway in the client VPC.
Redaction layer: Regex + deterministic masking for names, MRNs, and contact fields; runs before storing messages.
Retrieval: RAG pipeline with pgvector in Postgres; embeddings generated with a local model.
Model: Self-hosted 13B LLM fine-tuned on synthetic dialogue; quantized for GPU efficiency.
Cache: Redis for hot answers; cut p95 latency by 2.1s after rollout.
Observability: Loki + Grafana dashboards with PHI-safe logs and alerting to Slack/Email.

Data preparation and governance

We ingested SOPs, insurance partners, lab prep PDFs, and clinic-specific FAQs. Each document went through a cleansing pass to remove embedded PHI, then chunked to 500 tokens with overlap to preserve context. A governance table tracks document owners, expiry dates, and when content must be revalidated—critical for medical information that changes quarterly.

Conversation design

The assistant was trained to disclose it is not a clinician, never provides diagnoses, and routes anything symptomatic to a human within one turn. We added citation snippets to every answer and a “was this helpful?” collector that feeds back into content prioritization. When patients request appointments, the bot gathers only minimal data (preferred time, location, reason), then hands off to the scheduling system via an internal API without storing PHI in logs.

Security & compliance guardrails

All traffic terminates inside the client’s VPC with mutual TLS.
Role-based access: only care coordinators and compliance can view redacted logs; engineers see masked fields only.
PII redaction before vectorization; embeddings never contain PHI.
Automatic transcript purge after 30 days in line with their retention policy.
Weekly red-team prompts to probe for data leakage; none observed in eight rounds.

Deployment timeline

Days 1–5
Intake, ticket analysis, redaction policy approval, and infra sizing.

Days 6–10
RAG pipeline built, documents cleaned, embeddings generated locally.

Days 11–14
Pilot on web-only, 15% traffic, daily evals and clinician review.

Days 15–24
WhatsApp channel added, Redis cache, guardrails tightened, alerts live.

Days 25–30
Training, runbooks, handoff, and a livefire compliance test.

Measured outcomes after 90 days

The chatbot handled 318 tickets per month, deflecting 42% of what previously reached agents. Median response time stayed under five seconds, with p95 at 7.3 seconds. First-contact resolution for benefits questions hit 88%. Human agents were freed for complex cases, saving ~28 hours weekly. Most importantly, audits confirmed zero PHI left the environment, and security teams could see every interaction with redaction context.

“Legal signed off because nothing leaves our walls. Patients get answers faster, and my team finally has nights back.” — Director of Patient Experience

Operational playbook that kept risk low

Daily sampling of 25 conversations for accuracy and tone, with a feedback loop to retrain intents.
Automated content staleness alerts when SOPs exceed 90 days without review.
Shadow mode for new intents—log only until a clinician approves the response set.
Blue/green deploys for the model so rollbacks take under 2 minutes.

Lessons for other regulated teams

Private doesn’t have to mean slow. Running everything in the client VPC cut vendor risk while keeping latency competitive. Redaction before embeddings is non-negotiable. And governance beats novelty—consistent document owners and review cadences did more for accuracy than any prompt trick.

Deploy a compliant bot See all Zyphh insights

FAQ

Do you support SOC 2 or ISO 27001 environments?

Yes. We align to existing controls, keep data in your cloud, and integrate with your SIEM for centralized logging. We also ship data flow diagrams for auditors.

What about languages beyond English?

We deployed English and Spanish from day one. The same RAG pipeline serves both, and we log language detection so you can see channel mix over time.

Can it hand off to a human mid-conversation?

Yes. The bot can route to a live agent queue with transcript context, preserving the redaction state. Patients never have to repeat themselves.

How do you measure quality?

We run daily evals on held-out question sets, track citation coverage, and monitor sentiment on a five-point scale. Quality scores and drift are visible in Grafana.

How We Deployed a Private AI Chatbot for a Healthcare Business (Zero Data Sent to OpenAI)

Why the client insisted on private cloud

Discovery: mapping intents and redaction rules

Architecture at a glance

Data preparation and governance

Conversation design

Security & compliance guardrails

Deployment timeline

Measured outcomes after 90 days

Operational playbook that kept risk low

Lessons for other regulated teams

FAQ