Private Cloud · Healthcare

How We Deployed a Private AI Chatbot for a Healthcare Business (Zero Data Sent to OpenAI)

A multi-location healthcare group needed 24/7 patient support without exposing PHI. Zyphh shipped a private, RAG-powered chatbot that answers intake, benefits, and location questions with under five-second responses—keeping every byte inside their VPC.

HIPAA-aligned logging 19 min read Postgres · Open-source LLM · Redis cache
318Tickets resolved per month
<5sMedian response time
42%Deflection of human tickets
0Data sent to OpenAI

Why the client insisted on private cloud

The group runs three clinics and a diagnostic lab. They handle PHI daily, and their legal team forbids sending identifiers to third-party LLM APIs. They also wanted a full audit trail for every patient interaction and the ability to shut the system off instantly if anything drifted. The mandate to Zyphh was clear: keep data local, deliver sub-5-second responses, and integrate with existing workflows without retraining staff.

Discovery: mapping intents and redaction rules

We interviewed care coordinators and pulled 60 days of Zendesk tickets. 71% of tickets clustered around five intents: insurance verification, location hours, appointment scheduling, test prep instructions, and follow-up timelines. We also cataloged every field that could contain PHI—names, MRNs, phone numbers, emails, appointment IDs—and wrote redaction patterns to strip them before any query left the edge gateway.

Architecture at a glance

Data preparation and governance

We ingested SOPs, insurance partners, lab prep PDFs, and clinic-specific FAQs. Each document went through a cleansing pass to remove embedded PHI, then chunked to 500 tokens with overlap to preserve context. A governance table tracks document owners, expiry dates, and when content must be revalidated—critical for medical information that changes quarterly.

Conversation design

The assistant was trained to disclose it is not a clinician, never provides diagnoses, and routes anything symptomatic to a human within one turn. We added citation snippets to every answer and a “was this helpful?” collector that feeds back into content prioritization. When patients request appointments, the bot gathers only minimal data (preferred time, location, reason), then hands off to the scheduling system via an internal API without storing PHI in logs.

Security & compliance guardrails

Deployment timeline

Days 1–5
Intake, ticket analysis, redaction policy approval, and infra sizing.
Days 6–10
RAG pipeline built, documents cleaned, embeddings generated locally.
Days 11–14
Pilot on web-only, 15% traffic, daily evals and clinician review.
Days 15–24
WhatsApp channel added, Redis cache, guardrails tightened, alerts live.
Days 25–30
Training, runbooks, handoff, and a livefire compliance test.

Measured outcomes after 90 days

The chatbot handled 318 tickets per month, deflecting 42% of what previously reached agents. Median response time stayed under five seconds, with p95 at 7.3 seconds. First-contact resolution for benefits questions hit 88%. Human agents were freed for complex cases, saving ~28 hours weekly. Most importantly, audits confirmed zero PHI left the environment, and security teams could see every interaction with redaction context.

“Legal signed off because nothing leaves our walls. Patients get answers faster, and my team finally has nights back.” — Director of Patient Experience

Operational playbook that kept risk low

Lessons for other regulated teams

Private doesn’t have to mean slow. Running everything in the client VPC cut vendor risk while keeping latency competitive. Redaction before embeddings is non-negotiable. And governance beats novelty—consistent document owners and review cadences did more for accuracy than any prompt trick.

Deploy a compliant bot See all Zyphh insights

FAQ

Do you support SOC 2 or ISO 27001 environments?

Yes. We align to existing controls, keep data in your cloud, and integrate with your SIEM for centralized logging. We also ship data flow diagrams for auditors.

What about languages beyond English?

We deployed English and Spanish from day one. The same RAG pipeline serves both, and we log language detection so you can see channel mix over time.

Can it hand off to a human mid-conversation?

Yes. The bot can route to a live agent queue with transcript context, preserving the redaction state. Patients never have to repeat themselves.

How do you measure quality?

We run daily evals on held-out question sets, track citation coverage, and monitor sentiment on a five-point scale. Quality scores and drift are visible in Grafana.