Ruby WhatsApp Assistant — Design
Date: 2026-06-02 Status: Approved for planning Test tenant: ssh & Associates (b6d3a3f3-…) only; default OFF everywhere else.
1. Summary
An autopilot AI assistant (“Ruby”) that replies to patients’ free-text WhatsApp messages inside the 24-hour service window. It answers general clinic questions, looks up the messaging patient’s own next appointment, captures booking requests (which flow into the existing request→allot pipeline), escalates clinical / sensitive conversations to staff, and durably hands a conversation to a human the moment staff step in. Built on the existing AI spine:callAIJson() → DeepSeek v4-flash + Langfuse
HIPAA tracing, the Ruby agents pattern, and the WhatsApp inbound webhook. No DB
migration. The canonical prompt lives in Langfuse (edited in the dashboard, no
redeploy); code carries only the standard minimal fallback per existing convention.
2. Goals / Non-goals
Goals- Answer general clinic info (hours, address, services, price ranges, “do you do X?”, how to book).
- Answer the messaging patient’s own next-appointment question (Ring 2; patient already phone-matched by webhook).
- Capture booking requests conversationally → create a
requestedappointment (matched patient) or a bookinglead_submission(unmatched number) → reception allots via the shipped request→allot flow. - Hard-stop on clinical/medical questions and escalate; escalate on frustration/urgency/explicit “talk to a human”.
- Durable human takeover: once staff reply (or escalation fires), Ruby goes silent for that conversation until staff click “Let Ruby resume”.
- Per-clinic on/off + editable knowledge blob + a safe preview (“ask Ruby” dry-run, no send).
- Distinct ruby-accent bubble in the staff inbox.
- Ring 3 (Ruby directly booking/cancelling/moving a slot) — request capture only.
- A multi-call router architecture (single structured agent now; documented upgrade path).
- Outbound-initiated AI conversations (Ruby only responds to inbound free-text).
- Voice (covered by the separate voice-agent feasibility report).
3. Architecture
One Ruby agent,server/src/lib/ai/agents/whatsapp-assistant.ts, calling
callAIJson() once per inbound message. The prompt is internally “router-shaped”:
it first determines intent, then returns the matching action + reply in a single
structured response. Rationale: DeepSeek flash is cheap/fast with 1M context; one
call holds knowledge + patient context + instructions and avoids the 2× latency/cost
of a router→specialist split. Splitting into a real router + specialists is a clean
later upgrade if tuning shows the single prompt is overloaded — not built now.
Hook: server/src/routes/whatsapp-webhook.ts, in handleIncomingMessage, after
the inbound message is logged and after the existing cancel/reschedule intent
early-returns (so Ruby never competes with those automations). Entirely best-effort —
any Ruby error is caught and never blocks normal inbound logging.
4. Engage conditions (all must hold)
- Inbound message
type === 'text'(free-text). Button/list replies are owned by existing automations. - Clinic’s
whatsapp_assistantmodule is enabled (new toggle; default off; on for ssh & Associates). - WhatsApp module is active (kill-switch off) — automatic, since we send via
sendTextMessage. - Conversation is not in handed-off state (no
ruby-pausedlabel) and no staff message in the last 15 min (soft backoff layered under the durable label). - Rate cap: ≤ 5 Ruby replies per conversation per rolling hour; exactly one reply per inbound (loop/cost guard). Counted from
whatsapp_eventsmetadata.
5. Knowledge sources (all injected into the prompt)
- Structured (auto, always fresh): clinic name, address, operating hours (
clinic_operating_hours/clinics.operatingHours), services/procedures (procedures), doctors (usersrole=doctor +doctor_schedules). - Freeform blob:
clinic_modulesrow keywhatsapp_assistant,config.knowledge(string). Holds parking, insurance accepted, payment methods, policies, promos, custom FAQ. Edited in Settings → WhatsApp. - Patient context (Ring 2): if the webhook matched exactly one patient by phone, include that patient’s first name + next upcoming appointment (date/time/doctor/type). 0 or >1 matches → general-only; personal asks are deferred to staff.
6. Agent contract (structured output)
callAIJson returns:
7. Booking request capture
action: 'collect'→ Ruby asks for the next missing field (booking.missing[0]). We just sendreply.action: 'create_request'(booking.complete) → server creates the request, then sends Ruby’s confirmationreply:- Matched patient: insert a
requestedappointment (patientId, clinicId, preferredDate/Time, appointmentType=reason, doctorId=null) → appears in reception’s Pending Requests → allotted via the shipped flow. Reuses the same path the patient-portal booking uses (status forced ‘requested’). - Unmatched number: insert a booking
lead_submission(name, email, phone, preferred time, reason, source=‘whatsapp_ruby’) → reception’s Pending Bookings. No patient/account auto-created.
- Matched patient: insert a
- Ruby never selects the final slot (Ring 3). It confirms: “I’ve sent your request to the team — they’ll confirm your time shortly.”
8. Guardrails
- Clinical/medical (symptoms, diagnosis, medication, “is this infected”, pain management) →
action: 'handoff', never advises. - No hallucination: only use provided knowledge; unknown price/policy → “let me check with the team” + (optional) handoff.
- Disclosure: on Ruby’s first reply in a conversation, append a light line: “(You’re chatting with our automated assistant — a team member can jump in anytime.)” Configurable per clinic (
config.disclosure, default on). - Tone: short, warm, the clinic’s voice. To the patient Ruby speaks as the clinic, never names itself “Ruby”/“AI” beyond the disclosure line.
9. Escalation + HITL takeover
Escalation triggers (modelintent='escalate' or rule-detected): explicit “talk
to a person/receptionist”, frustration/complaint/anger, refund/billing dispute,
urgency/emergency words (bleeding, swelling, severe pain, emergency), or 2 handoffs
in a row / repeated re-asks.
On escalation/handoff:
- Send one brief holding line (“Let me get a team member to help with that — they’ll reply shortly”).
- Raise a high-priority staff notification + flag the conversation (unread).
- Add the
ruby-pausedlabel to the conversation → Ruby goes silent there.
ruby-paused (hook the staff send path). Ruby stays off until staff click “Let
Ruby resume” (removes the label). This is the durable layer; the 15-min backoff
(§4.4) only covers the brief window before the label is set.
State storage: conversations.labels (text array) — no migration.
10. Reply send path + inbox rendering
- Send via
sendTextMessage(...)(respects kill-switch + 24h window; the inbound just opened the window). - Log outbound with metadata
{ aiAssistant: true, category, intent }(no enum migration). - Inbox bubble (staff-only): keep the normal outbound bubble; add a ruby accent —
ui/public/ruby-icon.webp+ “Ruby · auto-reply” label chip on top, a 3px ruby-red left border, faint ruby tint. Distinct from amber (internal notes) and rose (failed). Detected via theaiAssistantmetadata flag on the message DTO.
11. Settings UI (Settings → WhatsApp, admin/superadmin)
A “Ruby auto-reply” card:- On/off toggle (writes
whatsapp_assistantmoduleisEnabled). - Disclosure on/off.
- Knowledge textarea (
config.knowledge). - Preview box: type a question → calls a preview endpoint → shows Ruby’s drafted reply + detected intent, without sending to any patient. Primary tuning tool for ssh & Associates.
12. Config storage (no migration)
clinic_modules row, moduleKey = 'whatsapp_assistant':
isEnabled→ master toggle.config(JSON string):{ knowledge: string, disclosure: boolean }.
whatsapp_assistant.isEnabled.
13. Prompt (Langfuse-managed)
- Prompt name e.g.
whatsapp-assistant; fetched vialangfuse.getPrompt(...)like the other agents; minimal fallback constant inprompts.tsper existing convention (never hard-fails if Langfuse is unreachable). - Three-tier structure (System role/scope → JSON schema → Rules), schema-first, “json” keyword, stable prefix for KV-cache. Full draft authored during implementation and uploaded to Langfuse; covers: intent routing, the §6 schema, all §8 guardrails, §7 booking slot-filling, §9 escalation, disclosure, tone, and the injected knowledge/patient blocks.
14. Endpoints
- Reuse the inbound webhook (no new public route) for live replies.
POST /whatsapp-module/assistant/preview(admin) →{ question }→ runs the agent against the clinic’s knowledge, returns{ reply, intent }, never sends.GET/PUT /whatsapp-module/assistant/config(admin) → toggle + knowledge + disclosure.POST /whatsapp-module/conversations/:id/ruby-resume(staff) → removesruby-paused.
15. Observability & safety
- Langfuse trace per call (standard in
callAI). - Each Ruby send →
whatsapp_event(typeoutbound_message, metadata{ aiAssistant, intent, category }); handoffs counted for the “2 in a row” rule. - Rate cap + one-reply-per-inbound +
ruby-paused+ clinic toggle + kill-switch = layered loop/cost/runaway guards. - Conversation history capped to last ~8 turns for cost/latency.
16. Files
server/src/lib/ai/agents/whatsapp-assistant.ts(new agent)server/src/lib/ai/prompts.ts(+ name + fallback)server/src/lib/ai/whatsapp-knowledge.ts(new: assemble structured + blob + patient context)server/src/routes/whatsapp-webhook.ts(engage hook; staff-send →ruby-paused)server/src/routes/whatsapp.ts(assistant config get/put, preview, ruby-resume)server/src/lib/whatsapp-assistant-dispatch.ts(new: guards, rate limit, request/lead creation, send)ui/src/components/settings/WhatsAppSettings.tsx(Ruby auto-reply card + preview)ui/src/components/chat-v2/thread/MessageBubble.tsx(+ ruby-accent treatment)shared/src/chat-types.ts(surfaceaiAssistantflag on MessageDto if needed)- Tests: agent classification/guardrails; dispatch guards (backoff, rate cap, label); request/lead creation.
17. Testing
- Unit: agent hands off clinical; answers hours from knowledge; doesn’t invent prices; booking slot-fill asks for missing fields then sets
create_request; escalation triggers. - Dispatch: respects
ruby-paused, 15-min backoff, rate cap; matched→requested, unmatched→lead_submission. - Preview endpoint dry-run on ssh & Associates (no patient send).
- Live: message the ssh & Associates WhatsApp number from a test phone.
18. Rollout
Default OFF. Enablewhatsapp_assistant for ssh & Associates only. Tune via preview,
then live-test. The kill-switch disables Ruby automatically (no sends when paused).
