Skip to main content

Ruby WhatsApp Assistant — Design

Date: 2026-06-02 Status: Approved for planning Test tenant: ssh & Associates (b6d3a3f3-…) only; default OFF everywhere else.

1. Summary

An autopilot AI assistant (“Ruby”) that replies to patients’ free-text WhatsApp messages inside the 24-hour service window. It answers general clinic questions, looks up the messaging patient’s own next appointment, captures booking requests (which flow into the existing request→allot pipeline), escalates clinical / sensitive conversations to staff, and durably hands a conversation to a human the moment staff step in. Built on the existing AI spine: callAIJson() → DeepSeek v4-flash + Langfuse HIPAA tracing, the Ruby agents pattern, and the WhatsApp inbound webhook. No DB migration. The canonical prompt lives in Langfuse (edited in the dashboard, no redeploy); code carries only the standard minimal fallback per existing convention.

2. Goals / Non-goals

Goals
  • Answer general clinic info (hours, address, services, price ranges, “do you do X?”, how to book).
  • Answer the messaging patient’s own next-appointment question (Ring 2; patient already phone-matched by webhook).
  • Capture booking requests conversationally → create a requested appointment (matched patient) or a booking lead_submission (unmatched number) → reception allots via the shipped request→allot flow.
  • Hard-stop on clinical/medical questions and escalate; escalate on frustration/urgency/explicit “talk to a human”.
  • Durable human takeover: once staff reply (or escalation fires), Ruby goes silent for that conversation until staff click “Let Ruby resume”.
  • Per-clinic on/off + editable knowledge blob + a safe preview (“ask Ruby” dry-run, no send).
  • Distinct ruby-accent bubble in the staff inbox.
Non-goals (v1)
  • Ring 3 (Ruby directly booking/cancelling/moving a slot) — request capture only.
  • A multi-call router architecture (single structured agent now; documented upgrade path).
  • Outbound-initiated AI conversations (Ruby only responds to inbound free-text).
  • Voice (covered by the separate voice-agent feasibility report).

3. Architecture

One Ruby agent, server/src/lib/ai/agents/whatsapp-assistant.ts, calling callAIJson() once per inbound message. The prompt is internally “router-shaped”: it first determines intent, then returns the matching action + reply in a single structured response. Rationale: DeepSeek flash is cheap/fast with 1M context; one call holds knowledge + patient context + instructions and avoids the 2× latency/cost of a router→specialist split. Splitting into a real router + specialists is a clean later upgrade if tuning shows the single prompt is overloaded — not built now. Hook: server/src/routes/whatsapp-webhook.ts, in handleIncomingMessage, after the inbound message is logged and after the existing cancel/reschedule intent early-returns (so Ruby never competes with those automations). Entirely best-effort — any Ruby error is caught and never blocks normal inbound logging.

4. Engage conditions (all must hold)

  1. Inbound message type === 'text' (free-text). Button/list replies are owned by existing automations.
  2. Clinic’s whatsapp_assistant module is enabled (new toggle; default off; on for ssh & Associates).
  3. WhatsApp module is active (kill-switch off) — automatic, since we send via sendTextMessage.
  4. Conversation is not in handed-off state (no ruby-paused label) and no staff message in the last 15 min (soft backoff layered under the durable label).
  5. Rate cap: ≤ 5 Ruby replies per conversation per rolling hour; exactly one reply per inbound (loop/cost guard). Counted from whatsapp_events metadata.

5. Knowledge sources (all injected into the prompt)

  • Structured (auto, always fresh): clinic name, address, operating hours (clinic_operating_hours / clinics.operatingHours), services/procedures (procedures), doctors (users role=doctor + doctor_schedules).
  • Freeform blob: clinic_modules row key whatsapp_assistant, config.knowledge (string). Holds parking, insurance accepted, payment methods, policies, promos, custom FAQ. Edited in Settings → WhatsApp.
  • Patient context (Ring 2): if the webhook matched exactly one patient by phone, include that patient’s first name + next upcoming appointment (date/time/doctor/type). 0 or >1 matches → general-only; personal asks are deferred to staff.

6. Agent contract (structured output)

callAIJson returns:
{
  intent: 'general' | 'appointment_lookup' | 'booking_request' | 'clinical' | 'escalate' | 'smalltalk' | 'unknown',
  action: 'reply' | 'collect' | 'create_request' | 'handoff',
  reply: string,                  // patient-facing message; ALWAYS present (for action='handoff' it's the holding line — Ruby never goes silent on the patient)
  category: string,               // for analytics/logging
  // booking_request slot-filling (transcript-driven; model re-derives each turn):
  booking?: {
    complete: boolean,
    preferredDate?: string,       // yyyy-mm-dd if given
    preferredTime?: string,       // HH:mm if given
    reason?: string,              // service / chief concern
    name?: string,                // for unmatched numbers
    email?: string,               // for unmatched numbers
    missing?: string[],           // fields still needed → drives the follow-up question
  },
  escalate?: { reason: string },  // present when action='handoff'/intent='escalate'
}
Slot-filling is stateless/transcript-driven: Ruby is given the last ~8 messages of the thread and re-derives what’s already collected and what’s missing — no state column needed.

7. Booking request capture

  • action: 'collect' → Ruby asks for the next missing field (booking.missing[0]). We just send reply.
  • action: 'create_request' (booking.complete) → server creates the request, then sends Ruby’s confirmation reply:
    • Matched patient: insert a requested appointment (patientId, clinicId, preferredDate/Time, appointmentType=reason, doctorId=null) → appears in reception’s Pending Requests → allotted via the shipped flow. Reuses the same path the patient-portal booking uses (status forced ‘requested’).
    • Unmatched number: insert a booking lead_submission (name, email, phone, preferred time, reason, source=‘whatsapp_ruby’) → reception’s Pending Bookings. No patient/account auto-created.
  • Ruby never selects the final slot (Ring 3). It confirms: “I’ve sent your request to the team — they’ll confirm your time shortly.”

8. Guardrails

  • Clinical/medical (symptoms, diagnosis, medication, “is this infected”, pain management) → action: 'handoff', never advises.
  • No hallucination: only use provided knowledge; unknown price/policy → “let me check with the team” + (optional) handoff.
  • Disclosure: on Ruby’s first reply in a conversation, append a light line: “(You’re chatting with our automated assistant — a team member can jump in anytime.)” Configurable per clinic (config.disclosure, default on).
  • Tone: short, warm, the clinic’s voice. To the patient Ruby speaks as the clinic, never names itself “Ruby”/“AI” beyond the disclosure line.

9. Escalation + HITL takeover

Escalation triggers (model intent='escalate' or rule-detected): explicit “talk to a person/receptionist”, frustration/complaint/anger, refund/billing dispute, urgency/emergency words (bleeding, swelling, severe pain, emergency), or 2 handoffs in a row / repeated re-asks. On escalation/handoff:
  1. Send one brief holding line (“Let me get a team member to help with that — they’ll reply shortly”).
  2. Raise a high-priority staff notification + flag the conversation (unread).
  3. Add the ruby-paused label to the conversation → Ruby goes silent there.
Durable takeover: any staff outbound message in a conversation also adds ruby-paused (hook the staff send path). Ruby stays off until staff click “Let Ruby resume” (removes the label). This is the durable layer; the 15-min backoff (§4.4) only covers the brief window before the label is set. State storage: conversations.labels (text array) — no migration.

10. Reply send path + inbox rendering

  • Send via sendTextMessage(...) (respects kill-switch + 24h window; the inbound just opened the window).
  • Log outbound with metadata { aiAssistant: true, category, intent } (no enum migration).
  • Inbox bubble (staff-only): keep the normal outbound bubble; add a ruby accent — ui/public/ruby-icon.webp + “Ruby · auto-reply” label chip on top, a 3px ruby-red left border, faint ruby tint. Distinct from amber (internal notes) and rose (failed). Detected via the aiAssistant metadata flag on the message DTO.

11. Settings UI (Settings → WhatsApp, admin/superadmin)

A “Ruby auto-reply” card:
  • On/off toggle (writes whatsapp_assistant module isEnabled).
  • Disclosure on/off.
  • Knowledge textarea (config.knowledge).
  • Preview box: type a question → calls a preview endpoint → shows Ruby’s drafted reply + detected intent, without sending to any patient. Primary tuning tool for ssh & Associates.

12. Config storage (no migration)

clinic_modules row, moduleKey = 'whatsapp_assistant':
  • isEnabled → master toggle.
  • config (JSON string): { knowledge: string, disclosure: boolean }.
Assistant “active” = whatsapp module active (sends possible) AND whatsapp_assistant.isEnabled.

13. Prompt (Langfuse-managed)

  • Prompt name e.g. whatsapp-assistant; fetched via langfuse.getPrompt(...) like the other agents; minimal fallback constant in prompts.ts per existing convention (never hard-fails if Langfuse is unreachable).
  • Three-tier structure (System role/scope → JSON schema → Rules), schema-first, “json” keyword, stable prefix for KV-cache. Full draft authored during implementation and uploaded to Langfuse; covers: intent routing, the §6 schema, all §8 guardrails, §7 booking slot-filling, §9 escalation, disclosure, tone, and the injected knowledge/patient blocks.

14. Endpoints

  • Reuse the inbound webhook (no new public route) for live replies.
  • POST /whatsapp-module/assistant/preview (admin) → { question } → runs the agent against the clinic’s knowledge, returns { reply, intent }, never sends.
  • GET/PUT /whatsapp-module/assistant/config (admin) → toggle + knowledge + disclosure.
  • POST /whatsapp-module/conversations/:id/ruby-resume (staff) → removes ruby-paused.

15. Observability & safety

  • Langfuse trace per call (standard in callAI).
  • Each Ruby send → whatsapp_event (type outbound_message, metadata { aiAssistant, intent, category }); handoffs counted for the “2 in a row” rule.
  • Rate cap + one-reply-per-inbound + ruby-paused + clinic toggle + kill-switch = layered loop/cost/runaway guards.
  • Conversation history capped to last ~8 turns for cost/latency.

16. Files

  • server/src/lib/ai/agents/whatsapp-assistant.ts (new agent)
  • server/src/lib/ai/prompts.ts (+ name + fallback)
  • server/src/lib/ai/whatsapp-knowledge.ts (new: assemble structured + blob + patient context)
  • server/src/routes/whatsapp-webhook.ts (engage hook; staff-send → ruby-paused)
  • server/src/routes/whatsapp.ts (assistant config get/put, preview, ruby-resume)
  • server/src/lib/whatsapp-assistant-dispatch.ts (new: guards, rate limit, request/lead creation, send)
  • ui/src/components/settings/WhatsAppSettings.tsx (Ruby auto-reply card + preview)
  • ui/src/components/chat-v2/thread/MessageBubble.tsx (+ ruby-accent treatment)
  • shared/src/chat-types.ts (surface aiAssistant flag on MessageDto if needed)
  • Tests: agent classification/guardrails; dispatch guards (backoff, rate cap, label); request/lead creation.

17. Testing

  • Unit: agent hands off clinical; answers hours from knowledge; doesn’t invent prices; booking slot-fill asks for missing fields then sets create_request; escalation triggers.
  • Dispatch: respects ruby-paused, 15-min backoff, rate cap; matched→requested, unmatched→lead_submission.
  • Preview endpoint dry-run on ssh & Associates (no patient send).
  • Live: message the ssh & Associates WhatsApp number from a test phone.

18. Rollout

Default OFF. Enable whatsapp_assistant for ssh & Associates only. Tune via preview, then live-test. The kill-switch disables Ruby automatically (no sends when paused).