OdontoX App Performance — Neon Recommended Driver Split + TanStack Query
Date: 2026-05-03 Owner: sshssn Status: Spec — awaiting user reviewProblem
The app is “terribly slow” across the board: first load, navigation between modules, lists, saves, and — notably — re-entering a module the user just left. Investigation found two root causes:- Server: wrong Neon driver everywhere.
getDatabase()inserver/src/lib/db.tsis used by ~50+ routes. On Cloudflare Workers it always opens a fresh@neondatabase/serverlessPool(WebSocket) to Neon inap-southeast-1(Singapore), per request. Neon’s HTTP driver (getDatabaseHttp) exists but is consumed by exactly one route (user-devices.ts). Every read pays a TLS+WebSocket handshake to Singapore that a single HTTP fetch could replace. - Client: no persistent data cache. No
@tanstack/react-query, noswr, noQueryClient. There is a hand-rolled in-memorycache.tswith 90s/5min stale-while-revalidate, but it is allowlist-scoped (~10 endpoints), wiped on reload, not shared across tabs, and does not power most pages. Returning to a module re-fetches everything.
Goals
- Server roundtrip latency: drop ~70%+ on read-heavy endpoints by removing the WebSocket handshake from the hot path.
- Client perceived latency: re-entering a module renders cached data instantly, with background revalidation.
- Zero auth/security regressions. Zero cross-tenant data leakage in the persisted cache.
- Migration is gradual and per-route revertible. No big-bang.
Non-goals
- Cloudflare Hyperdrive (deferred — separate spec if revisited).
- Edge KV response caching (deferred — Phase 3 was dropped from current scope).
- Database schema changes, index tuning, or N+1 query fixes (separate work, may follow).
- Mobile app (Expo) — uses native fetch, separate stack.
- Durable Object (
CLINIC_HUB) SSE refactor. - Auth flow (Firebase, JWT refresh, WebAuthn) changes.
Architecture overview
Three independent phases, each shippable on its own:Phase 1 — Neon driver split (server)
Driver decision matrix (from Neon’s current docs, May 2026)
| Use case | Driver | Why |
|---|---|---|
| One-shot read (most GETs) | neon(url) HTTP | Single fetch round-trip, no WS handshake, lowest cold-path latency |
| One-shot write (single INSERT/UPDATE/DELETE) | neon(url) HTTP | Same — works for non-transactional writes |
| Multi-statement atomic batch (no branching logic) | neon(url).transaction([...]) | One HTTP round-trip, atomic, no interactive control |
| Interactive transaction (BEGIN, conditional, COMMIT) | Pool (WebSocket) | Only path that supports client.query('BEGIN') flows |
Changes to server/src/lib/db.ts
Replace the current getDatabase() + getDatabaseHttp() shape with three explicit accessors:
getReadDb(connStr?)→ returnsdrizzle(neon(url))(HTTP). Default for all GET handlers and any single-statement mutation.getWriteDb(connStr?)→ for now, alias ofgetReadDb(HTTP works for single-statement writes too). Distinct name documents intent and gives us a hook if we ever need a write-only knob.getTxDb(connStr?, ctx)→ returns aPool-backed Drizzle instance plus anend()helper the route is required to register withctx.waitUntil(). Used only by routes that need interactive transactions.
getDatabase() is kept as a thin alias of getReadDb() during migration so unmigrated routes keep working. Removed once all callsites are migrated.
The AsyncLocalStorage per-request connection cache is removed — HTTP driver has no connection state to cache, and a Pool can’t safely outlive a single request handler on Workers anyway.
Migration order
- Add new accessors to
db.ts. - Migrate read-heavy routes first (highest impact, lowest risk):
routes/clinic.ts(settings, stats)routes/patients.ts(list, get, search)routes/appointments.ts(list, day view)routes/billing.ts,routes/invoices.ts(read paths)routes/staff.ts,routes/services.ts,routes/inventory.ts
- Migrate single-statement writes module by module.
- Audit every
db.transaction(callsite and everyBEGINraw SQL. Migrate those togetTxDb()+ctx.waitUntil(end()). Likely candidates: payment processing, appointment-conflict booking, invoice creation, inventory stock adjustments. Each gets its own commit so a regression rollback is one revert.
What this won’t break (auth)
Auth endpoints are mostly single statements:login reads user + password hash; refresh reads a refresh-token row and writes a new one; logout deletes one row; session check is a single SELECT. All migrate cleanly to HTTP. JWT signing, cookie handling, Firebase Admin, WebAuthn challenges — none touch the driver layer.
Risks
| Risk | Mitigation |
|---|---|
| HTTP driver used where a transaction was needed → split-write inconsistency | Pre-migration grep for db.transaction(, BEGIN, FOR UPDATE. Each match audited and routed to getTxDb if interactive |
Pool routes leak (no pool.end()) → Worker hangs / billing | getTxDb() returns { db, end }. Route MUST call ctx.waitUntil(end()). Reviewer checks for this in every Tx-route diff |
Drizzle behavior subtly different across neon-http vs neon-serverless adapters | Spot-test each migrated route in staging before promoting. Both adapters share the same Drizzle query builder API; differences are at the driver edge only |
Phase 2 — TanStack Query (client)
Packages
@tanstack/react-query(v5)@tanstack/react-query-persist-client@tanstack/query-sync-storage-persister
QueryClient defaults
Persister
Cache key conventions
Every query key is a tuple starting with[resource, clinicId, ...params]:
clinicId is read from localStorage['odontox-active-clinic-id'] via a small hook. Querying without a clinicId is a programming error; the hook throws in dev, no-ops in prod.
Auth-safety lifecycle
| Event | Action |
|---|---|
| Login success | queryClient.clear() — wipe any leftover cache from a previous session |
| Logout | queryClient.clear() + localStorage.removeItem('odontox-rq-*') — privacy on shared devices |
| Clinic switch | Full reload happens already (intentional). Persister key includes clinicId, so the new clinic loads its own bucket. On reload, also remove stale buckets older than 24h to keep localStorage tidy |
| 401 from any query | onError global handler → existing redirect-to-login flow (unchanged) |
| Mutation success | invalidateQueries({ queryKey: [resource, clinicId] }) — refetch tenant-scoped views |
Coexistence with existing cache.ts
TanStack Query is added alongside the existing hand-rolled cache. No big-bang migration. Per-module rollout:
- Patients module → migrate first (highest traffic, simplest CRUD shape)
- Appointments
- Billing / invoices
- Inventory
- Settings / staff / services
- Remaining
useQuery / useMutation. The hand-rolled cache.ts allowlist shrinks as routes migrate. After full migration, cache.ts is deleted.
What this won’t break
- Auth endpoints are not wrapped in
useQuery— they use direct fetch (signIn,signOut,refreshToken). No change. - Server-Sent Events (
CLINIC_HUBDurable Object stream) does not go through Query. No change. - File uploads (R2, DICOM) —
useMutationwraps these but the underlying upload code is untouched. - Existing
cache.tsallowlist routes — coexist until migrated.
Risks
| Risk | Mitigation |
|---|---|
| Persisted cache shows previous clinic’s data | clinicId in every query key + per-clinic localStorage key |
| Persisted cache shows previous user’s data on shared devices | queryClient.clear() on login AND logout; clear localStorage on logout |
| Deploy ships breaking change while users have stale persisted cache | buster = build hash → auto-invalidates on every deploy |
| localStorage quota exceeded on long sessions | gcTime and maxAge both 24h; persister silently drops oldest entries |
| User sees stale data right after a write | useMutation.onSuccess invalidates relevant query keys |
Phase 4 — Bundle/CSS audit
Smaller win, separate scope. Three targets:- CSS: 490KB CSS bundle is suspicious for a Tailwind project. Verify
tailwind.config.tscontentglob actually purges. Spot-checkdist/assets/*.cssfor unused class prefixes. - Lazy-load heavy chunks: confirm
@react-pdf/renderer,dicom-parser, ChartJS (if present) are all behindReact.lazy()or dynamic imports. Anything bundled into the entry chunk that’s only used on one page is a target. - Icon imports: spot-check pages for
import { ... } from 'lucide-react'patterns that pull the whole pack. Tree-shaking should handle this, but a misconfigured import can defeat it.
Testing strategy
- Phase 1: existing route tests pass. Add a per-route smoke test that confirms it returns 200 with expected shape after migration. Manual staging test for transaction routes.
- Phase 2: snapshot test for
QueryClientconfig; integration test for clinic-switch cache isolation (login as user with 2 clinics, populate cache, switch, assert no leak); manual test of login/logout cache clear. - Phase 4: bundle-size diff before/after. Lighthouse score before/after on
/dashboard,/patients,/appointments.
Rollback
- Phase 1: revert single route’s commit.
getDatabase()alias still points to the HTTP path; old behavior is recoverable by flipping the alias back to the Pool path. - Phase 2: feature flag the
<PersistQueryClientProvider>wrapper. Disabling reverts to the hand-rolled cache for migrated routes (they fall back to direct fetch via the sameserverCommwrappers Query is built on). - Phase 4: standard build-config revert.
Out-of-scope follow-ups
- Hyperdrive evaluation (separate spec when prioritized).
- KV edge response caching (Phase 3 deferred).
- N+1 query audit, especially in tenant-scoped list endpoints.
- DB index audit on hot columns (
patients.clinic_id,appointments.scheduled_at, etc.). - Mobile app caching parity.

