Skip to main content

OdontoX App Performance — Neon Recommended Driver Split + TanStack Query

Date: 2026-05-03 Owner: sshssn Status: Spec — awaiting user review

Problem

The app is “terribly slow” across the board: first load, navigation between modules, lists, saves, and — notably — re-entering a module the user just left. Investigation found two root causes:
  1. Server: wrong Neon driver everywhere. getDatabase() in server/src/lib/db.ts is used by ~50+ routes. On Cloudflare Workers it always opens a fresh @neondatabase/serverless Pool (WebSocket) to Neon in ap-southeast-1 (Singapore), per request. Neon’s HTTP driver (getDatabaseHttp) exists but is consumed by exactly one route (user-devices.ts). Every read pays a TLS+WebSocket handshake to Singapore that a single HTTP fetch could replace.
  2. Client: no persistent data cache. No @tanstack/react-query, no swr, no QueryClient. There is a hand-rolled in-memory cache.ts with 90s/5min stale-while-revalidate, but it is allowlist-scoped (~10 endpoints), wiped on reload, not shared across tabs, and does not power most pages. Returning to a module re-fetches everything.
Cloudflare Hyperdrive was considered and explicitly deferred — the user prefers Neon’s recommended path first.

Goals

  • Server roundtrip latency: drop ~70%+ on read-heavy endpoints by removing the WebSocket handshake from the hot path.
  • Client perceived latency: re-entering a module renders cached data instantly, with background revalidation.
  • Zero auth/security regressions. Zero cross-tenant data leakage in the persisted cache.
  • Migration is gradual and per-route revertible. No big-bang.

Non-goals

  • Cloudflare Hyperdrive (deferred — separate spec if revisited).
  • Edge KV response caching (deferred — Phase 3 was dropped from current scope).
  • Database schema changes, index tuning, or N+1 query fixes (separate work, may follow).
  • Mobile app (Expo) — uses native fetch, separate stack.
  • Durable Object (CLINIC_HUB) SSE refactor.
  • Auth flow (Firebase, JWT refresh, WebAuthn) changes.

Architecture overview

Three independent phases, each shippable on its own:
Phase 1 ─ Server: Neon driver split
   getDatabase()     →  getReadDb()  (HTTP, default for GETs and one-shot writes)
                     →  getWriteDb() (Pool, only for interactive transactions, with waitUntil cleanup)

Phase 2 ─ Client: TanStack Query + persistent cache
   <PersistQueryClientProvider> wraps <App />
   QueryClient { staleTime: 30s, gcTime: 24h, buster: build-hash }
   localStorage persister, clinicId in every query key

Phase 4 ─ Bundle/CSS audit
   Tailwind purge correctness, lazy-load heavy chunks, dead-icon scrub
(Phase 3 — KV edge cache — deferred. Numbering preserved to match the brainstorming discussion.)

Phase 1 — Neon driver split (server)

Driver decision matrix (from Neon’s current docs, May 2026)

Use caseDriverWhy
One-shot read (most GETs)neon(url) HTTPSingle fetch round-trip, no WS handshake, lowest cold-path latency
One-shot write (single INSERT/UPDATE/DELETE)neon(url) HTTPSame — works for non-transactional writes
Multi-statement atomic batch (no branching logic)neon(url).transaction([...])One HTTP round-trip, atomic, no interactive control
Interactive transaction (BEGIN, conditional, COMMIT)Pool (WebSocket)Only path that supports client.query('BEGIN') flows

Changes to server/src/lib/db.ts

Replace the current getDatabase() + getDatabaseHttp() shape with three explicit accessors:
  • getReadDb(connStr?) → returns drizzle(neon(url)) (HTTP). Default for all GET handlers and any single-statement mutation.
  • getWriteDb(connStr?) → for now, alias of getReadDb (HTTP works for single-statement writes too). Distinct name documents intent and gives us a hook if we ever need a write-only knob.
  • getTxDb(connStr?, ctx) → returns a Pool-backed Drizzle instance plus an end() helper the route is required to register with ctx.waitUntil(). Used only by routes that need interactive transactions.
The existing getDatabase() is kept as a thin alias of getReadDb() during migration so unmigrated routes keep working. Removed once all callsites are migrated. The AsyncLocalStorage per-request connection cache is removed — HTTP driver has no connection state to cache, and a Pool can’t safely outlive a single request handler on Workers anyway.

Migration order

  1. Add new accessors to db.ts.
  2. Migrate read-heavy routes first (highest impact, lowest risk):
    • routes/clinic.ts (settings, stats)
    • routes/patients.ts (list, get, search)
    • routes/appointments.ts (list, day view)
    • routes/billing.ts, routes/invoices.ts (read paths)
    • routes/staff.ts, routes/services.ts, routes/inventory.ts
  3. Migrate single-statement writes module by module.
  4. Audit every db.transaction( callsite and every BEGIN raw SQL. Migrate those to getTxDb() + ctx.waitUntil(end()). Likely candidates: payment processing, appointment-conflict booking, invoice creation, inventory stock adjustments. Each gets its own commit so a regression rollback is one revert.

What this won’t break (auth)

Auth endpoints are mostly single statements: login reads user + password hash; refresh reads a refresh-token row and writes a new one; logout deletes one row; session check is a single SELECT. All migrate cleanly to HTTP. JWT signing, cookie handling, Firebase Admin, WebAuthn challenges — none touch the driver layer.

Risks

RiskMitigation
HTTP driver used where a transaction was needed → split-write inconsistencyPre-migration grep for db.transaction(, BEGIN, FOR UPDATE. Each match audited and routed to getTxDb if interactive
Pool routes leak (no pool.end()) → Worker hangs / billinggetTxDb() returns { db, end }. Route MUST call ctx.waitUntil(end()). Reviewer checks for this in every Tx-route diff
Drizzle behavior subtly different across neon-http vs neon-serverless adaptersSpot-test each migrated route in staging before promoting. Both adapters share the same Drizzle query builder API; differences are at the driver edge only

Phase 2 — TanStack Query (client)

Packages

  • @tanstack/react-query (v5)
  • @tanstack/react-query-persist-client
  • @tanstack/query-sync-storage-persister

QueryClient defaults

staleTime: 30_000              // 30s — re-renders within 30s served from cache, no refetch
gcTime:    24 * 60 * 60_000    // 24h — match maxAge so persistence isn't garbage-collected
retry:     1
refetchOnWindowFocus: false    // disabled — too noisy for clinical workflows

Persister

createSyncStoragePersister({
  storage: window.localStorage,
  key: `odontox-rq-${activeClinicId}`,    // tenant-scoped storage key
})

persistOptions: {
  persister,
  maxAge: 24 * 60 * 60_000,                // 24h — match gcTime
  buster: __BUILD_HASH__,                  // injected at build time → auto-invalidate on deploy
}
The build-hash buster is the deploy-safety net: any production deploy invalidates every user’s persisted cache, eliminating the “stale data after release” class of bug.

Cache key conventions

Every query key is a tuple starting with [resource, clinicId, ...params]:
['patients', clinicId, { search, page }]
['appointments', clinicId, { date }]
['invoice', clinicId, invoiceId]
clinicId is read from localStorage['odontox-active-clinic-id'] via a small hook. Querying without a clinicId is a programming error; the hook throws in dev, no-ops in prod.

Auth-safety lifecycle

EventAction
Login successqueryClient.clear() — wipe any leftover cache from a previous session
LogoutqueryClient.clear() + localStorage.removeItem('odontox-rq-*') — privacy on shared devices
Clinic switchFull reload happens already (intentional). Persister key includes clinicId, so the new clinic loads its own bucket. On reload, also remove stale buckets older than 24h to keep localStorage tidy
401 from any queryonError global handler → existing redirect-to-login flow (unchanged)
Mutation successinvalidateQueries({ queryKey: [resource, clinicId] }) — refetch tenant-scoped views

Coexistence with existing cache.ts

TanStack Query is added alongside the existing hand-rolled cache. No big-bang migration. Per-module rollout:
  1. Patients module → migrate first (highest traffic, simplest CRUD shape)
  2. Appointments
  3. Billing / invoices
  4. Inventory
  5. Settings / staff / services
  6. Remaining
For each module: existing fetch wrapper stays for non-migrated callsites; migrated components switch to useQuery / useMutation. The hand-rolled cache.ts allowlist shrinks as routes migrate. After full migration, cache.ts is deleted.

What this won’t break

  • Auth endpoints are not wrapped in useQuery — they use direct fetch (signIn, signOut, refreshToken). No change.
  • Server-Sent Events (CLINIC_HUB Durable Object stream) does not go through Query. No change.
  • File uploads (R2, DICOM) — useMutation wraps these but the underlying upload code is untouched.
  • Existing cache.ts allowlist routes — coexist until migrated.

Risks

RiskMitigation
Persisted cache shows previous clinic’s dataclinicId in every query key + per-clinic localStorage key
Persisted cache shows previous user’s data on shared devicesqueryClient.clear() on login AND logout; clear localStorage on logout
Deploy ships breaking change while users have stale persisted cachebuster = build hash → auto-invalidates on every deploy
localStorage quota exceeded on long sessionsgcTime and maxAge both 24h; persister silently drops oldest entries
User sees stale data right after a writeuseMutation.onSuccess invalidates relevant query keys

Phase 4 — Bundle/CSS audit

Smaller win, separate scope. Three targets:
  1. CSS: 490KB CSS bundle is suspicious for a Tailwind project. Verify tailwind.config.ts content glob actually purges. Spot-check dist/assets/*.css for unused class prefixes.
  2. Lazy-load heavy chunks: confirm @react-pdf/renderer, dicom-parser, ChartJS (if present) are all behind React.lazy() or dynamic imports. Anything bundled into the entry chunk that’s only used on one page is a target.
  3. Icon imports: spot-check pages for import { ... } from 'lucide-react' patterns that pull the whole pack. Tree-shaking should handle this, but a misconfigured import can defeat it.
Output: a short report of findings + targeted fixes. No spec sub-design needed.

Testing strategy

  • Phase 1: existing route tests pass. Add a per-route smoke test that confirms it returns 200 with expected shape after migration. Manual staging test for transaction routes.
  • Phase 2: snapshot test for QueryClient config; integration test for clinic-switch cache isolation (login as user with 2 clinics, populate cache, switch, assert no leak); manual test of login/logout cache clear.
  • Phase 4: bundle-size diff before/after. Lighthouse score before/after on /dashboard, /patients, /appointments.

Rollback

  • Phase 1: revert single route’s commit. getDatabase() alias still points to the HTTP path; old behavior is recoverable by flipping the alias back to the Pool path.
  • Phase 2: feature flag the <PersistQueryClientProvider> wrapper. Disabling reverts to the hand-rolled cache for migrated routes (they fall back to direct fetch via the same serverComm wrappers Query is built on).
  • Phase 4: standard build-config revert.

Out-of-scope follow-ups

  • Hyperdrive evaluation (separate spec when prioritized).
  • KV edge response caching (Phase 3 deferred).
  • N+1 query audit, especially in tenant-scoped list endpoints.
  • DB index audit on hot columns (patients.clinic_id, appointments.scheduled_at, etc.).
  • Mobile app caching parity.