Skip to main content

JWT Refresh Token Rotation — Design Spec

Date: 2026-04-28
Status: Approved, pending implementation

Problem

Access tokens are currently 24h/7d depending on login path. The production live-feed SSE endpoint logs JWTExpired as level: error on every expiry, flooding logs. There is no refresh-token rotation — a stolen refresh token remains valid for 90 days with no detection or revocation path.

Goals

  • 15-minute access tokens with seamless background refresh (no visible session interruption)
  • Refresh token rotation: every refresh issues a new token, invalidates the old
  • Reuse detection: a replayed revoked token triggers family-wide session revocation
  • Eliminate expired-token error log spam (demote to info)
  • All existing auth methods (email, passkey, TOTP, OTP) continue working without logic changes

Non-Goals

  • No new D1 tables — KV only for token revocation state
  • No changes to MFA, OTP, TOTP, or passkey login flow logic
  • No changes to bridge tokens or upgrade invite tokens
  • No changes to OTT cross-subdomain auth flow

Section 1: Token Lifecycle & KV Schema

Token lifetimes

TokenDurationChange
Access token15 minutesWas 24h / 7d
Refresh token90 daysUnchanged
Proactive refresh threshold≤3 min remaining (80% elapsed)Was 24h

KV key: rt:{sha256hex(rawRefreshTokenJWT)}

One record per issued refresh token. TTL = 90 days from issuance. Revoked entries preserve their original TTL (not extended).
// Valid token
{
  "status": "valid",
  "userId": "user_123",
  "sessionId": "session_abc",
  "familyId": "rtf_xyz",
  "issuedAt": 1710000000,
  "expiresAt": 1717776000
}

// Revoked token (after rotation)
{
  "status": "revoked",
  "userId": "user_123",
  "sessionId": "session_abc",
  "familyId": "rtf_xyz",
  "issuedAt": 1710000000,
  "expiresAt": 1717776000,
  "revokedAt": 1710001000,
  "replacedBy": "sha256hex_of_new_token"
}

KV key: rtf:{familyId}

One record per token family (one login session = one family). Written once on first login; updated to revoked only on reuse detection.
{ "status": "active", "userId": "user_123" }
// or
{ "status": "revoked", "userId": "user_123", "revokedAt": 1710001000 }
TTL: 90 days.

KV key: rtlock:{sha256hex(rawRefreshTokenJWT)}

Value: "1", TTL: 10 seconds. Best-effort mutex to reduce concurrent refresh races. KV has no compare-and-swap, so this is probabilistic only — it significantly reduces the race window but does not eliminate it. The correct long-term fix would be a D1 transaction; this is the accepted KV mitigation.

Section 2: issueRefreshToken Helper

async function issueRefreshToken(
  env: Env,
  userId: string,
  sessionId: string,    // required — caller always controls session state
  familyId?: string     // undefined = first login, generates new family
): Promise<{ rawToken: string; familyId: string }>
Behaviour:
  1. Generate refresh token JWT via generateRefreshToken() (internals unchanged)
  2. hash = sha256hex(rawToken)
  3. resolvedFamilyId = familyId ?? 'rtf_' + crypto.randomUUID()
  4. KV write rt:{hash} → valid metadata JSON, TTL 90d
  5. If familyId was undefined (first login only): KV write rtf:{resolvedFamilyId}{ status: "active", userId }, TTL 90d
  6. Return { rawToken, familyId: resolvedFamilyId }
On rotation, the caller passes familyId from the old KV entry — the helper skips the rtf: write because the family record already exists.

Section 3: /auth/refresh Endpoint — Rotation Flow

1.  Verify JWT signature + tokenType === 'refresh'
2.  hash = sha256hex(rawRefreshToken)
3.  Read rt:{hash} from KV
4.  → Missing: return 401 INVALID_TOKEN

5.  Read rtf:{kvEntry.familyId} from KV
6.  → rtf status "revoked": return 401 SESSION_REVOKED

7.  → rt status "revoked" (reuse of already-rotated token):
      a. Read user from D1
      b. If user.lastSessionId === kvEntry.sessionId → clear lastSessionId in D1
      c. KV put rtf:{kvEntry.familyId} → { status: "revoked", userId, revokedAt: now }
      d. return 401 SESSION_REVOKED

8.  Acquire rtlock:{hash}: PUT "1" TTL 10s
    → If lock exists: return 409 REFRESH_IN_PROGRESS

9.  newSessionId = crypto.randomUUID()
10. Generate new refresh token JWT + compute newHash = sha256hex(newRt)
    (do not write to KV yet)

11. KV put rt:{hash} → { ...kvEntry, status: "revoked", revokedAt: now, replacedBy: newHash }
    TTL = kvEntry.expiresAt - now  (preserve original expiry, do not extend)
    ← REVOKE OLD BEFORE REGISTERING NEW

12. KV put rt:{newHash} → { status: "valid", userId, sessionId: newSessionId, familyId, issuedAt, expiresAt }
    TTL 90d
    KV put rtf:{familyId} remains "active" (no change needed)
13. Update users.lastSessionId = newSessionId in D1
14. Issue 15-minute access token
15. Return { accessToken, refreshToken: newRt }
Crash safety: If the Worker crashes between step 11 and step 12, the old token is revoked and the new one was never registered. The user must re-authenticate. This is preferred over the alternative (crash after issue but before revoke = dual-valid tokens). replacedBy in step 11 references newHash, which is computed in step 10 before any KV writes — no forward-reference ambiguity.

Section 4: dualAuthMiddleware Changes

Add a named helper for expired-error detection:
function isJwtExpiredError(err: unknown): boolean {
  return (
    err instanceof Error &&
    (
      err.name === 'JWTExpired' ||
      err.message.includes('"exp" claim timestamp check failed') ||
      (err as any).code === 'ERR_JWT_EXPIRED'
    )
  );
}
Update the catch block in dualAuthMiddleware:
catch (err) {
  if (isJwtExpiredError(err)) {
    logger.info('Access token expired', { path: c.req.path });
    return c.json({ error: 'TOKEN_EXPIRED' }, 401);
  }
  logger.warn('Invalid token', { path: c.req.path });
  return c.json({ error: 'INVALID_TOKEN' }, 401);
}
  • Expired = info (normal auth state, not an error)
  • Invalid = warn (bad token, worth noting)
  • Bridge token and upgrade invite token catch blocks are unaffected

Section 5: Login Endpoint Changes

Every path that currently calls generateRefreshToken() directly must switch to issueRefreshToken(env, userId, sessionId). Paths to update:
  • Email + password signin
  • Passkey verification
  • TOTP post-verification
  • OTP post-verification
The sessionId passed must be the crypto.randomUUID() value that will be stored in users.lastSessionId for that login. No MFA, passkey, TOTP, or OTP logic changes beyond replacing the token issuance call.

Section 6: Frontend — serverComm.ts Changes

Three targeted changes only. Existing refresh mutex (refreshPromise) and _retry guard are preserved.

1. Proactive refresh threshold

Change the threshold passed to refreshTokenIfNeeded() from 24h to 3 minutes:
refreshTokenIfNeeded(3 * 60 * 1000)  // 180_000 ms — ≤3 min remaining triggers refresh

2. fetchWithAuth() 401 branching

401 + body.error === 'TOKEN_EXPIRED'   → refresh once (mutex + _retry), retry request once
401 + body.error === 'SESSION_REVOKED' → logout() immediately, no retry, redirect to login
401 + anything else                    → propagate as-is, no retry

3. SSE live-feed (/notifications/live-feed)

The SSE connection passes the access token as a query parameter. With 15-min tokens, expired connections are expected. Before opening connection:
await refreshTokenIfNeeded(3 * 60 * 1000);
// then open SSE with latest token
On disconnect:
await refreshTokenIfNeeded(3 * 60 * 1000);
// reconnect with latest token
Optional (recommended): Schedule a proactive SSE reconnect ~12 minutes after the connection opens to avoid a mid-stream drop when the token expires.

What Does Not Change

AreaStatus
MFA / passkey / TOTP / OTP login flow logicUnchanged
Bridge token handlingUnchanged
Upgrade invite token handlingUnchanged
generateRefreshToken() / verifyJWT() internalsUnchanged
Cookie storage for access tokensUnchanged
localStorage storage for refresh tokensUnchanged
OTT cross-subdomain auth flowUnchanged
Refresh token JWT structure / signing algorithmUnchanged