Skip to content

Notifications and email

Generated from a canonical source

This page is a read-only projection of docs/handoff-corpus/notifications-and-email.md. Edit the canonical file, then run npm --prefix tools/project-knowledge-derive run derive.

What notifications-and-email is for

The invariant you must not break: only allowlisted, subscriber-safe variables may ever reach a rendered email body. email-queue-handler.ts builds the variables map from an explicit allowlist (subscriber.*, charge.*, store.*, link.* only); FALLBACK_VARIABLE_ALLOWLIST in email-template-engine.ts enforces the same set a second time at render, substituting SAFE_PLACEHOLDER for anything outside it. The second layer exists because the first layer already missed once (GH #1329) — producer-side internal fields like failure_code, retry_attempt, stage_index must never reach a subscriber's inbox. (email-queue-handler.ts header comment; email-template-engine.ts FALLBACK_VARIABLE_ALLOWLIST + substituteUnallowedVariables.)

Every merchant-editable email routes through this allowlist twice before a customer ever sees it — that is the one thing an extension must never route around.

The capability breaks into twelve reader-facing features:

  • Transactional lifecycle emails — welcome, upcoming charge, charge succeeded/failed, paused/resumed/cancelled, shipment generated, sent automatically as each lifecycle event fires (US-23.1)
  • Optional SMS for time-sensitive events — charge-failed and shipment-shipped alerts via Twilio, consent-gated (US-23.2, P2)
  • Merchant-editable email templates — Markdown/MJML subject + body with variable placeholders and a live preview before save (US-23.3)
  • Custom sending domain — emails send from the merchant's own verified domain (DKIM/DMARC) instead of a generic platform address, with a 14-day fallback (US-23.9)
  • Deliverability monitoring — per-domain bounce/complaint-rate gate that throttles or pauses sending before a merchant's poor list hygiene damages shared reputation (US-23.10)
  • Suppression list, honored automatically — bounces, complaints, and marketing opt-outs stop future sends; transactional mail still gets through (US-23.11)
  • Passwordless magic-link access — single-use, time-boxed links for portal login (US-23.12)
  • Plain-language decline reasons in dunning email — issuer codes translated to subscriber-friendly copy, content contract only (producer lives in dunning, US-23.13)
  • A broken merchant template never blocks a billing email — automatic fallback to a structured plain-text message plus a merchant alert (US-23.14)
  • No duplicate emails on retry — every send is keyed so a workflow restart or queue redelivery can't double-send (US-23.15)
  • Delivery feedback loop — Resend's bounce/complaint/open/click webhooks close the loop into deliverability and suppression (US-23.16)
  • Localization-ready email pipeline — the renderer accepts a locale parameter today so Phase-2 translations slot in without a call-signature change; Phase-1 always renders en-US (US-23.17)

Dunning's per-stage retry emails and the auto-renewal legal reminder (US-23.18, counsel-gated) ride the same delivery mechanism but are covered in their own domain pages — dunning and ADR-0079, respectively.

The three decisions that carry the most weight:

  • Logical lifecycle events with consumer-side template mapping is the canonical wire-format pattern. Producers publish domain facts (subscription.activated, charge.failed, etc.) via logEvent; the consumer Worker owns the type → template_key mapping. Correct as ratified — but its own ship-status table is stale (see the canonical-framing attestation below). (ADR-0063.)
  • The transactional-outbox property only holds through logEvent. An events row plus EVENTS_QUEUE.send() in one Worker request, marked by queue_published_at, with cron/republish-unpublished-events.ts as the catch-up recovery path. A direct env.EVENTS_QUEUE.send() call skips both the D1 audit trail and the catch-up cron — several live sites still do this (see typed deltas). (ADR-0010 §1.)
  • email.requested is retained but demoted. ADR-0063 keeps the type valid for a future imperative-send use case (an admin "test send" button) but says it "is no longer emitted by lifecycle paths." Live producers still emit it from lifecycle paths — the opposite of what the ADR prescribes. (ADR-0063 "Out of scope" §.)

Canonical-framing attestation (operator-ratified 2026-07-02). Two independent gaps, both traced to code, not inferred from an ADR's prose:

  1. Vendor: no ADR ratifies Resend. ADR-0017 (canonical: true) specifies Postmark as the primary vendor with an idempotent-send outbox table, email_send_log. Zero production code calls Postmark or writes to that table (schema.sql:115-144 defines the table; nothing references it outside schema.sql). Every real send path — email-queue-handler.ts, apps/email-consumer, notify.ts — calls the Resend API against a differently-shaped email_sends table (schema.sql:626-646, provider TEXT NOT NULL DEFAULT 'resend'). No ADR formally supersedes ADR-0017's vendor choice — the closest artifact is email-domain-provisioner.ts's own header comment, which miscites its coupling as "ADR-0017 (Resend as transactional email provider)." ADR-0017's own Decision section never mentions Resend. This is a missing-decision-record gap, not just leftover residue: file it, don't paper over it.
  2. The flagship example doesn't run. ADR-0063's own ship-status table says subscription.renewed → renewal_confirmation is "Shipped today. No change." No producer anywhere calls logEvent(type: 'subscription.renewed') — confirmed by grep; the string exists only inside apps/email-consumer's own mapping table. See the live-state attestation below for the full trace.

How it actually works

Read the pipeline as: a producer fires a domain fact → logEvent writes an events row and publishes to EVENTS_QUEUE in one step (the outbox) → apps/email-consumer's deployed Worker consumes subs-events, maps the event type to a template_key, checks the deliverability gate and the per-send idempotency key, renders (merchant override or built-in default), and POSTs to Resend.

That's the canonical shape. Two things break it in practice, both traced below: several live producers skip logEvent and call EVENTS_QUEUE.send() directly with an email.requested type the deployed consumer doesn't map — and a second, fully-built consumer path never got wired into any deployed Worker at all.

The deployed consumer is apps/email-consumer/src/index.ts, bound to the shared subs-events queue via an active [[queues.consumers]] block in apps/email-consumer/wrangler.toml (main = "src/index.ts"). Its TYPE_TO_TEMPLATE_KEY table (11 entries) maps logical event types to template keys — subscription.activatedwelcome, charge.failedcharge_failed, subscription.renewedrenewal_confirmation, and so on (index.ts:65-77). An unmapped type returns 'skip' and the message is acked without sending anything (index.ts:456-460). For each mapped message, processMessage resolves the customer email by joining events → subscriptions → customers (index.ts:227-252), runs the deliverability gate (canSendForDomain, US-23.10 — bounce-rate >5% throttles, complaint-rate >0.1%/0.3% pause-warns or pause-hards, all against a trailing 7-day window with a 20-send minimum sample), looks up a merchant-customized template (email_templates table) or falls back to a per-template_key built-in default, then checks a per-send idempotency key before dispatch.

The idempotency key is {template_key}:{recipient_lowercased}:{source_event_id} (computeIdempotencyKey, index.ts:306-312, US-23.15). A pre-send lookup against the email_idempotency table (migration 0057) short-circuits a resend on queue redelivery; INSERT OR IGNORE plus the table's unique index absorb the narrow race between concurrent invocations on the same key (index.ts:346-373). The same key is also passed as Resend's Idempotency-Key header, so Resend's own dedupe collapses anything that slips past the D1 check.

A second consumer entry point exists in the same package but is not deployed. apps/email-consumer/src/worker.ts (v0.2) routes portal.magic_link_requested and charge.failed-shaped dunning messages to their own handlers before falling through to the v0.1 processMessage. Its own header says: "To activate: update wrangler.toml main from src/index.ts to src/worker.ts." wrangler.toml's main field is still src/index.tsworker.ts has never been the deployed entry point. Concretely: the magic-link flow does everything right on the producer side — routes/portal/auth/request-link.ts calls logEvent(type: 'portal.magic_link_requested') and separately sends the same type directly to EVENTS_QUEUE (lines 121-138) — but the deployed index.ts consumer's TYPE_TO_TEMPLATE_KEY has no portal.magic_link_requested entry, so that message hits the same unmapped-type 'skip' path as everything else this section documents. Magic-link portal-login email is undeliverable via the live consumer today, for the identical root cause as the renewal-confirmation gap below.

A third, separate handler is real, tested, and unreachable for a different reason. apps/api/src/services/email-queue-handler.ts implements the email.requested-shaped consumer (suppression check with the transactional-bypass rule, template render, Resend send, email_sends insert) and is wired as apps/api/src/worker.ts's own queue() export, G4-tested by epic-23-delivery.scenario.ts. But infra/cloudflare/wrangler.toml — the subs-api Worker's own deploy config, the same file worker.ts belongs to — has no active [[queues.consumers]] block; it is commented out with the note "Until consumers exist, queue messages are discarded..." (lines 71-84). Cloudflare never invokes a Worker's queue() export without a consumer binding. handleEmailQueue cannot run in the deployed subs-api Worker today, regardless of its own test coverage.

Read the failure as: three producers publish email.requested directly via EVENTS_QUEUE.send(), bypassing logEvent — the renewal-confirmation path in scheduler.ts, the subscriber-initiated resume in routes/portal/resume.ts, and the system resume sweep in cron/pause-resume-sweep.ts. All three land on the same deployed queue, consumed by the same index.ts Worker whose mapping table has no email.requested entry at all — so every one of those messages hits 'skip' and is silently acked. The renewal-confirmation dead end is the flagship case both ADR-0063 and the epic-23 derived view cite as proof the pattern ships:

sequenceDiagram
    autonumber
    participant Proc as scheduler.ts::processCharge
    participant DB as D1
    participant Queue as EVENTS_QUEUE / email.requested
    participant Consumer as email-consumer/index.ts::processMessage

    Proc->>DB: markChargeAttempted(charge.id, dbResult) (scheduler.ts:1112)
    Proc->>DB: logEvent('charge.succeeded') (db.ts::logEvent → INSERT events)
    Proc->>Queue: EVENTS_QUEUE.send({type:'email.requested', template_key:'renewal_confirmation'})
    Note over Proc,Queue: Bypasses logEvent — no events row, no outbox marker,<br/>no republish-unpublished-events coverage (ADR-0010 §1).
    Queue->>Consumer: message.type = 'email.requested'
    Consumer-->>Queue: TYPE_TO_TEMPLATE_KEY['email.requested'] is undefined → return 'skip'
    Note over Consumer: message.ack() — dropped silently, no retry, no DLQ.

Diagram provenance. The EVENTS_QUEUE.send line and its surrounding participants are transcribed verbatim from § 1 "Renewal — end-to-end charge sequence" of the canonical, code-sourced docs/architecture/sequence-diagrams.md (sign_off: pending — accurate to the code, not yet human-attested). This is a focused excerpt: the full renewal sequence (order creation, the BC Payments three-call charge, dunning branch) is already transcluded verbatim in canonical-charge-rail.md and dunning.md — both share that one source file's provenance, not three independent diagrams. The Consumer participant and its two messages are added here (not present in the source diagram) to show what actually happens to the message on the receiving end, traced directly against email-consumer/index.ts:65-77 and :456-460 — not part of the transcluded mermaid block itself.

The one path that does deliver, redundantly. System-triggered resume (cron/pause-resume-sweep.ts:69-89) calls logEvent(type: 'subscription.resumed') and separately sends a direct email.requested message. subscription.resumed is in TYPE_TO_TEMPLATE_KEY (index.ts:73), so the logEvent path delivers the resume email correctly — the email.requested duplicate alongside it is simply dead weight, not a second delivery (Resend never receives two sends for one resume; only the mapped-type message reaches a template). Subscriber-initiated resume (routes/portal/resume.ts:83-93) has no such luck — it fires only email.requested, with no compensating logEvent call anywhere in the handler, so it has no path to delivery at all.

The invariant enforcement, concretely. email-queue-handler.ts (the unreachable subs-api consumer) builds its variables map from six allowlisted keys only — subscriber.first_name, subscriber.email, subscription.id, store.name, link.url, link.label, plus charge.amount / charge.currency when present (email-queue-handler.ts:288-297). renderTemplateWithFallback (email-template-engine.ts:318-379) enforces the same set a second time at render: any {{variable}} reference outside FALLBACK_VARIABLE_ALLOWLIST is substituted with [redacted] before the merchant template ever runs, and a template.render_failed event is emitted so the merchant is alerted their template referenced something it shouldn't (US-23.14). If the render throws outright, the same function falls back to a structured plain-text body built from four safe fields (buildFallbackBody) — the customer still gets an email, never a dropped notification and never a leaked internal field.

Suppression is a two-tier gate, live in both consumer paths. email_suppressions.reason distinguishes hard blocks (bounced, complained, manual/DSAR-erasure — never sent, any template) from unsubscribed (marketing opt-out only — transactional templates listed in TRANSACTIONAL_TEMPLATE_KEYS bypass it, email-queue-handler.ts:49-73). The deployed index.ts consumer instead gates on domain-level deliverability state (canSendForDomain) rather than this per-recipient suppression table — the two consumers implement related but not identical policy, since only one of them (email-queue-handler.ts) is reachable from email.requested-shaped messages, and that path is itself unreachable per the wrangler binding gap above.

Sending internal, non-subscriber emails skips the queue entirely. notify.ts is a thin, direct Resend wrapper used only for the processor-health-down alert to a merchant contact (US-2.5) — deliberately not routed through the outbox or email-queue-handler.ts's deliverability/suppression checks, since it is low-volume, best-effort, and operator-facing rather than subscriber-facing (file header, notify.ts:1-27).

Localization is architecture-only for email today. The renderer accepts no locale parameter anywhere in the call chain — grepping locale across email-template-engine.ts, email-queue-handler.ts, and email-consumer/index.ts returns nothing. Only the en-US catalog exists on disk (apps/i18n/catalogs/en-US/transactionalEmail.json). ADR-0006's four-part i18n strategy and ADR-0017 §5's subscriber.preferred_locale → store.default_locale → 'en-US' fallback chain are unimplemented for email specifically — every email renders en-US regardless of subscriber locale, which is the deliberate Phase-1 scope per BRD §US-23.17 AC4, not a bug.

Where intent and reality diverge

The derived coverage matrix (_coverage-matrix.json) reports all 18 of Epic-23's US-23.x rows at g4_status: pass. For this domain, that green is the most misleading kind of true — every scenario proving it constructs a MessageBatch and calls handleEmailQueue directly, never through the deployed queue wiring this page just traced. Six typed deltas:

  • Superseded-framing residueADR-0017 (canonical: true) specifies Postmark as the primary vendor with an email_send_log idempotency table; zero code implements either. All real delivery code calls Resend against email_sends (schema.sql:626, provider defaults 'resend'). No ADR formally ratifies this switch — email-domain-provisioner.ts's header comment miscites ADR-0017 as the Resend decision, which it is not. Missing-decision-record gap, not merely leftover residue.
  • Superseded-framing residue — ADR-0063 names exactly two migration targets for the demoted email.requested pattern — term-nudge-sweep.ts and term-decline-sweep.ts — and both have since migrated to logEvent. But the pattern persists, unmigrated, at sites the ADR never enumerated: scheduler.ts (the flagship renewal-confirmation path), routes/portal/resume.ts (subscriber resume, no compensating logical event at all), and cron/pause-resume-sweep.ts (system resume — has a compensating logEvent('subscription.resumed'), so this one path delivers, redundantly, alongside its own dead email.requested duplicate).
  • Built-but-untroddenTYPE_TO_TEMPLATE_KEY in the deployed email-consumer/index.ts is fully code-complete for its documented flagship case, subscription.renewed → renewal_confirmation, but no producer anywhere calls logEvent(type: 'subscription.renewed'). Renewal-confirmation — the example both ADR-0063 and the epic-23 derived view cite as proof the pattern ships — is not deliverable as currently wired.
  • Built-but-untroddenapps/api/src/worker.ts exports a queue() handler wired to handleEmailQueue, real and G4-tested (epic-23-delivery.scenario.ts) — but infra/cloudflare/wrangler.toml has no active [[queues.consumers]] binding for subs-events; the block is a commented-out Phase-2 placeholder. Cloudflare does not invoke a Worker's queue() export without a consumer binding, so this handler cannot run in the deployed Worker today regardless of test coverage.
  • Built-but-untroddenapps/email-consumer/src/worker.ts (v0.2 — adds portal.magic_link_requested and charge.failed-dunning routing atop the v0.1 mapping table) is real, tested code whose own header says how to activate it; wrangler.toml's main still points at src/index.ts. Not deployed — concretely, this is why magic-link portal-login email is also undeliverable via the live consumer today, traced directly above.
  • Verified-but-incomplete — US-23.17 (Localization framework) is G4-pass (epic-23-templating.scenario.ts), and honestly so: the scenario explicitly asserts the Phase-1 MVP no-op (any locale other than en-US returns the en-US body — BRD §US-23.17 AC4). No locale variable is read anywhere in the render chain. ADR-0006's four-part strategy and ADR-0017 §5's locale-resolution chain are unimplemented for email — the green is a correct proof of the intentional no-op, not evidence the framework exists.
  • Named-deferred — five US-23.x admin/portal surfaces are explicitly marked "NOT YET BUILT — forward-looking contract" in their own ui-states blocks (epic-23 derived view), with no matching route in the repo: the subscriber notification-preferences toggle (US-23.1), the admin suppression-list UI (US-23.11), the deliverability dashboard (US-23.10), merchant alerts/digest config (US-23.4), and per-locale template management (US-23.17 admin side).
  • Contract-verified, not live-verified — every US-23.x row is G4-pass via epic-23-delivery.scenario.ts / epic-23-templating.scenario.ts, both of which construct a MessageBatch and call handleEmailQueue (or the render functions) directly — never through the deployed queue wiring traced above. This is a stronger claim than dunning's own "G4 mocks the queue" note: here the routing itself, not just the send step, is unproven live. No G5/live-delivery evidence exists in this repo for Epic-23 email beyond dunning's already-attested mocked-queue G4 tier.

How to operate & extend

  • Add a new lifecycle email: the producer calls logEvent(type: '...') (never EVENTS_QUEUE.send() directly — that bypasses the D1 audit trail and the catch-up cron per ADR-0010 §1), and the deployed email-consumer/index.ts gets one new row in TYPE_TO_TEMPLATE_KEY plus a matching DEFAULT_TEMPLATES entry (index.ts:65-173). Two-file coordination by design (ADR-0063's stated tradeoff) — producer event and consumer mapping row, nothing else.
  • The invariant you must not break: the double-layer variable allowlist (above). Any new template variable must be added to both email-queue-handler.ts's explicit build step and FALLBACK_VARIABLE_ALLOWLIST in email-template-engine.ts — adding it to only one leaves either a missing-variable bug or a leak vector.
  • Fixing the flagship renewal-confirmation gap (the highest-leverage fix in this domain): replace scheduler.ts's direct EVENTS_QUEUE.send({type: 'email.requested', ...}) (around scheduler.ts:1226-1234) with logEvent(type: 'subscription.renewed', ...). The consumer-side mapping already exists and needs no change — this is a pure producer-side fix that turns a dead-on-arrival delta into a shipped feature.
  • Before wiring handleEmailQueue live: uncomment and configure [[queues.consumers]] in infra/cloudflare/wrangler.toml — but first decide whether subs-api's email.requested-shaped consumer or email-consumer's logical-event consumer is the one canonical path going forward; running both against the same subs-events queue would double-consume messages neither currently reads.
  • Activating v0.2 (magic-link, dunning-routed charge.failed): update apps/email-consumer/wrangler.toml's main from src/index.ts to src/worker.ts — the routing code and its handlers already exist and are tested; only the deploy config needs to change.
  • Where merchants customize templates: email_templates D1 table, looked up by (store_hash, template_key) in both consumer paths — a present row wins over DEFAULT_TEMPLATES / BUILT_IN_SUBJECTS/BUILT_IN_BODIES.
  • Where deliverability state lives: email_domains.send_state — read by canSendForDomain before every dispatch in the deployed consumer, written by evaluateAndPersist in deliverability-monitor.ts against a trailing 7-day window over email_sends.
  • Extension seams: new consumer Workers can subscribe to the same subs-events queue and read the same logical events without any producer-side change (ADR-0063's explicit multi-consumer goal — Epic 27's outbound-webhook Worker is the sibling example); a new lifecycle type needs only the two-file coordination above.

Confidence notes

  • Magic-link undeliverability is my own trace, extending Input-B's typed delta rather than restating it verbatim. Input-B's "Built-but-untrodden" delta for v0.2 describes the file generically ("adds portal.magic_link_requested and charge.failed-dunning routing"). I additionally traced request-link.ts and confirmed it correctly calls logEvent and a direct EVENTS_QUEUE.send, and that the deployed index.ts consumer's TYPE_TO_TEMPLATE_KEY has no entry for portal.magic_link_requested — meaning even the logEvent-correct path dead-ends on the receiving side, for the same wrangler-main reason as the delta already names. I present this as a concrete instance of Input-B's existing delta, not a new, independently-typed one.
  • The redundant-vs-dead distinction for the two resume paths (system resume delivers redundantly; subscriber-initiated resume has no delivery path at all) is my own trace of pause-resume-sweep.ts:69-89 against resume.ts:83-93 and TYPE_TO_TEMPLATE_KEY, cross-checking Input-B's live_state attestation, which states the same conclusion in prose. I independently confirmed the code shape rather than only citing the attestation.
  • notify.ts and deliverability-monitor.ts are read for Move 2 context, not separately re-attested. Input-B's typed deltas don't mention either file; I traced them myself to explain the two-tier suppression/gate picture and the internal-alert exception. Neither changes any typed delta — they're mechanism detail supporting the deltas already ratified.
  • I did not independently re-run any G4 scenario or a live send — the "contract-verified, not live-verified" delta and the coverage-matrix numbers are relayed from Input-B's live_state attestation and the derived coverage matrix file, both already operator-ratified 2026-07-02.