Skip to content

Test strategy — bc-subscriptions

Generated from a canonical source

This page is a read-only projection of docs/methodology/test-strategy.md. Edit the canonical file, then run npm --prefix tools/project-knowledge-derive run derive.

Consolidated statement of what testing exists, what it proves, and how it maps to the Definition-of-Done verification ladder (ADR-0067). This document describes the real runtime — Cloudflare Workers + D1. If a claim here can't be traced to a config file, tool, or ADR in the repo, it doesn't belong.

The organizing frame: G1-G5

tools/state-derive and the coverage tooling exist because "done" is not one question — it's five, and conflating them produces false confidence. Per ADR-0067:

Gate Question Oracle What it does NOT prove
G1 spec Is the AC written? Traceability registry (BRD/PRD parse) Nothing built yet
G2 prototype Does a prototype screen exist? Traceability prototypePages join Production behavior
G3 presence Are the expected code artifacts present? tools/state-derive — static grep/file_exists/schema_has_* checks Whether the artifact works
G4 behavior Does a runnable scenario exist and pass? Recorded scenario-results evidence (tools/scenario-results/) Whether it works against the deployed app
G5 live Green vs the deployed app + live environment? Live E2E run or blocked-pending-external marker

G3 is a ceiling, not a verdict of correctness. Every one of state-derive's 1,166 check primitives is static (grep_present, file_exists, schema_has_column, etc.) — none executes code. COMPLIANT means present, never shipped or working (tools/state-derive/OWNER-SPEC.md). The canonical failure example: a storefront widget injector emitting one selector while the PDP renderer reads another — both files exist, both presence checks pass, the feature is broken. Presence checks structurally cannot catch seam bugs, because seam bugs live in the behavior between artifacts that each individually exist.

Monotone claims. A gate cannot be claimed without the gates below it. "G4 green" with no G3 evidence is a register bug. blocked-pending-external is first-class (e.g. the bc-payments charge rail, blocked on PI-5062 + partner-track beta) — distinguishable from "unbuilt," never miscounted as a gap or a pass.

ADR-0076 (proof-obligation registry, tools/spec-obligation-registry/) generalizes this ladder below AC-grain to arbitrary "prove it" claims (telemetry events, data-contract fields, NFRs) with the same law: a proof's evidence must come from a source the claim does not control. The ladder's five gates are five of its obligations, not replaced by it.

Test categories mapped to the real stack

Unit — apps/api/vitest.config.ts

Runs via @cloudflare/vitest-pool-workers, which executes every test inside an actual workerd instance via miniflare, with a real in-memory D1 binding — not a mocked D1 client. Migrations are loaded once at config time via readD1Migrations() against apps/api/migrations/schema/, exposed as a TEST_MIGRATIONS binding; tests call applySchema() in beforeEach, which delegates to applyD1Migrations(env.DB, env.TEST_MIGRATIONS). This means the test schema tracks production migrations automatically. Covers individual primitives — one route handler, one function, one query — in isolation.

Run: npm test (root-level, fans out via Turborepo to every workspace member with a test script — currently apps/api + apps/i18n).

Scenarios / behavioral (G4) — apps/api/vitest.scenarios.config.mts

Separate config from the unit gate (Hive #583) — extended timeout (testTimeout: 30_000), separate reporter output, so a slow behavioral suite never throttles the unit gate. Uses the same @cloudflare/vitest-pool-workers pool as the unit config — real workerd, real in-memory D1, same applySchema() seeding contract. This is the load-bearing point: scenarios run against the actual Workers + D1 runtime, so a schema/code mismatch that an inline-schema unit test would silently miss (a CREATE TABLE fixture in the test file that doesn't match the real migration) surfaces as a real D1_ERROR / constraint failure — an honest RED, not a false GREEN (inline-schema-unit-tests-false-green landmine below).

Scenarios live at apps/api/test/scenarios/*.scenario.ts — one file per epic/feature area (epic-01-install-registration.scenario.ts through epic-27-*, plus cross-cutting files like charge-retry-backoff.scenario.ts). Per apps/api/test/scenarios/README.md: a scenario is added when a BRD AC describes a multi-step, multi-route flow; each scenario has an initial() DB seed and named steps with do/expect. Scenarios reference their AC via the acs: [...] tag field (per ADR-0067's D2 resolution) so results join back to the AC in the coverage matrix.

Run: npm run test:scenarios (or npx vitest --config apps/api/vitest.scenarios.config.mts run).

In CI, setting SCENARIO_RESULTS_JSON makes the config additionally emit a Vitest json reporter file, which tools/scenario-results/ normalizes into _scenario-results.json — the artifact state-derive's scenario_passes check primitive reads to populate G4. This is how a scenario pass (not just a scenario file existing) becomes a mechanically-derived G4 verdict.

E2E / Playwright

Three configs, each scoped to a different app surface, none of them a monolith:

  • apps/admin/playwright.config.ts — admin surface (React/BigDesign).
  • apps/storefront-svelte/playwright.config.ts — Svelte storefront widget/portal.
  • apps/storefront-catalyst/playwright.config.ts — Catalyst storefront host.

The e2e-tier2 gate (.github/workflows/e2e-tier2.yml) runs two jobs against deploy-affecting PRs (paths: apps/api/**, apps/admin/**, e2e/**, infra/cloudflare/**) plus every push to main: - contract — Vitest + fetch, verifies the deployed admin↔API contract. - behavior — playwright-bdd @runnable scenarios, verifies client-side routing against a locally-built admin. Per the ADR-0067 rework note in the workflow header: the deployed admin sits behind Cloudflare Access (401), so "run against the deployed surface" could never produce a verdict — this is G4 (build-behavior), not G5 (live). @skip-tagged scenarios remain documentation-of-intent pending the isolation surface from ADR-0065.

Consumer-facing flows have their own path-scoped gate: .github/workflows/consumer-flows-e2e.yml, triggered on PRs touching apps/storefront-catalyst/** or apps/storefront-svelte/**.

ADR-0065's validated isolation pattern (amended 2026-06-25 after implementation): not per-test D1-store provisioning against a real sandbox, but worker.fetch() + a minted HS256 JWT (signBcSignedPayload, verified by the real authenticateRequest) + an injected EVENTS_QUEUE capturing sink, against the existing applySchema'd ephemeral D1. This drives the REAL router (proving route-wiring, not just handler logic) without needing a deployed target or real BC sandbox. The exemplar is apps/api/test/scenarios/settings-store-full-stack.scenario.ts. Browser-render Playwright specs (e.g. apps/admin/tests/e2e/store-settings.spec.ts) stay stub-auth + page.route()-mocked — real for render/UI-state, but structurally cannot catch a route-orphan (handler exists and passes G4, but is never wired into worker.ts) because the API layer is mocked out. That gap is exactly what the worker.fetch scenario layer closes.

Route-reachability — .github/workflows/route-orphan-lint.yml

PR-triggered on changes to apps/api/src/routes/**, apps/api/src/worker.ts, apps/api/test/**. Closes the specific "G4 green scenario calls the handler directly, never proving it's wired into the real router" gap (see landmines below).

Security — .github/workflows/gitleaks.yml

Runs on every pull_request (Tier 1 per Hive #1036). Scans the PR diff for committed secrets. Push-side scanning was deliberately dropped (Hive #1371, 2026-05-20) — worktree-push events fired the scan before a PR existed, doubling run count with no added coverage; a future narrow push: branches: [main] trigger is reserved for a force-push-bypasses-PR scenario, not added speculatively.

Dependency scanning: .github/dependabot.yml — weekly (Monday 06:00) npm updates, opened against dev (not main — dependabot-to-main caused recurring drift), grouped by security vs patch update-type.

No Lighthouse-CI workflow exists in this repo (verified: no .github/workflows/*lighthouse* file). Performance/perf-budget gating is not currently part of this stack — do not cite it as present.

Tooling itself — .github/workflows/tools-tests.yml

PR-triggered on tools/** changes — the substrate tools (state-derive, coverage-matrix-derive, scenario-results, etc.) carry their own test suites and are gated the same as application code.

The two-tier coverage truth

tools/coverage-matrix-derive produces docs/audits/derived/_coverage-matrix.{json,md}, the per-AC × per-test-type instrument. It draws a hard line between two numbers that must never be conflated:

  • TAGGED (acs_with_any_test) — a test file carries an intentional tag linking it to an AC: acs: [...] in *.scenario.ts, @ac:US-X.Y in *.feature, or a BRD §US-X.Y prose comment. This is presence-of-claim, not presence-of-proof — a bare mention of an AC ID does not count; the tag itself is the unit of backfill work.
  • G4-VERIFIED (acs_g4_verified) — a scenario tagged to the AC actually passed, per the committed _state.json's scenario_passes verdict.

Tagging-backfill lifts TAGGED, never VERIFIED. A high TAGGED % over a low VERIFIED % is a legibility win (we know what's untested), not a verification win. Per the coverage-matrix-derive README's authority boundary: this tool reads the G4 verdict from state-derive's output; it does not own the G4 gate itself.

The "dark" tier in the coverage matrix means no tagged test, not unbuilt — a capability can be fully shipped and tested but untagged (re-proven with US-12.5: refund read "dark" while fully shipped). Never estimate completion from tag/presence signals; only G4/G5 mean verified.

Below AC-grain, the proof-obligation registry (ADR-0076, tools/spec-obligation-registry/) extends the same discipline to normative requirements inside a story body (telemetry events, data-contract fields, NFRs) that sit below the AC and were previously ungrounded by any lint.

CI topology

Trigger Workflow(s) What runs
Every PR gitleaks.yml Secret scan on the diff
PR touching tools/** tools-tests.yml Substrate tool test suites
PR touching apps/api/routes/**, worker.ts, test/** route-orphan-lint.yml Route-reachability check
PR touching apps/api/test/scenarios/**, scenarios config scenario-results.yml Normalizes scenario JSON → _scenario-results.json
PR touching apps/api/**, apps/admin/**, e2e/**, infra/cloudflare/** e2e-tier2.yml Contract + behavior (Playwright-bdd) jobs
PR touching apps/storefront-catalyst/**, apps/storefront-svelte/** consumer-flows-e2e.yml Consumer-facing Playwright flows
Push to main only + nightly cron (06:00 UTC) test.yml npm test (unit, workspace-wide via Turborepo) THEN npm run test:scenarios (G4 behavioral)
Push to main (path-scoped) admin-playwright.yml Admin Playwright suite
Weekly (Monday) dependabot npm dependency PRs against dev

The Pattern-1 landmine

test.yml — the workflow that runs both npm test and npm run test:scenarios — triggers only on push: branches: [main] and the nightly cron. It does not trigger on Pattern-1 direct-to-dev pushes (see dev-push-patterns in CLAUDE.md — local-integration → direct dev push is a preferred pattern for single-session coherent work). This means a Pattern-1 push to dev touching apps/api/ skips the unit-test gate entirely until the next operator-driven dev → main fast-forward or the nightly run.

Mitigation: run apps/api vitest locally before syncing any apps/api work via Pattern-1. Don't rely on CI to catch it same-day.

Known test landmines

These are real, previously-hit failure modes — not hypothetical risks.

Inline-schema unit tests false-green. A unit test that hand-writes a CREATE TABLE fixture without the production CHECK constraints tests a fiction: it can pass while the real migration would reject the same write. G4 scenarios calling applySchema() (which applies the actual migrations/schema/*.sql files) catch real schema/code mismatches that inline fixtures cannot. A scenario going RED with a D1_ERROR / constraint violation is an honest RED — it found something real, not a broken test harness.

vitest-scenarios worktree quirk. npm run test:scenarios reliably fails inside .worktrees/* with Cannot find module '@cloudflare/vitest-pool-workers'. Root cause (documented in apps/api/test/scenarios/README.md): a fresh git worktree's workspace root has no installed node_modules/; the pool-worker binaries live at the workspace root, not inside apps/api/node_modules. Fix: run npm run test:scenarios from the main checkout root, or npm install inside the worktree first. The cacheDir setting in vitest.scenarios.config.mts pins the Vite cache to a worktree-local path once deps are installed, preventing stale cache bleed across worktrees.

G4 route-orphan gap. G4 scenarios historically called route handlers directly — bypassing apps/api/src/worker.ts's real router entirely. A green G4 scenario under that shape proves the handler logic works; it does not prove the route is reachable over HTTP. Confirmed unrouted at time of the 2026-06-23 audit (ADR-0065): gift purchase+claim (US-6.1), prepaid (US-6.2), eligibility-audit (US-26.10). The fix direction is the worker.fetch() + minted-JWT + queue-sink pattern (the settings-store-full-stack.scenario.ts exemplar) plus the standing route-orphan-lint.yml gate on route/worker.ts changes — not a presence check, an actual dispatch through the real router.

What this strategy deliberately excludes

  • No external-database test harness of any kind. The real runtime is Cloudflare Workers + D1. @cloudflare/vitest-pool-workers runs tests in an actual workerd instance with a real in-memory D1 binding — this is strictly better fidelity than mocking D1 or standing up a substitute relational database, and it's what both vitest.config.ts and vitest.scenarios.config.mts actually use.
  • No Lighthouse-CI. Not present in .github/workflows/; do not assume it exists when planning performance work.
  • No claim that G3 (state-derive COMPLIANT) means done. It means present. Only G4 (scenario passed) and G5 (live) mean verified.