Test strategy — bc-subscriptions¶
Generated from a canonical source
This page is a read-only projection of docs/methodology/test-strategy.md.
Edit the canonical file, then run npm --prefix tools/project-knowledge-derive run derive.
Consolidated statement of what testing exists, what it proves, and how it maps to the Definition-of-Done verification ladder (ADR-0067). This document describes the real runtime — Cloudflare Workers + D1. If a claim here can't be traced to a config file, tool, or ADR in the repo, it doesn't belong.
The organizing frame: G1-G5¶
tools/state-derive and the coverage tooling exist because "done" is not one
question — it's five, and conflating them produces false confidence. Per
ADR-0067:
| Gate | Question | Oracle | What it does NOT prove |
|---|---|---|---|
| G1 spec | Is the AC written? | Traceability registry (BRD/PRD parse) | Nothing built yet |
| G2 prototype | Does a prototype screen exist? | Traceability prototypePages join |
Production behavior |
| G3 presence | Are the expected code artifacts present? | tools/state-derive — static grep/file_exists/schema_has_* checks |
Whether the artifact works |
| G4 behavior | Does a runnable scenario exist and pass? | Recorded scenario-results evidence (tools/scenario-results/) |
Whether it works against the deployed app |
| G5 live | Green vs the deployed app + live environment? | Live E2E run or blocked-pending-external marker |
— |
G3 is a ceiling, not a verdict of correctness. Every one of state-derive's 1,166
check primitives is static (grep_present, file_exists, schema_has_column, etc.) —
none executes code. COMPLIANT means present, never shipped or working
(tools/state-derive/OWNER-SPEC.md). The
canonical failure example: a storefront widget injector emitting one selector while
the PDP renderer reads another — both files exist, both presence checks pass, the
feature is broken. Presence checks structurally cannot catch seam bugs, because seam
bugs live in the behavior between artifacts that each individually exist.
Monotone claims. A gate cannot be claimed without the gates below it. "G4 green"
with no G3 evidence is a register bug. blocked-pending-external is first-class (e.g.
the bc-payments charge rail, blocked on PI-5062 + partner-track beta) — distinguishable
from "unbuilt," never miscounted as a gap or a pass.
ADR-0076 (proof-obligation registry, tools/spec-obligation-registry/) generalizes
this ladder below AC-grain to arbitrary "prove it" claims (telemetry events, data-contract
fields, NFRs) with the same law: a proof's evidence must come from a source the claim
does not control. The ladder's five gates are five of its obligations, not replaced by it.
Test categories mapped to the real stack¶
Unit — apps/api/vitest.config.ts¶
Runs via @cloudflare/vitest-pool-workers, which executes every test inside an
actual workerd instance via miniflare, with a real in-memory D1 binding — not a
mocked D1 client. Migrations are loaded once at config time via readD1Migrations()
against apps/api/migrations/schema/, exposed as a TEST_MIGRATIONS binding; tests
call applySchema() in beforeEach, which delegates to applyD1Migrations(env.DB,
env.TEST_MIGRATIONS). This means the test schema tracks production migrations
automatically. Covers individual primitives — one route handler, one function, one
query — in isolation.
Run: npm test (root-level, fans out via Turborepo to every workspace member with a
test script — currently apps/api + apps/i18n).
Scenarios / behavioral (G4) — apps/api/vitest.scenarios.config.mts¶
Separate config from the unit gate (Hive #583) — extended timeout (testTimeout:
30_000), separate reporter output, so a slow behavioral suite never throttles the
unit gate. Uses the same @cloudflare/vitest-pool-workers pool as the unit config
— real workerd, real in-memory D1, same applySchema() seeding contract. This is
the load-bearing point: scenarios run against the actual Workers + D1 runtime, so a
schema/code mismatch that an inline-schema unit test would silently miss (a CREATE
TABLE fixture in the test file that doesn't match the real migration) surfaces as a
real D1_ERROR / constraint failure — an honest RED, not a false GREEN
(inline-schema-unit-tests-false-green landmine below).
Scenarios live at apps/api/test/scenarios/*.scenario.ts — one file per epic/feature
area (epic-01-install-registration.scenario.ts through epic-27-*, plus
cross-cutting files like charge-retry-backoff.scenario.ts). Per
apps/api/test/scenarios/README.md: a scenario is added when a BRD AC describes a
multi-step, multi-route flow; each scenario has an initial() DB seed and named
steps with do/expect. Scenarios reference their AC via the acs: [...] tag field
(per ADR-0067's D2 resolution) so results join back to the AC in the coverage matrix.
Run: npm run test:scenarios (or npx vitest --config apps/api/vitest.scenarios.config.mts run).
In CI, setting SCENARIO_RESULTS_JSON makes the config additionally emit a Vitest
json reporter file, which tools/scenario-results/ normalizes into
_scenario-results.json — the artifact state-derive's scenario_passes check
primitive reads to populate G4. This is how a scenario pass (not just a scenario
file existing) becomes a mechanically-derived G4 verdict.
E2E / Playwright¶
Three configs, each scoped to a different app surface, none of them a monolith:
apps/admin/playwright.config.ts— admin surface (React/BigDesign).apps/storefront-svelte/playwright.config.ts— Svelte storefront widget/portal.apps/storefront-catalyst/playwright.config.ts— Catalyst storefront host.
The e2e-tier2 gate (.github/workflows/e2e-tier2.yml) runs two jobs against
deploy-affecting PRs (paths: apps/api/**, apps/admin/**, e2e/**,
infra/cloudflare/**) plus every push to main:
- contract — Vitest + fetch, verifies the deployed admin↔API contract.
- behavior — playwright-bdd @runnable scenarios, verifies client-side routing
against a locally-built admin. Per the ADR-0067 rework note in the workflow header:
the deployed admin sits behind Cloudflare Access (401), so "run against the deployed
surface" could never produce a verdict — this is G4 (build-behavior), not G5 (live).
@skip-tagged scenarios remain documentation-of-intent pending the isolation surface
from ADR-0065.
Consumer-facing flows have their own path-scoped gate:
.github/workflows/consumer-flows-e2e.yml, triggered on PRs touching
apps/storefront-catalyst/** or apps/storefront-svelte/**.
ADR-0065's validated isolation pattern (amended 2026-06-25 after implementation):
not per-test D1-store provisioning against a real sandbox, but worker.fetch() +
a minted HS256 JWT (signBcSignedPayload, verified by the real authenticateRequest)
+ an injected EVENTS_QUEUE capturing sink, against the existing applySchema'd
ephemeral D1. This drives the REAL router (proving route-wiring, not just handler
logic) without needing a deployed target or real BC sandbox. The exemplar is
apps/api/test/scenarios/settings-store-full-stack.scenario.ts. Browser-render
Playwright specs (e.g. apps/admin/tests/e2e/store-settings.spec.ts) stay
stub-auth + page.route()-mocked — real for render/UI-state, but structurally
cannot catch a route-orphan (handler exists and passes G4, but is never wired
into worker.ts) because the API layer is mocked out. That gap is exactly what the
worker.fetch scenario layer closes.
Route-reachability — .github/workflows/route-orphan-lint.yml¶
PR-triggered on changes to apps/api/src/routes/**, apps/api/src/worker.ts,
apps/api/test/**. Closes the specific "G4 green scenario calls the handler
directly, never proving it's wired into the real router" gap (see landmines below).
Security — .github/workflows/gitleaks.yml¶
Runs on every pull_request (Tier 1 per Hive #1036). Scans the PR diff for
committed secrets. Push-side scanning was deliberately dropped (Hive #1371,
2026-05-20) — worktree-push events fired the scan before a PR existed, doubling
run count with no added coverage; a future narrow push: branches: [main] trigger
is reserved for a force-push-bypasses-PR scenario, not added speculatively.
Dependency scanning: .github/dependabot.yml — weekly (Monday 06:00) npm updates,
opened against dev (not main — dependabot-to-main caused recurring drift),
grouped by security vs patch update-type.
No Lighthouse-CI workflow exists in this repo (verified: no .github/workflows/*lighthouse*
file). Performance/perf-budget gating is not currently part of this stack — do not
cite it as present.
Tooling itself — .github/workflows/tools-tests.yml¶
PR-triggered on tools/** changes — the substrate tools (state-derive,
coverage-matrix-derive, scenario-results, etc.) carry their own test suites and
are gated the same as application code.
The two-tier coverage truth¶
tools/coverage-matrix-derive produces docs/audits/derived/_coverage-matrix.{json,md},
the per-AC × per-test-type instrument. It draws a hard line between two numbers that
must never be conflated:
- TAGGED (
acs_with_any_test) — a test file carries an intentional tag linking it to an AC:acs: [...]in*.scenario.ts,@ac:US-X.Yin*.feature, or aBRD §US-X.Yprose comment. This is presence-of-claim, not presence-of-proof — a bare mention of an AC ID does not count; the tag itself is the unit of backfill work. - G4-VERIFIED (
acs_g4_verified) — a scenario tagged to the AC actually passed, per the committed_state.json'sscenario_passesverdict.
Tagging-backfill lifts TAGGED, never VERIFIED. A high TAGGED % over a low VERIFIED %
is a legibility win (we know what's untested), not a verification win. Per the
coverage-matrix-derive README's authority boundary: this tool reads the G4
verdict from state-derive's output; it does not own the G4 gate itself.
The "dark" tier in the coverage matrix means no tagged test, not unbuilt — a capability can be fully shipped and tested but untagged (re-proven with US-12.5: refund read "dark" while fully shipped). Never estimate completion from tag/presence signals; only G4/G5 mean verified.
Below AC-grain, the proof-obligation registry (ADR-0076,
tools/spec-obligation-registry/) extends the same discipline to normative
requirements inside a story body (telemetry events, data-contract fields, NFRs) that
sit below the AC and were previously ungrounded by any lint.
CI topology¶
| Trigger | Workflow(s) | What runs |
|---|---|---|
| Every PR | gitleaks.yml |
Secret scan on the diff |
PR touching tools/** |
tools-tests.yml |
Substrate tool test suites |
PR touching apps/api/routes/**, worker.ts, test/** |
route-orphan-lint.yml |
Route-reachability check |
PR touching apps/api/test/scenarios/**, scenarios config |
scenario-results.yml |
Normalizes scenario JSON → _scenario-results.json |
PR touching apps/api/**, apps/admin/**, e2e/**, infra/cloudflare/** |
e2e-tier2.yml |
Contract + behavior (Playwright-bdd) jobs |
PR touching apps/storefront-catalyst/**, apps/storefront-svelte/** |
consumer-flows-e2e.yml |
Consumer-facing Playwright flows |
Push to main only + nightly cron (06:00 UTC) |
test.yml |
npm test (unit, workspace-wide via Turborepo) THEN npm run test:scenarios (G4 behavioral) |
Push to main (path-scoped) |
admin-playwright.yml |
Admin Playwright suite |
| Weekly (Monday) | dependabot | npm dependency PRs against dev |
The Pattern-1 landmine¶
test.yml — the workflow that runs both npm test and npm run test:scenarios
— triggers only on push: branches: [main] and the nightly cron. It does not
trigger on Pattern-1 direct-to-dev pushes (see dev-push-patterns
in CLAUDE.md — local-integration → direct dev push is a preferred pattern for
single-session coherent work). This means a Pattern-1 push to dev touching
apps/api/ skips the unit-test gate entirely until the next operator-driven
dev → main fast-forward or the nightly run.
Mitigation: run apps/api vitest locally before syncing any apps/api work via
Pattern-1. Don't rely on CI to catch it same-day.
Known test landmines¶
These are real, previously-hit failure modes — not hypothetical risks.
Inline-schema unit tests false-green. A unit test that hand-writes a CREATE
TABLE fixture without the production CHECK constraints tests a fiction: it can
pass while the real migration would reject the same write. G4 scenarios calling
applySchema() (which applies the actual migrations/schema/*.sql files) catch
real schema/code mismatches that inline fixtures cannot. A scenario going RED with
a D1_ERROR / constraint violation is an honest RED — it found something real,
not a broken test harness.
vitest-scenarios worktree quirk. npm run test:scenarios reliably fails
inside .worktrees/* with Cannot find module '@cloudflare/vitest-pool-workers'.
Root cause (documented in apps/api/test/scenarios/README.md): a fresh git
worktree's workspace root has no installed node_modules/; the pool-worker
binaries live at the workspace root, not inside apps/api/node_modules. Fix: run
npm run test:scenarios from the main checkout root, or npm install inside the
worktree first. The cacheDir setting in vitest.scenarios.config.mts pins the
Vite cache to a worktree-local path once deps are installed, preventing stale
cache bleed across worktrees.
G4 route-orphan gap. G4 scenarios historically called route handlers
directly — bypassing apps/api/src/worker.ts's real router entirely. A green
G4 scenario under that shape proves the handler logic works; it does not prove
the route is reachable over HTTP. Confirmed unrouted at time of the 2026-06-23
audit (ADR-0065): gift
purchase+claim (US-6.1), prepaid (US-6.2), eligibility-audit (US-26.10). The fix
direction is the worker.fetch() + minted-JWT + queue-sink pattern (the
settings-store-full-stack.scenario.ts exemplar) plus the standing
route-orphan-lint.yml gate on route/worker.ts changes — not a presence check,
an actual dispatch through the real router.
What this strategy deliberately excludes¶
- No external-database test harness of any kind. The real runtime is Cloudflare
Workers + D1.
@cloudflare/vitest-pool-workersruns tests in an actualworkerdinstance with a real in-memory D1 binding — this is strictly better fidelity than mocking D1 or standing up a substitute relational database, and it's what bothvitest.config.tsandvitest.scenarios.config.mtsactually use. - No Lighthouse-CI. Not present in
.github/workflows/; do not assume it exists when planning performance work. - No claim that G3 (state-derive
COMPLIANT) means done. It means present. Only G4 (scenario passed) and G5 (live) mean verified.