This is a build-verification ledger, not a savings claim.Every figure below traces to a named, re-runnable source — listed inMethodology. We do not compute a cost-per-output ratio and we do not estimate what a hired team would have cost — see what this report deliberately excludes for why.
Verified throughput
Designed via an adversarial plan-then-judge pass — four independent report designs, each scored against three skeptical lenses (does every stat trace to a real source; does it serve a resourcing decision; can it be computed today without inventing a number) — specifically to avoid shipping a vanity scorecard.
Behavioral proof — 202 verified, 0 failing
An automated, repeatable test actually passes for each one — the project's own bar for "proven," not just "code exists" (internally: G4). Shown as the full four-way partition, never a bare percentage, so the unknown/no-gate slice can't hide inside a rounded number.
Pass
202
Fail
0
Unknown
1
No gate
of 218 tracked acceptance criteria
See it yourself
A clickable artifact is stronger evidence than any statistic.
Judgment throughput
Ratified ADRs run through synthesis
79/81 (98%)
Each carries a traced Hive proposal (the project's internal decision-tracking system) → synthesis (a batch ratification review) → ratification path, not unilateral authorship — distinguishes "wrote a doc" from "ran a decision through the project's own review process."
Dossier-traceable execution units
58
Each dossier carries acceptance commands a third party can re-run — outcome-gated, not a raw chat-turn or commit count.
Commercial viability
This page intentionally computes no ROI or savings number — every available proxy (a list-price dollar figure, a hypothetical hired-team comparison) is the exact unsourced-claim shape this report exists to refuse. See the VP Product brieffor the qualitative commercial thesis (the Mustang migration-unlock framing).
What this report deliberately excludes
Raw token counts (billions of tokens, mostly cache-read)
Cache-read is mechanical context reload, not output — no defensible translation to shipped work.
API list-price cost equivalent and human-presence hours as headline stats
Source is local session telemetry, not a CI-derived repo artifact in the same sense as everything above — kept as a separately-labeled, point-in-time addendum below instead of a headline.
Cost-per-G4-verified-AC (e.g. "$X per AC")
Divides a partial-period, machine-local cost numerator by a cumulative, all-time, all-commit G4 denominator — different windows and scopes. Dividing them would fabricate precision a reader could not reproduce.
Counterfactual team-cost / "hours or dollars saved vs. hiring a team"
Not computable from any tool in this repo. Day rate, per-story effort, and team size are external, unsourced judgment calls — exactly the unsourced-savings-claim shape this report exists to refuse.
TAGGED test coverage (a test claims coverage) cited alone
TAGGED means "a test claims coverage," not "a test passes." Citing it without G4 is the presence-dressed-as-done trap the project’s own gate ladder exists to prevent.
G3 presence-check counts ("COMPLIANT" capabilities)
G3-COMPLIANT means present, not done, per this project’s own DoD methodology (ADR-0067).
Raw commit count or daily commit peaks as a "look how fast" claim
~31% of commits on this branch are CI/dependency-automation (github-actions[bot], dependabot[bot]), not human-or-agent feature work; commit-date peaks reflect when a long-lived integration branch was fast-forwarded, not a velocity signal.
Raw human-authored prompt count
Includes system-reminders, continuation messages, and interrupted-request artifacts — no noise-filtered substitute was computable from locally available tooling this session.
"Solo operator" framing
Several other human contributors appear in the commit history — "operator-led" is the accurate framing, not "100% solo."
For context, not as a quality signal: 2,948 commits (240 merges) on this branch, 2026-04-21 → 2026-07-01. ~31% (911) are CI/dependency-automation commits (github-actions[bot], dependabot[bot]), not human-or-agent feature work — activity, not output.
Investment snapshot addendum — not live, never ratio'd against the above
A point-in-time snapshot of local Claude Code session telemetry across 2 sources(2026-05-01 → 2026-06-30). This is API list-price equivalent — notional value consumed, not cash spent (a subscription plan does not bill per token). Local session retention is roughly 30 days per machine, so this span understates the project's full life unless a cross-machine import extends it. Deliberately never combined into a ratio with the verified-throughput numbers above — different time windows and machine scopes.
Notional cost
$5,797
Human presence
164h
Agent effort (parallel sum)
197h
Agent calendar (union)
129h
by model: claude-opus-4-8 $5,070 · claude-sonnet-4-6 $510 · claude-fable-5 $163 · claude-sonnet-5 $23 · claude-opus-4-7 $17 · claude-haiku-4-5-20251001 $13
Methodology — re-derive any number above
- G4 partition:
npx tsx tools/coverage-matrix-derive/index.ts→docs/audits/derived/_coverage-matrix.jsonsummary.g4 - ADR / dossier / commit counts:
npx tsx tools/build-ledger-derive/index.ts(reads the coverage-matrix output above; never recomputes G4) →docs/audits/derived/_build-ledger.json - Investment snapshot:
npx tsx tools/session-cost-derive/index.ts [import-root...]→docs/audits/derived/_session-cost-snapshot.json(manually run + manually committed — local session telemetry doesn't exist in CI)
G4/ADR/dossier/commit figures generated 2026-07-01 @ b083f095, regenerated on every push to main. Investment snapshot generated 2026-06-30 — a manual, point-in-time figure; re-run the tool above to refresh it.
For the commercial thesis built on these numbers, see the Product VP brief →