Raw architecture document — no interpretation. For a guided review entry point, seeCode Review. For the full entity-relationship diagram (85 tables, mechanically generated from the live schema), seethe data-model ERD — renders natively on GitHub.
One term used below without inline definition: "Hive" is this project's internal proposal/decision/task coordination substrate (process tooling — issue tracking + an automation layer), not a runtime dependency of the shipped product. Where you see "Hive #N" or "Hive proposal", that's a reference to that internal tracker, not something the running application calls. The rendered document below also carries other internal shorthand without inline definition — ADR numbers and bare synthesis IDs (decision-record references), and BC-payments acronyms like MIT (Merchant-Initiated Transaction) and NTI (Network Transaction Identifier) — which this wrapper page doesn't gloss term-by-term since that content lives in the canonical doc itself, not this page.
Solution Architecture: BC-Native Subscription Platform
Companion to: PRD.md, BRD.md, APP-ADMIN-SPEC.md
Authoritative phasing: ADR-0030 (Cloudflare → GCP) + ADR-0021 (tier cohorts)
Status: Draft v0.2 — current state in §0; §5–§7, §10 are historical decision-time content
Read this first. §1–§4 + §0 below describe how bc-subscriptions is built today. §5–§7 and §10 are preserved as the decision-time comparison that produced ADR-0030; they read as forward-looking but the decision has shipped. §8 (cost model), §9 (operational), and §11 (open architectural decisions) remain load-bearing. §12–§14 (the Approach-A1 / Approach-B renewal sequences + the capability→service comparison table) are decision-time comparison content like §5–§7 — they depict rejected stacks and are kept only for the ADR-0030 decision trail. The authoritative as-built sequence, container, context, flow, and integration diagrams live in
docs/architecture/(see §0), render natively on GitHub, and are the ones to cite.
Table of Contents
- Current Architecture (as built) — start here
- Solution Context
- Shared Architectural Decisions
- Systems of Record
- Capability Map
- Approach A — Vercel-Native (historical, decision-time)
- Approach B — GCP-Only (historical, decision-time)
- Side-by-Side Comparison (historical, decision-time)
- Rough Cost Model
- Operational Considerations
- Recommendation (historical — superseded by ADR-0030)
- Open Architectural Decisions
- Appendix A — Renewal Sequence (Approach A1, Vercel) (historical, decision-time — see
docs/architecture/sequence-diagrams.mdfor the as-built) - Appendix B — Renewal Sequence (Approach B, GCP) (historical, decision-time)
- Summary Table — Capability to Service (historical, decision-time)
0. Current Architecture (as built)
Phase 1 (in production today): Cloudflare. All five deployables run on the Cloudflare platform.
| Component | Service |
|---|---|
Admin UI (apps/admin/, React + BigDesign, BC App Extension iframe) |
Cloudflare Pages |
Subscriber portal (apps/storefront-svelte/) |
Cloudflare Pages |
Catalyst storefront integration (apps/storefront-catalyst/) |
Cloudflare Pages |
Docs site (apps/docs-site/, MkDocs Material) |
Cloudflare Pages |
API + webhooks (apps/api/, Hono) |
Cloudflare Workers |
| Hive substrate MCP server | Cloudflare Workers (subs-hive-mcp.bigcommerce-testing-7727.workers.dev) |
| Archaeology / chat-island Worker | Cloudflare Workers (subs-archaeology.bigcommerce-testing-7727.workers.dev) |
| Primary OLTP | Cloudflare D1 |
| Cache + locks + rate limits | Cloudflare KV |
| Async jobs (charges, webhooks, reconciliation) | Cloudflare Queues (EVENTS_QUEUE → subs-events) |
| Cron / scheduled triggers | Cloudflare Cron Triggers |
| Secrets | Cloudflare Workers secrets (BC API tokens, encryption keys) + per-store D1 columns (encrypted at rest via HKDF-derived keys per Hive f80e037c) |
| Observability (Phase 1) | Workers Analytics + Sentry; sink-abstracted per tools/observability/ for Phase 2 cutover |
| CI/CD | GitHub Actions → wrangler deploy |
Phase 2 (planned): GCP. Migration shape ratified in ADR-0030 and aligned with the reference dms-self-serve GCP pattern.
| Component | Phase 2 service |
|---|---|
| All web surfaces | Cloud Run (containerized) |
| Primary OLTP | Cloud SQL (MySQL) |
| Cache + sessions | Memorystore (Redis) |
| Async jobs + events | Pub/Sub + Cloud Tasks |
| Object storage | Cloud Storage |
| Cron / orchestration | Cloud Scheduler + Cloud Workflows |
| Secrets | Secret Manager |
| Identity for CI | Workload Identity Federation (per-env pool, repo+environment attribute conditions) |
| Observability | Cloud Logging + Cloud Monitoring + Sentry; same sink abstraction |
| IaC | Terraform (hand-rolled google_*, not terraform-gcp-platform) per infra/terraform/ |
What is explicitly NOT in the architecture.
- Vercel is not a deployment target. It appears in §5–§7 below as decision-time comparison content and in
package.jsononly as a peer-dependency footprint of certainnextpackages. No production surface runs on Vercel. - Supabase is not in use. Earlier Hive substrate explorations (
ai-hive,hackathon-hive) ran on Supabase + Vercel; the production substrate for bc-subscriptions is Cloudflare Workers + D1 with GitHub-derived proposal lifecycle (ADR-0055). - Direct Stripe vault / Braintree SDK is fallback, not primary. The canonical charge rail is BigCommerce's stored-instruments vault per ADR-0037. Direct gateway SDKs are capability fallbacks for merchants whose gateways don't expose the BC stored-instruments path.
- No object-store (R2) binding in the product runtime. No product-runtime Worker (
apps/api,apps/storefront-svelte,apps/email-consumer) declares anr2_bucketsbinding — verified againstinfra/cloudflare/wrangler.toml+ the WorkerEnvinterface bytools/arch-derive/. The only R2 bucket in the repo belongs to the archaeology substrate tool (process tooling, not a product deployable). Audit-log/export durability is served by the append-onlyeventstable in D1 (§2.5), not object storage.
Marketplace shape. Distributed as a BigCommerce Marketplace app per ADR-0029 — destination, not waypoint, with native-readiness preserved as a design constraint. See memory: marketplace-first-native-ready for the fork-revisit conditions.
Substrate (process side, not runtime). Hive proposal/decision/task substrate per METHODOLOGY.md §4b and WAYS-OF-WORKING.md. The substrate runtime is the Cloudflare Worker named above; the lifecycle is derived from GitHub issue state per ADR-0055.
1. Solution Context
From the PRD and BRD, the platform must:
- Install as a BC Marketplace app with OAuth + signed-JWT load flow
- Render merchant UI inside the BC admin iframe with partitioned cookies
- Host a storefront widget that works on Stencil and Catalyst
- Host a subscriber portal (both hosted and headless SDK)
- Run a durable billing engine with scheduling, retries, dunning, idempotency
- Integrate two processor adapters at MVP (BC Payments via Braintree, Stripe)
- Create native BC orders on every successful renewal
- Deliver outbound webhooks to merchant integrations
- Maintain PCI SAQ-A boundary (no card data in our systems)
- Support GDPR DSAR + erasure end-to-end
- Scale to 100M active subscriptions across ~10K stores
The architecture must favor platforms that minimize time-to-ship for this workload, not generic flexibility.
2. Shared Architectural Decisions
These decisions apply to both approaches. They are platform-agnostic and should not change based on hosting choice.
2.1 Component-Level Isolation
Five distinct deployables, each with independent scaling and failure domains:
- Admin UI — BC-iframe-embedded merchant app (React + BigDesign v2,
apps/admin/, Cloudflare Pages) - Subscriber Portal — hosted customer-facing portal (SvelteKit + Tailwind,
apps/storefront-svelte/, Cloudflare Pages) - Storefront Widget — vanilla-JS IIFE embeddable in any theme (~15KB gzipped)
- API Layer — Hono on Cloudflare Workers (
apps/api/) for CRUD + webhooks + public REST API - Billing Engine — the Worker
scheduled()handler + Cloudflare Cron Triggers (apps/api/src/scheduler.ts::runScheduledTick) driving scheduling, executor, dunning, reconciliation
The admin UI and subscriber portal share the same API backend but deploy separately so a merchant-admin release doesn't couple to subscriber portal timing.
2.2 Framework Choices
- Admin UI: React + BigDesign v2 on Cloudflare Pages — React is forced by
@bigcommerce/big-designbeing React-only. Matchesaisles-adminandask-bcprecedent in the workspace. - Subscriber Portal: SvelteKit + Tailwind on Cloudflare Pages (
apps/storefront-svelte/) — the as-built choice. (The decision-time draft had proposed matching the admin's stack for team consistency; the shipped portal is SvelteKit.) - Storefront Widget: vanilla TypeScript — no framework dependency so it drops into any Stencil theme or Catalyst build. Compiled to an IIFE bundle.
- Billing Engine: TypeScript on the Cloudflare Worker
scheduled()handler + Cron Triggers (apps/api/src/scheduler.ts). Phase 2 maps this to Cloud Scheduler + Cloud Workflows per ADR-0030; the decision-time comparison of workflow runtimes is preserved in §12–§14 (historical).
For the always-current, drift-gated view of the deployables and their bindings, see the generated docs/architecture/c4-container.md.
2.3 Tenancy
- Multi-tenant, row-level isolated. Every D1 table carries
store_hash; tenant isolation is enforced by mandatorystore_hashscoping on every query (Cloudflare D1 is SQLite and has no row-level-security engine), not by database-level RLS policies. Phase 2's managed SQL tier can add RLS per ADR-0030. - Per-store encryption: access tokens encrypted with AES-256-GCM using a per-store nonce derived from
store_hash+ platform master key. - All BC-issued secret material is per-merchant identity, not per-app: the OAuth
access_token(granted at install), the Storefront API token (minted per-store viaPOST /stores/{store_hash}/v3/storefront/api-token— see ADR-0059), and the webhook signing secret (theclient_secretof the OAuth client that registered the webhook — seeBC_WEBHOOK_SIGNING_SECRETenv decoupling fromBC_CLIENT_SECRET). The app-wide env-var pattern is sandbox-only legacy and breaks by construction for any multi-store deployment. - Noisy-neighbor protection: per-store rate limits on BC API calls and processor charges.
2.4 Data Boundaries
- PCI scope: zero. No PAN, CVV, or full card data ever enters our systems. At standard Stencil/Catalyst checkout, the shopper enters card data into BC's hosted checkout UI; BigPay vaults the card with the provider and returns a
bigpay_token— our app receives only the opaque vault token viaGET /v3/orders/{id}/transactionsin the webhook handler (ADR-0037). For our portal add-payment-method flow (BRD US-19.1), shoppers add cards via an Instrument Access Token (IAT) issued by our backend — also zero PAN exposure. No gateway-specific tokenization SDK (Stripe Elements, Braintree Hosted Fields) is required by our subscription engine for the standard recurring-charge rail; Stripe-direct edge case usesapps/api/src/adapters/stripe.ts. - PII minimization. We store subscriber email, name, and address snapshots on subscriptions (not copied continuously). We never store BC customer auth credentials.
- GDPR region affinity. EU merchants' data lives in EU-region databases (both approaches support this via region-pinning).
2.5 Observability
- Structured logs with
{store_hash, subscription_id, charge_id, correlation_id}on every event - Append-only
eventstable per store (our application audit log) - Platform metrics (latency, error rate, queue depth) exported to the platform's metrics stack
2.6 Idempotency
- Every processor charge uses
charge.idas the idempotency key - Every state-changing API accepts an optional
Idempotency-Keyheader - Inbound webhook delivery uses HMAC signatures; signing identity is per-registering-OAuth-client, not per-app (BC signs with the
client_secretof whichever client registered the webhook). The verification secret is stored inBC_WEBHOOK_SIGNING_SECRET, decoupled from the app-wideBC_CLIENT_SECRET. Receiver is responsible for idempotent handling.
2.7 Shared Stack Elements (Platform-Agnostic)
- Language: TypeScript 5 everywhere
- Data: Cloudflare D1 (SQLite) via the Workers
DBbinding; Drizzle for query building + migrations - Validation: Zod v4
- Env: the typed Worker
Envinterface (apps/api/src/types.ts) — bindings + secrets, no external env framework - Auth libraries:
josefor JWT - Email: Resend (SendGrid as swap candidate)
- SMS: Twilio (P2)
- Testing: Vitest — including
@cloudflare/vitest-pool-workersrunning the G4 behavioral scenarios over a real D1 instance — plus Playwright for e2e. Seedocs/methodology/test-strategy.mdfor the full G1–G5 verification ladder
3. Systems of Record
<!-- traceability:start:ARCH:3 --><!-- traceability:end:ARCH:3 -->Prototype: Membership · Event Timeline · Drift Sweep · Replay Tool · Structured Logs
Clear SoR boundaries prevent reconciliation nightmares. This table is authoritative.
| Domain | System of Record | Why | Our Copy? |
|---|---|---|---|
| Product catalog, variants | BigCommerce | BC owns the catalog merchant edits | Read-through; never cached beyond 5 min |
| Pricing (base) | BigCommerce | BC is the merchant's price truth | Read-through |
| Price Lists | BigCommerce | Subscription pricing often references these | Read-through |
| Customer profile | BigCommerce | BC owns customer records + auth | Read; mirror only what we need (name, email for notifications) |
| Customer group membership | BigCommerce | Used for plan scoping + Price List resolution | Read at runtime |
| Orders (including subscription renewals) | BigCommerce | Every renewal produces a BC order | We store a bc_order_id pointer only |
| Order status + fulfillment state | BigCommerce | BC's OMS/WMS integrations operate here | Read via webhook |
| Payment method tokens (vault) | Processor (BC Payments → Braintree, or Stripe) | PCI scope; processor owns tokens | We store payment_method_ref only |
| Payment settlement + payout | Processor + BC Money Dashboard (for BC Payments) | Money movement | Read for reconciliation |
| Subscription agreement | Us | BC has no subscription model | We are the SoR |
| Charge history (attempts, outcomes) | Us | Derived from processor events + our scheduler | We are the SoR |
| Plan configuration | Us | No BC equivalent | We are the SoR (pointer stored as BC metafield for discoverability) |
| Eligibility rules | Us | No BC equivalent | We are the SoR |
| Promotions (subscription-specific) | Us | BC's native promos don't handle subscription semantics | We are the SoR |
| Dunning policies | Us | No BC equivalent | We are the SoR |
| Event / audit log | Us | Platform-specific audit requirements | We are the SoR |
| Subscriber portal auth session | Us (magic link) | BC's customer auth not reusable in iframe-less context | We are the SoR |
| App install credentials | Us | OAuth result specific to our app | We are the SoR; encrypted at rest |
| Entitlement state (grant lifecycle) | Us | Engine owns grant/revoke state machine tied to subscription lifecycle (PRD-COMPANION D1) | We are the SoR; a pluggable provider adapter applies the external effect — v1: BC customer-group assignment; post-v1: SSO, feature-flag, license-server. The adapter's target system is its own SoR for whatever it controls, but our Entitlement row is the truth of whether access is granted. |
| AllotmentGrant (admin-granted recurring quota) | Us | Independent of paid-subscription Entitlement; admin-issued credit/quota with refresh cadence (PRD-COMPANION D18) | We are the SoR; debits link optionally to BC orders or our charges. Audit on granted_by + reason. |
Multi-actor membership on subscription (actors[]) |
Us | BC owns customer identity; role assignment (owner / payer / beneficiary / manager / org_admin) is our concept (PRD-COMPANION D19, ADR-0023) |
We are the SoR; bc_customer_id is FK to BC's customer SoR. payer.role is unique-per-subscription via partial index. |
| DeliveryInstance (per-shipment row) | Us | Lazily populated; BC has no native concept; required when delivery cadence ≠ billing cadence (PRD-COMPANION D2, ADR-0024) | We are the SoR; n:1 charge → delivery fan-out permitted (Cintas weekly delivery / monthly billing). bc_order_id set when the per-instance order materializes. |
| CustomFieldDefinition (merchant-defined typed schema) | Us | Schema for extensible per-subscription / per-plan / per-grant data (PRD-COMPANION D20) | We are the SoR for the schema; field data lives on parent entity's metadata.custom_fields JSONB key. |
| Per-org processor connection routing | v2 — Us | RESERVED for marketplace MoR v2 (ADR-0022); single-tenant MoR in Phase 1 | Phase 1: store-level processor_connection_id is our SoR. v2: processor_connections keyed on (store_hash, payer_org_id) is our SoR; Actor.processor_connection_ref is the resolver column. |
| Funds Merchant-of-Record | Phase 1: platform tenant. v2: per-buyer-org configurable. | Determined by which processor connection settles the charge (ADR-0022). | Pointer; we do not hold funds. |
The SoR discipline dictates the architecture: we are a system of systems, not a commerce platform. We own the subscription, charge, and rule data models; everything else we read-through, webhook-consume, or pointer-reference.
4. Capability Map
Regardless of hosting approach, these capabilities must exist. Each will map to concrete services in Approach A and B.
graph TD
subgraph "Presentation"
A1[Admin UI<br/>React + BigDesign<br/>BC iframe]
A2[Subscriber Portal<br/>SvelteKit]
A3[Storefront Widget<br/>vanilla TS]
A4[Headless SDK<br/>TypeScript]
end
subgraph "API & Auth"
B1[OAuth + BC JWT<br/>session mint]
B2[Magic-link auth<br/>subscriber]
B3[Public REST API<br/>v1]
B4[GraphQL API<br/>Phase 2]
B5[Inbound webhooks<br/>BC + processor]
B6[Outbound webhooks<br/>signed + retried]
end
subgraph "Billing Engine"
C1[Scheduler cron<br/>every 15 min]
C2[Charge Executor<br/>durable workflow]
C3[Dunning state<br/>machine]
C4[Reconciliation<br/>daily sweep]
C5[Replay tool]
end
subgraph "Processor Adapters"
D1[BC Payments<br/>Braintree under]
D2[Stripe Connect]
D3[Future adapters<br/>Braintree, Adyen, ...]
end
subgraph "Integrations"
E1[BigCommerce APIs<br/>V2/V3/GraphQL/Storefront]
E2[Resend - email]
E3[Twilio - SMS, P2]
E4[Klaviyo / Gorgias / ... <br/>via webhooks]
end
subgraph "Data"
F1[(D1 SQLite<br/>transactional)]
F2[(KV + ratelimit<br/>cache, locks,<br/>rate limits)]
F3[CSV exports,<br/>DSAR bundles<br/>P2+]
F4[(Analytics warehouse<br/>P2+)]
end
A1 --> B1
A2 --> B2
A3 --> B3
A4 --> B3
B1 --> F1
B3 --> F1
B5 --> F1
B5 --> C1
C1 --> C2
C2 --> D1
C2 --> D2
C2 --> E1
C2 --> F1
C2 --> F2
C3 --> C2
C4 --> F1
C4 --> E1
C4 --> D1
C4 --> D2
B6 --> E4
B3 --> E2
C2 --> E2
F1 --> F4
5. Approach A — Vercel-Native
Historical, decision-time content. §5–§7 below preserve the original Vercel-vs-GCP comparison that produced ADR-0030. The ratified architecture is Cloudflare (Phase 1) → GCP (Phase 2). See §0 above for current state. Preserved here because the comparison framework + cost model + service mapping retain audit value when revisiting the migration shape.
5.1 Summary
Built on Vercel's platform with Neon for Postgres, Upstash for Redis, and Resend for email. Two sub-variants differ only in where the billing engine runs.
- A1 (Pure Vercel): billing engine on Vercel Workflow + Vercel Queues + Vercel Cron. Single platform, single deploy pipeline, single observability surface.
- A2 (Vercel + Cloudflare for billing engine): admin/portal/API on Vercel; billing engine on Cloudflare Workers + Durable Objects (following the
ask-bcprecedent). Trades operational complexity for per-store locality and sub-10ms hot paths.
Supabase is offered as an optional substitution for Neon if merchants want Realtime (live exception queue) and Storage (DSAR bundles) in the same SKU. Neon + Vercel Blob covers the same needs with simpler vendor count.
5.2 Component-to-Service Mapping
| Component | Service | Notes |
|---|---|---|
| Admin UI | Next.js 16 App Router on Vercel | Fluid Compute runtime; BigDesign v2; partitioned cookies for iframe |
| Subscriber Portal | Next.js 16 App Router on Vercel | Separate Vercel project; optional custom domain per merchant (wildcard cert) |
| Storefront Widget | Static IIFE served via Vercel's edge network | Bundled by Vite; deployed as a Vercel static project with long cache + versioned paths |
| API Layer | Next.js route handlers, Node.js runtime on Fluid Compute | Shared between admin + portal; same codebase |
| Inbound webhooks | Next.js route handlers, Node runtime | BC webhooks + processor webhooks |
| Scheduler cron | Vercel Cron (every 15 min) | Picks up due charges, enqueues to Vercel Queue |
| Charge executor | A1: Vercel Workflow. A2: Cloudflare Durable Object per store. | Durable state, checkpointed steps |
| Dunning state machine | Same runtime as executor | Re-enters executor with new retry_attempt |
| Reconciliation sweep | Vercel Cron (daily) + Vercel Queue | Long-running; fan out to worker invocations |
| Outbound webhook delivery | Vercel Queue consumer | Exponential backoff retry, dead-letter after 24h |
| Postgres | Neon Serverless (default) or Supabase Postgres (alt) | Branching for preview environments; per-region primary |
| Redis | Upstash Redis | Multi-region with per-store read replicas |
| Object storage | Vercel Blob (default) or Supabase Storage (alt) | DSAR bundles, CSV exports, migration files |
| Secrets | Vercel Environment Variables + @t3-oss/env-nextjs validation |
OIDC-based integrations where possible |
| Resend | Transactional only; MVP | |
| SMS | Twilio (P2) | With opt-in consent tracking |
| Analytics warehouse | Neon Postgres read replica + ClickHouse (P3) | OR BigQuery via Vercel Integrations |
| Metrics / logs | Vercel's Observability (logs, traces, speed insights) + OpenTelemetry export to Datadog (P2) | |
| Error tracking | Sentry | Both Vercel-hosted and Cloudflare Workers (A2) supported |
| CDN / edge | Built into Vercel | Global anycast; widget served edge-cached |
| Rate limiting | @upstash/ratelimit in middleware |
Per-store + per-IP |
5.3 Topology (A1 — Pure Vercel)
graph LR
subgraph Browser
MA[Merchant Admin<br/>BC iframe]
SP[Subscriber Portal]
SF[Storefront PDP<br/>with widget]
end
subgraph "Vercel"
direction TB
UIs[Next.js<br/>Admin + Portal]
API[Next.js API<br/>routes + webhooks]
WIDGET[Widget IIFE<br/>static]
CRON[Vercel Cron]
Q[Vercel Queue]
WF[Vercel Workflow<br/>charge executor]
end
subgraph "Data"
NEON[(Neon<br/>Postgres)]
UPS[(Upstash<br/>Redis)]
BLOB[Vercel Blob]
end
subgraph "External"
BC[BigCommerce]
BCP[BC Payments<br/>Braintree]
STRIPE[Stripe]
RESEND[Resend]
KLAVIYO[Klaviyo etc]
end
MA <--> UIs
SP <--> UIs
SF <--> WIDGET
SF <--> API
UIs --> API
API <--> NEON
API <--> UPS
API --> BLOB
API <--> BC
BC -.webhooks.-> API
BCP -.webhooks.-> API
STRIPE -.webhooks.-> API
CRON --> Q
Q --> WF
WF --> BCP
WF --> STRIPE
WF --> BC
WF <--> NEON
WF <--> UPS
WF --> RESEND
WF --> Q
Q -.outbound webhooks.-> KLAVIYO
5.4 Topology (A2 — Vercel + Cloudflare)
A2 is identical to A1 except for the billing engine. The Cloudflare Worker hosts a Durable Object per store, giving us:
- Per-store locality — all that store's charge state lives in one DO with local SQLite storage
- Durable state — DO persists across restarts; no need for external workflow orchestrator
- Low latency — sub-10ms state access for hot paths
- Independent scaling — hot stores don't starve cold stores
graph LR
subgraph Vercel
CRON[Vercel Cron] --> DISPATCH[Dispatcher<br/>Vercel Function]
end
subgraph Cloudflare
DO1[Durable Object<br/>store A<br/>+ SQLite]
DO2[Durable Object<br/>store B<br/>+ SQLite]
DO3[Durable Object<br/>store C<br/>+ SQLite]
end
DISPATCH -->|enqueue charge| DO1
DISPATCH -->|enqueue charge| DO2
DISPATCH -->|enqueue charge| DO3
DO1 --> BC[BigCommerce]
DO1 --> PROC[Processor Adapter]
DO2 --> BC
DO2 --> PROC
DO3 --> BC
DO3 --> PROC
When A2 wins over A1:
-
10K charges per store per day (per-store concurrency matters)
- Merchants who tolerate a second platform (no additional merchant-facing friction)
- Cost at scale (Durable Objects are cheaper per-invocation than Workflow for millions of charges)
When A1 wins over A2:
- MVP and early Phase 2 (scale not yet a concern)
- Single-platform operations is simpler for small teams
- Vercel Workflow's durable execution is enough at < 10K charges/day/store
Recommendation within Approach A: Start at A1 for MVP. Measure. Migrate the billing engine to A2 (Cloudflare) only if P95 latency or cost math requires it. The dispatcher-to-DO pattern is additive, not replacement — A1 can evolve into A2 without rewriting the API layer.
5.5 Approach A Strengths
- Workspace precedent.
aisles-admin(Next.js + BigDesign + Neon + Upstash) andask-bc(Next.js + Cloudflare Workers + DO) already establish both patterns. Team knows them. - DevEx. Deploy is
git push; previews are automatic; branch databases via Neon; environment management viavercel env. - Framework leverage. Next.js middleware, Server Components,
vercel.tsconfig, Fluid Compute concurrency reuse, Vercel Queues + Workflow all fit this workload natively. - iframe hosting. Vercel's response-header and partitioned-cookie defaults match BC admin embed requirements (verified against
aisles-adminandask-bcin production). - Time-to-ship. The
aisles-adminskeleton +ask-bcagent-runtime patterns cut weeks off setup. - Edge CDN for widget. Vercel's global anycast ensures < 50ms widget delivery worldwide without extra configuration.
- Integration ecosystem. Vercel Marketplace for Neon, Upstash, Resend auto-provisions keys via OIDC.
5.6 Approach A Weaknesses / Risks
- Vercel Queues is in public beta (per current platform state). Committing MVP to a beta product carries some risk; mitigate by designing abstractions so a swap to a mature queue (e.g., upstash-queue, RabbitMQ on a VM) is possible.
- Vercel Workflow durability guarantees must be verified for our billing use case before MVP commit (flagged in BRD open questions #4).
- Multi-cloud hybrid in A2 adds operational complexity: two deploy pipelines, two secrets stores, two observability surfaces. Acceptable for a billing engine, but worth naming.
- Single-vendor concentration risk. Vercel outages affect admin, portal, and API simultaneously. SLA monitoring + statuspage subscribership is essential.
6. Approach B — GCP-Only
6.1 Summary
All components hosted on Google Cloud Platform. Container-based services on Cloud Run, Cloud SQL for Postgres, Cloud Tasks + Cloud Workflows for orchestration, Firestore only considered for ephemeral data, Pub/Sub for internal eventing.
6.2 Component-to-Service Mapping
| Component | Service | Notes |
|---|---|---|
| Admin UI | Next.js 16 on Cloud Run (Gen 2, min instances > 0 for admin iframe to avoid cold starts) | Containerized; served behind Cloud Load Balancer |
| Subscriber Portal | Next.js 16 on Cloud Run | Separate service; separate domain |
| Storefront Widget | Cloud Storage static bucket + Cloud CDN | Versioned paths; immutable cache |
| API Layer | Next.js route handlers on Cloud Run | Same codebase as UIs; separate service for scaling independence |
| Inbound webhooks | Cloud Run service dedicated to webhook ingest | Fronted by Cloud Armor for DDoS / rate limiting |
| Scheduler cron | Cloud Scheduler (cron) → Pub/Sub topic | Every 15 min |
| Charge executor | Cloud Workflows (durable) OR Cloud Run Jobs | Workflow for step-by-step durability with checkpointing |
| Dunning state machine | Cloud Workflows | Re-enters executor with new state |
| Reconciliation sweep | Cloud Scheduler (daily) → Cloud Run Job | |
| Outbound webhook delivery | Cloud Tasks with retries + Cloud Run consumer | Native exponential backoff + DLQ |
| Postgres | Cloud SQL for Postgres with HA config, read replicas per region | |
| Redis | Memorystore for Redis (Standard HA tier) | Regional |
| Object storage | Cloud Storage | DSAR bundles, CSV exports, static assets |
| Secrets | Secret Manager + workload identity federation | |
| Resend (external) or SendGrid via Marketplace | SendGrid has better GCP integration | |
| SMS | Twilio (external) | Same as Approach A |
| Analytics warehouse | BigQuery + Dataflow ingestion from Cloud SQL via Datastream | Strong analytics story is a GCP pro |
| Metrics / logs | Cloud Logging + Cloud Monitoring | Integrated dashboards, SLO tooling |
| Error tracking | Sentry (external) or Error Reporting (GCP-native) | |
| CDN / edge | Cloud CDN via Cloud Load Balancer | Global; integrates with Cloud Armor |
| Rate limiting | Cloud Armor at edge + per-service Redis-backed limiting | |
| Identity (subscriber) | Identity Platform (magic-link flows) OR custom magic-link over Cloud SQL | Custom is simpler for our scope |
| IaC | Terraform + Google Cloud provider | Required; no equivalent to Vercel's deploy simplicity |
| CI/CD | Cloud Build or GitHub Actions → Artifact Registry → Cloud Run deploy | More configuration than Vercel |
6.3 Topology
graph LR
subgraph "Browser"
MA[Merchant Admin<br/>BC iframe]
SP[Subscriber Portal]
SF[Storefront PDP]
end
subgraph "GCP Edge"
LB[Cloud Load Balancer<br/>+ Cloud Armor]
CDN[Cloud CDN]
end
subgraph "GCP Services"
CR_UI[Cloud Run<br/>Admin + Portal]
CR_API[Cloud Run<br/>API]
CR_WH[Cloud Run<br/>Webhook Ingest]
SCHED[Cloud Scheduler]
PS[Pub/Sub]
WF[Cloud Workflows<br/>charge executor]
CT[Cloud Tasks]
CR_REC[Cloud Run Job<br/>Reconciliation]
end
subgraph "GCP Data"
SQL[(Cloud SQL<br/>Postgres HA)]
MEMO[(Memorystore<br/>Redis)]
GCS[Cloud Storage]
BQ[(BigQuery)]
end
subgraph "External"
BC[BigCommerce]
PROCS[Processors:<br/>BC Payments / Stripe]
EMAIL[SendGrid / Resend]
INT[Klaviyo / Gorgias]
end
MA --> LB
SP --> LB
SF --> CDN
SF --> LB
LB --> CR_UI
LB --> CR_API
LB --> CR_WH
CR_UI --> CR_API
CR_API --> SQL
CR_API --> MEMO
CR_API --> GCS
CR_API --> BC
CR_API --> CT
BC -.webhooks.-> CR_WH
PROCS -.webhooks.-> CR_WH
CR_WH --> PS
SCHED --> PS
PS --> WF
WF --> PROCS
WF --> BC
WF --> SQL
WF --> MEMO
WF --> EMAIL
WF --> CT
CT -.outbound webhooks.-> INT
SCHED --> CR_REC
CR_REC --> SQL
CR_REC --> BC
CR_REC --> PROCS
SQL -->|Datastream| BQ
6.4 Approach B Strengths
- Single-cloud compliance story. Some BC enterprise merchants (healthcare, regulated finance) require single-vendor cloud footprint. GCP-only is a clean compliance argument.
- BigQuery native. Cohort retention, LTV modeling, subscriber analytics all benefit from BigQuery as an analytics warehouse with Dataflow streaming from Cloud SQL. Phase 2 analytics epics are easier here.
- Cloud Workflows is GA (since 2020) and has documented durable execution semantics — lower adoption risk than Vercel Workflow (currently beta).
- SLO / monitoring ecosystem. Cloud Monitoring has native SLO + error budget objects, which shortens the path to formal SLO tracking for SOC 2 (vs. wiring Vercel logs into an external SLO tool).
- Cloud Armor for edge security is a real advantage for webhook ingest + public API (Vercel has basic but less sophisticated WAF options).
- Cost predictability at scale. Cloud Run Gen 2 + committed-use discounts can undercut Vercel for large workloads once steady-state is known.
- Workload Identity Federation eliminates long-lived secrets for inter-service auth — stronger security posture than managing rotated secrets.
6.5 Approach B Weaknesses / Risks
- Cold starts. Cloud Run Gen 2 is improved but still has perceptible cold-start latency vs. Vercel Fluid Compute (which reuses function instances across concurrent requests). For admin iframe load, this is felt by the merchant; mitigate with
min-instances = 1per service (ongoing cost). - DevEx. No
git pushto deploy; requires Terraform, Cloud Build, Artifact Registry, environment management via Deployment Manager or config files. Preview environments per PR require custom Cloud Build triggers. Team needs GCP ops expertise. - iframe hosting. Partitioned cookie handling,
SameSite=noneenforcement across Cloud Load Balancer, and iframe-safe redirects are all achievable but require explicit header configuration — Vercel ships these as defaults. - No framework-native features. Next.js runs in container mode (fine), but loses some of Vercel's Next.js-specific optimizations (ISR coordination, streaming edge features,
vercel.tsconfig). - More moving parts. Admin + Portal + API + Webhook Ingest = 4 Cloud Run services plus 2 jobs plus Scheduler/Tasks/Workflows. Each has IAM, networking, Monitoring dashboards. Operational surface area is 3× Approach A.
- Slower iteration for a small team. If Phase 1 is 14 weeks (per PRD), Approach A probably ships in 14 weeks; Approach B probably needs 18–20 weeks to cover the GCP setup tax.
- Widget edge delivery. Cloud CDN is fine, but Vercel's global anycast for Next.js is noticeably lower-latency for dynamic edges.
6.6 Firestore Deliberately Not Used
Firestore's document data model does not fit the relational subscription / charge / event domain. Firestore would require extensive denormalization, lose referential integrity guarantees, and complicate the rule engine for eligibility/promotions. Cloud SQL Postgres is the right answer.
Exception: Firestore could be used for ephemeral presence data (like real-time exception queue live-updates to replace WebSocket infrastructure) — but that optimization can wait.
7. Side-by-Side Comparison
| Dimension | Approach A1 (Pure Vercel) | Approach A2 (Vercel + CF) | Approach B (GCP) |
|---|---|---|---|
| Admin UI hosting | Vercel Fluid Compute | Vercel Fluid Compute | Cloud Run Gen 2 |
| Subscriber Portal hosting | Vercel | Vercel | Cloud Run |
| Widget hosting | Vercel Edge | Vercel Edge | Cloud CDN + Storage |
| API hosting | Vercel Fluid Compute | Vercel Fluid Compute | Cloud Run |
| Billing engine runtime | Vercel Workflow | Cloudflare DO | Cloud Workflows |
| Scheduler | Vercel Cron | Vercel Cron | Cloud Scheduler |
| Queue | Vercel Queue (beta) | Vercel Queue + CF Queues | Pub/Sub + Cloud Tasks |
| Postgres | Neon (or Supabase) | Neon | Cloud SQL |
| Redis | Upstash | Upstash + per-DO SQLite | Memorystore |
| Object store | Vercel Blob (or Supabase) | Vercel Blob | Cloud Storage |
| Secrets | Vercel env vars | Vercel + CF secrets | Secret Manager |
| CDN / edge | Vercel native | Vercel + CF native | Cloud CDN |
| IaC | Vercel config (vercel.ts) | Vercel + Wrangler (CF) | Terraform (mandatory) |
| CI/CD | git push + Vercel |
git push + Vercel + Wrangler |
Cloud Build / GHA + Terraform |
| Preview envs | Automatic per PR | Automatic (Vercel); manual (CF) | Custom Cloud Build setup |
| Observability | Vercel native + Sentry | Vercel + CF + Sentry | Cloud Logging/Monitoring + Sentry |
| SLO tooling | Basic (via external) | Basic | Native, mature |
| WAF / DDoS | Vercel basic | CF advanced | Cloud Armor advanced |
| Single-cloud compliance | No | No | Yes |
| Analytics warehouse | Neon / ClickHouse P3 | Neon / ClickHouse P3 | BigQuery native |
| Cold start risk (admin) | Very low (Fluid Compute) | Very low | Low with min-instances |
| MVP time-to-ship | ✅ Fastest (14 wk target) | Slower (~15 wk) | Slowest (~18–20 wk) |
| Operational surface | Smallest | Medium | Largest |
| Cost at MVP scale | Low-mid | Low-mid | Mid |
| Cost at 100M subs scale | Mid-high | Mid | Predictable low-mid |
| PCI DSS SAQ-A | ✅ | ✅ | ✅ |
| GDPR EU residency | ✅ (Neon EU + Vercel EU) | ✅ | ✅ (regional Cloud SQL) |
| Team expertise fit | ✅ High (aisles-admin, ask-bc precedent) | ✅ High | Medium (GCP ops curve) |
8. Rough Cost Model
Monthly estimates for two scale points, steady-state, per store unless noted. Not a quote — a sanity check.
8.1 MVP scale (50 stores, 5K active subs, 500K total API calls/mo)
| Category | Approach A (Vercel) | Approach B (GCP) |
|---|---|---|
| Compute (UI + API + webhooks) | $200–$400 | $300–$500 (min-instances) |
| Postgres | $50 (Neon Scale) | $150 (Cloud SQL HA) |
| Redis | $30 (Upstash) | $100 (Memorystore Standard) |
| Object storage | < $10 | < $10 |
| Secrets / auth | included | included |
| Email (Resend) | $30 | $30 |
| CDN | included | $20 |
| Observability | $50 (Sentry) | $80 (Cloud Ops + Sentry) |
| Total / month | ~$370–$520 | ~$690–$890 |
8.2 Growth scale (1K stores, 500K active subs, 50M API calls/mo)
| Category | Approach A | Approach B |
|---|---|---|
| Compute | $2K–$4K | $1.5K–$3K (CUD discounts) |
| Postgres | $800 (Neon Business) | $1K (Cloud SQL HA + read replicas) |
| Redis | $400 | $500 |
| Object storage | $50 | $50 |
| $500 | $500 | |
| CDN + egress | included | $300 |
| Observability | $400 | $500 |
| BigQuery analytics | n/a | $300 |
| Total / month | ~$4.15K–$6.25K | ~$4.65K–$6.15K |
Conclusion: Vercel is cheaper at MVP scale (~40% cheaper); GCP approaches parity at growth scale, with GCP edging out on pure compute (CUDs) but incurring storage/egress/ops overhead.
9. Operational Considerations
9.1 Deployment
- Approach A:
git pushdeploys; preview URL per PR; production deploy is a promote in the Vercel dashboard or viavercel deploy --prod. A2 addswrangler deployfor the Cloudflare piece. - Approach B: Terraform plan/apply for infrastructure; Cloud Build triggers for app deploys; blue/green or canary via Cloud Run traffic splits. Requires discipline around state management and IAM.
9.2 Disaster Recovery
- RPO (data loss tolerance): 5 min for charges; 0 for events; 1 hour for other tables
- RTO (restore time): 1 hour for read-only operation; 4 hours for full write restoration
- Approach A: Neon point-in-time recovery to any second within 7d (Scale) / 30d (Business); Upstash snapshot restore; Vercel Blob versioning
- Approach B: Cloud SQL automated backups + PITR; Memorystore replica failover; Cloud Storage versioning + cross-region replication
Both meet RPO/RTO for MVP. Approach B's Cloud SQL offers stronger cross-region failover at higher cost.
9.3 Security Posture
- Approach A: Vercel SOC 2 Type II + HIPAA-eligible (Enterprise plan); Neon SOC 2; Upstash SOC 2. Stacked vendor compliance with narrow blast radius per vendor.
- Approach B: GCP's unified compliance (SOC 2, ISO 27001, HIPAA, PCI DSS shared responsibility). Single-vendor evidence collection simplifies SOC 2 audit.
Either approach meets PCI SAQ-A eligibility (we never see card data).
9.4 Incident Response
- Approach A: Vercel has fast status page updates and history; Cloudflare same. Incident response runbooks span 2–3 vendors.
- Approach B: GCP status + Cloud Monitoring alert policies; single incident surface; stronger diagnostic tooling via Cloud Profiler / Cloud Trace.
9.5 Multi-Region / Data Residency
- Approach A: Multi-region is opt-in per service. Neon multi-region is available; Vercel multi-region compute is the norm. EU-data-stays-in-EU requires Neon EU region + Vercel EU hosting + Upstash EU — achievable but vendor-by-vendor configuration.
- Approach B: Region-pinning is uniform across every service via a single regional deployment; EU data residency is one Terraform variable.
Approach B wins clearly on data residency for Phase 2 international expansion.
9.6 Vendor Concentration Risk
- Approach A1: Medium (Vercel outage affects all user-facing surfaces)
- Approach A2: Medium-low (Cloudflare billing engine keeps running during Vercel outages; admin UI degrades but subscribers' existing subscriptions still bill)
- Approach B: Medium (GCP outage affects all)
9.7 Talent Market
- Approach A: Next.js + Vercel experience is widely available; Cloudflare Workers more specialized but growing
- Approach B: GCP expertise is deep but narrower than AWS; Cloud Run + Workflows + Terraform combination requires stronger platform engineering
10. Recommendation
Superseded by ADR-0030. The recommendation below reflects the original April 2026 decision-time analysis (favoring Approach A1 / Vercel for MVP). The ratified architecture is Cloudflare (Phase 1) → GCP (Phase 2) — see §0 above. Preserved verbatim because the reasoning structure (time-to-ship, workspace precedent, cost-at-MVP, GCP-trigger conditions) remains useful when re-litigating platform choices.
Original (April 2026) recommendation: Choose Approach A1 (Pure Vercel) for MVP. Plan an evolution path to A2 (Vercel + Cloudflare for billing engine) if and when scale warrants.
Reasoning
-
Time-to-ship is the primary risk. BC Payments launched March 2026; the strategic window to ship a BC-Payments-unified subscription app is measured in quarters, not years. Approach A1 ships in 14 weeks; Approach B takes 18–20. That 4–6 week gap is strategically material.
-
Workspace precedent is a real multiplier.
aisles-admin,ask-bc, andaisles-storefronthave already paid the setup cost for Vercel + Next.js + Neon + Upstash + BigDesign. Approach B throws that away and adds a GCP learning curve the team doesn't currently have invested. -
Approach A1 → A2 evolution is clean. Migrating the billing engine from Vercel Workflow to Cloudflare Durable Objects is additive: introduce a dispatcher that routes charges to DOs for stores where the scale math justifies it, leave the Workflow path for others. This is the
ask-bcworker pattern applied to billing. No rewrite. -
GCP's advantages don't fire at MVP scale. BigQuery analytics, Cloud Armor, Cloud Workflows maturity, single-cloud compliance — all real wins, but Phase 2/3 needs. Phase 1 doesn't touch any of them materially.
-
The cost gap at MVP scale is meaningful (~40% cheaper). At growth scale they converge, and the cost-of-change to migrate to GCP later is non-trivial but not prohibitive.
When to Reconsider Approach B
Revisit GCP if any of these become true:
- A merchant pipeline of enterprise accounts with GCP-single-cloud requirements materializes (regulated finance, healthcare). Revenue-weighted: > 30% of pipeline.
- Cost at steady-state exceeds $15K/month on Approach A with no clear optimization path.
- Team gains meaningful GCP operational depth (hires, acquisitions).
- BigQuery analytics becomes a competitive differentiator the Approach A warehouse pattern can't match.
- Phase 2 international expansion makes data residency setup-cost > payoff on Approach A's per-vendor configuration.
What to Decide Now
- A1 for MVP. Commit to Vercel + Neon + Upstash + Resend stack.
- Abstract the billing engine. Write the scheduler, executor, and dunning logic against interfaces that let us swap Vercel Workflow → Cloudflare DOs → Cloud Workflows without rewriting business logic.
- Abstract the queue. Vercel Queue is beta; wrap it behind a
JobQueueinterface. If it becomes a problem, swap to Upstash-backed queue or Cloudflare Queues without touching callers. - Abstract object storage. Behind a
BlobStoreinterface; Vercel Blob today, swap if needed. - Monitor three specific metrics that would trigger A2 migration:
- P95 charge execution latency > 5s sustained for 7 days
- Charges per store per day > 10K on > 5% of stores
- Cost per 1000 charges > $0.05 steady-state
Supabase / Cloudflare Optional Components (within Approach A)
- Supabase as Neon substitute: Skip unless the team wants Realtime (live exception queue) and Storage in one SKU. Neon is the simpler choice and matches existing workspace pattern.
- Cloudflare Workers for widget only: Skip. Vercel Edge is sufficient for widget CDN.
- Cloudflare for billing engine (A2): Defer until scale metrics trigger. Pattern already exists in
ask-bc.
11. Open Architectural Decisions
These require a decision before engineering kickoff. Most are flagged in PRD.md §13 and BRD-specs.md Appendix C; surfacing the architectural implications here.
- Vercel Workflow GA / durability commitment — confirm with Vercel that Workflow meets our durability requirements for billing, or we wrap it in abstractions that allow swap-out.
- Vercel Queue GA timing — if still beta at MVP ship, commit to the abstraction layer approach.
- BC Payments MIT surface — RESOLVED (synthesis
7574bb48, decision256591ec; further reframed by ADR-0037 2026-05-15). Day-0 spikeeb94cf4aconfirmed BC's BigPay layer auto-classifies API-initiated stored-instrument charges as MIT (MRECflag for recurring); production proof at Wren Laboratories. The stored-instruments vault rail (payments.bigcommerce.com) is the canonical path for any BC-supported gateway with "Stored Payments" enabled; direct Braintree SDK is a fallback capability only. ADR-0037 reframes PI-5062: it is a Worldpay/Paymetric-specific NTI threading issue, not a blocker for the standard vault rail. Adapter-side NTI persistence bridges for Worldpay/Paymetric merchants until PI-5062 closes (PRD-COMPANION D17). - Neon vs Supabase Postgres — recommend Neon for pattern consistency, but Supabase is viable.
- Subscriber portal custom domains — wildcard cert provisioning automation. Vercel supports this; confirm pricing at scale.
- Multi-region from day one vs. single-region with migration path — recommend single US region for MVP; EU expansion in Phase 2.
- Analytics warehouse — Neon read replica for MVP; ClickHouse or BigQuery for Phase 2 when cohort/LTV queries become expensive.
- Merchant-tier-aware infrastructure — do enterprise-tier merchants get dedicated compute pools? Deferred to Phase 3.
- Storefront extension on B2B-Edition stores — RESOLVED (synthesis
eb678099, ADR-0023). When the BigCommerce B2B Edition Buyer Portal is mounted on a Catalyst storefront (per the Open Source Buyer Portal working with One Click Catalyst integration guide), our subscription widget detects Buyer Portal presence (window.B2Bnamespace probe) and degrades to a non-purchasable "contact your account manager" CTA. Subscription enrollment for B2B buyer-orgs runs through CS tools (Epic 22) rather than the standard storefront cart-capture flow. The cart-capture handler (Epic 9) is bypassed because the Buyer Portal owns its own quote-to-cart conversion flow and ouron-cart-createdlistener never fires for Buyer Portal-routed checkouts. Cart-merge (where we own checkout for B2B-Edition stores by intercepting Buyer Portal's flow) is deferred to v3 behind spike #113 (Epic 24 unblocker). Composite PDP architecture: standard widget renders for non-B2B contexts; B2B variant renders when Buyer Portal SDK is loaded. Reference files:core/b2b/loader.tsx,core/b2b/use-b2b-cart.ts,core/b2b/use-product-details.tsxfrom the Catalyst integration guide. - Marketplace Merchant-of-Record (per-buyer-org processor routing) — RESOLVED (synthesis
eb678099, ADR-0022). Out of scope for Phase 1. Phase 1 ships single-tenant MoR (one processor connection per BC store). The multi-actor primitive (PRD-COMPANION D19) carries the v2 seam viaActor.processor_connection_refreserved column. v2 migration is additive: addsprocessor_connectionskeyed on(store_hash, payer_org_id), backfills existing subscriptions to the store's primary connection, extends Epic 14 charge orchestration to resolvepayer.processor_connection_ref ?? store.primary_connection_ref. No Phase-1 charge or subscription row is invalidated by the v2 migration. - NTI freshness for long-cycle subscriptions — RESOLVED (synthesis
eb678099, PRD-COMPANION D21). Phase 2 implementation slot. Workflow extension fires aMUSEverification charge before the renewal MIT whennow - subscriptions.last_nti_refreshed_at > merchant.nti_freshness_days(default 365). Verification success persists fresh NTI; verification failure routes to dunning as PM-decline. Adapter contract (D17) is unchanged — only the workflow gains a verification stage before the existing MIT call.
12. Appendix A — Sequence: End-to-End Renewal (Approach A1)
Historical — decision-time (rejected stack). This sequence depicts the Vercel-native Approach A1 (Vercel Cron/Queue/Workflow, Neon, Upstash) that was not built. It is retained only as part of the ADR-0030 decision trail. The as-built renewal / subscribe / dunning sequences (Cloudflare Cron → Worker
scheduled()→scheduler.ts→ BC Payments vault → D1 → Queues) are the authoritative ones: seedocs/architecture/sequence-diagrams.md. Do not cite the diagram below as current.
sequenceDiagram
participant Cron as Vercel Cron
participant Disp as Dispatcher (Next.js API)
participant Q as Vercel Queue
participant WF as Vercel Workflow
participant DB as Neon Postgres
participant R as Upstash Redis
participant Proc as Processor (BC Payments)
participant BC as BigCommerce API
participant Email as Resend
Cron->>Disp: trigger (every 15m)
Disp->>DB: SELECT charges due
Disp->>Q: enqueue(charge_id × N)
Q->>WF: start workflow(charge_id)
WF->>R: acquire lock charge_id
WF->>DB: load subscription + plan
WF->>BC: GET variant stock
WF->>BC: POST checkout price quote (tax + shipping)
WF->>Proc: charge(amount, pm_ref, idempotency_key)
Proc-->>WF: settled ok (txn_id)
WF->>BC: POST /v2/orders (payment_status=captured)
BC-->>WF: order_id
WF->>DB: charges.status=succeeded, bc_order_id=...
WF->>DB: schedule next charge (anchor + interval)
WF->>Email: send renewal receipt
WF->>R: release lock
13. Appendix B — Sequence: End-to-End Renewal (Approach B)
Historical — decision-time (Phase-2 target, not yet built). This sequence depicts the GCP Approach B (Cloud Scheduler / Pub/Sub / Cloud Workflows / Cloud SQL / Memorystore) — the ratified Phase-2 migration shape per ADR-0030, not the current runtime. The as-built Phase-1 sequences live in
docs/architecture/sequence-diagrams.md.
sequenceDiagram
participant Sched as Cloud Scheduler
participant PS as Pub/Sub
participant WF as Cloud Workflows
participant DB as Cloud SQL
participant R as Memorystore
participant Proc as Processor
participant BC as BigCommerce API
participant Email as Email
Sched->>PS: publish(tick)
PS->>WF: trigger dispatcher
WF->>DB: SELECT charges due
loop per charge
WF->>WF: execute charge workflow
WF->>R: acquire lock
WF->>DB: load state
WF->>BC: stock + pricing
WF->>Proc: charge
Proc-->>WF: result
WF->>BC: create order
WF->>DB: finalize
WF->>Email: notify
end
14. Summary Table — Capability to Service (One Page)
Historical — decision-time comparison. This one-page table compares the three candidate stacks (Approach A1 Vercel-native, A2 Vercel+Cloudflare hybrid, B GCP) that fed ADR-0030. The shipped Phase-1 stack is §0's Cloudflare column (closest to A2's Cloudflare primitives); this table is retained for the decision trail, not as a description of what runs today. For the as-built container + binding view, see
docs/architecture/c4-container.md.
| Capability | Approach A1 | Approach A2 | Approach B |
|---|---|---|---|
| Admin UI | Next.js on Vercel | Next.js on Vercel | Next.js on Cloud Run |
| Subscriber Portal | Next.js on Vercel | Next.js on Vercel | Next.js on Cloud Run |
| Storefront Widget | Vercel Edge static | Vercel Edge static | Cloud CDN + Cloud Storage |
| API + Webhooks | Next.js Routes on Vercel | Next.js Routes on Vercel | Cloud Run services |
| Scheduler | Vercel Cron | Vercel Cron | Cloud Scheduler |
| Job Queue | Vercel Queue | Vercel Queue + CF Queue | Cloud Tasks + Pub/Sub |
| Charge Executor | Vercel Workflow | Cloudflare Durable Object | Cloud Workflows |
| Dunning State | Vercel Workflow | Cloudflare DO | Cloud Workflows |
| Reconciliation | Vercel Cron + Queue | Vercel Cron + Queue | Cloud Scheduler + Cloud Run Job |
| Postgres | Neon | Neon | Cloud SQL |
| Redis | Upstash | Upstash (+ DO SQLite) | Memorystore |
| Object Storage | Vercel Blob | Vercel Blob | Cloud Storage |
| Secrets | Vercel Env + OIDC | Vercel + CF secrets | Secret Manager |
| Resend | Resend | SendGrid / Resend | |
| SMS | Twilio | Twilio | Twilio |
| CDN | Vercel native | Vercel + Cloudflare | Cloud CDN |
| Observability | Vercel + Sentry | Vercel + CF + Sentry | Cloud Ops + Sentry |
| Analytics | Neon replica (P3: ClickHouse) | Neon replica | BigQuery |
| IaC | vercel.ts |
vercel.ts + wrangler.toml |
Terraform |
| CI/CD | Vercel | Vercel + Wrangler | Cloud Build |