Lustr DevOps

Summary

This is the release and reliability system behind Lustr. I set up the dev-to-staging-to-main flow, hotfix backports, branch previews, migration deploys, and guardrails that block frontend promotion when backend migrations fail. It lets us move fast day to day without rolling broken changes into production.

Stack

What it's built with.

CI/CD

GitHub Actions
Vercel Preview Deployments
Docker
Migration Discipline

Security

Secret Management
Row-Level Security
Webhook Signature Verification
Auth Boundary Design

Testing

Playwright (E2E)
Vitest (Unit)
Idempotent Test Seeding
Local Supabase via Docker

Staging & Release

Per-Environment Supabase Projects
Vercel Deployment Checks
Hotfix Auto-Backport
Forward-Only Migrations

Details

How it works.

Release flow

Three tiers: dev/* → stg → main. Feature work branches off stg into a dev/* branch, opens a PR back to stg, and gets merged after Gate 1 passes. Promotion to production happens in a single stg → main PR that runs Gate 2 (the full regression suite) before merging.

There's a fourth lane for emergencies: hotfix/* branches cut directly off main, target main, and get the same Gate 2 suite. On merge a workflow detects the hotfix prefix and automatically opens a backport PR (main → stg) with auto-merge enabled, so the fix is guaranteed to land back in the next promote rather than getting clobbered when stg lands on main next.

main and stg are permanent branches and the repo is configured to never auto-delete head branches, specifically so the backport workflow can't accidentally request a delete on main.

Two CI gates on purpose

Gate 1 runs on every PR into stg (`ci-pr-stg.yml`). It runs Vitest and a smoke Playwright suite covering the six flows that hurt most when they regress: auth, the booking wizard, cart, checkout, membership, and the dashboard. Total budget: 5 min for Vitest, 15 min for the smoke suite. The runner spins up a full local Supabase stack via the Supabase CLI before Playwright runs, with a retry loop because `supabase start` occasionally flakes on cold runners.

Gate 2 runs on every PR into main (`ci-pr-main.yml`). Same Vitest pass, then the full Playwright suite (35 spec files) across two browser projects: Chromium desktop for everything and Pixel-7 mobile-Chromium for the mobile-rendering specs. Budget: 5 min plus 30 min.

Gate 2 is deliberately not re-run on the merge commit itself. The same code already passed Gate 2 on the PR head, so re-running another half-hour of tests on the merge commit just delays production for zero new information. The deploy workflow handles the merge commit instead.

Preview environments that are actually useful

Vercel auto-builds a preview deployment for every push on dev/* and stg. Previews are gated behind Vercel team auth, so reviewers (and I) get a real URL to click through, but it's not publicly reachable. The preview deployment talks to a separate, persistent Supabase project (sandbox Stripe, separate auth users, separate data) that exists explicitly to be the staging backend. Production main builds talk to the production Supabase project and live Stripe.

Environment binding is handled by Vercel's environment scopes (Preview vs Production), not by runtime branch detection. `VITE_SUPABASE_URL`, `VITE_SUPABASE_ANON_KEY`, and the publishable Stripe key all live in the right scope for the right environment, and there's never a path where a preview build can accidentally hit the production database.

One thing I deliberately don't do: ephemeral per-PR databases. The staging backend is shared and persistent. The trade is honest: shared state means one PR can pollute data another PR is reading, but per-PR provisioning would have been a lot of plumbing for the size of the team. The Playwright global setup mitigates this by seeding deterministic test users idempotently before any spec runs.

Local Docker before push

The local loop is `supabase start`, which brings up Postgres 17, GoTrue (auth), PostgREST, the Edge Function runtime, Realtime, Studio, and Inbucket (mail capture) on fixed ports (API 55321, DB 55322, Studio 55323). The frontend runs against `http://127.0.0.1:55321`, `npm run test:e2e` runs the Playwright suite against that local stack, and Stripe webhooks can be forwarded with `stripe listen --forward-to ...` to test real test-card flows end-to-end without leaving the laptop.

This is the same stack CI uses. The Gate 1 and Gate 2 runners both call `supabase start` themselves and run tests against the local instance, so a passing local Playwright run is a strong signal the CI run will pass too. The handbook in `docs/development/` walks through the bootstrap (config.toml, env files, function secrets) in one place so new contributors don't have to reverse-engineer it.

Merge-to-deploy path

Push to stg fires `deploy-stg.yml`. The job links the Supabase CLI to the staging project with `SUPABASE_STG_PROJECT_REF` + `SUPABASE_STG_DB_PASSWORD`, runs `supabase db push` to apply any new migrations from `supabase/migrations/`, then `supabase functions deploy` to push every Edge Function. Per-function `verify_jwt` settings are read from `supabase/config.toml` so JWT enforcement isn't an editable runtime knob.

Push to main fires `deploy-prod.yml`, which does the same thing against the production project, then posts a commit status named `prod-migrations` (success or failure). Vercel's Deployment Check is wired to wait on that status before promoting the new build to the production URL. If the migration job fails, Vercel holds the deploy and the old frontend stays live, which is the right default: shipping a UI against a half-migrated schema is much worse than waiting an hour to fix a migration.

There's no manual approval gate. The Gate 2 PR review and the migration status check are the gates.

Database migrations

All schema changes live as timestamped SQL files in `supabase/migrations/` (24 files at present, including a 91-policy RLS baseline that locks down every user-facing table). The same files apply to staging on push to stg and to production on push to main, in lexicographic order, via the same `supabase db push` command. The CLI tracks applied migrations in the database, so re-runs no-op anything already there.

Convention: forward-only. Rollbacks are new migrations that revert intent, not destructive `down` files. In a true emergency the Supabase dashboard can edit SQL directly, but that's the break-glass option and it goes back into a migration immediately afterward so the file history stays the source of truth.

Production migrations are gated by the Vercel Deployment Check described above. If the migration step fails, the frontend doesn't promote; if it succeeds, the frontend promotes against the new schema. Either way the two never get out of sync in production.

Test infrastructure and seeding

35 Playwright spec files in `web/e2e/` covering auth, the full booking wizard, cart, checkout, membership credit checkout, scheduling, every admin dashboard surface, security boundary checks, and mobile rendering. 23 Vitest unit files in `web/src/__tests__/` and `web/src/services/__tests__/` covering services, schemas, and dashboard helpers.

The Playwright global setup is the part that earns the most. Before any spec runs, it idempotently provisions two admin accounts and one test customer via the Supabase service role: creates them if they don't exist, asserts and updates flags (TOS acceptance, email verification, is_admin) if they do. The test customer gets a deterministic vehicle, address, an active 60-day membership (the 60-day window deliberately avoids month-end edge cases that bit earlier calendar-month logic), and a Pit Stop pricing override so assertions don't drift with prod price changes. Seed IDs are written to `e2e/helpers/.seed-ids.json` for specs to import.

Playwright config: `retries: 2` in CI, `0` locally, single worker, Chromium desktop plus a Pixel-7 mobile-Chromium project for `mobile/` specs. The 2-retry tolerance in CI is honest about flake risk on a stack that includes async Edge Function startup; tests that fail after both retries surface as real failures and block the PR.

Security posture

Secrets are scoped per environment in Vercel: production keys never appear in preview, preview keys never appear in production. Edge Function secrets live in Supabase runtime, not the repo, and CI uses obvious placeholder Stripe keys (`sk_test_placeholder_*`) so a leak from a CI log is immediately recognisable as not-a-real-secret. The committed `.env.example` and `.env.test` only contain local-Docker public values.

Stripe webhooks are signature-verified via `stripe.webhooks.constructEventAsync` against `STRIPE_WEBHOOK_SECRET`, with raw body (not parsed JSON) and the `stripe-signature` header. Missing signature → 400. Verification failure → 400. The webhook handler also pulls FK values for appointment metadata from the database rather than from webhook metadata, on the principle that webhook metadata is technically tamper-able even after signature verification (the metadata field is set by the client at checkout creation).

Admin Edge Functions gate every call on an `is_admin` lookup against the caller's row in `profiles` before doing anything destructive. RLS is the second wall behind that: the 91-policy baseline restricts admin-only tables to admins regardless of which function reaches them.

Highlights

The things I'm proudest of.

▹Three-tier branching (dev/* → stg → main) with a hotfix/* lane that bypasses stg in emergencies and auto-opens a backport PR with auto-merge enabled, so a hotfix can never silently regress on the next stg → main promote.
▹Two PR gates. Gate 1 (PR into stg): Vitest plus a smoke Playwright suite (~6 critical specs covering auth, booking wizard, cart, checkout, membership, dashboard) against a fresh local Supabase stack spun up via `supabase start` in the runner. Gate 2 (PR into main): the same Vitest pass plus the full Playwright suite (35 specs across desktop Chromium and Pixel-7 mobile-Chromium projects) against the same fresh local stack. Gate 2 deliberately does not re-run on the merge commit so we don't burn ~25 min of CI re-validating identical code.
▹Vercel previews per branch. dev/* and stg builds deploy to per-branch Vercel preview URLs gated behind Vercel team auth, scoped to the staging Supabase project and Stripe sandbox. Production builds (main) target the production Supabase project and live Stripe. Env scoping is handled by Vercel's preview vs production environment buckets, so reviewers click through real flows on the real code rather than a mock.
▹Continuous delivery on merge. `deploy-stg.yml` fires on push to stg: links the Supabase CLI to the staging project, runs `supabase db push` against the migrations folder, deploys all Edge Functions (per-function `verify_jwt` settings come from `supabase/config.toml`). `deploy-prod.yml` does the same on push to main against the production project, and posts a `prod-migrations` commit status. Vercel's Deployment Check waits on that status before promoting the new frontend, so a failed production migration leaves the old frontend live instead of shipping a UI against a half-migrated schema.
▹Migrations as the single source of truth. 24 timestamped SQL files in `supabase/migrations/`, including a 91-policy RLS baseline. Same migrations run against staging on push to stg and production on push to main, in order, idempotently (Supabase CLI no-ops anything already applied). Forward-only by convention; rollbacks are new migrations, not destructive `down` files.
▹Local Docker loop before pushing. `supabase start` brings up Postgres 17, GoTrue, PostgREST, the Edge Function runtime, Realtime, Studio, and Inbucket on fixed ports. Devs point the frontend at `http://127.0.0.1:55321`, run `npm run test:e2e`, and reproduce the exact stack CI uses. Stripe webhook forwarding via `stripe listen --forward-to ...` makes real test-card flows runnable end-to-end locally before anything hits a remote.
▹Test infrastructure. 35 Playwright spec files (auth, booking wizard, cart, checkout, membership, scheduling, admin dashboards, security, mobile rendering) plus 23 Vitest unit files (services, schemas, dashboard helpers). Playwright runs Chromium desktop plus a Pixel-7 mobile project for `mobile/` specs, with a global setup that idempotently seeds two admin accounts and one test customer (vehicle, address, active membership, deterministic Pit Stop pricing override) via the service role before any spec executes. Retries 2 in CI / 0 locally, single worker, 30 min suite cap.
▹Security posture. Secrets are scoped per Vercel environment (no production keys in preview, no preview keys in production); Edge Function secrets land in Supabase runtime, not the repo; CI uses placeholder Stripe keys (`sk_test_placeholder_*`). Stripe webhooks verify signatures via `stripe.webhooks.constructEventAsync`, drop missing-signature requests with 400, and source FK values from the database rather than untrusted webhook metadata. Admin Edge Functions gate every call on an `is_admin` lookup against `profiles` before doing anything destructive, with RLS as the second wall behind that.