Developer Docs

CI/CD Pipeline & Production Infrastructure

The Ngwenya federation platform uses GitHub Actions for continuous integration and deployment. This document covers the pipeline architecture, runner hosting strategy, database backup infrastructure, and CORS production lockdown.


Pipeline Architecture

The platform has two repositories, each with its own CI pipeline:

Backend โ€” `ngwenya-federation`

PR Quality Gate (`ci.yaml`)

Runs on every pull request targeting main. Three parallel jobs:

Job Purpose Est. Time
Lint & Types pnpm lint + pnpm build (NestJS compile check) ~3 min
Unit Tests make test with MongoDB/Postgres/Redis/Meilisearch service containers ~5 min
Security vuln-regex-detector ReDoS scan ~1 min

The unit test job runs with full infrastructure services (MongoDB, Postgres, Redis, Meilisearch) via GitHub Actions service containers.

Merge Pipeline (`e2e.yaml`)

Runs on push to main (after PR merge). Executes the full E2E test suite with the same service containers:

make test-e2e  # Runs all NestJS subgraph E2E tests with --forceExit

Note: Rust services (auth, uchat, intelligence, tower) are not yet compiled in CI. E2E tests for NestJS subgraphs only. Rust cross-compilation will be added in a future iteration.

Frontend โ€” `ngwenya-front`

PR Quality Gate (`ci.yaml`)

Runs on every pull request targeting main. Five jobs (E2E depends on build):

Job Purpose Est. Time
Lint ESLint + Prettier format check ~1 min
Type Check svelte-kit sync + svelte-check ~2 min
Build vite build โ€” validates production bundle ~3 min
Unit Tests Vitest unit test suite ~1 min
E2E Tests 35 Playwright tests against dev server (Chromium) ~5 min

The E2E job installs Playwright Chromium and runs against the Vite dev server on port 4173. Tests use the established mockAuth pattern for backend isolation (see E2E Testing Infrastructure).

Documentation Portals โ€” `ngwenya-dev` & `ngwenya-support`

Both Astro-powered portals have a build-check CI (ci.yaml) that runs astro build on every PR. This catches broken markdown, invalid frontmatter, remark plugin errors, and missing imports before merge.

Concurrency Controls

  • Backend PR: Cancels in-progress runs on the same PR (cancel-in-progress: true)
  • Backend E2E: Serializes runs on main (cancel-in-progress: false) to prevent race conditions
  • Frontend PR: Cancels in-progress runs on the same PR
  • Doc Portals: Cancel in-progress runs on the same PR

Staging Environment

The platform includes a self-contained staging environment mirroring the full production topology across all 5 repositories: ngwenya-federation, ngwenya-front, ngwenya-dev, ngwenya-support, and mall-design.

Staging Topology

The staging environment is orchestrated via docker-compose-staging.yaml in the backend repository and includes 34 containers:

  • 19 NestJS Services: Containerized using multi-stage production builds.
  • 4 Rust Services: Containerized release binaries.
  • Frontend: SvelteKit containerized via @sveltejs/adapter-node.
  • Doc Portals: Both ngwenya-dev and ngwenya-support compiled to static HTML and served via nginx:alpine.
  • Infrastructure: Postgres, Redis, MongoDB, Meilisearch, MinIO, and Matrix Conduit.
  • Observability: GlitchTip error tracking, Prometheus time-series metrics, Grafana dashboards, and Tower health monitoring.

Mobile App Integration

The native mobile apps (ngwenya-front/mobile) are not containerized but include build-time configurations to target the staging gateway:

  • iOS (Swift): Uses an #if STAGING conditional in NgwenyaViewModel.swift.
  • Android (Kotlin): Uses a custom staging build flavor exposing BuildConfig.GATEWAY_URL.

Operational Commands

The staging lifecycle is managed via Makefile targets in ngwenya-federation:

  • make staging-up: Boots the full 34-container stack (~3.5 GB RAM).
  • make staging-up-lite: Boots the critical path only (~2.0 GB RAM).
  • make staging-health: Runs scripts/staging-healthcheck.sh to poll all services.
  • make staging-test: Runs the E2E suite against the staging gateway.
  • make staging-seed: Populates staging databases with test data.

Runner Hosting Strategy

Decision: GitHub-Hosted Runners (`ubuntu-latest`)

After evaluating GitHub's 2026 pricing changes, the platform uses GitHub-hosted runners for all CI/CD workflows.

Cost Analysis (as of January 2026)

Factor GitHub-Hosted Self-Hosted
Setup Zero โ€” instant Provision VM, install agent, maintain
Linux cost $0.008/min (39% reduction) $0.002/min platform charge + infra
Free tier 2,000 min/month (Free), 3,000 (Pro) Same minutes consumed
Maintenance GitHub handles everything Team handles OS, deps, patches
Scale Auto-scales with demand Manual capacity planning
Cold starts ~15-30s VM spin-up per job Persistent = faster starts

Why GitHub-Hosted (For Now)

  1. Zero operational overhead โ€” no servers to patch, monitor, or scale
  2. Free tier sufficient โ€” at current commit frequency, monthly CI usage stays within the included 2,000-3,000 minutes
  3. Predictable environments โ€” eliminates "works on my machine" issues in CI
  4. No specialized hardware needs โ€” NestJS + Jest don't require GPU or ARM

When to Re-evaluate

Switch to self-hosted runners when any of these conditions are met:

  • Monthly CI minutes consistently exceed the free tier for 2+ months
  • CI jobs require specialized hardware (GPU, ARM Mac, high-memory)
  • Build times need to drop below what cold-start VMs allow
  • Security policy requires code to stay on private infrastructure

Action item: Review CI minute consumption after 3 months of pipeline data. The usage dashboard is at Settings โ†’ Billing โ†’ Actions in the GitHub repository.


Database Backup Strategy

The platform provides automated backup/restore for all databases via Makefile targets.

Quick Start

make backup-db           # Full backup (MongoDB + Postgres)
make backup-db-mongo     # MongoDB only
make backup-db-postgres  # Postgres only
make restore-db          # Restore from latest backup
make restore-db BACKUP=2026-05-07T15-20  # Restore specific backup

Storage Backends

Backups support pluggable storage backends via environment variables:

Backend BACKUP_STORAGE Required Config Use Case
Local local (default) BACKUP_LOCAL_DIR Development, quick snapshots
S3 s3 BACKUP_S3_BUCKET, AWS credentials AWS production backups
R2 r2 BACKUP_S3_BUCKET, BACKUP_S3_ENDPOINT Cloudflare R2 offsite backups

Adding a new storage backend requires adding an upload_*() function in scripts/backup-databases.sh and a case in the upload switch.

Automated Schedule

Backups run automatically via GitHub Actions (backup.yaml):

Setting Value
Schedule Daily at 3:00 AM UTC (0 3 * * *)
Manual trigger workflow_dispatch โ€” trigger on-demand from the Actions tab
Cloud storage Configure via repository secrets: BACKUP_S3_BUCKET, BACKUP_AWS_ACCESS_KEY_ID, BACKUP_AWS_SECRET_ACCESS_KEY
Local fallback If no cloud storage configured, backups are uploaded as GitHub Actions artifacts (30-day retention)

To change the schedule, edit the cron expression in .github/workflows/backup.yaml.

What Gets Backed Up

Database Engine Data
ngwenya MongoDB All collections (malets, products, orders, blogs, etc.)
ngwenya_auth Postgres Users, sessions, OAuth tokens, passkeys
ngwenya_uchat Postgres E2EE messages, conversations, participants
ngwenya_scim Postgres SCIM provisioning tokens, IdP configs

Retention

  • Local: Auto-prunes backups older than BACKUP_RETENTION_DAYS (default: 7 days)
  • Cloud: Managed by the S3/R2 bucket's lifecycle policy

See environment-variables.md for all backup-related env vars.


CORS Production Lockdown

CORS origins are environment-aware, controlled by NODE_ENV:

Development Mode (`NODE_ENV != 'production'`)

All localhost origins are allowed alongside production origins:

  • http://localhost:5173 (Vite dev server)
  • http://localhost:4321 (Astro docs portals)
  • http://localhost:3000 (alt dev port)

Production Mode (`NODE_ENV=production`)

Only Mallnline subdomain origins are allowed:

  • https://mallnline.com โ€” The Lobby + Malets
  • https://uid.mallnline.com โ€” uID (Universal Identity)
  • https://uchat.mallnline.com โ€” uChat
  • https://umail.mallnline.com โ€” uMail
  • https://ucart.mallnline.com โ€” uCart Universal
  • https://deck.mallnline.com โ€” The Deck (Malet Owner workspace)
  • https://studio.mallnline.com โ€” The Studio (Developer workspace)
  • https://tower.mallnline.com โ€” The Tower (Platform Admin workspace)

Adding Extra Origins

Use the CORS_EXTRA_ORIGINS env var to add staging or preview URLs without code changes:

CORS_EXTRA_ORIGINS=https://staging.mallnline.com,https://preview-123.vercel.app

CORS lockdown is applied in two locations:

  • Gateway (apps/ngwenya-gateway/src/main.ts)
  • Media service (apps/media/src/main.ts)

Internal subgraph-to-subgraph communication does not use CORS (gateway forwards headers internally via TCP/HTTP).