Tower Subgraph

The Tower subgraph is a Rust/Axum service with two core responsibilities: (1) proxying the self-hosted GlitchTip error tracking REST API into the federated GraphQL schema, and (2) running a dual-layer health monitor that continuously probes all 34 platform services for liveness and schema composition health. It powers The Tower admin dashboard — the platform-wide observability hub restricted to is_privileged users via Platform Admin Provisioning.

Port: 3027 · Federation: async-graphql v7 · Database: None (stateless proxy) · Pattern: Follows the Intelligence service architecture

Architecture

Tower has no database — it acts as a thin proxy layer between the federated gateway and the GlitchTip REST API:

┌──────────────┐     GraphQL      ┌──────────────┐     REST/JSON     ┌──────────────┐
│  Tower Admin  │ ───────────────→ │    Tower      │ ───────────────→ │  GlitchTip   │
│  (Frontend)   │                  │  Subgraph     │                  │  :8000       │
│  :5173        │ ←─────────────── │  :3027        │ ←─────────────── │  /api/0/     │
└──────────────┘                   └──────────────┘                   └──────────────┘

The frontend /tower route gates access behind the is_privileged flag, and the subgraph enforces authentication at the resolver level by inspecting the user header propagated by the Hive Gateway.

Stack

Component	Technology
Runtime	Rust + Tokio
HTTP Server	Axum 0.8
GraphQL	async-graphql 7.0.17 + Federation
HTTP Client	reqwest 0.12
Logging	tracing + tracing-subscriber

Cross-Service Dependencies

Tower is deliberately isolated — it depends only on:

Dependency	Direction	Mechanism	Purpose
GlitchTip `:8000`	Outbound HTTP	Bearer token REST	Error tracking data
Gateway `:30000`	Inbound GraphQL	Federation composition	Schema stitching
All 34 services	Outbound HTTP	Direct + federated probes	Health monitoring

No TCP events, no database, no inter-subgraph references beyond health probing.

GlitchTip API Mapping

Tower maps GlitchTip's Sentry-compatible REST endpoints to GraphQL queries:

GraphQL Query	GlitchTip Endpoint	Description
`errorProjects`	`GET /api/0/projects/`	List all projects in the organization
`errorProject(slug)`	`GET /api/0/projects/{org}/{slug}/`	Single project by slug
`errorIssues(projectSlug, ...)`	`GET /api/0/projects/{org}/{project}/issues/`	Paginated issues with status/level filters
`errorIssue(issueId)`	`GET /api/0/issues/{id}/`	Issue detail with metadata
`errorEvents(issueId, limit)`	`GET /api/0/issues/{id}/events/`	Individual occurrences with stack traces
`errorEventLatest(issueId)`	`GET /api/0/issues/{id}/events/latest/`	Most recent event for an issue
`errorOverview(projectSlug, topN)`	`GET /api/0/projects/{org}/{project}/issues/`	Pre-aggregated KPIs + top N issues (server-side)
`updateErrorIssueStatus(issueId, status)`	`PUT /api/0/issues/{id}/`	Resolve, ignore, or reopen an issue

Configuration

Variable	Required	Default	Description
`TOWER_SERVICE_PORT`	No	`3027`	HTTP listen port
`GLITCHTIP_API_URL`	No	`http://localhost:8000`	GlitchTip base URL
`GLITCHTIP_API_TOKEN`	Yes	—	Bearer token for GlitchTip API
`GLITCHTIP_ORG_SLUG`	No	`ngwenya`	GlitchTip organization slug
`RUST_LOG`	No	`tower_subgraph=debug`	Tracing filter

Generating a GlitchTip API Token

The token is auto-generated by make glitchtip-setup (called automatically by make glitchtip-up). The setup target:

Creates the ngwenya_glitchtip database if absent
Runs Django migrations
Creates the admin superuser (idempotent)
Generates an API token via Django ORM
Injects the token directly into apps/tower/.env

# Everything is automatic — just run:
make glitchtip-up
# Then restart tower to pick up the token:
make restart-tower

No manual browser steps needed. For full GlitchTip setup details including Docker infrastructure, frontend SDK integration, and Makefile targets, see the Error Tracking (GlitchTip) guide.

GraphQL Schema

Types

type ErrorProject {
  id: String!
  name: String!
  slug: String!
  platform: String
  dateCreated: String
}

type ErrorIssue {
  id: String!
  title: String!
  culprit: String
  shortId: String
  count: Int!
  userCount: Int!
  firstSeen: String
  lastSeen: String
  level: ErrorLevel!
  status: ErrorIssueStatus!
  metadata: ErrorIssueMetadata
  issueType: String
}

type ErrorEvent {
  id: String!
  eventId: String
  title: String
  message: String
  dateCreated: String
  platform: String
  tags: [ErrorTag!]!
  contexts: JSON
  entries: [ErrorEntry!]!
  user: ErrorEventUser
}

enum ErrorLevel { DEBUG, INFO, WARNING, ERROR, FATAL }
enum ErrorIssueStatus { UNRESOLVED, RESOLVED, IGNORED }

type ErrorOverview {
  totalIssues: Int!
  unresolvedIssues: Int!
  totalAffectedUsers: Int!
  approximateCrashFreeRate: Float!
  topIssues: [ErrorIssue!]!
}

Example Queries

# List all error issues for the frontend project
query ErrorIssues {
  errorIssues(projectSlug: "ngwenya-front", limit: 10) {
    id
    title
    count
    lastSeen
    level
    status
    metadata {
      errorType
      filename
    }
  }
}

# Get the latest event for a specific issue
query LatestEvent {
  errorEventLatest(issueId: "42") {
    eventId
    title
    dateCreated
    tags { key value }
    entries { entryType data }
  }
}

# Resolve an error issue
mutation ResolveIssue {
  updateErrorIssueStatus(issueId: "42", status: RESOLVED) {
    id
    status
  }
}

Development

Build & Run

# From apps/tower/
cargo build          # Compile
cargo run            # Start service on :3027

# Or via Makefile (from project root)
make restart-tower   # Rebuild + restart
make setup-tower     # Create .env from template

Testing

cd apps/tower
cargo test           # Run all 14 unit tests
cargo clippy         # Lint

Dual-Layer Health Monitoring

Tower runs a background tokio::spawn task (start_health_monitor) that probes all 34 platform services every 30 seconds (configurable via HEALTH_CHECK_INTERVAL). Each subgraph is probed twice:

Direct Probe — HTTP POST to the service's own port (e.g., localhost:3001/graphql) to verify the process is alive
Federated Probe — The same query routed through the Gateway at :30000/graphql to verify schema composition

This produces a diagnostic matrix with four states: UP/UP (fully operational), UP/DOWN (federation issue), DOWN/UP (stale cache), DOWN/DOWN (service down).

The health state is stored in Arc<RwLock<PlatformHealthOverview>> and exposed via the platformHealth query.

For the full architectural deep dive — diagnostic matrix, smoke query mapping, configuration, and frontend dashboard — see Tower Health Observability.

Dashboard Integration

The Tower admin dashboard (/tower) integrates data in three places:

Platform Health Widget — PlatformHealthWidget.svelte shows cluster uptime, grouped service grid with D/F status badges, and service detail modals with diagnostics and Developer Portal documentation links. Uses the platformHealth query.
Error Overview Widget — ErrorHealthWidget shows crash-free rate (color-coded), total/unresolved issues, affected users, and top 5 errors. Uses the errorOverview query.
Analytics → Errors Sub-Tab — Full ErrorTrackingAnalytics component with project selector, paginated issue list, status/level breakdowns, and issue detail drill-down with stack traces.

The Overview tab also includes an UptimeWidget that fetches from the gateway's GET /admin/health endpoint (which now returns startedAt and uptimeFormatted) and GET /admin/uptime/sessions for historical session tracking. Session history is persisted in Redis (gateway:sessions list key, capped at 100 entries).

Security

All GraphQL queries and mutations require the user header propagated by the Hive Gateway. This header contains the authenticated user's session context and is set by the auth service. Tower enforces authentication at the resolver level — requests without a valid user header receive an "Authentication required" error.

Header injection architecture: Tower uses a custom Axum handler (graphql_handler) that explicitly extracts HeaderMap from the HTTP request and injects it into the async-graphql context via request.data(headers). This is required because async_graphql_axum::GraphQL::new() as a service does not automatically inject HTTP headers. The gateway's propagateHeaders configuration sets the user header based on the authenticated session, and the custom handler makes it accessible to resolvers via ctx.data::<HeaderMap>().

The frontend additionally gates the /tower route behind the is_privileged flag, which is resolved via the Platform Admin Provisioning system (dedicated platform_admins Postgres table, separate from org membership).

Known Issues & Resolutions

Auth Propagation Failure (Resolved 2026-05-07)

The Error Tracking widget showed "Authentication required" even for authenticated admin users. The root cause was that main.rs used async_graphql_axum::GraphQL::new(schema) as a service (via .post_service()), which does not inject the HTTP HeaderMap into the async-graphql context. The require_auth() function checks ctx.data::<HeaderMap>() for the user header, but it was always empty.

Fix: Replaced the service-based handler with a custom graphql_handler function that explicitly extracts HeaderMap via Axum's extractor and injects it into the async-graphql request with request.data(headers). This matches the pattern used by the auth subgraph. Additionally fixed GlitchTipProject.id deserialization from i64 to String to match the actual GlitchTip REST API response.

GlitchTip API Token Type Mismatch (Resolved 2026-05-07)

The GlitchTip UI truncates API tokens when displayed. The full 64-character token must be used. make glitchtip-setup now auto-generates and injects the token via Django ORM, eliminating this issue entirely.

Users Tab "Bad Request Exception" (Resolved 2026-05-05)

The Tower Users tab queries adminUsers which is resolved by the nodes subgraph (not the tower Rust subgraph). The nodes app configures ValidationPipe globally with forbidNonWhitelisted: true, which requires all @ArgsType() and @InputType() DTOs to include class-validator decorators (e.g. @IsOptional(), @IsString(), @IsEnum()).

Previously the AdminUserPagingArgs and AdminUserFilter DTOs only had @Field() decorators from @nestjs/graphql, causing the pipe to reject every property as "non-whitelisted". Additionally, filter was a separate @Args('filter') parameter on the resolver — when mixed with @Args() spread, the forbidNonWhitelisted check treats filter as an unexpected property on the ArgsType.

Fix: Added class-validator decorators to both DTOs and merged filter into AdminUserPagingArgs as a @ValidateNested() property. See apps/nodes/src/actors/user/admin-users.types.ts.

Platform-wide implication: Any NestJS subgraph using forbidNonWhitelisted: true will silently reject all properties on DTOs that lack class-validator decorators. Audit all ArgsType/InputType DTOs when adding new queries.

Admin Provisioning

Platform admin access is managed via the platform_admins Postgres table in the auth database. Bootstrap admins are seeded via SQL migration (20260419000000_platform_admins.sql). To add new admins:

# Via Makefile target
make grant-tower-admin EMAIL=user@example.com

# Or via new migration (preferred for persistent access)
# See: apps/auth/migrations/20260505000000_add_info_admin.sql

Important: Never modify already-applied migrations — the sqlx migration runner checksums each file and will reject modifications. Always create a new migration file.

Tower Subgraph Gateway Registration (Resolved 2026-05-07)

The tower Rust subgraph is now automatically started by scripts/dev/start_services.sh with health probes ensuring readiness before gateway initialization. make restart-tower provides a dedicated rebuild + restart target. make services uses the smart restart script to only restart modified services.

Tower Health Observability — Dual-layer probing architecture, diagnostic matrix, smoke query mapping, and frontend dashboard details
Error Tracking (GlitchTip) — Self-hosted GlitchTip infrastructure: Docker setup, frontend SDK, DSN configuration, and Makefile targets
Workspaces & The Tower — Architecture of The Tower admin dashboard, The Deck, and workspace routing
Intelligence — Rust/Axum subgraph pattern reference (same tech stack: Axum + async-graphql + tracing)
Gateway (Hive Gateway) — Federation composition and service registry where tower is registered
Gateway Tracing & Observability — Backend observability: response caching, APQ, tracing, and Prometheus metrics
Monitoring & Alerting Infrastructure — Prometheus + Grafana observability stack
Platform Admin Provisioning — is_privileged access control system governing Tower access
Admin Dashboard & Audit Tools — User-facing support guide for The Tower
Platform Health Monitoring — User-facing guide for the health dashboard