Tower Subgraph
The Tower subgraph is a Rust/Axum service with two core responsibilities: (1) proxying the self-hosted GlitchTip error tracking REST API into the federated GraphQL schema, and (2) running a dual-layer health monitor that continuously probes all 34 platform services for liveness and schema composition health. It powers The Tower admin dashboard โ the platform-wide observability hub restricted to is_privileged users via Platform Admin Provisioning.
Port:
3027ยท Federation: async-graphql v7 ยท Database: None (stateless proxy) ยท Pattern: Follows the Intelligence service architecture
Architecture
Tower has no database โ it acts as a thin proxy layer between the federated gateway and the GlitchTip REST API:
โโโโโโโโโโโโโโโโ GraphQL โโโโโโโโโโโโโโโโ REST/JSON โโโโโโโโโโโโโโโโ
โ Tower Admin โ โโโโโโโโโโโโโโโโ โ Tower โ โโโโโโโโโโโโโโโโ โ GlitchTip โ
โ (Frontend) โ โ Subgraph โ โ :8000 โ
โ :5173 โ โโโโโโโโโโโโโโโโ โ :3027 โ โโโโโโโโโโโโโโโโ โ /api/0/ โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
The frontend /tower route gates access behind the is_privileged flag, and the subgraph enforces authentication at the resolver level by inspecting the user header propagated by the Hive Gateway.
Stack
| Component | Technology |
|---|---|
| Runtime | Rust + Tokio |
| HTTP Server | Axum 0.8 |
| GraphQL | async-graphql 7.0.17 + Federation |
| HTTP Client | reqwest 0.12 |
| Logging | tracing + tracing-subscriber |
Cross-Service Dependencies
Tower is deliberately isolated โ it depends only on:
| Dependency | Direction | Mechanism | Purpose |
|---|---|---|---|
GlitchTip :8000 |
Outbound HTTP | Bearer token REST | Error tracking data |
Gateway :30000 |
Inbound GraphQL | Federation composition | Schema stitching |
| All 34 services | Outbound HTTP | Direct + federated probes | Health monitoring |
No TCP events, no database, no inter-subgraph references beyond health probing.
GlitchTip API Mapping
Tower maps GlitchTip's Sentry-compatible REST endpoints to GraphQL queries:
| GraphQL Query | GlitchTip Endpoint | Description |
|---|---|---|
errorProjects |
GET /api/0/projects/ |
List all projects in the organization |
errorProject(slug) |
GET /api/0/projects/{org}/{slug}/ |
Single project by slug |
errorIssues(projectSlug, ...) |
GET /api/0/projects/{org}/{project}/issues/ |
Paginated issues with status/level filters |
errorIssue(issueId) |
GET /api/0/issues/{id}/ |
Issue detail with metadata |
errorEvents(issueId, limit) |
GET /api/0/issues/{id}/events/ |
Individual occurrences with stack traces |
errorEventLatest(issueId) |
GET /api/0/issues/{id}/events/latest/ |
Most recent event for an issue |
errorOverview(projectSlug, topN) |
GET /api/0/projects/{org}/{project}/issues/ |
Pre-aggregated KPIs + top N issues (server-side) |
updateErrorIssueStatus(issueId, status) |
PUT /api/0/issues/{id}/ |
Resolve, ignore, or reopen an issue |
Configuration
| Variable | Required | Default | Description |
|---|---|---|---|
TOWER_SERVICE_PORT |
No | 3027 |
HTTP listen port |
GLITCHTIP_API_URL |
No | http://localhost:8000 |
GlitchTip base URL |
GLITCHTIP_API_TOKEN |
Yes | โ | Bearer token for GlitchTip API |
GLITCHTIP_ORG_SLUG |
No | ngwenya |
GlitchTip organization slug |
RUST_LOG |
No | tower_subgraph=debug |
Tracing filter |
Generating a GlitchTip API Token
The token is auto-generated by make glitchtip-setup (called automatically by make glitchtip-up). The setup target:
- Creates the
ngwenya_glitchtipdatabase if absent - Runs Django migrations
- Creates the admin superuser (idempotent)
- Generates an API token via Django ORM
- Injects the token directly into
apps/tower/.env
# Everything is automatic โ just run:
make glitchtip-up
# Then restart tower to pick up the token:
make restart-tower
No manual browser steps needed. For full GlitchTip setup details including Docker infrastructure, frontend SDK integration, and Makefile targets, see the Error Tracking (GlitchTip) guide.
GraphQL Schema
Types
type ErrorProject {
id: String!
name: String!
slug: String!
platform: String
dateCreated: String
}
type ErrorIssue {
id: String!
title: String!
culprit: String
shortId: String
count: Int!
userCount: Int!
firstSeen: String
lastSeen: String
level: ErrorLevel!
status: ErrorIssueStatus!
metadata: ErrorIssueMetadata
issueType: String
}
type ErrorEvent {
id: String!
eventId: String
title: String
message: String
dateCreated: String
platform: String
tags: [ErrorTag!]!
contexts: JSON
entries: [ErrorEntry!]!
user: ErrorEventUser
}
enum ErrorLevel { DEBUG, INFO, WARNING, ERROR, FATAL }
enum ErrorIssueStatus { UNRESOLVED, RESOLVED, IGNORED }
type ErrorOverview {
totalIssues: Int!
unresolvedIssues: Int!
totalAffectedUsers: Int!
approximateCrashFreeRate: Float!
topIssues: [ErrorIssue!]!
}
Example Queries
# List all error issues for the frontend project
query ErrorIssues {
errorIssues(projectSlug: "ngwenya-front", limit: 10) {
id
title
count
lastSeen
level
status
metadata {
errorType
filename
}
}
}
# Get the latest event for a specific issue
query LatestEvent {
errorEventLatest(issueId: "42") {
eventId
title
dateCreated
tags { key value }
entries { entryType data }
}
}
# Resolve an error issue
mutation ResolveIssue {
updateErrorIssueStatus(issueId: "42", status: RESOLVED) {
id
status
}
}
Development
Build & Run
# From apps/tower/
cargo build # Compile
cargo run # Start service on :3027
# Or via Makefile (from project root)
make restart-tower # Rebuild + restart
make setup-tower # Create .env from template
Testing
cd apps/tower
cargo test # Run all 14 unit tests
cargo clippy # Lint
Dual-Layer Health Monitoring
Tower runs a background tokio::spawn task (start_health_monitor) that probes all 34 platform services every 30 seconds (configurable via HEALTH_CHECK_INTERVAL). Each subgraph is probed twice:
- Direct Probe โ HTTP POST to the service's own port (e.g.,
localhost:3001/graphql) to verify the process is alive - Federated Probe โ The same query routed through the Gateway at
:30000/graphqlto verify schema composition
This produces a diagnostic matrix with four states: UP/UP (fully operational), UP/DOWN (federation issue), DOWN/UP (stale cache), DOWN/DOWN (service down).
The health state is stored in Arc<RwLock<PlatformHealthOverview>> and exposed via the platformHealth query.
For the full architectural deep dive โ diagnostic matrix, smoke query mapping, configuration, and frontend dashboard โ see Tower Health Observability.
Dashboard Integration
The Tower admin dashboard (/tower) integrates data in three places:
Platform Health Widget โ
PlatformHealthWidget.svelteshows cluster uptime, grouped service grid with D/F status badges, and service detail modals with diagnostics and Developer Portal documentation links. Uses theplatformHealthquery.Error Overview Widget โ
ErrorHealthWidgetshows crash-free rate (color-coded), total/unresolved issues, affected users, and top 5 errors. Uses theerrorOverviewquery.Analytics โ Errors Sub-Tab โ Full
ErrorTrackingAnalyticscomponent with project selector, paginated issue list, status/level breakdowns, and issue detail drill-down with stack traces.
Gateway Uptime Widget
The Overview tab also includes an UptimeWidget that fetches from the gateway's GET /admin/health endpoint (which now returns startedAt and uptimeFormatted) and GET /admin/uptime/sessions for historical session tracking. Session history is persisted in Redis (gateway:sessions list key, capped at 100 entries).
Security
All GraphQL queries and mutations require the user header propagated by the Hive Gateway. This header contains the authenticated user's session context and is set by the auth service. Tower enforces authentication at the resolver level โ requests without a valid user header receive an "Authentication required" error.
Header injection architecture: Tower uses a custom Axum handler (graphql_handler) that explicitly extracts HeaderMap from the HTTP request and injects it into the async-graphql context via request.data(headers). This is required because async_graphql_axum::GraphQL::new() as a service does not automatically inject HTTP headers. The gateway's propagateHeaders configuration sets the user header based on the authenticated session, and the custom handler makes it accessible to resolvers via ctx.data::<HeaderMap>().
The frontend additionally gates the /tower route behind the is_privileged flag, which is resolved via the Platform Admin Provisioning system (dedicated platform_admins Postgres table, separate from org membership).
Known Issues & Resolutions
Auth Propagation Failure (Resolved 2026-05-07)
The Error Tracking widget showed "Authentication required" even for authenticated admin users. The root cause was that main.rs used async_graphql_axum::GraphQL::new(schema) as a service (via .post_service()), which does not inject the HTTP HeaderMap into the async-graphql context. The require_auth() function checks ctx.data::<HeaderMap>() for the user header, but it was always empty.
Fix: Replaced the service-based handler with a custom graphql_handler function that explicitly extracts HeaderMap via Axum's extractor and injects it into the async-graphql request with request.data(headers). This matches the pattern used by the auth subgraph. Additionally fixed GlitchTipProject.id deserialization from i64 to String to match the actual GlitchTip REST API response.
GlitchTip API Token Type Mismatch (Resolved 2026-05-07)
The GlitchTip UI truncates API tokens when displayed. The full 64-character token must be used. make glitchtip-setup now auto-generates and injects the token via Django ORM, eliminating this issue entirely.
Users Tab "Bad Request Exception" (Resolved 2026-05-05)
The Tower Users tab queries adminUsers which is resolved by the nodes subgraph (not the tower Rust subgraph). The nodes app configures ValidationPipe globally with forbidNonWhitelisted: true, which requires all @ArgsType() and @InputType() DTOs to include class-validator decorators (e.g. @IsOptional(), @IsString(), @IsEnum()).
Previously the AdminUserPagingArgs and AdminUserFilter DTOs only had @Field() decorators from @nestjs/graphql, causing the pipe to reject every property as "non-whitelisted". Additionally, filter was a separate @Args('filter') parameter on the resolver โ when mixed with @Args() spread, the forbidNonWhitelisted check treats filter as an unexpected property on the ArgsType.
Fix: Added class-validator decorators to both DTOs and merged filter into AdminUserPagingArgs as a @ValidateNested() property. See apps/nodes/src/actors/user/admin-users.types.ts.
Platform-wide implication: Any NestJS subgraph using
forbidNonWhitelisted: truewill silently reject all properties on DTOs that lack class-validator decorators. Audit all ArgsType/InputType DTOs when adding new queries.
Admin Provisioning
Platform admin access is managed via the platform_admins Postgres table in the auth database. Bootstrap admins are seeded via SQL migration (20260419000000_platform_admins.sql). To add new admins:
# Via Makefile target
make grant-tower-admin EMAIL=user@example.com
# Or via new migration (preferred for persistent access)
# See: apps/auth/migrations/20260505000000_add_info_admin.sql
Important: Never modify already-applied migrations โ the sqlx migration runner checksums each file and will reject modifications. Always create a new migration file.
Tower Subgraph Gateway Registration (Resolved 2026-05-07)
The tower Rust subgraph is now automatically started by scripts/dev/start_services.sh with health probes ensuring readiness before gateway initialization. make restart-tower provides a dedicated rebuild + restart target. make services uses the smart restart script to only restart modified services.
Related
- Tower Health Observability โ Dual-layer probing architecture, diagnostic matrix, smoke query mapping, and frontend dashboard details
- Error Tracking (GlitchTip) โ Self-hosted GlitchTip infrastructure: Docker setup, frontend SDK, DSN configuration, and Makefile targets
- Workspaces & The Tower โ Architecture of The Tower admin dashboard, The Deck, and workspace routing
- Intelligence โ Rust/Axum subgraph pattern reference (same tech stack: Axum + async-graphql + tracing)
- Gateway (Hive Gateway) โ Federation composition and service registry where tower is registered
- Gateway Tracing & Observability โ Backend observability: response caching, APQ, tracing, and Prometheus metrics
- Monitoring & Alerting Infrastructure โ Prometheus + Grafana observability stack
- Platform Admin Provisioning โ
is_privilegedaccess control system governing Tower access - Admin Dashboard & Audit Tools โ User-facing support guide for The Tower
- Platform Health Monitoring โ User-facing guide for the health dashboard