Developer Docs

Tower Subgraph

The Tower subgraph is a Rust/Axum service with two core responsibilities: (1) proxying the self-hosted GlitchTip error tracking REST API into the federated GraphQL schema, and (2) running a dual-layer health monitor that continuously probes all 34 platform services for liveness and schema composition health. It powers The Tower admin dashboard โ€” the platform-wide observability hub restricted to is_privileged users via Platform Admin Provisioning.

Port: 3027 ยท Federation: async-graphql v7 ยท Database: None (stateless proxy) ยท Pattern: Follows the Intelligence service architecture


Architecture

Tower has no database โ€” it acts as a thin proxy layer between the federated gateway and the GlitchTip REST API:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     GraphQL      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     REST/JSON     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Tower Admin  โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ โ”‚    Tower      โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ โ”‚  GlitchTip   โ”‚
โ”‚  (Frontend)   โ”‚                  โ”‚  Subgraph     โ”‚                  โ”‚  :8000       โ”‚
โ”‚  :5173        โ”‚ โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚  :3027        โ”‚ โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚  /api/0/     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The frontend /tower route gates access behind the is_privileged flag, and the subgraph enforces authentication at the resolver level by inspecting the user header propagated by the Hive Gateway.

Stack

Component Technology
Runtime Rust + Tokio
HTTP Server Axum 0.8
GraphQL async-graphql 7.0.17 + Federation
HTTP Client reqwest 0.12
Logging tracing + tracing-subscriber

Cross-Service Dependencies

Tower is deliberately isolated โ€” it depends only on:

Dependency Direction Mechanism Purpose
GlitchTip :8000 Outbound HTTP Bearer token REST Error tracking data
Gateway :30000 Inbound GraphQL Federation composition Schema stitching
All 34 services Outbound HTTP Direct + federated probes Health monitoring

No TCP events, no database, no inter-subgraph references beyond health probing.


GlitchTip API Mapping

Tower maps GlitchTip's Sentry-compatible REST endpoints to GraphQL queries:

GraphQL Query GlitchTip Endpoint Description
errorProjects GET /api/0/projects/ List all projects in the organization
errorProject(slug) GET /api/0/projects/{org}/{slug}/ Single project by slug
errorIssues(projectSlug, ...) GET /api/0/projects/{org}/{project}/issues/ Paginated issues with status/level filters
errorIssue(issueId) GET /api/0/issues/{id}/ Issue detail with metadata
errorEvents(issueId, limit) GET /api/0/issues/{id}/events/ Individual occurrences with stack traces
errorEventLatest(issueId) GET /api/0/issues/{id}/events/latest/ Most recent event for an issue
errorOverview(projectSlug, topN) GET /api/0/projects/{org}/{project}/issues/ Pre-aggregated KPIs + top N issues (server-side)
updateErrorIssueStatus(issueId, status) PUT /api/0/issues/{id}/ Resolve, ignore, or reopen an issue

Configuration

Variable Required Default Description
TOWER_SERVICE_PORT No 3027 HTTP listen port
GLITCHTIP_API_URL No http://localhost:8000 GlitchTip base URL
GLITCHTIP_API_TOKEN Yes โ€” Bearer token for GlitchTip API
GLITCHTIP_ORG_SLUG No ngwenya GlitchTip organization slug
RUST_LOG No tower_subgraph=debug Tracing filter

Generating a GlitchTip API Token

The token is auto-generated by make glitchtip-setup (called automatically by make glitchtip-up). The setup target:

  1. Creates the ngwenya_glitchtip database if absent
  2. Runs Django migrations
  3. Creates the admin superuser (idempotent)
  4. Generates an API token via Django ORM
  5. Injects the token directly into apps/tower/.env
# Everything is automatic โ€” just run:
make glitchtip-up
# Then restart tower to pick up the token:
make restart-tower

No manual browser steps needed. For full GlitchTip setup details including Docker infrastructure, frontend SDK integration, and Makefile targets, see the Error Tracking (GlitchTip) guide.


GraphQL Schema

Types

type ErrorProject {
  id: String!
  name: String!
  slug: String!
  platform: String
  dateCreated: String
}

type ErrorIssue {
  id: String!
  title: String!
  culprit: String
  shortId: String
  count: Int!
  userCount: Int!
  firstSeen: String
  lastSeen: String
  level: ErrorLevel!
  status: ErrorIssueStatus!
  metadata: ErrorIssueMetadata
  issueType: String
}

type ErrorEvent {
  id: String!
  eventId: String
  title: String
  message: String
  dateCreated: String
  platform: String
  tags: [ErrorTag!]!
  contexts: JSON
  entries: [ErrorEntry!]!
  user: ErrorEventUser
}

enum ErrorLevel { DEBUG, INFO, WARNING, ERROR, FATAL }
enum ErrorIssueStatus { UNRESOLVED, RESOLVED, IGNORED }

type ErrorOverview {
  totalIssues: Int!
  unresolvedIssues: Int!
  totalAffectedUsers: Int!
  approximateCrashFreeRate: Float!
  topIssues: [ErrorIssue!]!
}

Example Queries

# List all error issues for the frontend project
query ErrorIssues {
  errorIssues(projectSlug: "ngwenya-front", limit: 10) {
    id
    title
    count
    lastSeen
    level
    status
    metadata {
      errorType
      filename
    }
  }
}

# Get the latest event for a specific issue
query LatestEvent {
  errorEventLatest(issueId: "42") {
    eventId
    title
    dateCreated
    tags { key value }
    entries { entryType data }
  }
}

# Resolve an error issue
mutation ResolveIssue {
  updateErrorIssueStatus(issueId: "42", status: RESOLVED) {
    id
    status
  }
}

Development

Build & Run

# From apps/tower/
cargo build          # Compile
cargo run            # Start service on :3027

# Or via Makefile (from project root)
make restart-tower   # Rebuild + restart
make setup-tower     # Create .env from template

Testing

cd apps/tower
cargo test           # Run all 14 unit tests
cargo clippy         # Lint

Dual-Layer Health Monitoring

Tower runs a background tokio::spawn task (start_health_monitor) that probes all 34 platform services every 30 seconds (configurable via HEALTH_CHECK_INTERVAL). Each subgraph is probed twice:

  1. Direct Probe โ€” HTTP POST to the service's own port (e.g., localhost:3001/graphql) to verify the process is alive
  2. Federated Probe โ€” The same query routed through the Gateway at :30000/graphql to verify schema composition

This produces a diagnostic matrix with four states: UP/UP (fully operational), UP/DOWN (federation issue), DOWN/UP (stale cache), DOWN/DOWN (service down).

The health state is stored in Arc<RwLock<PlatformHealthOverview>> and exposed via the platformHealth query.

For the full architectural deep dive โ€” diagnostic matrix, smoke query mapping, configuration, and frontend dashboard โ€” see Tower Health Observability.


Dashboard Integration

The Tower admin dashboard (/tower) integrates data in three places:

  1. Platform Health Widget โ€” PlatformHealthWidget.svelte shows cluster uptime, grouped service grid with D/F status badges, and service detail modals with diagnostics and Developer Portal documentation links. Uses the platformHealth query.

  2. Error Overview Widget โ€” ErrorHealthWidget shows crash-free rate (color-coded), total/unresolved issues, affected users, and top 5 errors. Uses the errorOverview query.

  3. Analytics โ†’ Errors Sub-Tab โ€” Full ErrorTrackingAnalytics component with project selector, paginated issue list, status/level breakdowns, and issue detail drill-down with stack traces.

Gateway Uptime Widget

The Overview tab also includes an UptimeWidget that fetches from the gateway's GET /admin/health endpoint (which now returns startedAt and uptimeFormatted) and GET /admin/uptime/sessions for historical session tracking. Session history is persisted in Redis (gateway:sessions list key, capped at 100 entries).


Security

All GraphQL queries and mutations require the user header propagated by the Hive Gateway. This header contains the authenticated user's session context and is set by the auth service. Tower enforces authentication at the resolver level โ€” requests without a valid user header receive an "Authentication required" error.

Header injection architecture: Tower uses a custom Axum handler (graphql_handler) that explicitly extracts HeaderMap from the HTTP request and injects it into the async-graphql context via request.data(headers). This is required because async_graphql_axum::GraphQL::new() as a service does not automatically inject HTTP headers. The gateway's propagateHeaders configuration sets the user header based on the authenticated session, and the custom handler makes it accessible to resolvers via ctx.data::<HeaderMap>().

The frontend additionally gates the /tower route behind the is_privileged flag, which is resolved via the Platform Admin Provisioning system (dedicated platform_admins Postgres table, separate from org membership).

Known Issues & Resolutions

Auth Propagation Failure (Resolved 2026-05-07)

The Error Tracking widget showed "Authentication required" even for authenticated admin users. The root cause was that main.rs used async_graphql_axum::GraphQL::new(schema) as a service (via .post_service()), which does not inject the HTTP HeaderMap into the async-graphql context. The require_auth() function checks ctx.data::<HeaderMap>() for the user header, but it was always empty.

Fix: Replaced the service-based handler with a custom graphql_handler function that explicitly extracts HeaderMap via Axum's extractor and injects it into the async-graphql request with request.data(headers). This matches the pattern used by the auth subgraph. Additionally fixed GlitchTipProject.id deserialization from i64 to String to match the actual GlitchTip REST API response.

GlitchTip API Token Type Mismatch (Resolved 2026-05-07)

The GlitchTip UI truncates API tokens when displayed. The full 64-character token must be used. make glitchtip-setup now auto-generates and injects the token via Django ORM, eliminating this issue entirely.

Users Tab "Bad Request Exception" (Resolved 2026-05-05)

The Tower Users tab queries adminUsers which is resolved by the nodes subgraph (not the tower Rust subgraph). The nodes app configures ValidationPipe globally with forbidNonWhitelisted: true, which requires all @ArgsType() and @InputType() DTOs to include class-validator decorators (e.g. @IsOptional(), @IsString(), @IsEnum()).

Previously the AdminUserPagingArgs and AdminUserFilter DTOs only had @Field() decorators from @nestjs/graphql, causing the pipe to reject every property as "non-whitelisted". Additionally, filter was a separate @Args('filter') parameter on the resolver โ€” when mixed with @Args() spread, the forbidNonWhitelisted check treats filter as an unexpected property on the ArgsType.

Fix: Added class-validator decorators to both DTOs and merged filter into AdminUserPagingArgs as a @ValidateNested() property. See apps/nodes/src/actors/user/admin-users.types.ts.

Platform-wide implication: Any NestJS subgraph using forbidNonWhitelisted: true will silently reject all properties on DTOs that lack class-validator decorators. Audit all ArgsType/InputType DTOs when adding new queries.

Admin Provisioning

Platform admin access is managed via the platform_admins Postgres table in the auth database. Bootstrap admins are seeded via SQL migration (20260419000000_platform_admins.sql). To add new admins:

# Via Makefile target
make grant-tower-admin EMAIL=user@example.com

# Or via new migration (preferred for persistent access)
# See: apps/auth/migrations/20260505000000_add_info_admin.sql

Important: Never modify already-applied migrations โ€” the sqlx migration runner checksums each file and will reject modifications. Always create a new migration file.

Tower Subgraph Gateway Registration (Resolved 2026-05-07)

The tower Rust subgraph is now automatically started by scripts/dev/start_services.sh with health probes ensuring readiness before gateway initialization. make restart-tower provides a dedicated rebuild + restart target. make services uses the smart restart script to only restart modified services.