Developer Docs

Alerts Resilience & Delivery Tracking โ€” Developer Guide

Overview

The Alerts service is the platform's notification engine โ€” it dispatches emails, SMS, and push notifications triggered by events from other subgraphs (Murchases, Services, Auth). The Resilience layer ensures no notification is lost during provider outages by introducing a Dead Letter Queue (DLQ) backed by MongoDB and a persistent AlertLog that tracks every delivery attempt.

Component Purpose Backend
AlertLog Entity Persistent record of every notification attempt MongoDB (alert_logs collection)
DlqService Dead Letter Queue with exponential back-off retry MongoDB + @nestjs/schedule cron
AlertDeliveryService Single orchestration layer for all outbound sends NestJS injectable
AlertLogResolver Admin-only GraphQL queries for delivery inspection Yoga Federation subgraph

Architecture

Every notification in the platform โ€” whether it's a Murchase confirmation email, an SMS to a Malet Owner, or a push notification about a Workroom action โ€” flows through the AlertDeliveryService. This guarantees that every send is logged and retryable.

graph TD
    A["OrderAlertsController"] --> D["AlertDeliveryService"]
    B["GmailAlertsController"] --> D
    C["SmsAlertsController"] --> D
    E["WorkroomAlertsController"] --> D

    D --> F{"Channel Router"}
    F -->|EMAIL| G["GmailAlertsService (Resend)"]
    F -->|SMS| H["SmsAlertsService (Twilio)"]
    F -->|PUSH| I["PushAlertsService"]
    F -->|IN_APP| J["AlertLog (IN_APP)"]

    D --> K["DlqService"]
    K --> L[("MongoDB: alert_logs")]

    M["Cron: every 5 min"] --> D
    D -->|"retry FAILED logs"| K

    style L fill:#3b82f6,color:#fff
    style D fill:#22c55e,color:#fff
    style M fill:#f59e0b,color:#fff

Two-Tier Retry Strategy

The system uses two independent retry layers to handle both transient blips and prolonged outages:

Layer Scope Strategy Max Attempts Base Delay
RetryService (in-request) Transient network errors Exponential back-off with jitter 3 1 second
DlqService (cross-request) Full provider outages Cron-based exponential back-off 5 1 minute

If the in-request RetryService exhausts its 3 attempts, the AlertDeliveryService catches the failure and marks the AlertLog as FAILED. The DLQ cron then picks it up on the next tick (every 5 minutes) and re-dispatches it.


AlertLog Entity

Every delivery attempt creates a single AlertLog document in the alert_logs collection.

Schema

@modelOptions({
	schemaOptions: { timestamps: true, collection: 'alert_logs' }
})
@index({ status: 1, nextRetryAt: 1 }) // DLQ retry lookup
@index({ channel: 1 })
@index({ userId: 1 })
@index({ createdAt: 1 }, { expireAfterSeconds: 90 * 24 * 60 * 60 }) // 90-day TTL
export class AlertLog {
	id: string; // nanoid
	channel: AlertChannel; // EMAIL | SMS | PUSH | IN_APP
	status: AlertStatus; // PENDING | DELIVERED | FAILED | DEAD
	recipient: string; // email address, phone number, or push token
	subject?: string; // email subject or push title
	eventType: string; // e.g. 'order_status_changed', 'notify_email'
	payload?: object; // frozen copy of the original event payload
	providerRef?: string; // Resend message ID, Twilio SID, etc.
	error?: string; // last error message
	attempts: number; // delivery attempt count
	userId?: string; // target user ID
	nextRetryAt?: Date; // when the DLQ should next retry
	createdAt: Date;
	updatedAt: Date;
}

Enums

enum AlertChannel {
	EMAIL = 'EMAIL',
	SMS = 'SMS',
	PUSH = 'PUSH',
	IN_APP = 'IN_APP' // Live โ€” powers NotificationCenter.svelte via polling
}

enum AlertStatus {
	PENDING = 'PENDING', // Log created, send in progress
	DELIVERED = 'DELIVERED', // Provider confirmed delivery
	FAILED = 'FAILED', // Send failed, queued for retry
	DEAD = 'DEAD' // Max retries exhausted
}

Data Retention

The alert_logs collection uses a MongoDB TTL index set to 90 days. Documents are automatically pruned by MongoDB's background thread โ€” no application-level cleanup needed.

Why 90 days? Industry standard for operational notification logs. Long enough for debugging delivery issues and compliance audits, short enough to respect GDPR storage limitation principles.


Delivery Flow

Happy Path (Email Example)

// 1. Controller receives event
@EventPattern('notify_email')
async notifyEmail(data: NotifyEmailDto) {
  // 2. Check user preferences
  const shouldSend = await this.nodesClient.shouldSendEmail(data.userId);
  if (!shouldSend) return;

  // 3. Route through delivery orchestrator
  await this.deliveryService.sendEmail({
    email: data.email,
    subject: 'Your Murchase Confirmation',
    text: data.text,
    eventType: 'notify_email',
    userId: data.userId,
    payload: { ...data },
  });
}

Inside AlertDeliveryService

sendEmail(opts)
  โ”œโ”€โ”€ 1. dlqService.enqueue(PENDING)     โ†’ Creates AlertLog
  โ”œโ”€โ”€ 2. gmailService.notifyEmail()      โ†’ Calls Resend API (with RetryService)
  โ”œโ”€โ”€ 3a. Success โ†’ dlqService.markDelivered(logId, providerRef)
  โ””โ”€โ”€ 3b. Failure โ†’ dlqService.markFailed(logId, error)
                     โ””โ”€โ”€ Sets nextRetryAt with exponential back-off
                     โ””โ”€โ”€ If attempts >= 5 โ†’ status = DEAD

DLQ Retry Processor

Every 5 minutes, the cron job picks up FAILED logs whose nextRetryAt has passed:

@Cron('0 */5 * * * *')
async processRetryBatch(): Promise<void> {
  const retryable = await this.dlqService.getRetryable(25);

  for (const log of retryable) {
    switch (log.channel) {
      case AlertChannel.EMAIL:
        await this.attemptEmail(log, { /* reconstructed from payload */ });
        break;
      case AlertChannel.SMS:
        await this.attemptSms(log, { ... });
        break;
      case AlertChannel.PUSH:
        await this.attemptPush(log, { ... });
        break;
    }
  }
}

Back-Off Schedule

Attempt Delay Status if fails
1 1 minute FAILED
2 2 minutes FAILED
3 4 minutes FAILED
4 8 minutes FAILED
5 โ€” DEAD (no more retries)

GraphQL Queries (Admin)

The AlertLogResolver exposes read-only queries for admin/debugging. All queries are protected by GqlAuthGuard.

alertLogs

Paginated list with optional channel and status filters:

query {
	alertLogs(filter: { channel: EMAIL, status: FAILED, first: 20, after: "cursor-id" }) {
		id
		channel
		status
		recipient
		subject
		eventType
		attempts
		error
		providerRef
		createdAt
		nextRetryAt
	}
}

alertLog

Single log lookup by ID:

query {
	alertLog(id: "abc123") {
		id
		channel
		status
		recipient
		payload
		attempts
		error
	}
}

dlqSummary

Aggregate counts by status โ€” useful for dashboards:

query {
	dlqSummary {
		total
		statusCounts {
			status
			count
		}
	}
}

Example response:

{
	"data": {
		"dlqSummary": {
			"total": 1247,
			"statusCounts": [
				{ "status": "DELIVERED", "count": 1200 },
				{ "status": "FAILED", "count": 40 },
				{ "status": "DEAD", "count": 5 },
				{ "status": "PENDING", "count": 2 }
			]
		}
	}
}

Channel Integration

Email (Resend)

The GmailAlertsService sends transactional emails via the Resend API. It supports vertical-specific templates (Restaurant, Tour, Photography) that adapt the layout and content to the Malet's business type.

SMS (Twilio)

The SmsAlertsService sends SMS via Twilio. Used for time-sensitive notifications like Murchase confirmations and Workroom action reminders.

Push (FCM)

The PushAlertsService sends push notifications to mobile/web clients. Push tokens are fetched from the nodes subgraph via the NodesClientService.

In-App (Live)

The IN_APP channel creates AlertLog entries with channel: IN_APP and publishes them through the in-process PubSub for instant WebSocket delivery via the federated gateway (port 30000). The frontend notificationStore.svelte.ts receives these via graphql-ws subscription in real time, with automatic fallback to 30-second polling if the WS connection is unavailable. See Notification Connection Modes for the full transport architecture. Used by Organization invitations (org_invite_created), Murchase confirmations (order_created), Community assignment events, and uChat messages (new_message). See Invite & Notification Pipeline for the full cross-subgraph flow.


Environment Variables

Variable Default Description
DLQ_MAX_ATTEMPTS 5 Max delivery attempts before moving to DEAD
ALERT_LOG_TTL_DAYS 90 Days before MongoDB auto-prunes alert logs
ALERTS_SERVICE_PORT_TCP โ€” Required. TCP port for microservice communication
NODES_SERVICE_HOST localhost Nodes service host for preference lookups
NODES_SERVICE_PORT_TCP 3011 Nodes service TCP port

Module Structure

apps/alerts/src/alert-log/
โ”œโ”€โ”€ alert-log.entity.ts                  # AlertLog + enums
โ”œโ”€โ”€ alert-log.module.ts                  # NestJS module wiring
โ”œโ”€โ”€ alert-log.resolver.ts                # GraphQL admin queries (federation)
โ”œโ”€โ”€ alert-log.resolver.spec.ts           # Resolver unit tests
โ”œโ”€โ”€ alert-delivery.service.ts            # Orchestrator + PubSub publish
โ”œโ”€โ”€ alert-delivery.service.spec.ts
โ”œโ”€โ”€ dlq.service.ts                       # Dead Letter Queue
โ”œโ”€โ”€ dlq.service.spec.ts
โ”œโ”€โ”€ notification.pubsub.ts               # Shared PubSub instance
โ”œโ”€โ”€ notification.pubsub.spec.ts          # PubSub + filter unit tests
โ”œโ”€โ”€ notification-subscription.resolver.ts # WS subscription (Yoga federated)
โ”œโ”€โ”€ subscription.module.ts               # Subscription module
โ”œโ”€โ”€ dto/
โ”‚   โ”œโ”€โ”€ alert-log-filter.input.ts        # Query filter input
โ”‚   โ””โ”€โ”€ dlq-summary.type.ts             # Summary response type
โ””โ”€โ”€ index.ts                             # Barrel exports

Testing

Unit Tests

# Run all alerts unit tests (125 tests across 18 suites)
npm run test -- apps/alerts --no-coverage

Key test coverage:

  • DlqService: enqueue, markDelivered, markFailed (including DEAD transition), getRetryable, getSummary, cursor pagination
  • AlertDeliveryService: sendEmail/sendSms/sendPush success + failure paths, DLQ retry batch for all channels
  • AlertLogResolver: alertLogs pagination + filter cap, alertLog by ID, dlqSummary aggregation

E2E Tests

# Run alerts E2E tests (17 tests across 5 suites)
npx jest --config apps/alerts/test/jest-e2e.json --detectOpenHandles

Covers: email/SMS/push delivery tracking, GraphQL query responses, and auth guard enforcement.


Cross-Service Integration

How Other Subgraphs Trigger Notifications

Subgraphs emit events via TCP microservice transport. The alerts service listens with @EventPattern:

// From murchases service โ€” order creation
this.alertsClient.emit('order_status_changed', {
	orderId: order.id,
	buyerId: order.buyerId,
	newStatus: OrderStatus.SHIPPED,
	previousStatus: OrderStatus.PROCESSING,
	verticalSlug: order.verticalSlug
});

The alerts controller receives the event, checks user preferences via NodesClientService, and routes through AlertDeliveryService โ€” creating an AlertLog for every notification attempt.

Dependency Map

Alerts depends on For
nodes (TCP) User preferences, email, push tokens, quiet hours
Resend API Email delivery
Twilio API SMS delivery
MongoDB AlertLog persistence
Other services depend on Alerts Via
murchases order_status_changed, order_created events
services booking_confirmed events
malets notify_email (contact form)
organizations org_invite_created, org_invite_sms events