Developer Docs

The Voice Messages feature extends the core uChat โ€” E2EE Messenger to support native audio recording and playback. To preserve the zero-knowledge security guarantees of the platform, voice messages are treated as standard file attachments encrypted client-side using the Crypto โ€” Encryption Microservice patterns.

Data Model

In the uchat_messages table, voice messages are represented with a message_type of voice. Because the frontend playback widget needs to know the total duration of the track before the audio buffer is fully decrypted and loaded, a voice_duration_ms column is used.

type Message {
  id: ID!
  msgType: MessageType! # VOICE
  voiceDurationMs: Int
  fileUrl: String
  fileMime: String # audio/webm
  encryptedBody: String!
}

The audio is recorded using the MediaRecorder API (audio/webm;codecs=opus).

End-to-End Encryption Flow

The recording and encryption flow strictly adheres to the platform's E2EE standards:

  1. Recording: The uChat Client SDK & Interface captures audio chunks via the browser's native MediaRecorder.
  2. Encryption: Upon completion, the raw Blob is converted to an ArrayBuffer and encrypted symmetrically via AES-256-GCM.
  3. Upload: The resulting ciphertext is uploaded directly to the platform's CDN.
  4. Key Wrapping: The symmetric AES key used to encrypt the audio blob is itself encrypted using the recipient's Megolm session keys, producing the encryptedBody.
  5. Mutation: The message is dispatched to the backend with msgType: VOICE and the exact duration in milliseconds.

Playback Widget

When the frontend detects msgType === 'VOICE', it skips the generic file attachment card and mounts a dedicated audio playback widget.

  • The voiceDurationMs field allows the widget to render the timeline scale immediately.
  • Once the user clicks play, the audio blob is fetched from the CDN, decrypted in memory, and passed to a native HTML <audio> element via URL.createObjectURL(decryptedBlob).