VoiceBlender adds SIP, WebRTC, and WhatsApp call control in Go
VoiceBlender is an open-source Go service that bridges SIP and WebRTC voice calls, offering multi-party mixing, recording, TTS/STT, and integration with AI agents. It includes experimental support for Media-over-QUIC (MoQ) legs via WebTransport/HTTP/3 and provides a REST API for comprehensive call control. The platform also features specific capabilities for WhatsApp Business Calling, including inbound/outbound calls over SIP-TLS and ICE/DTLS-SRTP.
Key Takeaways
- The core service is written in Go and exposes call control through `/v1/legs`, `/v1/rooms`, and `/v1/vsi` endpoints.
- WhatsApp Business Calling is built in, using SIP-TLS on port 5061 with ICE/DTLS-SRTP and a CA-signed certificate.
- The platform supports multi-party rooms with mixed-minus-self audio, room bridging, and role-based audio routing for barge-in, whisper, and supervisor monitor use cases.
- TTS and STT integrations include ElevenLabs, Google Cloud TTS, AWS Polly, and Deepgram, while AI agents can be attached through ElevenLabs, VAPI, Pipecat, or Deepgram.
- MoQ support is experimental and disabled by default, using WebTransport/HTTP/3 with Opus and `MOQ_ENABLED=true` plus TLS cert files.
Why It Matters
VoiceBlender packages SIP, WebRTC, recording, transcription, and agent control behind a single API surface, which makes it easier to build voice workflows without stitching together separate call-handling services. The repo also extends that same control plane to WhatsApp Business Calling and experimental MoQ legs, putting traditional telephony, browser voice, and QUIC-based media into one stack. The next concrete signal to watch is whether the MoQ path moves beyond PoC settings like `MOQ_ENABLED=true` and draft-11 transport support, or whether usage stays centered on SIP-TLS, WebRTC, and room-based mixing.
Read full article at github.com