Agora launches real-time speech-to-text with three-speaker diarization and LLM integration
Agora has unveiled a cloud-based real-time speech-to-text and captioning solution that supports multiple languages and up to three simultaneous speakers. This service integrates with large language models (LLMs) and adheres to enterprise-grade security standards like ISO and SOC 2, and is available for developers as part of their broader real-time engagement platform.
Key Takeaways
- Supports concurrent diarization for up to three simultaneous speakers with individual transcription tracks.
- Integrated LLM support enables .vtt exports to GPT and other models for automated meeting notes without impacting real-time performance.
- Dual-language transcription capabilities allow a single channel to process two languages at once.
- The infrastructure adheres to ISO and SOC 1/2 standards with localized compliance for HIPAA, GDPR, and CCPA.
- Seamlessly integrates with Agora's SD-RTN™ global network to maintain transcription accuracy in poor network conditions.
Why It Matters
This move marks Agora's shift from a pure connectivity provider to an intelligent orchestrator of live data. By enabling low-latency transcription that feeds directly into LLMs, Agora is lowering the barrier for developers to build agent-driven video applications in telehealth and education. Competitively, this positions Agora to capture market share from teams migrating away from legacy stacks, such as Twilio Video, by offering a more mature AI extension ecosystem. Watch for whether high infrastructure costs associated with these AI features continue to compress Agora's gross margins, which recently dipped to 63.4%.
Additional Context
The launch arrives as Agora maintains its sixth consecutive quarter of GAAP profitability, reporting Q1 2026 revenues of $37.7 million, per company financial filings from May 2026. This financial stability is a key differentiator in the real-time communication (RTC) space, especially as competitors face structural shifts. For instance, Twilio Programmable Video is set to reach its end-of-life on December 5, 2026, forcing a global migration window where performance and AI integration have become the primary criteria for vendor selection. Analysts at iotum and Fora Soft noted in late 2025 and early 2026 that Agora and LiveKit have emerged as top alternatives for teams seeking more than basic video transport. Industry-wide, the demand for embedded AI features is accelerating. Market reports from early 2026 indicate that the real-time speech-to-text market is projected to grow at a CAGR of 6.7% through 2034, with contact centers and virtual meetings as primary drivers. In May 2026, Agora also launched 'Agent Studio,' a no-code platform for deploying AI voice agents, further signaling its intent to dominate the 'Conversational AI' category. Per quarterly earnings commentary, CEO Tony Zhao confirmed that while these newer AI products currently represent sub-scale revenue contributions, they are the cornerstone of the company's long-term growth strategy to provide 'reliable, high-performance solutions' as the market shifts from pilots to full-scale production.
Read full article at prod.agora.io
