AI & VideoProduct LaunchJune 8, 2026

Agora Launches Real-Time Speech-to-Text Translation with Sub-Second Latency, AI Integration

Agora has launched a real-time speech-to-text translation solution that supports over 30 languages with ultra-low latency and AI integration. This beta service aims to break down language barriers for global communication in virtual events, education, and live shopping within the streaming industry. Features include advanced speech recognition, translation transcripts, LLM integration, and sub-second end-to-start latency.

Key Takeaways

The beta service provides real-time speech-to-text translation across more than 30 languages.
It features ultra-low latency, achieving sub-second end-to-start and under 3 seconds average end-to-end latency.
The solution includes advanced speech recognition, translation transcripts, and LLM integration for enhanced functionality.
Use cases span virtual events, education, live shopping, and telehealth, promoting global communication in streaming.
Developers can manage multilingual interactions by translating up to two source languages into five target languages for audio, with more languages supported for text.

Why It Matters

Agora's sub-second latency real-time translation offers a significant advancement for global live streaming and interactive video platforms. This capability directly impacts user engagement and content accessibility by enabling immediate multilingual communication. The integration of LLMs suggests future pathways for AI-driven content analysis and dynamic localization. Moving forward, observe adoption rates in specified verticals and how this technology influences user retention in international markets, particularly for time-sensitive, interactive content.

Additional Context

The real-time translation market is seeing rapid innovation, with various approaches to achieving low-latency, high-accuracy multilingual communication. According to a ForaSoft analysis from May 2026, the market broadly splits into cascaded pipelines (ASR + MT + TTS) and end-to-end speech-to-speech (S2S) models. While S2S models like OpenAI Realtime and Meta SeamlessM4T v2 can achieve lower latency (230-500 ms), cascaded systems typically offer more control, such as inserting glossaries and a higher degree of auditability. G2 rankings in November 2025 noted Agora’s position against competitors like IBM watsonx Orchestrate and Microsoft, highlighting the importance of pipeline customization and real-time inference in NLP platforms. For specific enterprise needs, platforms like DeepL Voice (launched April 2024), KUDO AI Speech Translator, and Interprefy Aivia are prevalent, with varying benchmarks for accuracy, latency, and cost. For example, DeepL Voice is recognized for its strong text-translation quality, often achieving a lower per-minute cost compared to event-focused platforms like KUDO or Interprefy. Meanwhile, for internal enterprise use, Microsoft Teams, Zoom, and Google Meet have expanded their native real-time captioning capabilities, with some offering limited voice translation. For instance, Zoom supports translated captions in over 40 languages and Google Meet has incrementally rolled out voice mode, often leveraging backend models like GPT-class systems, as noted by ForaSoft in May 2026.

Read full article at prod.agora.io

Get this in your inbox → Subscribe

Enjoy our coverage?

Add StreamingMeme as a preferred source on Google to see more of our streaming news at the top of your Search results.

Add as preferred source

Content+Technology: Runway launches Media Router to automate generative video model selection

X: vLLM v0.26.0 introduces tiered KV offloading and multimodal audio-video support

WeRSM (We are Social Media): Google morphs Flow Music Spaces into end-to-end AI production studio

Agora Launches Real-Time Speech-to-Text Translation with Sub-Second Latency, AI Integration

Key Takeaways

The beta service provides real-time speech-to-text translation across more than 30 languages.
It features ultra-low latency, achieving sub-second end-to-start and under 3 seconds average end-to-end latency.
The solution includes advanced speech recognition, translation transcripts, and LLM integration for enhanced functionality.
Use cases span virtual events, education, live shopping, and telehealth, promoting global communication in streaming.
Developers can manage multilingual interactions by translating up to two source languages into five target languages for audio, with more languages supported for text.

Why It Matters

Additional Context

Read full article at prod.agora.io

Agora Launches Real-Time Speech-to-Text Translation with Sub-Second Latency, AI Integration

Key Takeaways

Why It Matters

Additional Context

Enjoy our coverage?

Related Articles

Agora Launches Real-Time Speech-to-Text Translation with Sub-Second Latency, AI Integration

Key Takeaways

Why It Matters

Additional Context

Enjoy our coverage?

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources

Related Articles

Runway launches Media Router to automate generative video model selection

vLLM v0.26.0 introduces tiered KV offloading and multimodal audio-video support

Google morphs Flow Music Spaces into end-to-end AI production studio