AI & VideoTechnical Development

OpenAI details Realtime voice, translation, and transcription sessions

OpenAI published API documentation detailing its 'Realtime and audio' capabilities, including building low-latency voice agents, live translation, and transcription. The documentation outlines different session types (voice agent, translation, transcription) and connection methods such as WebRTC, WebSocket, and SIP, along with models like gpt-realtime-2, gpt-realtime-translate, and gpt-realtime-whisper. It also provides guidance on safety identifiers and migration from beta to GA interfaces.

Key Takeaways

gpt-realtime-2 is the model OpenAI lists for low-latency voice-agent sessions on `/v1/realtime`.
gpt-realtime-translate is tied to continuous translation sessions on `/v1/realtime/translations`.
gpt-realtime-whisper powers realtime transcription with controllable latency and transcript deltas.
OpenAI says browser and mobile clients should use WebRTC, while server media pipelines can use WebSocket.
GA migration includes removing `OpenAI-Beta: realtime=v1` and using `POST /v1/realtime/client_secrets` for ephemeral credentials.

Why It Matters

OpenAI has turned realtime audio into a documented production path rather than a loose beta pattern, with separate flows for voice agents, translation, and transcription. That matters for streaming and media teams because the API now spells out which model, endpoint, and transport fit each audio job, including WebRTC for browsers, WebSocket for server pipelines, and SIP for telephony. The clearest signal to watch is whether teams adopt the GA interface changes — especially `/v1/realtime/client_secrets`, `/v1/realtime/calls`, and the newer event names such as `response.output_text.delta` and `response.output_audio.delta`.

Read full article at platform.openai.com

Get this in your inbox → Subscribe

Enjoy our coverage?

Add StreamingMeme as a preferred source on Google to see more of our streaming news at the top of your Search results.

Add as preferred source

Hyper.ai: Google DeepMind and UC Riverside launch framework to trace synthetic video

Smallest.ai: Smallest.ai integrates low-latency Pulse and Lightning models into LiveKit Agents

Beliefnet: Elon Musk promises AI-generated Odyssey movie to challenge Christopher Nolan

OpenAI details Realtime voice, translation, and transcription sessions

Key Takeaways

gpt-realtime-2 is the model OpenAI lists for low-latency voice-agent sessions on `/v1/realtime`.
gpt-realtime-translate is tied to continuous translation sessions on `/v1/realtime/translations`.
gpt-realtime-whisper powers realtime transcription with controllable latency and transcript deltas.
OpenAI says browser and mobile clients should use WebRTC, while server media pipelines can use WebSocket.
GA migration includes removing `OpenAI-Beta: realtime=v1` and using `POST /v1/realtime/client_secrets` for ephemeral credentials.

Why It Matters

Read full article at platform.openai.com

OpenAI details Realtime voice, translation, and transcription sessions

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

OpenAI details Realtime voice, translation, and transcription sessions

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources