NVIDIA launches 600M-parameter streaming ASR for English transcription

NVIDIA has released Nemotron-ASR-Streaming, a new English streaming Automatic Speech Recognition (ASR) model with 600M parameters. Developed by NVIDIA, this model uses a Cache-Aware FastConformer-RNNT architecture to provide high-quality transcription with native punctuation and capitalization support, designed for low-latency streaming and high-throughput batch workloads.

Key Takeaways

March 13, 2026 release: Nemotron-ASR-Streaming is available on Hugging Face, Build.nvidia.com, and NGC.
The model has 600M parameters and uses a Cache-Aware FastConformer-RNNT architecture with a 24-layer encoder.
NVIDIA lists four chunk sizes for inference: 80ms, 160ms, 560ms, and 1120ms.
Training data includes about 250,000 hours of US English speech from NVIDIA Riva ASR training set and Granary.
On Hugging Face OpenASR leaderboard tests, the model reports 6.93% average WER at 1.12s chunk size.

Why It Matters

Nemotron-ASR-Streaming gives developers a single English ASR model for both live voice workloads and batch transcription, with built-in punctuation and capitalization. The cache-aware design is aimed at reducing redundant overlap in streaming, while NVIDIA says it can improve throughput and lower GPU memory pressure versus buffered approaches. The competitive signal is the model’s reported 6.93% average WER at 1.12s chunk size, plus support for four operating points without retraining. Next to watch: how the model performs in real deployments through the hosted NVIDIA NIM API and NeMo stack.

Read full article at huggingface.co

Get this in your inbox → Subscribe

Enjoy our coverage?

Add StreamingMeme as a preferred source on Google to see more of our streaming news at the top of your Search results.

Add as preferred source

NVIDIA: NVIDIA’s Nemotron 3 Nano Omni targets multimodal agent reasoning

StockTitan: Quickplay bets on AI to shrink broadcast’s workflow tax

Broadcast: EVS Embeds AI for Deblurring, Player Tracking, and Vertical Reframing

Yahoo Finance: iQiyi bets on AI-native TV—and TikTokifies its core app

NVIDIA launches 600M-parameter streaming ASR for English transcription

Key Takeaways

March 13, 2026 release: Nemotron-ASR-Streaming is available on Hugging Face, Build.nvidia.com, and NGC.
The model has 600M parameters and uses a Cache-Aware FastConformer-RNNT architecture with a 24-layer encoder.
NVIDIA lists four chunk sizes for inference: 80ms, 160ms, 560ms, and 1120ms.
Training data includes about 250,000 hours of US English speech from NVIDIA Riva ASR training set and Granary.
On Hugging Face OpenASR leaderboard tests, the model reports 6.93% average WER at 1.12s chunk size.

Why It Matters

Read full article at huggingface.co

NVIDIA launches 600M-parameter streaming ASR for English transcription

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

NVIDIA launches 600M-parameter streaming ASR for English transcription

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources

Related Articles

NVIDIA’s Nemotron 3 Nano Omni targets multimodal agent reasoning

Quickplay bets on AI to shrink broadcast’s workflow tax

EVS Embeds AI for Deblurring, Player Tracking, and Vertical Reframing

iQiyi bets on AI-native TV—and TikTokifies its core app