Kwai-Keye ships 30B video model with 256K context and agents

Kwai-Keye has released Keye-VL-2.0-30B-A3B, a new 30B-parameter multimodal large language model designed for long-video understanding and agent capabilities. The model features sparse attention architecture for efficient processing of hour-long video contexts and performs competitively against top open-source and closed-source models in various video understanding benchmarks. It also includes built-in agent abilities for tasks such as code generation, tool use, and web-grounded search.

Key Takeaways

Keye-VL-2.0-30B-A3B is a 30B-class base model with built-in Code, Tool, and Search agent abilities.
The model uses DSA sparse attention and targets 256K ultra-long context for hour-long video inputs.
On TimeLens, it posted 58.4 mIoU on Charades-TimeLens, 58.5 on ActivityNet-TimeLens, and 70.1 on QVHighlights-TimeLens.
On VideoMME V2, accuracy rose from 35.3% at 64 frames to 42.4% at 512 frames, with non-linear reasoning score improving from 18.5 to 24.2.
On LongVideoBench, Keye-VL-2.0-30B-A3B scored 74.1 and the release says it outperformed Qwen3.5-35B-A3B and Qwen3-VL-235B-A22B.

Why It Matters

Keye-VL-2.0-30B-A3B matters because it packages long-video understanding and agent functions in a single 30B model, with the company claiming nearly lossless reasoning over 256K context. That puts a new open model into direct comparison with larger open-source systems and several closed-source baselines across video, coding, and agent benchmarks. The more concrete signal to watch is whether the model’s VideoMME V2 result holds as frame count increases, since the release says accuracy improved from 35.3% at 64 frames to 42.4% at 512 frames.

Read full article at huggingface.co

Get this in your inbox → Subscribe

Enjoy our coverage?

Add StreamingMeme as a preferred source on Google to see more of our streaming news at the top of your Search results.

Add as preferred source

NVIDIA: NVIDIA’s Nemotron 3 Nano Omni targets multimodal agent reasoning

Broadcast: EVS Embeds AI for Deblurring, Player Tracking, and Vertical Reframing

StockTitan: Quickplay bets on AI to shrink broadcast’s workflow tax

Yahoo Finance: iQiyi bets on AI-native TV—and TikTokifies its core app

Kwai-Keye ships 30B video model with 256K context and agents

Key Takeaways

Keye-VL-2.0-30B-A3B is a 30B-class base model with built-in Code, Tool, and Search agent abilities.
The model uses DSA sparse attention and targets 256K ultra-long context for hour-long video inputs.
On TimeLens, it posted 58.4 mIoU on Charades-TimeLens, 58.5 on ActivityNet-TimeLens, and 70.1 on QVHighlights-TimeLens.
On VideoMME V2, accuracy rose from 35.3% at 64 frames to 42.4% at 512 frames, with non-linear reasoning score improving from 18.5 to 24.2.
On LongVideoBench, Keye-VL-2.0-30B-A3B scored 74.1 and the release says it outperformed Qwen3.5-35B-A3B and Qwen3-VL-235B-A22B.

Why It Matters

Read full article at huggingface.co

Kwai-Keye ships 30B video model with 256K context and agents

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Kwai-Keye ships 30B video model with 256K context and agents

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources

Related Articles

NVIDIA’s Nemotron 3 Nano Omni targets multimodal agent reasoning

EVS Embeds AI for Deblurring, Player Tracking, and Vertical Reframing

Quickplay bets on AI to shrink broadcast’s workflow tax

iQiyi bets on AI-native TV—and TikTokifies its core app