AI & VideoTechnical Development

TikTok’s MLT-Dedup cuts repetition 91% with 5x larger index

TikTok researchers have developed MLT-Dedup, a new framework for efficient large-scale online video deduplication. This system uses multi-level video representations (ML-VE) for scaled candidate retrieval and a differential feature-enhanced similarity module (DiF-SiM) for precise spatial-temporal matching. Online A/B tests demonstrate that MLT-Dedup reduces repetition rates by 91% at 90% precision, and its sparse retrieval design increases index size by five times.

Key Takeaways

MLT-Dedup uses ML-VE to generate both clip-level embeddings for retrieval and frame-level embeddings for matching.
Online A/B tests reported a 91% reduction in repetition rate at 90% precision for the full ML-VE + DiF-SiM stack.
The sparse retrieval design increased retrieval index size by 5x, allowing broader candidate coverage under fixed resources.
DiF-SiM adds differential features and learned similarity to localize duplicated temporal segments before making deduplication decisions.
On the VCSL benchmark, DiF-SiM reached a 74.31 F-score, ahead of RTR + pre-training at 70.73.

Why It Matters

The immediate effect is practical: MLT-Dedup lowers duplicate-video repetition while storing more content in the retrieval index, which matters when dedup systems operate under tight memory budgets. The broader point is architectural: TikTok is not relying on denser embeddings alone, but splitting retrieval and verification across clip-level and frame-level representations, then using temporal overlap thresholds to avoid false matches on partial copies. For streaming platforms, that’s a concrete template for large-scale content filtering. What to watch next is whether the same ML-VE and DiF-SiM split holds up as index TTL grows and candidate pools get larger in production.

Read full article at openreview.net

Get this in your inbox → Subscribe

Enjoy our coverage?

Add StreamingMeme as a preferred source on Google to see more of our streaming news at the top of your Search results.

Add as preferred source

Qiang Zhang: DeltaToken cuts video tokens from 180K to under 1,000

Tech Times: Microsoft Mirage cuts AI video memory use 55x via latent caching

The Decoder: Microsoft Mirage cuts video generation memory usage by 55x

Arxiv: Framework cuts video bandwidth requirements by 99% using generative AI

TikTok’s MLT-Dedup cuts repetition 91% with 5x larger index

Key Takeaways

MLT-Dedup uses ML-VE to generate both clip-level embeddings for retrieval and frame-level embeddings for matching.
Online A/B tests reported a 91% reduction in repetition rate at 90% precision for the full ML-VE + DiF-SiM stack.
The sparse retrieval design increased retrieval index size by 5x, allowing broader candidate coverage under fixed resources.
DiF-SiM adds differential features and learned similarity to localize duplicated temporal segments before making deduplication decisions.
On the VCSL benchmark, DiF-SiM reached a 74.31 F-score, ahead of RTR + pre-training at 70.73.

Why It Matters

Read full article at openreview.net

TikTok’s MLT-Dedup cuts repetition 91% with 5x larger index

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

TikTok’s MLT-Dedup cuts repetition 91% with 5x larger index

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources

Related Articles

DeltaToken cuts video tokens from 180K to under 1,000

Microsoft Mirage cuts AI video memory use 55x via latent caching

Microsoft Mirage cuts video generation memory usage by 55x

Framework cuts video bandwidth requirements by 99% using generative AI