AI & VideoTechnical Development

Meta updates SAM 2.1 with faster video object segmentation

Meta (facebookresearch) has released SAM 2.1, an updated version of its Segment Anything Model, which now supports full model compilation for video object segmentation (VOS) for a significant speedup, along with independent per-object inference for improved multi-object tracking. The open-source repository provides code, trained model checkpoints, and example notebooks for both image and video prediction, building upon the previously released SAM 2.

Key Takeaways

The repo now supports `torch.compile` on the entire SAM 2 model for videos via `vos_optimized=True` in `build_sam2_video_predictor`.
Meta says the full model compilation provides a major speedup in inference FPS for video object segmentation.
`SAM2VideoPredictor` now handles each object independently while sharing backbone features.
The new prompting behavior removes the earlier assumption that non-prompted objects are absent in a frame.
Users can add new objects after tracking starts, which the repo says was previously restricted.

Why It Matters

The immediate effect is faster video object segmentation inference and a less restrictive multi-object workflow in SAM 2.1. That matters for teams building video annotation or tracking tools on top of Meta’s open-source stack, because the update changes both runtime performance and how prompts are interpreted across objects. The repo also raises the floor to PyTorch 2.5.1 for full support, so deployment environments need to match that version. Watch for adoption of the `--use_vos_optimized_video_predictor` flag in `tools/vos_inference.py` and whether users stick with the new default predictor or fall back to `sam2_video_predictor_legacy.py`.

Read full article at github.com

Get this in your inbox → Subscribe

Enjoy our coverage?

Add StreamingMeme as a preferred source on Google to see more of our streaming news at the top of your Search results.

Add as preferred source

Qiang Zhang: DeltaToken cuts video tokens from 180K to under 1,000

ayushchat: Whisper runs locally on Apple Silicon with no network access

Zenvanriel: Netflix open sources VOID, an AI model using physics for video removal

Medium: NVIDIA runs Cosmos 3 physical AI world model on desktop GPUs

Meta updates SAM 2.1 with faster video object segmentation

Key Takeaways

The repo now supports `torch.compile` on the entire SAM 2 model for videos via `vos_optimized=True` in `build_sam2_video_predictor`.
Meta says the full model compilation provides a major speedup in inference FPS for video object segmentation.
`SAM2VideoPredictor` now handles each object independently while sharing backbone features.
The new prompting behavior removes the earlier assumption that non-prompted objects are absent in a frame.
Users can add new objects after tracking starts, which the repo says was previously restricted.

Why It Matters

Read full article at github.com

Meta updates SAM 2.1 with faster video object segmentation

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Meta updates SAM 2.1 with faster video object segmentation

Key Takeaways

Why It Matters

Enjoy our coverage?

Related Articles

Newest

Upcoming Events

Top Sources

Newest

Upcoming Events

Top Sources

Related Articles

DeltaToken cuts video tokens from 180K to under 1,000

Whisper runs locally on Apple Silicon with no network access

Netflix open sources VOID, an AI model using physics for video removal

NVIDIA runs Cosmos 3 physical AI world model on desktop GPUs