Meta updates SAM 2.1 with faster video object segmentation
Meta (facebookresearch) has released SAM 2.1, an updated version of its Segment Anything Model, which now supports full model compilation for video object segmentation (VOS) for a significant speedup, along with independent per-object inference for improved multi-object tracking. The open-source repository provides code, trained model checkpoints, and example notebooks for both image and video prediction, building upon the previously released SAM 2.
Key Takeaways
- The repo now supports `torch.compile` on the entire SAM 2 model for videos via `vos_optimized=True` in `build_sam2_video_predictor`.
- Meta says the full model compilation provides a major speedup in inference FPS for video object segmentation.
- `SAM2VideoPredictor` now handles each object independently while sharing backbone features.
- The new prompting behavior removes the earlier assumption that non-prompted objects are absent in a frame.
- Users can add new objects after tracking starts, which the repo says was previously restricted.
Why It Matters
The immediate effect is faster video object segmentation inference and a less restrictive multi-object workflow in SAM 2.1. That matters for teams building video annotation or tracking tools on top of Meta’s open-source stack, because the update changes both runtime performance and how prompts are interpreted across objects. The repo also raises the floor to PyTorch 2.5.1 for full support, so deployment environments need to match that version. Watch for adoption of the `--use_vos_optimized_video_predictor` flag in `tools/vos_inference.py` and whether users stick with the new default predictor or fall back to `sam2_video_predictor_legacy.py`.
Read full article at github.com