Funvision opencv C++ tutorials: Meta SAM 3: Next-Gen Object Segmentation [2025 Guide]

While Meta AI has seen its share of experimental hits and misses, the newly unveiled Meta Segment Anything Model 3 (SAM 3) is undeniably superb. This powerhouse model represents a massive leap forward, capable of detecting, segmenting, and tracking objects across both static images and dynamic videos with unprecedented precision.

SAM 3 running detection example

Alongside the model, Meta has released a suite of innovative tools, including the Segment Anything Playground—a platform designed to democratize AI experimentation for creators and developers alike.

The Era of Promptable Concept Segmentation (PCS)

SAM 3 isn't just another segmentation tool; it is a promptable foundation model. Whether you provide a text description, a bounding box, a single point, or a mask, SAM 3 understands exactly what you want to isolate and track.

In the example below, the model identifies the subject through a simple text prompt, accurately generating both a bounding box and a precise segment mask in real-time.

Beyond "Person" and "Car": Understanding Nuance

The true breakthrough in SAM 3 is Promptable Concept Segmentation (PCS). While its predecessor (SAM 2) revolutionized general segmentation, SAM 3 masters "open-vocabulary" concepts. This means it can find all instances of a specific, highly detailed request.

Old Models: Identify a generic "person" or "umbrella."
SAM 3: Can pinpoint "the striped red umbrella" or "the player wearing a white jersey."

Furthermore, SAM 3 is fully interactive. If the initial result isn't perfect, you can refine the selection with additional prompts to resolve any ambiguity.

Breakthroughs in Performance and Data

SAM 3’s intelligence is fueled by a massive new data engine. This engine has automatically annotated over 4 million unique concepts, creating the most comprehensive open-vocabulary segmentation dataset in existence.

Setting New Industry Standards

To measure its prowess, Meta released the Segment Anything with Concepts (SA-Co) benchmark. This benchmark is 50 times larger than previous ones, featuring 207,000 unique concepts across 120,000 images and 1,700 videos. The results speak for themselves:

Unmatched Accuracy: SAM 3 achieved a twofold increase in the cgF1 score (concept recognition/localization) over existing systems.
Near-Human Level: On the SA-Co benchmark, SAM 3 reached 75-80% of estimated human performance.
Beating the Giants: In zero-shot mask AP tests on LVIS, SAM 3 scored 48.8, dwarfing the previous best of 38.5 and outperforming general models like Gemini 2.5.
User Preference: In blind studies, users preferred SAM 3 outputs over strong models like OWLv2 by a 3-to-1 ratio.
Speed: SAM 3 delivers lightning-fast inference, processing a single image with 100+ objects in just 30 milliseconds on an H200 GPU.

Architecture & Technical Innovations

With roughly 850 million parameters, SAM 3 utilizes a vision encoder shared by both a detector and a tracker. Two key innovations make this possible:

The Presence Token: A specialized token that helps the model distinguish between similar prompts (e.g., “player in white” vs. “player in red”). It separates concept recognition from localization.
Decoupled Detector–Tracker: This design prevents tasks from interfering with one another. The detector uses a DETR-based architecture, while the tracker is adapted from SAM 2 to support real-time video segmentation.

Research Access: Meta is committed to open science. You can find the code and model checkpoints in the Meta/facebookresearch/sam3 GitHub repository.

Note: While powerful, SAM 3 can still struggle with ultra-fine-grained "out-of-domain" concepts, such as identifying specific cells in medical imagery. However, it adapts remarkably fast when fine-tuned on small datasets.

Real-World Applications in the Meta Ecosystem

SAM 3 is already being integrated into the apps millions of people use every day:

Facebook Marketplace: Powers the "View in Room" feature, allowing you to see exactly how a lamp or table looks in your home using AR.
Instagram Edits: Enabling creators to apply complex filters or motion trails to specific objects in a video with a single tap.
Vibes: A new creation experience within Meta AI apps and meta.ai.
Wildlife Conservation: In partnership with Conservation X Labs, SAM 3 is being used to monitor wildlife via a new public video dataset.

Get Started: Explore the Segment Anything Playground

The Segment Anything Playground allows anyone to test these models. Whether you want to pixelate license plates, add a "spotlight" effect to a video, or track objects in motion, the playground makes it easy.

Developer Quick Start: SAM 3 Python Example

For developers looking to integrate SAM 3, here is a basic implementation for image segmentation:

import torch
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from PIL import Image
from sam3.model_builder import build_sam3_image_model
from sam3.model.sam3_image_processor import Sam3Processor
from sam3.visualization_utils import plot_results

# 1. Initialize the Model
model = build_sam3_image_model()
processor = Sam3Processor(model)

# 2. Load and Prepare Image
image_path = "C:/pose/2.jpg"
image = Image.open(image_path)
inference_state = processor.set_image(image)

# 3. Prompt with Text (e.g., "people" or "shoe")
output = processor.set_text_prompt(state=inference_state, prompt="people")
plot_results(image, inference_state)

# 4. Extract Data
masks, boxes, scores = output["masks"], output["boxes"], output["scores"]

# 5. Define Overlay Function for Visualization
def overlay_masks(image, masks):
    image = image.convert("RGBA")
    masks = 255 * masks.cpu().numpy().astype(np.uint8)
    n_masks = masks.shape[0]
    cmap = matplotlib.colormaps.get_cmap("rainbow").resampled(n_masks)
    colors = [tuple(int(c * 255) for c in cmap(i)[:3]) for i in range(n_masks)]

    for mask, color in zip(masks, colors):
        mask = Image.fromarray(mask[0])
        overlay = Image.new("RGBA", image.size, color + (0,))
        alpha = mask.point(lambda v: int(v * 0.5))
        overlay.putalpha(alpha)
        image = Image.alpha_composite(image, overlay)
    return image

# 6. Display Result
final_image = overlay_masks(image, masks)
plt.imshow(final_image)
plt.axis('off')
plt.show()

The AI community is invited to join Meta in building the future of computer vision. Whether you are a researcher using the SA-Co benchmark or a creator using the Playground, SAM 3 provides the tools to turn vision into reality.