You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+
## Project Overview
6
+
7
+
SOWLv2 is a command-line tool and Python library for text-prompted object segmentation that combines Google's OWLv2 (open-vocabulary object detector) with Meta's SAM 2 (Segment Anything Model V2) for precise pixel-level segmentation. The tool processes images, video frames, or videos based on natural language prompts.
Copy file name to clipboardExpand all lines: README.md
+58-9Lines changed: 58 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ TL;DR: SOWLv2: Text-prompted object segmentation using OWLv2 and SAM 2 -->
23
23
<br>
24
24
</p>
25
25
26
-
SOWLv2 (**S**egmented**OWLv2**) is a powerful command-line tool for **text-prompted object segmentation**. It seamlessly integrates Google’s [OWLv2](https://huggingface.co/docs/transformers/en/model_doc/owlv2) open-vocabulary object detector with Meta’s [SAM 2](https://github.com/facebookresearch/sam2) (Segment Anything Model V2) to precisely segment objects in images, image sequences (frames), or videos based on natural language descriptions.
26
+
SOWLv2 (**S**egmented**OWLv2**) is a powerful command-line tool for **text-prompted object segmentation**. It seamlessly integrates Google's [OWLv2](https://huggingface.co/docs/transformers/en/model_doc/owlv2) open-vocabulary object detector with Meta's [SAM 2](https://github.com/facebookresearch/sam2) (Segment Anything Model V2) to precisely segment objects in images, image sequences (frames), or videos based on natural language descriptions.
27
27
28
28
Given one or more text prompts (e.g., `"a red bicycle"`, or `"cat" "dog"`) and an input source, SOWLv2 will:
29
29
1. Utilize **OWLv2** to detect bounding boxes for objects matching the text prompt(s), based on the principles from the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683).
@@ -88,8 +88,7 @@ Note: If a single prompt contains spaces, it should be enclosed in quotes (e.g.,
| `--no-merged` | (Optional) Disables merged mode. Merged mode (where all masks are combined into a single output [image/video] ) is enabled by default. | Enabled |
101
+
| `--no-binary` | (Optional) Disables binary mask generation. Binary mask output is enabled by default. | Enabled |
102
+
| `--no-overlay` | (Optional) Disables overlay image generation. Overlay image output (original image with masks) is enabled by default. | Enabled |
101
103
| `--config` | (Optional) Path to a YAML configuration file to specify arguments (see [Configuration](#configuration)). Prompts can also be a list in YAML. | `None` |
102
104
103
105
### Examples:
@@ -126,13 +128,60 @@ Note: If a single prompt contains spaces, it should be enclosed in quotes (e.g.,
126
128
127
129
### Output Structure:
128
130
129
-
The tool saves results in the specified output directory. For each detected object instance (corresponding to any of the given prompts), SOWLv2 generates:
130
-
* A **binary mask** image (e.g., `imagename_object0_mask.png`): Grayscale PNG where foreground pixels are white (255) and background pixels are black (0). The filename includes a sequential object ID.
131
-
* An **overlay image** (e.g., `imagename_object0_overlay.png`): The original image with the segmentation mask overlaid (typically colored with transparency).
131
+
The tool saves results in the specified output directory with the following structure:
132
132
133
-
Objects are numbered sequentially (e.g., `object0`, `object1`) in the order they are detected by OWLv2, regardless of which text prompt they matched. For video inputs, output filenames will also include frame identifiers, and separate videos for each object's masks and overlays will be generated (e.g., `obj0_mask_video.mp4`, `obj0_overlay_video.mp4`).
Objects are numbered sequentially (`obj1`, `obj2`, etc.) in the order they are detected by OWLv2, regardless of which text prompt they matched. Frame numbers use 6-digit zero-padding (`000001`, `000002`, etc.).
134
183
135
-
SOWLv2 automatically assigns a unique color to each detected OWLv2 label, making it easy to visually distinguish different object classesin the output overlays and merged results.
184
+
SOWLv2 automatically assigns a unique color to each detected object class, making it easy to visually distinguish different object types in the output overlays and merged results.
0 commit comments