hailo-segTo close the application, press Ctrl+C.
Instance segmentation refers to the process of identifying and segmenting individual objects within a video frame. It combines object detection (locating objects and drawing bounding boxes) with semantic segmentation (assigning pixel-level labels to regions of the image). The goal is to not only detect objects but also to generate a mask for each detected object, distinguishing it from other objects and the background.
- The app uses a neural network to detect objects in the video frame.
- Each detected object is represented as a
HAILO_DETECTIONobject, which includes metadata such as:- Label: The class of the object (e.g., "person").
- Bounding Box: The coordinates of the object's location in the frame.
- Confidence: The probability score of the detection.
- For each detected object, the app retrieves a mask (
HAILO_CONF_CLASS_MASKobject). - The mask is a pixel-level representation of the object's shape within the bounding box.
- Masks are resized and reshaped to match the frame's resolution and the object's bounding box dimensions.
- The masks are overlaid on the video frame using colors to visually distinguish different objects.
- The overlay is blended with the original frame to highlight the segmented objects.
- The app assigns a unique track ID to each detected object, allowing it to track the object across frames.
hailo-seg --input rpiThere are 2 ways:
Specify the argument --input to usb:
hailo-seg --input usbThis will automatically detect the available USB camera (if multiple are connected, it will use the first detected).
Second way:
Detect the available camera using this script:
get-usb-cameraRun example using USB camera input - Use the device found by the previous script:
hailo-seg --input /dev/video<X>For additional options, execute:
hailo-seg --helpFor examples:
python instance_segmentation.py --input usbThe basic idea is to utilize the pipeline's callback function. In simple terms, it can be thought of as a Python function that is invoked at the end of the pipeline when frame processing is complete.
This is the recommended location to implement your logic.
The callback function processes instance segmentation metadata from the network output. Each instance is represented as a HAILO_DETECTION with a mask (HAILO_CONF_CLASS_MASK object). The function parses, resizes, and reshapes the masks according to the frame coordinates, and overlays the masks on the frame if the --use-frame flag is set. The function also prints the detection details, including the track ID, label, and confidence, to the terminal.
Detailed Steps of the callback function:
-
Retrieve the Buffer:
- The function starts by extracting the GstBuffer from the probe info (
info.get_buffer()). - If the buffer is invalid (
None), it returns immediately without further processing.
- The function starts by extracting the GstBuffer from the probe info (
-
Frame Counting:
- Frame counting is handled automatically by the framework wrapper before the callback is invoked.
- The current frame count is retrieved using
user_data.get_count()for debugging purposes.
-
Frame Skipping:
- To reduce computational load, the function skips processing for frames based on the
frame_skipvalue inuser_data. - Only frames that satisfy the condition (
user_data.get_count() % user_data.frame_skip == 0) are processed.
- To reduce computational load, the function skips processing for frames based on the
-
Extract Video Frame Properties:
- The function retrieves the video format, width, and height from the pad using
get_caps_from_pad(pad).
- The function retrieves the video format, width, and height from the pad using
-
Reduce Resolution:
- The resolution of the video frame is reduced by a factor of 4 to optimize processing.
-
Extract Video Frame:
- If
user_data.use_frameisTrue, the function extracts the video frame as a NumPy array usingget_numpy_from_buffer(buffer, format, width, height). - The extracted frame is resized to the reduced resolution.
- If
-
Object Detection:
- The function retrieves the Region of Interest (ROI) from the buffer using
hailo.get_roi_from_buffer(buffer). - It extracts detections from the ROI using
roi.get_objects_typed(hailo.HAILO_DETECTION).
- The function retrieves the Region of Interest (ROI) from the buffer using
-
Parse Detections:
- For each detection:
- The label, bounding box, and confidence score are extracted.
- If the label is
"person", the function retrieves the track ID for the detected object. - Detection information (ID, label, confidence) is appended to the debug string.
- For each detection:
-
Instance Segmentation Mask:
- If
user_data.use_frameisTrueand masks are available:- The mask is reshaped to its original dimensions.
- The mask is resized to match the bounding box dimensions.
- The mask is overlaid on the reduced video frame using a color corresponding to the track ID.
- If
-
Overlay Mask on Frame:
- The mask overlay is blended with the reduced video frame using
cv2.addWeighted().
- The mask overlay is blended with the reduced video frame using
-
Print Debug Information:
- he detection information (frame count, detection details) is printed to the console.
-
Convert Frame to BGR:
- If user_data.use_frame is True, the reduced frame is converted from RGB to BGR format using OpenCV (cv2.cvtColor()).
- The processed frame is stored in user_data using user_data.set_frame().
-
Return Pad Probe Status:
- The function returns Gst.PadProbeReturn.OK to indicate successful processing.
