Add YOLO11-OBB (Oriented Bounding Box) support with GPU-accelerated parsing#684
Add YOLO11-OBB (Oriented Bounding Box) support with GPU-accelerated parsing#684Monishkumarvr wants to merge 6 commits into
Conversation
Create comprehensive guidance document for Claude Code and other AI assistants working in this repository. Includes: - Build commands and CUDA version configuration - Architecture overview of custom TensorRT/DeepStream integration - Model processing pipeline and configuration flow - Network type mapping for different YOLO variants - Step-by-step model integration workflow - Important implementation details (batching, GPU post-processing, etc.) - Common issues and troubleshooting
This commit adds comprehensive support for YOLOv11-OBB models to DeepStream-Yolo: **Parser Implementation:** - Add NvDsInferParseYoloOBB() CPU parser for OBB detection format - Add NvDsInferParseYoloOBBCuda() GPU-accelerated parser for better performance - OBB parser converts oriented boxes (center, width, height, angle) to axis-aligned bounding boxes (AABB) that fully enclose the rotated objects - Supports multi-class OBB models with configurable number of classes **Model Export:** - Add export_yolo11_obb.py script to convert YOLOv11-OBB models to ONNX format - Export script handles the OBB detection head from Ultralytics - Outputs format: [x_center, y_center, width, height, class_probs..., angle] **Configuration:** - Add config_infer_primary_yolo11_obb.txt example configuration - Configured for DOTAv1 dataset (15 classes) by default - Supports both CPU and GPU parsing modes **Documentation:** - Add comprehensive YOLO11-OBB.md guide covering: - Model conversion workflow - Compilation instructions - Configuration setup - OBB output format explanation - Common OBB datasets (DOTAv1, DOTAv1.5, DOTAv2) **Technical Details:** - OBB angle range: 0 to π/2 radians - AABB conversion formula: half_width = (w*|cos(θ)| + h*|sin(θ)|)/2 - Compatible with DeepStream 5.1-8.0 - Maintains backward compatibility with existing YOLO parsers Tested formats: YOLOv11-OBB models trained on rotated object detection datasets
- Add YOLO11-OBB to table of contents - Add YOLO11-OBB to supported models list - Add YOLO11-OBB to improvements section
|
@marcoslucianops Thanks for this lovely repo. This helped me port Ultralytics models to the DeepStream pipeline on Jetson devices. While working on an OBB model porting project, I wrote this parser and thought it could be useful for many devs like me. This is my first open source contribution too. Looking forward to your feedback. |
|
@Monishkumarvr Thanks for the PR, this is really interesting work. I tested it in a DeepStream 7 based integration and it seems the OBB model/export/parser path works in the sense that the engine builds and detections are produced. However, on the application side I still only receive regular axis-aligned bounding boxes ( From what I can tell, the OBB information seems to be reduced to a standard bounding rectangle before it reaches the downstream application / OSD layer. Is that the intended behavior of this PR? Maybe because you are limited to only be allowed to return an instance of NvDsInferParseObjectInfo from that plugin? |
Address reviewer feedback about OBB angle information not being accessible in standard DeepStream metadata. The parser correctly returns AABB (axis- aligned bounding boxes) as required by the NvDsInferParseObjectInfo API constraint. Changes: - docs/YOLO11-OBB.md: Add "OBB Geometry and DeepStream Metadata" section * Explain why AABB is returned (DeepStream API constraint) * Document output-tensor-meta=1 workaround for accessing full OBB geometry * Provide C++ pad probe example for reading raw tensor with angle data * Link to DeepStream Python Apps for Python examples - config_infer_primary_yolo11_obb.txt: Add commented output-tensor-meta=1 with usage instructions The angle information is NOT lost - it's accessible via NvDsInferTensorMeta when output-tensor-meta=1 is enabled. This gives users two paths: 1. Standard: OBB → AABB (seamless DeepStream integration) 2. Advanced: OBB → AABB + raw tensor (custom angle-aware post-processing) Refs: GitHub PR reviewer comment about missing OBB angle downstream
|
@neilyoung Thank you for the detailed testing and excellent diagnosis — you're exactly right! Why AABB is returnedThe DeepStream inference API constrains the custom bbox parser to return struct NvDsInferParseObjectInfo {
float left, top, width, height; // AABB only
float detectionConfidence;
unsigned int classId;
// ← No angle or corner points possible
};There's no mechanism in this interface to attach additional geometry like angle or corner points. The parser must return AABB because that's what DeepStream's NMS, OSD, and tracker expect. The AABB is computed using the tightest-fit formula to fully enclose the rotated box: Accessing the full OBB angle downstreamThe angle information is not lost — it's available via DeepStream's raw tensor metadata:
I've updated the documentation with a full explanation and example probe structure:
This gives users two paths:
Does this address your concern? |
@Monishkumarvr Yes, that is exactly how I'm doing it now, including drawing the overlays, now no longer a rects, but as quadrilateral, constructed from the rotated rect. It still looks weird a bit, since the perspective is lost (think about a 45 degree drone shot), but at least the direction of the box is correct. I'm currently looking for an easy way to calculate the "skew" caused by perspective, but I guess this will need some CV stuff on top. I'm doing it in GO using GO GST, which works very reliable. What I noticed is, that the OBB models have extreme difficulties in closeups, detections nearly 0, sometimes driving swimming pools also have been spotted :). I thing the primary use case of these models is counting, space determination, observation in mostly straight orthogonal drone shots (at least this is what they are showing in their videos). Thanks for the extra comment. |
|
Thank you for testing and sharing your Go implementation experience! It's great to hear the tensor metadata approach is working well for you, and your quadrilateral overlay solution sounds perfect for directional visualization. Regarding your perspective/closeup observations — you're absolutely right. OBB models excel at aerial/orthogonal views (counting, parking lot analysis) but struggle with extreme perspectives and closeups. My use case: Foundry moulding inspection I'm using this for moulding box detection in metal casting operations. The critical challenge is maintaining stable tracking during the molten metal pouring cycle, when visual conditions degrade significantly (steam, smoke, lighting changes, vibration). Freezing technique implementation:
The combination of OBB geometry + tensor metadata + software position locking has been essential for production-grade reliability in this harsh visual environment. |
Overview
This PR adds comprehensive support for YOLO11-OBB (Oriented Bounding Box) models to DeepStream-Yolo, enabling detection of rotated objects with GPU-accelerated inference.
What's New
🔧 Parser Implementation
NvDsInferParseYoloOBB()innvdsparsebbox_Yolo.cppNvDsInferParseYoloOBBCuda()innvdsparsebbox_Yolo_cuda.cuhalf_aabb_w = (w*|cos(θ)| + h*|sin(θ)|)/2to ensure full enclosure of rotated objects📦 Model Export Script
utils/export_yolo11_obb.py: Converts YOLOv11-OBB.ptmodels to ONNX format[x_center, y_center, width, height, class_probs..., angle]⚙️ Configuration
config_infer_primary_yolo11_obb.txt: Example configuration file📚 Documentation
docs/YOLO11-OBB.md: Comprehensive guide covering:README.md: Added YOLO11-OBB to all relevant sectionsTechnical Details
OBB Format Support
num-detected-classesPerformance
NvDsInferParseYoloOBBCudaCompatibility
Use Cases
Perfect for applications requiring rotated object detection:
Testing Checklist
Files Changed
Example Usage
Sample Output Format
References
✅ Ready for review and testing on DeepStream-enabled systems.