Zero-Shot Object Detector
YOLOE (Real-Time Seeing Anything) is a new advancement in zero-shot, promptable YOLO models, designed for open-vocabulary detection and segmentation. Unlike previous YOLO models limited to fixed categories, YOLOE uses text, image, or internal vocabulary prompts, enabling real-time detection of any object class.
This document describes the functionality, parameters, and usage of the zero_shot_object_detector_node
, a ROS 2 node designed to detect objects using a YOLO-based zero-shot detector.
Overview
The zero_shot_object_detector_node
performs real-time 2D object detection from RGB and depth images, using a YOLO-based model. It also supports:
- Output of detection poses and 3D visualizations
- Publishing detection images
- Dynamic class configuration via a ROS service
- Use of ZED transform for spatial alignment
Node Name
Subscribed Topics
CAMERA_TOPIC
(sensor_msgs/Image): RGB camera streamDEPTH_IMAGE_TOPIC
(sensor_msgs/Image): Depth imageCAMERA_INFO_TOPIC
(sensor_msgs/CameraInfo): Intrinsics
Published Topics
ZERO_SHOT_DETECTIONS_TOPIC
(frida_interfaces/ObjectDetectionArray): 2D object detectionsZERO_SHOT_DETECTIONS_POSES_TOPIC
(geometry_msgs/PoseArray): Detected object posesZERO_SHOT_DETECTIONS_3D_TOPIC
(visualization_msgs/MarkerArray): RViz-compatible markersZERO_SHOT_DETECTIONS_IMAGE_TOPIC
(sensor_msgs/Image): Debug image with bounding boxes
Services
SET_DETECTOR_CLASSES_SERVICE
- Request:
- Response:
Parameters
These are defined in the ARGS
dictionary and loaded using ROS 2 declare_parameter()
:
Parameter Name | Type | Default Value | Description |
---|---|---|---|
YOLO_MODEL_PATH |
string |
yoloe-11l-seg.pt |
YOLOv5/YOLOe model to load |
CLASSES |
string[] |
List of default object classes | Object categories to detect |
USE_ACTIVE_FLAG |
bool |
False |
Enables active detection gating |
DEPTH_ACTIVE |
bool |
True |
Use depth image for 3D info |
MIN_SCORE_THRESH |
float |
0.25 |
Minimum confidence score |
USE_ZED_TRANSFORM |
bool |
True |
Enables ZED-specific frame alignment |
FLIP_IMAGE |
bool |
False |
Flip input image if required |
VERBOSE |
bool |
False |
Enables debug logs |
Default Classes
ZERO_SHOT_DEFAULT_CLASSES = [
"bowl", "whiteBaseball", "apple", "cup", "plate",
"yellow_mustard_container", "blue_tuna_can", "soup_can",
"neon_ball", "squash", "banana", "soap", "rubikCube",
"coke_bottle", "yellow_bowl", "fanta_can", "orange"
]
Example Detection Output
Detected: 'apple' at (x=0.45, y=0.63), confidence=0.81
Detected: 'rubikCube' at (x=0.27, y=0.42), confidence=0.74
Message Example: SetDetectorClasses
Request
Response
Visualization
The 2D detections and 3D markers can be visualized in RViz:
Launching the Node
Run the node as a standalone executable:
Make sure your topic remappings and camera inputs are correct.
Extending the Detector
This node is built atop the object_detector_node
class and leverages YoloEObjectDetector
. You can:
- Replace the detection backend
- Update the class list dynamically
- Customize frame transforms and camera parameters