An interactive computer vision application using Hailo's Vision Language Model (VLM) for real-time image analysis and question answering.
- Real-time video processing with Hailo AI acceleration
- Interactive Q&A mode - press Enter (in terminal) to ask questions about the current frame
- Single window display - continuous video feed that freezes during Q&A mode
- Custom prompt support - ask any question about the captured image
- Non-blocking interface - video continues while processing questions
- Hailo AI processor and SDK
- Python >=3.10
- OpenCV
- NumPy
- Hailo Platform libraries
Before running this example, ensure GenAI dependencies are installed:
# From the repository root directory
pip install -e ".[gen-ai]"Use a clean virtual environment before installing the dependencies:
py -m venv wind_venv
.\wind_venv\Scripts\Activate.ps1
pip install .\hailort-<version>-cp<python>-cp<python>-win_amd64.whl
# From the repository root directory
pip install -e ".[gen-ai]"vlm_chat.py- Main application with interactive video processingbackend.py- Hailo VLM backend with multiprocessing support
-
Run the application:
python -m hailo_apps.python.gen_ai_apps.vlm_chat.vlm_chat --input usb
Or for Raspberry Pi camera:
python -m hailo_apps.python.gen_ai_apps.vlm_chat.vlm_chat --input rpi
Note: This application requires a live camera input.
-
The application will show a video window:
- Video: Continuous live camera feed (or captured frame when in Q&A mode)
-
Interactive mode:
- Press
Enter(in terminal) to capture current frame and enter Q&A mode - Type your question about the captured image (or press Enter for default prompt)
- Press
Enterto submit question and get VLM response - Press
Enteragain to continue normal processing - Press
q(in video window) to quit
- Press
You can modify the following constants in vlm_chat.py to customize the application behavior:
MAX_TOKENS(Default: 200) - Maximum number of tokens to generate in the response.TEMPERATURE(Default: 0.1) - Sampling temperature for the model (lower means more deterministic).SEED(Default: 42) - Random seed for reproducibility.SYSTEM_PROMPT- The system prompt used to guide the VLM's behavior.INFERENCE_TIMEOUT(Default: 60) - Timeout in seconds for VLM inference.SAVE_FRAMES(Default: False) - Set toTrueto save captured frames to disk.
The application uses a multiprocessing architecture to handle:
- Real-time video capture and display
- Hailo VLM inference in a separate process
- Non-blocking user input handling
- State management for interactive mode
The VLM can answer questions about objects, scenes, activities, and any visual content in the captured frames.