Hailo Model Zoo GenAI

The Hailo Model Zoo GenAI is a curated collection of pre-trained models and example applications optimized for Hailo's AI processors, designed to accelerate GenAI application development. It includes Hailo-Ollama, an Ollama-compatible API written in C++ on top of HailoRT, enabling seamless integration with various external tools and frameworks.

Ollama simplifies running large language models locally by managing model downloads, deployments, and interactions through a convenient REST API.

Models are specifically optimized for Hailo hardware, providing efficient, high-performance inference tailored for GenAI tasks.

Models

For a detailed list of supported models, including download links and relevant information, visit the models page.

Installation

Prerequisites

Hailo-10H module.
Ensure HailoRT is installed.
Supported OS: Linux, Windows.

Two installation methods are available

Pre-built Debian package (Ubuntu Recommended):

Download the latest Debian package from the Developer Zone.
Install it:
sudo dpkg -i hailo_gen_ai_model_zoo_<ver>_<arch>.deb

Build from source (Alternative):

Linux: Clone the repository, build and install the Hailo-Ollama server:
git clone https://github.com/hailo-ai/hailo_model_zoo_genai.git
cd hailo-model-zoo-genai
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release
cmake --install build
Note: On Linux, sudo may be required for the install step.
Windows: Clone the repository and build using CMake (Ensure OpenSSL and HailoRT are available):
git clone https://github.com/hailo-ai/hailo_model_zoo_genai.git
cd hailo-model-zoo-genai
cmake -B build -DCMAKE_PREFIX_PATH="C:/Path/To/HailoRT/cmake"
cmake --build build --config Release
cmake --install build --config Release

Basic Usage

Start the Hailo-Ollama server:
```
hailo-ollama
```

List available models:

curl --silent http://localhost:8000/hailo/v1/list

Pull a specific model. For example:

curl --silent http://localhost:8000/api/pull \
     -H 'Content-Type: application/json' \
     -d '{ "model": "qwen2:1.5b", "stream" : true }'

Windows (CMD): Use double quotes with escaping:

curl --silent http://localhost:8000/api/pull ^
     -H "Content-Type: application/json" ^
     -d "{ \"model\": \"qwen2:1.5b\", \"stream\" : true }"

Pull all available models:

curl --silent http://localhost:8000/hailo/v1/list \
| jq -r '.models[]' \
| while read model; do
    echo "Pulling $model..."
    curl --no-buffer --silent http://localhost:8000/api/pull \
        -H 'Content-Type: application/json' \
        -d "{\"model\": \"$model\", \"stream\": true}"
  done

Chat with the model:

curl --silent http://localhost:8000/api/chat \
     -H 'Content-Type: application/json' \
     -d '{"model": "qwen2:1.5b", "messages": [{"role": "user", "content": "Tell me a joke"}]}'

Windows (CMD): Use double quotes with escaping:

curl --silent http://localhost:8000/api/chat ^
     -H "Content-Type: application/json" ^
     -d "{\"model\": \"qwen2:1.5b\", \"messages\": [{\"role\": \"user\", \"content\": \"Tell me a joke\"}]}"

Running Alongside Other Hailo Applications

The Hailo-Ollama server can share the Hailo device with other HailoRT-based applications (e.g., Whisper, vision models) by setting the HAILO_OLLAMA_VDEVICE_GROUP_ID environment variable:

HAILO_OLLAMA_VDEVICE_GROUP_ID=SHARED hailo-ollama

Other applications should use the same group ID to share the device. See USAGE for details.

Optional Open WebUI

Open WebUI is an optional user-friendly web interface that provides a modern web-based interface for chat with Hailo LLMs. It offers a convenient alternative to command-line interactions, allowing users to interact with AI models through an intuitive browser-based interface.

Benefits of using Open WebUI include:

Easy-to-use interface: No need to use curl commands or write custom scripts - simply access the web interface through your browser
Visual model management: Browse and select available LLM or VLM models from a user-friendly interface
Conversation history: Keep track of your chat sessions and conversation history
Multi-model support: Easily switch between different models without changing commands
Accessibility: Access your Hailo-Ollama server from any device with a web browser on the same network

Open WebUI can be installed and run as a separate Docker container that works alongside the Hailo Model Zoo GenAI installation. It connects to the Hailo-Ollama server, providing an intuitive web-based interface for interacting with GenAI models running on Hailo devices. Pull the Hailo-Ollama server LLM or VLM models from the available models and start chatting with an AI through the web interface.

Example for running the Hailo-Ollama server with WebUI:

Start the Hailo-Ollama server:
```
hailo-ollama
```

In a separate terminal, install and run Open WebUI using Docker:

docker run -d --net=host -e OLLAMA_BASE_URL=http://127.0.0.1:8000 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Access the WebUI at http://localhost:8080.

Note

The --net=host option is required to allow the Open WebUI Docker container to access the Hailo-Ollama server running on the host machine at localhost:8000. Without this option, the container would have its own network namespace and wouldn't be able to reach services on the host via localhost.

The Open WebUI Docker command assumes the Hailo-Ollama server is running on port 8000 (the default). Adjust the OLLAMA_BASE_URL environment variable if your Hailo-Ollama server is running on a different port or host.

For detailed usage instructions and advanced examples, see the USAGE page.

Changelog

See the CHANGELOG page for detailed release notes.

License

The Hailo Model Zoo GenAI is distributed under the MIT license. Refer to the LICENSE file for details.

Support

For support, please post your question on the Hailo community Forum or contact us directly via hailo.ai.

About Hailo

Hailo provides innovative AI Inference Accelerators and AI Vision Processors specifically engineered for efficient, high-performance embedded deep learning applications on edge devices.

Hailo's AI Inference Accelerators enable edge devices to execute deep learning applications at full scale, leveraging architectures optimized for neural network operations. The Hailo AI Vision Processors (SoC) integrate powerful AI inferencing with advanced computer vision, delivering superior image quality and sophisticated video analytics.

For more information, visit hailo.ai.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
cmake/external		cmake/external
docs		docs
models/manifests		models/manifests
src		src
thirdparty		thirdparty
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
LICENSE-3RD-PARTY.md		LICENSE-3RD-PARTY.md
README.rst		README.rst
typos.toml		typos.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hailo Model Zoo GenAI

Models

Installation

Prerequisites

Two installation methods are available

Basic Usage

Running Alongside Other Hailo Applications

Optional Open WebUI

Changelog

License

Support

About Hailo

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hailo Model Zoo GenAI

Models

Installation

Prerequisites

Two installation methods are available

Basic Usage

Running Alongside Other Hailo Applications

Optional Open WebUI

Changelog

License

Support

About Hailo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages