GitHub - Mr-Loevan/HSA-DPO: [AAAI 2025] Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

[AAAI 2025] Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Wenyi Xiao^1* , Ziwei Huang^1* , Leilei Gan^1† , Wanggui He²
Haoyuan Li² , Zhelun Yu² , Fangxun Shu² , Hao Jiang² , Linchao Zhu¹
¹ Zhejiang University ² Alibaba Group
^*Equal contribution ^†Corresponding author

Overview

This repository contains the official implementation of the paper "Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback". For supplementary experiments and details, please refer to the Appendix.

Installation

git clone https://github.com/Mr-Loevan/HSA-DPO.git
cd HSA-DPO

# Install HSA-DPO and dependencies
conda create -n hsa_dpo python==3.9
conda activate hsa_dpo
pip install -e .

# (Optional) Install flash-attention for faster training
pip install -e ".[flash-attn]"

Dataset

Download Dataset

pip install -U huggingface_hub

# Download all dataset files
huggingface-cli download --repo-type dataset WenyiXiao/HSA-DPO --local-dir ./datasets

Dataset Organization

For hallucination detection:

Training data: hsa_dpo_detection.jsonl
Images: From Visual Genome

For hallucination mitigation (HSA-DPO training):

Preference data: hsa_dpo_preference_llava1dot5.jsonl
Images: hsa_dpo_imgs.tar.gz

Prepare Data for Training

# 1. Create data directories
mkdir -p hsa_dpo/data
mkdir -p hsa_dpo/data/image

# 2. Copy preference dataset
cp datasets/hsa_dpo_preference_llava1dot5.jsonl hsa_dpo/data/

# 3. Extract images
tar -xzf datasets/hsa_dpo_imgs.tar.gz -C hsa_dpo/data/image/

# 4. Verify data structure
ls hsa_dpo/data/
# Should show: hsa_dpo_preference_llava1dot5.jsonl

ls hsa_dpo/data/image/ | head -5
# Should show: 0.jpg, 1.jpg, 2.jpg, 3.jpg, 4.jpg ...

Note: The images are named with sequential IDs (0.jpg, 1.jpg, ...) corresponding to the id field in the JSONL file.

Training

Prerequisites

Install the HSA-DPO package:

pip install -e .

Prepare dataset following the instructions above (see Dataset section)
Download the base LLaVA-v1.5 model:

# Download LLaVA-v1.5-13B model
huggingface-cli download liuhaotian/llava-v1.5-13b --local-dir ./models/llava-v1.5-13b

# The CLIP vision encoder will be auto-downloaded during training

Running Training

We provide a training script for HSA-DPO with LLaVA-v1.5:

# Configure paths in hsa_dpo_train.sh
vim hsa_dpo_train.sh

# Update these paths according to your setup:
# DATA_PATH="./hsa_dpo/data/hsa_dpo_preference_llava1dot5.jsonl"
# IMAGE_FOLDER="./hsa_dpo/data/image"
# MODEL_PATH="path/to/llava-v1.5-13b"
# OUTPUT_DIR="./output/hsa_dpo_llava"

# Run training
bash hsa_dpo_train.sh

Key Parameters

use_chosen_score: Whether to use chosen scores in DPO loss (default: False)
use_rejected_score: Whether to use rejected scores in DPO loss (default: True)
beta: Temperature parameter for DPO loss (default: 0.1)
num_train_epochs: Number of training epochs (default: 2)
per_device_train_batch_size: Batch size per GPU (default: 8)
learning_rate: Learning rate (default: 2e-6)

Multi-GPU Training

The script supports multi-GPU training with DeepSpeed. Adjust NUM_GPUS in the script:

NUM_GPUS=2  # Use 2 GPUs
bash hsa_dpo_train.sh

Evaluation

Download Model Weights

pip install -U modelscope
modelscope download --model xiaowenyi/HSA-DPO --local-dir ./checkpoints

Run Inference

We provide a simple inference script to test the model:

# Run inference (LLaVA should already be installed from Installation step)
python inference/inference_example.py \
    --model-base path/to/llava-v1.5-13b \
    --lora-path ./checkpoints/HSA-DPO_llava_v1.5-13B-lora \
    --image path/to/image.jpg \
    --prompt "Describe this image in detail."

Citation

If you find this work useful, we would appreciate it if you could cite our paper:

@article{xiao2025hsa_dpo,
  title     = {Detecting and Mitigating Hallucination in Large Vision Language Models 
               via Fine-Grained AI Feedback},
  author    = {Xiao, Wenyi and Huang, Ziwei and Gan, Leilei and He, Wanggui and 
               Li, Haoyuan and Yu, Zhelun and Shu, Fangxun and Jiang, Hao and 
               Zhu, Linchao},
  journal   = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume    = {39},
  number    = {24},
  pages     = {25543--25551},
  year      = {2025},
  month     = {Apr},
  url       = {https://ojs.aaai.org/index.php/AAAI/article/view/34744},
  doi       = {10.1609/aaai.v39i24.34744}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
asset		asset
hsa_dpo		hsa_dpo
inference		inference
.gitignore		.gitignore
README.md		README.md
hsa_dpo_train.sh		hsa_dpo_train.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[AAAI 2025] Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Overview

Table of Contents

Installation

Dataset

Download Dataset

Dataset Organization

Prepare Data for Training

Training

Prerequisites

Running Training

Key Parameters

Multi-GPU Training

Evaluation

Download Model Weights

Run Inference

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[AAAI 2025] Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Overview

Table of Contents

Installation

Dataset

Download Dataset

Dataset Organization

Prepare Data for Training

Training

Prerequisites

Running Training

Key Parameters

Multi-GPU Training

Evaluation

Download Model Weights

Run Inference

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages