Skip to content

Mr-Loevan/HSA-DPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[AAAI 2025] Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Wenyi Xiao1* , Ziwei Huang1* , Leilei Gan1† , Wanggui He2
Haoyuan Li2 , Zhelun Yu2 , Fangxun Shu2 , Hao Jiang2 , Linchao Zhu1
1 Zhejiang University      2 Alibaba Group      
*Equal contribution        Corresponding author

Paper PDF Dataset Dataset

Overview

This repository contains the official implementation of the paper "Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback". For supplementary experiments and details, please refer to the Appendix.

model

Table of Contents

Installation

git clone https://github.com/Mr-Loevan/HSA-DPO.git
cd HSA-DPO

# Install HSA-DPO and dependencies
conda create -n hsa_dpo python==3.9
conda activate hsa_dpo
pip install -e .

# (Optional) Install flash-attention for faster training
pip install -e ".[flash-attn]"

Dataset

Download Dataset

pip install -U huggingface_hub

# Download all dataset files
huggingface-cli download --repo-type dataset WenyiXiao/HSA-DPO --local-dir ./datasets

Dataset Organization

For hallucination detection:

  • Training data: hsa_dpo_detection.jsonl
  • Images: From Visual Genome

For hallucination mitigation (HSA-DPO training):

  • Preference data: hsa_dpo_preference_llava1dot5.jsonl
  • Images: hsa_dpo_imgs.tar.gz

Prepare Data for Training

# 1. Create data directories
mkdir -p hsa_dpo/data
mkdir -p hsa_dpo/data/image

# 2. Copy preference dataset
cp datasets/hsa_dpo_preference_llava1dot5.jsonl hsa_dpo/data/

# 3. Extract images
tar -xzf datasets/hsa_dpo_imgs.tar.gz -C hsa_dpo/data/image/

# 4. Verify data structure
ls hsa_dpo/data/
# Should show: hsa_dpo_preference_llava1dot5.jsonl

ls hsa_dpo/data/image/ | head -5
# Should show: 0.jpg, 1.jpg, 2.jpg, 3.jpg, 4.jpg ...

Note: The images are named with sequential IDs (0.jpg, 1.jpg, ...) corresponding to the id field in the JSONL file.

Training

Prerequisites

  1. Install the HSA-DPO package:
pip install -e .
  1. Prepare dataset following the instructions above (see Dataset section)

  2. Download the base LLaVA-v1.5 model:

# Download LLaVA-v1.5-13B model
huggingface-cli download liuhaotian/llava-v1.5-13b --local-dir ./models/llava-v1.5-13b

# The CLIP vision encoder will be auto-downloaded during training

Running Training

We provide a training script for HSA-DPO with LLaVA-v1.5:

# Configure paths in hsa_dpo_train.sh
vim hsa_dpo_train.sh

# Update these paths according to your setup:
# DATA_PATH="./hsa_dpo/data/hsa_dpo_preference_llava1dot5.jsonl"
# IMAGE_FOLDER="./hsa_dpo/data/image"
# MODEL_PATH="path/to/llava-v1.5-13b"
# OUTPUT_DIR="./output/hsa_dpo_llava"

# Run training
bash hsa_dpo_train.sh

Key Parameters

  • use_chosen_score: Whether to use chosen scores in DPO loss (default: False)
  • use_rejected_score: Whether to use rejected scores in DPO loss (default: True)
  • beta: Temperature parameter for DPO loss (default: 0.1)
  • num_train_epochs: Number of training epochs (default: 2)
  • per_device_train_batch_size: Batch size per GPU (default: 8)
  • learning_rate: Learning rate (default: 2e-6)

Multi-GPU Training

The script supports multi-GPU training with DeepSpeed. Adjust NUM_GPUS in the script:

NUM_GPUS=2  # Use 2 GPUs
bash hsa_dpo_train.sh

Evaluation

Download Model Weights

pip install -U modelscope
modelscope download --model xiaowenyi/HSA-DPO --local-dir ./checkpoints

Run Inference

We provide a simple inference script to test the model:

# Run inference (LLaVA should already be installed from Installation step)
python inference/inference_example.py \
    --model-base path/to/llava-v1.5-13b \
    --lora-path ./checkpoints/HSA-DPO_llava_v1.5-13B-lora \
    --image path/to/image.jpg \
    --prompt "Describe this image in detail."

Citation

If you find this work useful, we would appreciate it if you could cite our paper:

@article{xiao2025hsa_dpo,
  title     = {Detecting and Mitigating Hallucination in Large Vision Language Models 
               via Fine-Grained AI Feedback},
  author    = {Xiao, Wenyi and Huang, Ziwei and Gan, Leilei and He, Wanggui and 
               Li, Haoyuan and Yu, Zhelun and Shu, Fangxun and Jiang, Hao and 
               Zhu, Linchao},
  journal   = {Proceedings of the AAAI Conference on Artificial Intelligence},
  volume    = {39},
  number    = {24},
  pages     = {25543--25551},
  year      = {2025},
  month     = {Apr},
  url       = {https://ojs.aaai.org/index.php/AAAI/article/view/34744},
  doi       = {10.1609/aaai.v39i24.34744}
}

About

[AAAI 2025] Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Topics

Resources

Stars

Watchers

Forks

Contributors