This repository provides the official implementation for few-shot learning-based bearing fault diagnosis using:
-
Multimodal Large Language Models (MLLMs):
- GPT-4o (OpenAI)
- GPT-5.1 (OpenAI)
- Claude 4.5 Haiku (Anthropic)
- Claude 4.5 Sonnet (Anthropic)
- LLaVA-1.5-7B (Open-source, HuggingFace:
liuhaotian/llava-v1.5-7b)
-
Prototypical Networks (baseline):
- ResNet-50 backbone (pretrained on ImageNet)
- Swin Transformer V2-T backbone (pretrained on ImageNet)
-
Task: 4-way classification of bearing health conditions:
- H: Healthy machine
- IR: Inner race fault
- OR: Outer race fault
- B: Rolling element (ball) fault
-
Input: Continuous Wavelet Transform (CWT) images of vibration signals
- Envelope analysis with 1400-2800 Hz band-pass filter
- Morse wavelet with 24 voices per octave
- 300x300 pixel images
-
Few-Shot Configurations: 1-shot, 5-shot, 10-shot learning
-
Evaluation: 10 repetitions with t-Student 95% confidence intervals
See QUICKSTART.md for detailed installation and usage instructions.
- Python 3.8+
- (Optional) CUDA-compatible GPU for LLaVA local inference
# Clone repository
git clone https://github.com/LGDiMaggio/few-shot-fault-diagnosis-multimodal-LLM.git
cd few-shot-fault-diagnosis-multimodal-LLM
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txt-
Copy .env.example to .env:
cp .env.example .env
-
Add your API keys to
.env:OPENAI_API_KEY=sk-your-openai-key ANTHROPIC_API_KEY=sk-ant-your-anthropic-key -
Prepare your dataset:
- You must provide your own bearing vibration data
- Generate CWT images (see Data Format section)
- Update \config.yaml\ with your data path
# Evaluate MLLMs
python evaluate_models.py
# Evaluate Prototypical Networks (ResNet-50)
python evaluate_prototypical.py --model resnet50 --n-shots 1 5 10
# Evaluate Prototypical Networks (Swin Transformer)
python evaluate_prototypical.py --model swin_v2_t --n-shots 1 5 10The code expects CWT images organized in a single directory with filenames following this convention:
{Condition}_{RPM}rpm_{FR}kN_{FA}kN_{Index}.png
Example:
H_607rpm_124.8kN_0kN_1.png
IR_607rpm_124.8kN_0kN_1.png
OR_1214rpm_124.8kN_0kN_5.png
B_1821rpm_124.8kN_0kN_12.png
-
Envelope Analysis:
- Band-pass filter: 1400-2800 Hz
- Extract vibration envelope from raw signal
-
Continuous Wavelet Transform:
- Wavelet: Morse wavelet
- Frequency resolution: 24 voices per octave
- Output: Time-frequency representation (300x300 pixels)
-
Image Encoding:
- Save as PNG format
- RGB or grayscale
- Collect your own bearing vibration signals
- Apply the CWT preprocessing described above
- Place images in the
data/cwt_images/directory - Update
config.yamlwith appropriate file naming patterns
All MLLM implementations use vision-enabled models with few-shot prompting:
| Model | Provider | API Model ID | Type |
|---|---|---|---|
| GPT-4o | OpenAI | gpt-4o-2024-08-06 |
Cloud API |
| GPT-5.1 | OpenAI | gpt-5.1-2025-11-13 |
Cloud API |
| Claude Haiku 4.5 | Anthropic | claude-haiku-4-5-20251001 |
Cloud API |
| Claude Sonnet 4.5 | Anthropic | claude-sonnet-4-5-20250929 |
Cloud API |
| LLaVA-1.5-7B | Open | liuhaotian/llava-v1.5-7b |
Local (HuggingFace) |
LLaVA Note: First run downloads ~13GB model from HuggingFace. Requires GPU for practical inference.
Traditional few-shot learning baseline using:
- easyfsl library for episodic training
- Pretrained feature extractors (ImageNet):
- ResNet-50: Deep residual network
- Swin Transformer V2-T: Vision transformer
Method: Compute class prototypes as mean of support embeddings, classify via Euclidean distance.
.
βββ README.md # This file
βββ QUICKSTART.md # Quick start guide
βββ LICENSE # MIT License
βββ requirements.txt # Python dependencies
βββ config.yaml # Experiment configuration
βββ .env.example # API key template
βββ .gitignore # Git ignore rules
β
βββ utils/ # Core utilities
β βββ __init__.py
β βββ models.py # MLLM interfaces (OpenAI, Anthropic, LLaVA)
β βββ prompts.py # Few-shot prompt construction
β βββ data_loader.py # CWT image loading
β βββ metrics.py # Evaluation metrics
β
βββ evaluate_models.py # Main MLLM evaluation script
βββ evaluate_prototypical.py # Prototypical Networks evaluation
β
βββ results/ # Output directory (created on run)
βββ *.xlsx # Per-model results
Edit config.yaml to customize:
experiment.n_shot_configs: Few-shot values (e.g.,[1, 5, 10])experiment.n_repetitions: Number of repetitions (default: 10)experiment.prompt_style:"concise"or"detailed"dataset.folder_path: Path to your CWT imagesmodels[].enabled: Enable/disable specific models
Results are saved as Excel files in results/:
- Per-repetition metrics: Accuracy, Precision, Recall, F1, Time
- Summary statistics: Mean, Std, 95% CI (t-Student distribution)
Example output:
results/
βββ gpt-4o_detailed_1-shot_all_speeds.xlsx
βββ gpt-4o_detailed_5-shot_all_speeds.xlsx
βββ claude-sonnet-4.5_detailed_10-shot_all_speeds.xlsx
βββ ...
If you use this code in your research, please cite:
@software{dimaggio2026fewshot,
author = {Di Maggio, Luigi Gianpio},
title = {Few-Shot Bearing Fault Diagnosis with Multimodal
LLMs and Prototypical Networks},
month = jan,
year = 2026,
publisher = {Zenodo},
version = {v1.0.0},
doi = {10.5281/zenodo.18376905},
url = {https://doi.org/10.5281/zenodo.18376905}
}This project is licensed under the MIT License - see LICENSE for details.
- LLaVA: Liu et al., "Visual Instruction Tuning" (NeurIPS 2023)
- HuggingFace model:
liuhaotian/llava-v1.5-7b
- HuggingFace model:
- Prototypical Networks: Snell et al., "Prototypical Networks for Few-shot Learning" (NeurIPS 2017)
- easyfsl: Few-Shot Learning library by Sicara
Contributions are welcome! Please open an issue or pull request.
This repository was prepared with AI support to facilitate academic reproducibility. All code derives from the original research implementation used in the paper.
Users must provide their own bearing vibration datasets. No original data is included in this repository.
For questions or collaboration:
- Luigi Gianpio Di Maggio: luigi.dimaggio@polito.it
Affiliation: Politecnico di Torino, Department of Mechanical and Aerospace Engineering