BDI-Kit is a library that assist users in performing data harmonization. It provides state-of-the-art tools to streamline the process of integrating and transforming disparate datasets (with a focus on biomedical data), and includes APIs for performing tasks such as:
- Schema matching
- Value matching
- Data transformation to a target schema or data model
Documentation is available at https://bdi-kit.readthedocs.io/.
You can install the latest stable version of this library from PyPI:
pip install bdi-kit
To install the latest development version:
pip install git+https://github.com/VIDA-NYU/bdi-kit@devel
This video demonstrates a brief overview of BDI-Kit, showcasing its functionality through both the Python API and the chatbot-style agent interface.
To learn more about making a contribution to BDI-Kit, please see our Contributing guide.
If you find BDI-Kit useful in your work, please consider citing:
@article{lopez2026bdikit,
title={{BDI-Kit: An AI-Powered Toolkit for Biomedical Data Harmonization}},
author={Lopez, Roque and Santos, Aecio and Koutras, Christos and Freire, Juliana},
journal={{Patterns}},
volume={7},
year={2026}
}You can also find here our other papers related to the BDI-Kit library.