Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond
📘 Documentation: https://cnag-biomedical-informatics.github.io/pheno-ranker
📖 Usage: https://cnag-biomedical-informatics.github.io/pheno-ranker/usage/
📓 Google Colab tutorial: https://colab.research.google.com/drive/1n3Etu4fnwuDWNveSMb1SzuN50O2a05Rg
📦 CPAN Distribution: https://metacpan.org/pod/Pheno::Ranker
🐳 Docker Hub Image: https://hub.docker.com/r/manuelrueda/pheno-ranker/tags
🌐 Web App UI: https://pheno-ranker.cnag.eu
Pheno-Ranker is a lightweight toolkit for semantic similarity analysis of phenotypic, clinical, and other categorical data serialized as JSON, YAML, or preprocessed CSV.
It supports GA4GH-oriented formats such as Beacon Friendly Format (BFF) and Phenotype Exchange Format (PXF), but it can also rank and compare generic JSON records beyond the biomedical domain.
Pheno-Ranker turns hierarchical records into comparable one-hot encoded binary vectors. It then computes pairwise similarity or distance metrics for cohort exploration, patient matching, clustering, multidimensional scaling, and graph analytics.
Main workflows:
- Cohort mode: compare every individual or record against every other record in one or more cohorts.
- Patient mode: rank records in a reference cohort against a target patient or object.
- Generic JSON mode: compare arbitrary categorical JSON data using a configuration file.
- Utility workflows: simulate BFF/PXF data, convert CSV data for ranking, plot summary statistics, and encode vectors as QR codes.
Basic cohort comparison:
pheno-ranker -r individuals.jsonPatient matching:
pheno-ranker -r individuals.json -t patient.json --max-out 10Generic JSON with a custom configuration:
pheno-ranker -r movies.json --config movies_config.yaml --include-terms genre yearCytoscape-compatible graph export:
pheno-ranker -r individuals.json --cytoscape-json graph.json- Native support for
BFFandPXFJSON/YAML inputs. - Generic JSON support through YAML/JSON configuration files.
- Cohort and patient-ranking modes.
- Hamming distance and Jaccard similarity.
- Patient-mode Z-scores and p-values for match significance.
- Include/exclude term filters, optional variable weights, and HPO ascendant expansion.
- Export of binary vectors, intermediate hashes, alignments, and coverage statistics.
- Outputs suitable for clustering, multidimensional scaling, and graph analytics.
- QR-code utilities for compact encoded vector exchange.
- Companion utilities for CSV import, BFF/PXF simulation, and summary-statistics plotting.
Common outputs include:
matrix.txt: dense pairwise comparison matrix.rank.txt: patient-mode ranking output.graph.json: Cytoscape-compatible graph output.graph_stats.txt: graph summary statistics.export.*.json: intermediate files for inspection or precomputed workflows.matrix.mtx: optional sparse Matrix Market output for large matrix workflows.
The Perl command-line interface is tested on Linux, macOS, and Windows via GitHub Actions. On Windows, use Docker, WSL, or a Perl environment such as Strawberry Perl; Python utilities under utils/ and external R plotting scripts are best used from Docker or a GitHub checkout.
For CPAN installation:
cpanm Pheno::Ranker
pheno-ranker --helpFor repository-based development:
git clone https://github.com/cnag-biomedical-informatics/pheno-ranker.git
cd pheno-ranker
cpanm --notest --installdeps .
bin/pheno-ranker --helpDocker images are also available from Docker Hub:
docker pull manuelrueda/pheno-ranker:latestDetailed installation instructions are available in the documentation:
- https://cnag-biomedical-informatics.github.io/pheno-ranker/download-and-installation/
- Non-containerized install: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker/blob/main/non-containerized/README.md
- Docker install: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker/blob/main/docker/README.md
Long-form documentation, tutorials, and use cases live in the documentation site:
The built-in CLI help remains available:
pheno-ranker --help--man is deprecated and now points to the online usage documentation.
If you use Pheno-Ranker in published work, please cite:
Leist, I.C. et al. (2024). Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond. BMC Bioinformatics. https://doi.org/10.1186/s12859-024-05993-2
Manuel Rueda, PhD. CNAG: https://www.cnag.eu
