Skip to content

CNAG-Biomedical-Informatics/pheno-ranker

Repository files navigation

Pheno-Ranker

Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond

Build and Test Build and Test Windows Coverage Status CPAN Publish Kwalitee Score version Docker Build Docker Pulls Docker Image Size Documentation Status License Google Colab


📘 Documentation: https://cnag-biomedical-informatics.github.io/pheno-ranker

📖 Usage: https://cnag-biomedical-informatics.github.io/pheno-ranker/usage/

📓 Google Colab tutorial: https://colab.research.google.com/drive/1n3Etu4fnwuDWNveSMb1SzuN50O2a05Rg

📦 CPAN Distribution: https://metacpan.org/pod/Pheno::Ranker

🐳 Docker Hub Image: https://hub.docker.com/r/manuelrueda/pheno-ranker/tags

🌐 Web App UI: https://pheno-ranker.cnag.eu


Pheno-Ranker

Pheno-Ranker is a lightweight toolkit for semantic similarity analysis of phenotypic, clinical, and other categorical data serialized as JSON, YAML, or preprocessed CSV.

It supports GA4GH-oriented formats such as Beacon Friendly Format (BFF) and Phenotype Exchange Format (PXF), but it can also rank and compare generic JSON records beyond the biomedical domain.

What It Does

Pheno-Ranker turns hierarchical records into comparable one-hot encoded binary vectors. It then computes pairwise similarity or distance metrics for cohort exploration, patient matching, clustering, multidimensional scaling, and graph analytics.

Main workflows:

  • Cohort mode: compare every individual or record against every other record in one or more cohorts.
  • Patient mode: rank records in a reference cohort against a target patient or object.
  • Generic JSON mode: compare arbitrary categorical JSON data using a configuration file.
  • Utility workflows: simulate BFF/PXF data, convert CSV data for ranking, plot summary statistics, and encode vectors as QR codes.

Quick Start

Basic cohort comparison:

pheno-ranker -r individuals.json

Patient matching:

pheno-ranker -r individuals.json -t patient.json --max-out 10

Generic JSON with a custom configuration:

pheno-ranker -r movies.json --config movies_config.yaml --include-terms genre year

Cytoscape-compatible graph export:

pheno-ranker -r individuals.json --cytoscape-json graph.json

Selected Features

  • Native support for BFF and PXF JSON/YAML inputs.
  • Generic JSON support through YAML/JSON configuration files.
  • Cohort and patient-ranking modes.
  • Hamming distance and Jaccard similarity.
  • Patient-mode Z-scores and p-values for match significance.
  • Include/exclude term filters, optional variable weights, and HPO ascendant expansion.
  • Export of binary vectors, intermediate hashes, alignments, and coverage statistics.
  • Outputs suitable for clustering, multidimensional scaling, and graph analytics.
  • QR-code utilities for compact encoded vector exchange.
  • Companion utilities for CSV import, BFF/PXF simulation, and summary-statistics plotting.

Output Formats

Common outputs include:

  • matrix.txt: dense pairwise comparison matrix.
  • rank.txt: patient-mode ranking output.
  • graph.json: Cytoscape-compatible graph output.
  • graph_stats.txt: graph summary statistics.
  • export.*.json: intermediate files for inspection or precomputed workflows.
  • matrix.mtx: optional sparse Matrix Market output for large matrix workflows.

Installation

The Perl command-line interface is tested on Linux, macOS, and Windows via GitHub Actions. On Windows, use Docker, WSL, or a Perl environment such as Strawberry Perl; Python utilities under utils/ and external R plotting scripts are best used from Docker or a GitHub checkout.

For CPAN installation:

cpanm Pheno::Ranker
pheno-ranker --help

For repository-based development:

git clone https://github.com/cnag-biomedical-informatics/pheno-ranker.git
cd pheno-ranker
cpanm --notest --installdeps .
bin/pheno-ranker --help

Docker images are also available from Docker Hub:

docker pull manuelrueda/pheno-ranker:latest

Detailed installation instructions are available in the documentation:

Documentation

Long-form documentation, tutorials, and use cases live in the documentation site:

The built-in CLI help remains available:

pheno-ranker --help

--man is deprecated and now points to the online usage documentation.

Citation

If you use Pheno-Ranker in published work, please cite:

Leist, I.C. et al. (2024). Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond. BMC Bioinformatics. https://doi.org/10.1186/s12859-024-05993-2

Author

Manuel Rueda, PhD. CNAG: https://www.cnag.eu

Packages

 
 
 

Contributors