Every failure encountered during pipeline development (Mar 2026), documented so they don't happen again.
- Failed:
bioinfochrustrasbourg/annotsv:3.4.4— no such image on Docker Hub - Fix: Use
getwilds/annotsv:3.4.4instead (Fred Hutch maintained)
- Failed:
quay.io/biocontainers/snpsift:5.2--hdfd78af_1— no such manifest - Fix: Use
quay.io/biocontainers/snpeff:5.2--hdfd78af_1— SnpSift is bundled inside the SnpEff package
- Failed:
snpEff download GRCh38.105and all database names — Azure blob storage returned 0-byte files for ALL URLs includingGRCh38.105,GRCh38.mane.1.2.ensembl - Root cause: SnpEff databases are hosted on Azure blob storage which was down
- Fix: Pivot to Ensembl VEP as alternative functional annotation tool
- Failed:
quay.io/biocontainers/expansionhunter:5.0.0--hd03093a_1— manifest not found. Alsomgibio/expansionhunter:latest— not found - Fix: Use
weisburd/expansionhunter:latest— binary at/ExpansionHunter/bin/ExpansionHunter, variant catalogs at/pathogenic_repeats/GRCh38/ - Note: This is ExpansionHunter v2.5.5 which uses
--repeat-specs(directory), not--variant-catalog(single JSON)
- Failed: Container exits immediately with "the option '--log' is required but missing"
- Fix: Always include
--log /output/sample_eh.login the command
- Failed:
twesigomwedavid/stellarpgx:latest— image exists but contains no StellarPGx binaries - Status: UNSOLVED. No working Docker image found for StellarPGx as of Mar 2026.
- Failed:
OSError: [Errno 13] Permission denied: '/output/sergio'when writing output (sample name in error will vary) - Fix: Add
--user rootflag todocker run
- Failed:
pip install telomerehunteron the host givesUnicodeDecodeError— Python environment issues - Fix: Use Docker image
lgalarno/telomerehunter:latestinstead of native install
- Failed attempt 1:
zlskidmore/hla-la:latest— graph atsrc/additionalReferences/PRG_MHC_GRCh38_withIMGT/exists but is NOT serialized. HLA-LA exits silently. - Failed attempt 2: Copied graph files to
graphs/PRG_MHC_GRCh38_withIMGT/— "graph not complete" - Failed attempt 3: Ran
--action prepareGraphin detached container — container exited during prep, graph still not serialized - Root cause: HLA-LA requires a pre-serialized graph (~40GB RAM to prepare, takes hours). No standard Docker image includes it.
- Fix: Use
jiachenzdocker/hla-la:latest(27.5GB image with pre-built graph) OR switch to T1K
- Failed:
jiachenzdocker/hla-la:latest— read extraction succeeds butHLA-LAC++ binary crashes during graph alignment even with 32GB RAM and 8 threads. Error: "HLA-LA execution not successful." - Root cause: Likely an incompatibility between the pre-built binary and the BAM data format, or an unmet memory requirement (the graph deserialization may need >32GB)
- Status: UNSOLVED. HLA-LA from WGS BAMs is unreliable in Docker. Alternative: use Sanitas clinical HLA typing results, or use arcas-hla or T1K with partial coordinates
- Failed:
t1k-build.pl -d hla.dat -g reference.fasta.faiproduced coordinate file withchr19 -1 -1 +for all HLA genes - Root cause: The
-gparameter expects the actual reference FASTA (3.1GB), not the FAI index (158KB). TheAddGeneCoord.plscript needs to align HLA sequences against the genome to find coordinates. - Fix: Use
-g Homo_sapiens_assembly38.fasta(the full FASTA, not the .fai)
- Failed:
bam-extractorruns but produces empty_candidate_1.fqand_candidate_2.fq - Root cause: Coordinate file had
-1 -1coordinates (see above), so no genomic region was extracted - Fix: Regenerate coordinate file with proper reference FASTA
- Result: Cyrius returned
Nonefor CYP2D6 star alleles - Root cause: CYP2D6 has extensive pseudogene homology (CYP2D7, CYP2D8) making short-read WGS unreliable
- Mitigation: Rely on Sanitas lab calls for CYP2D6; consider long-read sequencing in future
- Failed:
staphb/bcftools:1.21does not includebgziportabixin$PATH - Fix: Use
bcftools view -Oz -o output.vcf.gz(native bgzip output) andbcftools index -t(native tabix index) instead of piping tobgzip/tabix - Alternative: Use
quay.io/biocontainers/samtools:1.21which includes all htslib tools
- Failed: All 22 chr files in
mis_ready/were 0 bytes - Root cause: Script used
bgzip -c > output.vcf.gzwhich failed silently because bgzip wasn't available - Fix: See above — use
bcftools view -Oz -o
- Failed:
Cannot open Local file /opt/vep/.vep/tmp/homo_sapiens_vep_112_GRCh38.tar.gz - Fix: Run with
--user rootand pre-create the temp dir:mkdir -p /opt/vep/.vep/tmp && chmod 777 /opt/vep/.vep/tmp
- Failed: Container exits after "downloading..." with no cache files extracted
- Root cause: The 17GB download timed out or was interrupted; INSTALL.pl doesn't retry
- Fix: Download manually with
wget -c(supports resume), then extract withtar xzf, then run VEP with--cache --dir_cache /path/to/cache
docker run --cpus 4 --memory 8g ...Without limits, tools like DeepVariant or minimap2 will consume ALL available RAM and crash the host.
docker build --platform linux/amd64 ...macOS is arm64; most Linux servers are amd64. Images built on Mac without --platform won't run on amd64 servers.
Always use --rm for analysis containers to avoid accumulating stopped containers. Use -d (detached) for long-running jobs.
Most bioinformatics containers run as non-root users. If writing to bind-mounted volumes, add --user root to avoid permission issues.
- Observed:
ludeeus/action-shellcheck@masterexits non-zero even when configured withseverity: warning - Impact: Once the check is required on
main, "warning-only" findings still block merges - Fix: Clear ShellCheck warnings before enabling required status checks, or explicitly relax the workflow instead of assuming warnings are advisory only
- Observed: Required status checks become a trap if you enable branch protection while the default branch or active PR branch is still red
- Fix: Get
LintandSmoke Testsgreen first, then enable required checks, block force-pushes, and block deletion
- Failed:
robertopreste/mtoolbox:latest— "repository does not exist or may require docker login" - Also checked: No image on quay.io/biocontainers, ghcr.io, or Docker Hub
- Root cause: MToolBox was never officially containerized. GitHub issue #107 (Mar 2022) confirms: "Not at the moment."
- Fix: Use GATK Mutect2 in mitochondrial mode instead (
broadinstitute/gatk:latest). Mutect2 handles mitochondrial heteroplasmy detection natively and is well-maintained.
- Failed:
quay.io/biocontainers/cnvnator:0.4.1--py312hc02a2a2_7— manifest not found - Fix: Use
quay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11. Biocontainer hashes encode the conda build hash and change between builds. Always verify at quay.io/repository/biocontainers/cnvnator.
- Failed:
quay.io/biocontainers/delly:1.2.9--ha41ced6_0— manifest not found - Fix: Use
quay.io/biocontainers/delly:1.7.3--hd6466ae_0(latest as of Mar 2026). Biocontainer tags are version-specific and change frequently — always verify at quay.io/repository/biocontainers/delly.
- Observed: Delly's "SV annotation" step runs for 2-3 hours at 100% CPU on a 30X WGS genome. No new log output appears during this time, which can look like the process is stuck.
- This is normal. Delly genotypes every candidate SV site against the reference, which is CPU-intensive. Total runtime for 30X WGS: ~3-4 hours.
- Tip: Use
docker statsto confirm the container is still using CPU. If CPU is at 0%, the process may actually be stuck.
- Gotcha: Delly writes BCF (binary VCF), not VCF. The output file has a
.bcfextension. - Fix: Convert with
bcftools view input.bcf -Oz -o output.vcf.gzand index withbcftools index -t output.vcf.gz. The pipeline script handles this automatically.
- Observed: During the
-treestep, the.rootfile stays at 266 bytes (just the ROOT header) until the entire BAM is parsed. - This is normal. For a 30X WGS (~80GB BAM), the tree step takes ~5-10 minutes. The ROOT file grows to ~900MB-1.2GB only at the very end when the tree is flushed to disk.
- If the container exits and the file is still 266 bytes: Check if a corrupt
.rootfile from a previous failed run is blocking it. Delete and retry.
- Failed:
cpsr --pcgr_dir /genome/pcgr_data/data→ "Data directory (/genome/pcgr_data/data/data) does not exist" - Root cause: CPSR 1.x internally appends
/datato whatever--pcgr_diryou pass. If you point to thedata/directory inside the extracted bundle, it looks fordata/data/. - Fix (1.x): Point
--pcgr_dirto the parent of thedata/directory:--pcgr_dir /genome/pcgr_data(not/genome/pcgr_data/data) - Superseded by PCGR 2.x — the
--pcgr_dirflag no longer exists. See migration notes below.
- Failed:
sigven/cpsr:2.0.0does not exist on Docker Hub - Fix: Use
sigven/pcgr:2.2.5which bundles bothpcgrandcpsrbinaries at/usr/local/bin/
- CLI completely changed: The
--pcgr_dirflag is gone. Replaced by--refdata_dir(for the ref data bundle) and--vep_dir(for the VEP cache). These are separate mount points inside the container. - Docker volumes changed: PCGR 1.x used a single
-v ${GENOME_DIR}:/genomemount. PCGR 2.x requires four separate mounts: VEP cache (:/mnt/.vep), ref data bundle (:/mnt/bundle), input VCFs (:/mnt/inputs), and outputs (:/mnt/outputs). - Data bundle is smaller and different: The old monolithic ~21 GB bundle (
pcgr.databundle.grch38.20220203.tgz) that included VEP cache is replaced by a smaller ~5 GB ref data bundle (pcgr_ref_data.20250314.grch38.tgz). VEP cache is now provided separately (reuse the same cache from step 13). - Bundle extraction requires extra step: After
tar xzf, the extracteddata/directory must be moved into a version-stamped directory:mkdir -p 20250314 && mv data/ 20250314/. The--refdata_dirmount points to this version directory. - Docker tag:
sigven/pcgr:1.4.1→sigven/pcgr:2.2.5. The image still bundles bothpcgrandcpsrbinaries. - Old data bundle is incompatible: If you have the 1.x bundle, you must download the 2.x bundle fresh. The directory structure and expected paths are completely different.
- MIS is designed for genotyping array data from cohort studies
- Single-sample WGS submissions may be rejected (20-sample minimum)
- For individual WGS: imputation adds minimal value (you already have 90%+ variant coverage)
- Main benefit for WGS would be phasing, not imputation
- Must create account at imputationserver.sph.umich.edu
- API tokens expire after 30 days
- Results auto-deleted after 7 days
- TOPMed Freeze 8 (r2): 132K samples, 705M variants, GRCh38 native
- Uses
chrprefix (which GRCh38 BAMs already have) - HRC r1.1 (32K samples) is European-centric but hg19 only
- Failed:
quay.io/biocontainers/freebayes:1.3.7--h1870644_0—freebayes --versionworks, but actual variant calling triggersSIGILL(illegal instruction, exit code 132) - Tested on: Intel i5-14500
- Root cause: Likely a build-time CPU optimization mismatch in the 1.3.7 biocontainer binary
- Fix: Use
quay.io/biocontainers/freebayes:1.3.6--hbfe0e7f_2which works correctly
- Observed: FreeBayes memory usage grows unpredictably during full-genome runs: 463MB at 30 min, 6.4GB at 60 min, 12.7GB at 90 min, then stabilizes ~12GB
- Original limit:
--memory 16gwas too tight — would have OOM-killed at 80% usage - Fix: Use
--memory 32gfor full-genome runs. Peak observed was 12.8GB but growth is non-linear and region-dependent.
- Observed: FreeBayes has no
-tor--threadsflag. Full 30X WGS takes 8-12 hours. - Workaround: Use
--region chr22(orINTERVALS=chr22) for quick testing (~20-40 min) - For production: Consider GNU parallel with per-chromosome regions, then merge VCFs
- Failed:
bcftools index -tfails with "index file exists" after GATK already creates its own.tbi - Fix: Use
bcftools index -ft(with-fforce flag) to overwrite the GATK-generated index
- Failed: GATK HaplotypeCaller fails if
Homo_sapiens_assembly38.dictis missing - Fix: Generate once with
gatk CreateSequenceDictionary -R /genome/reference/Homo_sapiens_assembly38.fasta
- Failed:
bcftools isec -R chr22treats-R(uppercase) as a BED file path, fails with "file not found" - Fix: Use
-r chr22(lowercase) for region strings.-Rexpects a file.
- Failed:
tiddit --svexits with "The reference must be indexed using bwa index; run bwa index, or skip local assembly (--skip_assembly)" - Root cause: TIDDIT 3.9+ uses local assembly for breakpoint refinement, which requires BWA index files alongside the reference
- Fix: Use
--skip_assemblywhen using minimap2 alignments (no BWA index available). If using BWA-MEM2 alignment, the index files are compatible.
- Failed:
quay.io/biocontainers/tiddit:3.7.0--py312h24f4cff_1— manifest unknown - Fix: Use
quay.io/biocontainers/tiddit:3.9.5--py312h6e8b409_0. Always verify tags at quay.io/repository/biocontainers/tiddit.
- Failed:
--callRegions reference.fasta.fai→ "Can't find expected call-regions bed index file" - Fix: Create a proper bgzipped BED file with tabix index. Use GATK container for bgzip/tabix (not in bcftools or samtools staphb images).
- Observed: Neither
staphb/samtools:1.20norstaphb/bcftools:1.21includebgziportabixin PATH - Fix: Use
broadinstitute/gatk:4.6.1.0which has both at/usr/bin/bgzipand/usr/bin/tabix. Or usebcftools view -Ozas a bgzip alternative.
- Observed: FreeBayes calls ~247K variants on chr22 vs DeepVariant ~91K and GATK ~69K
- Interpretation: The ~200K FreeBayes-unique variants are mostly false positives. FreeBayes maximizes sensitivity at the cost of precision.
- Recommendation: Always quality-filter FreeBayes output with
bcftools filterorvcffilterbefore use.
- What failed:
plink --23file(1.9) to import +plink2 --ref-from-fa forceto fix REF/ALT - Why: For single-sample data, ALL homozygous positions are monomorphic. plink's
.bimstores only one allele for these.--ref-from-facannot create a proper ALT because there's no second allele slot. Homozygous ALT genotypes silently become homozygous REF. - Verified: rs9939609 (FTO), genotype=AA, REF=T. plink:
REF=A, ALT=., GT=0/0(WRONG). bcftools:REF=T, ALT=A, GT=1/1(CORRECT). ~66K positions (11%) corrupted. - Fix: Use
bcftools convert --tsv2vcf -f <reference.fa>. Single command, no intermediate binary format.
--allow-extra-chrcannot be used with--23file- Female samples with Y calls (MyHeritage GSA PAR region) error with sex=F
- Sex inference defaults to male unless explicitly set
- Quoted CSV with
"RSID","CHROMOSOME","POSITION","RESULT"columns - Strip
##comments, header, quotes; convert commas to tabs
- hg19 reference uses numeric chromosomes; chain file expects chr prefix
bcftools annotate --rename-chrsbetween conversion and liftover
- Correct: CYP2B6 (*1/*6), CYP4F2 (*1/*6), DPYD (*5/*5), NUDT15 (*1/*2)
- Missed: CYP2C19 (25 missing), VKORC1 (1 missing)
- Wrong: CYP3A5 *1/*1 (should be *3/*3, 4 missing positions)
- Total: 888 missing PGx positions from the GSA chip
- ROH:
-G30required (no FORMAT/PL in chip VCF) - PRS:
no-mean-imputationrequired (single sample lacks allele frequencies) - PRS matching: chip ~12% of large scoring files vs WGS ~28%
- Old (v2.5.5):
--bam,--ref-fasta,--repeat-specs(directory),--vcf,--json,--log(all required) - New (v5.0.0):
--reads,--reference,--variant-catalog(single JSON),--output-prefix,--threads - The
--logflag is gone in v5.0.0. Do NOT pass it or the command will fail. - Biocontainer image:
quay.io/biocontainers/expansionhunter:5.0.0--hc26b3af_5. Binary isExpansionHunter(on PATH, not at/ExpansionHunter/bin/). - Bundled catalog:
/usr/local/share/ExpansionHunter/variant_catalog/grch38/variant_catalog.json(31 loci). No need to download separately.
- Failed: GRIDSS exits with "BWA index not found" when using default minimap2 alignment
- Fix: Either (a) align with BWA-MEM2 first (
02a-alignment-bwamem2.sh), or (b) generate BWA index files separately. The04b-gridss.shscript validates this and prints instructions. - Note: GRIDSS outputs ALL SVs as BND (breakend) notation. Standard DEL/DUP/INV types require post-processing conversion for SURVIVOR merge compatibility.
- Observed: GRIDSS assembly-based SV calling needs ~28 GB JVM heap for 30X WGS
- Fix: Container runs with
--memory 32gand-Xmx28gJVM argument. Will fail silently on machines with < 32 GB RAM.
- Observed: GRIDSS benefits from an ENCODE blacklist to suppress known artifact regions
- Fix: Script auto-downloads
ENCFF356LFX.bed.gz(hg38 blacklist) from ENCODE on first run. If download fails (offline), GRIDSS runs without it (lower precision but still functional).
- Observed: fastp accepts
-w(workers) up to 16. Values above 16 are clamped to 16. The-wflag controls I/O worker threads; actual adapter detection is single-threaded. - Recommendation: Use
-wmatchingTHREADSup to 16. For most WGS runs,-w 4is sufficient.
- Observed: fastp's
--detect_adapter_for_peworks for Illumina, BGI, and MGI adapters without specifying adapter sequences. BGI/MGI adapters are compiled into fastp'sknownadapters.h. - No action needed: The
--detect_adapter_for_peflag handles all common sequencing platforms.
- Observed:
--fast-modeuses a simpler, faster counting method and does NOT write the per-base.per-base.bed.gzfile. This saves ~2 GB of output and cuts runtime by ~40%. - Fix: Always use
--fast-modeunless per-base resolution is specifically needed.
- Observed: MultiQC identifies fastp output by looking for
"before_filtering": {in JSON files, not by filename pattern. Files must end in.json. - Tip: Ensure fastp's
-joutput uses.jsonextension.
- Docker image
dancooke/octopus:0.7.4works out of the box for germline calling - Supports
--threadsfor parallelism (unlike FreeBayes) - Typical memory: 8-12 GB for 30X WGS (much less than FreeBayes peak of 13 GB)
- Old (2.15.5):
python3 /pharmcat/pharmcat_vcf_preprocessor.py - New (3.2.0):
python3 /pharmcat/pharmcat_vcf_preprocessor - Impact: Step 7 preprocessor command must drop the
.pysuffix or the container exits with "No such file"
- Old (2.15.5):
-reporterJsonproduced JSON; HTML was always generated by default - New (3.2.0): If ANY format flag is specified, ONLY those formats are produced. To get both HTML and JSON, you must pass
-reporterJson -reporterHtml - Impact: Step 7 now passes both flags explicitly. Without
-reporterHtml, the HTML report (used for manual review) would silently stop being generated.
- Old (2.15.5):
wildtypeAlleleproperty in gene result JSON objects - New (3.2.0): Renamed to
referenceAllele - Impact: Step 27's JSON parser does not use this property directly, so no code change was needed. Any downstream scripts or notebooks that parse
wildtypeAllelemust be updated.
- NAT2 calling: PharmCAT 3.x includes improved NAT2 acetylator status calling
- BCF support: Preprocessor now accepts BCF input files directly (no conversion needed)
- Single-gene calling: New
-gflag allows running PharmCAT on a single gene (useful for targeted re-analysis)