fix: quote-aware VCF header parser preserves '=' in Description values (#80, #89)#93
Open
jmg421 wants to merge 7 commits into
Open
fix: quote-aware VCF header parser preserves '=' in Description values (#80, #89)#93jmg421 wants to merge 7 commits into
jmg421 wants to merge 7 commits into
Conversation
Bioconductor#86) When query has no overlap with the CDS, .localCoordinates() returns a zero-length GRanges. Previously an early return on length(txlocal)==0 caused REFAA and VARAA to be absent from mcols(), returning NULL instead of empty AAStringSet objects. This breaks downstream operations like reverse() and subseq() on the result columns. Fix: - Remove early return so the full mcols-building code runs even when txlocal is empty, naturally producing zero-length AAStringSet columns - Fix GENEID=NA_character_ -> rep(NA_character_, length(txlocal)) so DataFrame() construction works correctly at zero length Test: extend test_predictCoding_empty to assert REFAA and VARAA are AAStringSet with length 0.
…lassification
Multi-nucleotide variants (MNVs/DBS) can produce VARAA strings like 'P*'
or '*W' where %in% '*' fails to match. Switch to grepl('\*', ..., fixed=TRUE)
so any VARAA containing a stop codon is correctly classified as 'nonsense'
rather than 'nonsynonymous'.
Fixes Bioconductor#86. Adds unit test test_predictCoding_nonsense_DBS covering
a DBS that introduces a stop at a codon boundary.
mcols(rdexp) <- NULL unconditionally erased all user-added metadata columns from rowRanges during CollapsedVCF expansion. Fix: compute the set of non-VCF-fixed columns (anything not in REF/ALT/QUAL/ FILTER/paramRangeID) and retain them in the expanded object; the fixed columns are dropped as before since they are rebuilt from fexp. Fixes Bioconductor#85.
… all-NA seqinfo (Bioconductor#78) - .contigsFromSeqinfo() now returns character(0) when all seqlengths and genome are NA, avoiding noisy '##contig=<ID=x>' placeholder lines - .formatHeader() single-value branch no longer overwrites an existing fileDate with today's date; the original value is preserved - META branch likewise only adds fileDate when absent - Add regression test test_predictCoding_exon_intron_boundary (Bioconductor#83)
Bioconductor#80, Bioconductor#89) htslib/scanBcfHeader splits structured header fields on '=' without respecting double-quoted values, silently truncating Description strings that contain '=' (e.g. VRS version=2.0.1). Add .parseVcfHeaderBody() and .parseRawVcfHeader() to re-parse raw header text and patch each DataFrame back to the correct values. Fixes Bioconductor#80, fixes Bioconductor#89
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
htslib/scanBcfHeader splits structured header fields on
=without respecting double-quoted values, silently truncating Description strings that contain=(e.g.VRS version=2.0.1).Adds
.parseVcfHeaderBody()and.parseRawVcfHeader()to re-parse raw header text and patch each DataFrame back to the correct values. Includes regression test.Fixes #80, fixes #89.