-
Notifications
You must be signed in to change notification settings - Fork 21
Convert ExtendingGenomicRanges.Rnw to .Rmd #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from 2 commits
7a4c48c
e1cb01f
c894b54
24992b2
1ecc930
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| --- | ||
| title: "**Extending Genomic Ranges**" | ||
| author: "Michael Lawrence, Bioconductor Team" | ||
| date: "`r format(Sys.time(), '%d %B, %Y')`" | ||
| package: GenomicRanges | ||
| vignette: > | ||
| %\VignetteIndexEntry{Extending Genomic Ranges} | ||
| %\VignetteEncoding{UTF-8} | ||
| %\VignetteEngine{knitr::rmarkdown} | ||
| output: | ||
|
jwokaty marked this conversation as resolved.
|
||
| BiocStyle::html_document | ||
| editor_options: | ||
| markdown: | ||
| wrap: 72 | ||
| --- | ||
|
|
||
| # Introduction | ||
|
|
||
| The goal of *GenomicRanges* is to provide general containers for genomic | ||
|
jwokaty marked this conversation as resolved.
Outdated
|
||
| data. The central class, at least from the user perspective, is , which | ||
| formalizes the notion of ranges, while allowing for arbitrary "metadata | ||
| columns" to be attached to it. These columns offer the same flexibility | ||
| as the venerable and permit users to adapt *GRanges* to a wide variety | ||
| of *adhoc* use-cases. | ||
|
|
||
| The more we encounter a particular problem, the better we understand it. | ||
| We eventually develop a systematic approach for solving the most | ||
| frequently encountered problems, and every systematic approach deserves | ||
| a systematic implementation. For example, we might want to formally | ||
| store genetic variants, with information on alleles and read depths. The | ||
| metadata columns, which were so useful during prototyping, are | ||
| inappropriate for extending the formal semantics of our data structure: | ||
| for the sake of data integrity, we need to ensure that the columns are | ||
| always present and that they meet certain constraints. | ||
|
|
||
| We might also find that our prototype does not scale well to the | ||
| increased data volume that often occurs when we advance past the | ||
| prototype stage. *GRanges* is meant mostly for prototyping and stores | ||
| its data in memory as simple R data structures. We may require something | ||
| more specialized when the data are large; for example, we might store | ||
| the data as a Tabix-indexed file, or in a database. | ||
|
|
||
| The *GenomicRanges* ppackage does not directly solve either of these | ||
|
jwokaty marked this conversation as resolved.
Outdated
|
||
| problems, because there are no general solutions. However, it is | ||
| adaptable to specialized use cases. | ||
|
|
||
| # The *GenomicRanges* abstraction | ||
|
|
||
| Unbeknownst to many, most of the *GRanges* implementation is provided by | ||
| methods on the class, the virtual parent class of *GRanges*. | ||
| *GenomicRanges* methods provide everything except for the actual data | ||
| storage and retrieval, which *GRanges* implements directly using slots. | ||
| For example, the ranges are retrieved like this: | ||
|
|
||
| ```{r message=FALSE} | ||
| library(GenomicRanges) | ||
| ``` | ||
|
|
||
| ```{r} | ||
| selectMethod(ranges, "GRanges") | ||
| ``` | ||
|
|
||
| An alternative implementation is *DelegatingGenomicRanges*, which stores | ||
| all of its data in a delegate object: | ||
|
|
||
| ```{r} | ||
| selectMethod(ranges, "DelegatingGenomicRanges") | ||
| ``` | ||
|
|
||
| This abstraction enables us to pursue more efficient implementations for | ||
| particular tasks. One example is *GNCList*, which is indexed for fast | ||
| range queries, we expose here: | ||
|
|
||
| ```{r} | ||
| getSlots("GNCList")["granges"] | ||
| ``` | ||
|
|
||
| The *MutableRanges* package in svn provides other, untested examples. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MutableRanges is italicized as well as a hyperlink in the original vignette so we can change this to MutableRanges (Although as a note to @jwokaty it looks like this link goes to a removed package).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hpages Maybe this sentence should be removed since MutableRanges no longer exists?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes please, thanks! |
||
|
|
||
| # Formalizing `mcols`: Extra column slots | ||
|
|
||
| An orthogonal problem to data storage is adding semantics by the | ||
| formalization of metadata columns, and we solve it using the "extra | ||
| column slot" mechanism. Whenever *GenomicRanges* needs to operate on its | ||
| metadata columns, it also delegates to the internal generic, methods of | ||
|
jwokaty marked this conversation as resolved.
Outdated
|
||
| which should return a character vector, naming the slots in the | ||
| *GenomicRanges* subclass that correspond to columns (i.e., they have one | ||
| value per range). It extracts the slot values and manipulates them as it | ||
| would a metadata column -- except they are now formal slots, with formal | ||
| types. | ||
|
|
||
| An example is the *VRanges* class in *VariantAnnotation* . It stores | ||
| information on the variants by adding these column slots: | ||
|
jwokaty marked this conversation as resolved.
Outdated
|
||
|
|
||
| ```{r message=FALSE, warning=FALSE} | ||
| GenomicRanges:::extraColumnSlotNames(VariantAnnotation:::VRanges()) | ||
| ``` | ||
|
|
||
| Mostly for historical reasons, *VRanges* extends GRanges . However, | ||
| since the data storage mechanism and the set of extra column slots are | ||
|
jwokaty marked this conversation as resolved.
Outdated
|
||
| orthogonal, it is probably best practice to take a composition approach | ||
| by extending *DelegatingGenomicRanges*. | ||
This file was deleted.
Uh oh!
There was an error while loading. Please reload this page.