Personal tools
Skip to content. | Skip to navigation
Rsolid is an R package for normalizing fluorescent intensity data from ABI/SOLiD second generation sequencing platform. It has been observed that the color-calls provided by factory software contain technical artifacts, where the proportions of colors called are extremely variable across sequencing cycles. Under the random DNA fragmentation assumption, these proportions should be equal across sequencing cycles and proportional to the dinucleotide frequencies of the sample. Rsolid implements a version of the quantile normalization algorithm that transforms the intensity values before calling colors. Results show that after normalization, the total number of mappable reads increases by around 5%, and number of perfectly mapped reads increases by 10%. Moreover a 2-5% reduction in overall error rates is observed, with a 2-6% reduction in the rate of valid adjacent color mis-matches. The latter is important, since it leads to a decrease in false-positive SNP calls. The normalization algorithm is computationally efficient. In a test we are able to process 300 million reads in 2 hours using 10 computer cluster nodes. The engine functions of the package are written in C for better performance.
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).
The SummarizedExperiment container contains one or more assays, each represented by a marix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Data for other R packages.
An R interface to V8: Google's open source JavaScript and WebAssembly engine.
Many approaches for both reading and creating XML (and HTML) documents (including DTDs), both local and accessible via HTTP or FTP. Also offers access to an XPath "interpreter".
Memory efficient S4 classes for storing sequences "externally" (behind an R external pointer, or on disk).
Combine multi-dimensional arrays. This is a generalization of cbind and rbind. Takes a sequence of vectors, matrices, or arrays and produces a single array of the same or higher dimension.
ACE and AVAS (additivity and variance stabilization) are used to estimate transformations for regression.
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.