--- title: "Analysis of common intron properties" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{intron-properties} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction This vignette introduces functions for characterizing common properties of introns.
MaxEntScan
Splicing site strength using the [MaxEntScan program](http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html).
phastCons
Conservation scoring using phastCons program outputs (bigwig of scores and bed of identified conserved regions).
BranchPointScan
Motif scanning to characterize distance from branchpoint to 3'SS using [universalmotif](https://bioconductor.org/packages/release/bioc/html/universalmotif.html).
We will use several random introns and exons from the dm6 Bioconductor databases in this vignette. For your own applications, read in your introns into a `GenomicRanges::GRanges` object and then the same workflow can be easily adopted. The Bioconductor `BiocIO::import()` may be helpful if you have UCSC BED files in hand. Randomly sample a bunch of exons and introns. ```{r sampling} library(gsAnalysis) requireNamespace("TxDb.Dmelanogaster.UCSC.dm6.ensGene", quietly = TRUE) requireNamespace("BSgenome.Dmelanogaster.UCSC.dm6", quietly = TRUE) ``` ## MaxEntScan: splicing site strength `MaxEntScan` reports splicing site strength. It is a simple wrapper around the original MaxEntScan perl routine. ### Preparation You need to gather MaxEntScan program on your own. This can be done in one of the following ways. Also, you need `perl` to be in your system paths. ### Required inputs:
path.zip.MES
Path to the zipped MaxEntScan perl scripts
BSgenome
BSgenome for obtaining splice site sequences. You must make sure that the same genome build is used.
GRange.intron
GRanges of your introns.
### Sample usage ```{r MaxEntScan} # The package itself does not bundle the MaxEntScan program and # the following is NOT run. # Expected outputs are shown in comments at the end. ``` ## phastCons: conservation scoring `phastCons` reads conservation scores (bigwig) and conserved elements (bed) reported by the phastCons program and compute convervation scores for your genomic ranges. ### Preparation Phastcons score and conserved element files must be provided independently. One location to fetch these data is from UCSC. To get 124-way score and element files for Drosophila melanogaster dm6 genome:
bigWig of scores
https://hgdownload.soe.ucsc.edu/goldenPath/dm6/phastCons124way/dm6.phastCons124way.bw
BED of elements
https://hgdownload.soe.ucsc.edu/goldenPath/dm6/database/phastConsElements124way.txt.gz
Unzip the `.gz` file. Provide `.bw` and `.txt` to the `phastCons` function. ### Required inputs:
GRange.intron
GRanges of your introns. Alternatively, any other genome ranges may be used. This is a generally applicable function.
bw.phastCons.path
Path to the phastCons score bigWig file.
bed.phastCons.path
Path to the phastCons conserved element BED file.
### Sample usage ```{r phastCons} # The package itself does not bundle the score data and # the following is NOT run. # Expected outputs are shown in comments at the end. ``` ## BranchPointScan: distance of branchpoint to the 3' splice site `BranchPointScan` is a wrapper of the `universalmotif` package for identifying the highest-scored branchpoint motif position w.r.t. 3'-SS. ### Preparation You will need to provide a branchpoint motif matrix generated from motif discovery packages, e.g., HOMER or the MEME Suite. The workflow below starts with a precomputed MEME motif for branchpoint of the fruit fly. A tutorial of using the MEME suite to generate branchpoint motif will be released elsewhere TODO. ### Required inputs:
GRanges.intron
GRanges of your introns.
branchpoint.motif
Motif read using the `universalmotif` package.
BSgenome
BSgenome for obtaining splice site sequences. You must make sure that the same genome build is used.
### Sample usage ```{r BranchPointScan} # The package itself bundles a MEME-generated branchpoint motif of flies. # This example uses the bundled motif on randomly sampled introns. ```