---
title: "Analysis of common intron properties"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{intron-properties}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Introduction
This vignette introduces functions for characterizing common properties of
introns.
- MaxEntScan
- Splicing site strength using the
[MaxEntScan program](http://hollywood.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html).
- phastCons
- Conservation scoring using phastCons program outputs
(bigwig of scores and bed of identified conserved regions).
- BranchPointScan
- Motif scanning to characterize distance from branchpoint to 3'SS using
[universalmotif](https://bioconductor.org/packages/release/bioc/html/universalmotif.html).
We will use several random introns and exons from the dm6 Bioconductor
databases in this vignette. For your own applications, read in your introns
into a `GenomicRanges::GRanges` object and then the same workflow can be easily
adopted. The Bioconductor `BiocIO::import()` may be helpful if you have
UCSC BED files in hand.
Randomly sample a bunch of exons and introns.
```{r sampling}
library(gsAnalysis)
requireNamespace("TxDb.Dmelanogaster.UCSC.dm6.ensGene", quietly = TRUE)
requireNamespace("BSgenome.Dmelanogaster.UCSC.dm6", quietly = TRUE)
```
## MaxEntScan: splicing site strength
`MaxEntScan` reports splicing site strength. It is a simple wrapper around the
original MaxEntScan perl routine.
### Preparation
You need to gather MaxEntScan program on your own.
This can be done in one of the following ways.
Also, you need `perl` to be in your system paths.
-
Download the perl wrappers from the original [MaxEntScan site](http://hollywood.mit.edu/burgelab/maxent/download/). Then, compress all
files into a zip without any root folder.
-
Download prezipped archive of the MaxEntScan from my [Github archive repo](https://github.com/yeyuan98/archived_external_resources). The
`burgelab.maxent.zip` will be the MaxEntScan perl wrappers.
### Required inputs:
- path.zip.MES
- Path to the zipped MaxEntScan perl scripts
- BSgenome
- BSgenome for obtaining splice site sequences.
You must make sure that the same genome build is used.
- GRange.intron
- GRanges of your introns.
### Sample usage
```{r MaxEntScan}
# The package itself does not bundle the MaxEntScan program and
# the following is NOT run.
# Expected outputs are shown in comments at the end.
```
## phastCons: conservation scoring
`phastCons` reads conservation scores (bigwig) and conserved elements (bed)
reported by the phastCons program and compute convervation scores for your
genomic ranges.
### Preparation
Phastcons score and conserved element files must be provided independently.
One location to fetch these data is from UCSC.
To get 124-way score and element files for Drosophila melanogaster dm6 genome:
- bigWig of scores
- https://hgdownload.soe.ucsc.edu/goldenPath/dm6/phastCons124way/dm6.phastCons124way.bw
- BED of elements
- https://hgdownload.soe.ucsc.edu/goldenPath/dm6/database/phastConsElements124way.txt.gz
Unzip the `.gz` file. Provide `.bw` and `.txt` to the `phastCons` function.
### Required inputs:
- GRange.intron
- GRanges of your introns. Alternatively, any other genome ranges may
be used. This is a generally applicable function.
- bw.phastCons.path
- Path to the phastCons score bigWig file.
- bed.phastCons.path
- Path to the phastCons conserved element BED file.
### Sample usage
```{r phastCons}
# The package itself does not bundle the score data and
# the following is NOT run.
# Expected outputs are shown in comments at the end.
```
## BranchPointScan: distance of branchpoint to the 3' splice site
`BranchPointScan` is a wrapper of the `universalmotif` package for
identifying the highest-scored branchpoint motif position w.r.t. 3'-SS.
### Preparation
You will need to provide a branchpoint motif matrix generated from motif
discovery packages, e.g., HOMER or the MEME Suite. The workflow below
starts with a precomputed MEME motif for branchpoint of the fruit fly.
A tutorial of using the MEME suite to generate branchpoint motif will be
released elsewhere TODO.
### Required inputs:
- GRanges.intron
- GRanges of your introns.
- branchpoint.motif
- Motif read using the `universalmotif` package.
- BSgenome
- BSgenome for obtaining splice site sequences.
You must make sure that the same genome build is used.
### Sample usage
```{r BranchPointScan}
# The package itself bundles a MEME-generated branchpoint motif of flies.
# This example uses the bundled motif on randomly sampled introns.
```