matchPWM {Biostrings} | R Documentation |
Position Weight Matrix (PWM) creating, matching, and related utilities for DNA data. (PWM for amino acid sequences are not supported.)
PWM(x, type = c("log2probratio", "prob"), prior.params = c("A"=0.25, "C"=0.25, "G"=0.25, "T"=0.25)) matchPWM(pwm, subject, min.score="80%", ...) countPWM(pwm, subject, min.score="80%", ...) PWMscoreStartingAt(pwm, subject, starting.at=1) ## Utility functions for basic manipulation of the Position Weight Matrix maxWeights(x) minWeights(x) maxScore(x) minScore(x) unitScale(x) ## S4 method for signature 'matrix': reverseComplement(x, ...)
x |
For PWM a character string or DNAStringSet whose elements all have
the same number of characters.
For |
type |
The type of position weight matrix, either "log2probratio" or "prob". See Details section for more information. |
prior.params |
A positive numeric vector, which represents the parameters of the Dirichlet conjugate prior, with names A, C, G, and T. See Details section for more information. |
pwm |
A numeric matrix with row names A, C, G and T representing a Position Weight Matrix. |
subject |
An DNAString, XStringViews or MaskedDNAString
object for matchPWM and countPWM .
A DNAString object containing the subject sequence. |
min.score |
The minimum score for counting a match.
Can be given as a character string containing a percentage (e.g.
"85%" ) of the highest possible score or as a single number.
|
starting.at |
An integer vector specifying the starting positions of the Position Weight Matrix relatively to the subject. |
... |
Additional arguments for methods. |
The PWM
function uses a multinomial model with a Dirichlet conjugate
prior to calculate the estimated probability of base b at position i. As
mentioned in the Arguments section, prior.params
supplies the
parameters for the DNA bases A, C, G, and T in the Dirichlet prior. These
values result in a position independent initial estimate of the probabilities
for the bases to be
priorProbs = prior.params/sum(prior.params)
and the
posterior (data infused) estimate for the probabilities for the bases in each
of the positions to be
postProbs = (consensusMatrix(x) + prior.params)/(length(x) + sum(prior.params))
.
When type = "log2probratio"
, the PWM = unitScale(log2(postProbs/priorProbs))
.
When type = "prob"
, the PWM = unitScale(postProbs)
.
A numeric matrix representing the Position Weight Matrix for PWM
.
A numeric vector containing the Position Weight Matrix-based scores
for PWMscoreStartingAt
.
An XStringViews object for matchPWM
.
A single integer for countPWM
.
A vector containing the max weight for each position in pwm
for maxWeights
.
A vector containing the min weight for each position in pwm
for minWeights
.
The highest possible score for a given Position Weight Matrix for
maxScore
.
The lowest possible score for a given Position Weight Matrix for
maxScore
.
The modified numeric matrix given by
(x - minScore(x)/ncol(x))/(maxScore(x) - minScore(x))
for
unitScale
.
A PWM obtained by reverting the column order in PWM x
and by
reassigning each row to its complementary nucleotide
for reverseComplement
.
H. Pages and P. Aboyoun
Wasserman, WW, Sandelin, A., (2004) Applied bioinformatics for the identification of regulatory elements, Nat Rev Genet., 5(4):276-87.
matchPattern
,
reverseComplement
,
DNAString-class,
XStringViews-class
## Data setup data(HNF4alpha) library(BSgenome.Dmelanogaster.UCSC.dm3) chr3R <- Dmelanogaster$chr3R chr3R ## Create a PWM and perform some general routines pwm <- PWM(HNF4alpha) round(pwm, 2) maxWeights(pwm) maxScore(pwm) reverseComplement(pwm) ## Score the first 5 positions PWMscoreStartingAt(pwm, unmasked(chr3R), starting.at=1:5) ## Match the plus strand matchPWM(pwm, chr3R) countPWM(pwm, chr3R) ## Match the minus strand matchPWM(reverseComplement(pwm), chr3R)