export-tracks {rtracklayer} | R Documentation |
These functions output
RangedData
instances in
various formats.
export.gff(object, con, version = c("1", "2", "3"), source = "rtracklayer", append = FALSE, ...) export.gff1(object, con, ...) export.gff2(object, con, ...) export.gff3(object, con, ...) export.bed(object, con, variant = c("base", "bedGraph", "bed15"), color = NULL, append = FALSE, ...) export.bed15(object, con, expNames = NULL, ...) export.bedGraph(object, con, ...) export.wig(object, con, dataFormat = c("auto", "variableStep", "fixedStep"), ...) export.ucsc(object, con, subformat = c("auto", "gff1", "wig", "bed", "bed15", "bedGraph"), append = FALSE, ...) ## not yet supported on Windows export.bw(object, con, dataFormat = c("auto", "variableStep", "fixedStep", "bedGraph"), seqlengths = GenomicRanges::seqlengths(object), compress = TRUE, ...)
object |
The object to export, such as a
|
con |
The connection to which the object is exported. |
version |
The GFF version, either "1", "2" or "3" (default is "1"). |
source |
The source of the GFF information, for GFF. |
variant |
Which variant of BED lines to output, not for the user. |
color |
Recycled vector of colors, as interpreted by
|
dataFormat |
The format of the data lines for WIG tracks, see references. The "auto" format uses the most efficient format possible. |
subformat |
The format of the tracks within the UCSC
container. If "auto", the type is determined from the
trackline. If |
expNames |
Names of the columns in |
seqlengths |
The lengths of each sequence in |
append |
Logical, whether to append the output to the connection |
compress |
Logical, indicating whether to compress the bigWig output |
... |
For |
The following is some advice for choosing a file format.
The General Feature Format is meant to
represent any set of genomic features, with application-specific
columns represented as “attributes”. There are three
principal versions (1, 2, and 3). This is a good format
for interoperating with other genomic tools. UCSC supports GFF1,
but it needs to be encapsulated in the UCSC metaformat,
i.e. export.ucsc(subformat = "gff1")
.
The Browser Extended Display format is for
displaying tracks in a genome browser, in particular UCSC. There
are many options to control the appearance of the track, see
GraphTrackLine
. To output a track line
when object
is not a UCSCData
,
call export.ucsc(subformat = "bed")
.
An extension of BED with 15 columns, Bed15 is meant to represent data from microarray experiments. Multiple samples/columns are supported, and the data is displayed as a compact heatmap. With 15 columns per feature, this format is probably too verbose for e.g. ChIP-seq coverage (use multiple WIG tracks instead).
A variant of BED that represents
experimental data more compactly than BED and especially
Bed15, although only one sample is supported. The data is
displayed as a bar or line graph. For dense data,
WIG
is preferred.
The Wiggle format is meant for storing dense numerical data, such as the coverage from a ChIP-seq experiment. The data is displayed as a bar or line graph.
In summary, BED is usually best for displaying qualitative features or sparse quantiative features (like ChIP-seq peaks), while WIG is usually best for displaying dense data like coverage.
In general, columns in the RangedData
are mapped to the
column in the track format of the same name. For example, a column
named “itemRgb” will be mapped to the corresponding column in
BED-formatted output, while it is ignored for other formats. Missing
values are mapped between NA
in R and the format-specific
missing value indicator, usually “.”. The
following describes how the RangedData
object is mapped to each
track format. Default values for columns are given in parentheses.
Maps columns named “source” (“rtracklayer”),
“feature” (“sequence”), “score” (“.”),
“strand” (“.”), “frame” (“.”), and
(version 1 only) “group” (seqname
). In GFF versions
2 and 3, extra columns are mapped to attributes.
Maps columns named “name” (“.”), “score”
(“.”), “strand” (“.”), “thickStart”
(start
), “thickEnd” (end
), “itemRgb”
(“0,0,0”), “blockSizes”, and
“blockStarts”. Note that the BED field
“blockCounts” is derived automatically. The intervals
specified by “thickStart”, “thickEnd” and
“blockStarts” are 0-based, half-open as in BED. Note that
this is different from the chromosome start/end stored in the
Ranges
object (1-based, closed). The “itemRgb”
column should be specified in a format understood by
col2rgb
.
In addition to the behavior for BED above, encodes columns
named by the expNames
parameter into the fields
“expCount”, “expIds” and “expScores”.
The “score” column is used for the quantitative values.
The “score” column is used for the quantitative values.
The graph formats do not encode a strand. Thus, when targeting the UCSC format, if a track contains features from multiple strands, one track will be output for each strand. The string "m", "p" or "NA" is appended to the base track name for the minus, plus and NA/* strand, respectively.
If con
is missing, a character vector containing the string
output, otherwise nothing.
Michael Lawrence
http://genome.ucsc.edu/goldenPath/help/customTrack.html\#BED
See export
for the high-level interface to these
functions.
dummy <- file() # dummy file connection for demo track <- import(system.file("tests", "bed.wig", package = "rtracklayer")) ## output a track as GFF2 export.gff(track, dummy, version = "2") ## equivalently export.gff2(track, dummy) ## output as WIG string in variableStep format wig <- export.wig(track, dummy, dataFormat = "variableStep") ## output multiple tracks in UCSC meta-format track2 <- import(system.file("tests", "v1.gff", package = "rtracklayer")) ## output to WIG library(IRanges) # for the RangedDataList() constructor export.ucsc(RangedDataList(track, track2), dummy, subformat = "wig")