export-tracks {rtracklayer}R Documentation

Export tracks

Description

These functions output RangedData instances in various formats.

Usage

export.gff(object, con, version = c("1", "2", "3"), source =
           "rtracklayer", append = FALSE, ...)
export.gff1(object, con, ...)
export.gff2(object, con, ...)
export.gff3(object, con, ...)
export.bed(object, con, variant = c("base", "bedGraph", "bed15"),
           color = NULL, append = FALSE, ...)
export.bed15(object, con, expNames = NULL, ...)
export.bedGraph(object, con, ...)
export.wig(object, con,
           dataFormat = c("auto", "variableStep", "fixedStep"), ...)
export.ucsc(object, con, subformat = c("auto", "gff1", "wig", "bed",
           "bed15", "bedGraph"), append = FALSE, ...)
## not yet supported on Windows
export.bw(object, con,
          dataFormat = c("auto", "variableStep", "fixedStep", "bedGraph"),
          seqlengths = GenomicRanges::seqlengths(object), compress = TRUE, ...)

Arguments

object

The object to export, such as a RangedData, or anything coercible to a RangedData. If a UCSCData, the track line information is output. In the case of export.bed15, export.bedGraph, export.wig, and export.ucsc, a RangedDataList object with possibly multiple tracks is supported.

con

The connection to which the object is exported.

version

The GFF version, either "1", "2" or "3" (default is "1").

source

The source of the GFF information, for GFF.

variant

Which variant of BED lines to output, not for the user.

color

Recycled vector of colors, as interpreted by col2rgb for BED features. If NULL, the color column in the featureData is used, if any.

dataFormat

The format of the data lines for WIG tracks, see references. The "auto" format uses the most efficient format possible.

subformat

The format of the tracks within the UCSC container. If "auto", the type is determined from the trackline. If object is not a UCSCData, this essentially means "wig" or "bedGraph" (depending on the density) if there is a numeric score, else "bed".

expNames

Names of the columns in object that hold the experimental data. Defaults to all column names, unless object is a UCSCData, in which case the expNames field is taken from the track line, if it exists.

seqlengths

The lengths of each sequence in object. If seqinfo(object) is missing sequence lengths, an attempt is made to retrieve the sequence lengths from an installed BSgenome package or UCSC, as long as there is a matching genome identifier.

append

Logical, whether to append the output to the connection

compress

Logical, indicating whether to compress the bigWig output

...

For export.gff1, export.gff2 and export.gff3: arguments to pass to export.gff. For export.bed: arguments to pass to methods. For export.bed15, export.bedGraph and export.wig: arguments to pass to export.ucsc. For export.ucsc: arguments to pass to export.subformat or to set on the slots of the TrackLine subclass corresponding to subformat.

Details

The following is some advice for choosing a file format.

GFF

The General Feature Format is meant to represent any set of genomic features, with application-specific columns represented as “attributes”. There are three principal versions (1, 2, and 3). This is a good format for interoperating with other genomic tools. UCSC supports GFF1, but it needs to be encapsulated in the UCSC metaformat, i.e. export.ucsc(subformat = "gff1").

BED

The Browser Extended Display format is for displaying tracks in a genome browser, in particular UCSC. There are many options to control the appearance of the track, see GraphTrackLine. To output a track line when object is not a UCSCData, call export.ucsc(subformat = "bed").

Bed15

An extension of BED with 15 columns, Bed15 is meant to represent data from microarray experiments. Multiple samples/columns are supported, and the data is displayed as a compact heatmap. With 15 columns per feature, this format is probably too verbose for e.g. ChIP-seq coverage (use multiple WIG tracks instead).

bedGraph

A variant of BED that represents experimental data more compactly than BED and especially Bed15, although only one sample is supported. The data is displayed as a bar or line graph. For dense data, WIG is preferred.

WIG

The Wiggle format is meant for storing dense numerical data, such as the coverage from a ChIP-seq experiment. The data is displayed as a bar or line graph.

In summary, BED is usually best for displaying qualitative features or sparse quantiative features (like ChIP-seq peaks), while WIG is usually best for displaying dense data like coverage.

In general, columns in the RangedData are mapped to the column in the track format of the same name. For example, a column named “itemRgb” will be mapped to the corresponding column in BED-formatted output, while it is ignored for other formats. Missing values are mapped between NA in R and the format-specific missing value indicator, usually “.”. The following describes how the RangedData object is mapped to each track format. Default values for columns are given in parentheses.

GFF

Maps columns named “source” (“rtracklayer”), “feature” (“sequence”), “score” (“.”), “strand” (“.”), “frame” (“.”), and (version 1 only) “group” (seqname). In GFF versions 2 and 3, extra columns are mapped to attributes.

BED

Maps columns named “name” (“.”), “score” (“.”), “strand” (“.”), “thickStart” (start), “thickEnd” (end), “itemRgb” (“0,0,0”), “blockSizes”, and “blockStarts”. Note that the BED field “blockCounts” is derived automatically. The intervals specified by “thickStart”, “thickEnd” and “blockStarts” are 0-based, half-open as in BED. Note that this is different from the chromosome start/end stored in the Ranges object (1-based, closed). The “itemRgb” column should be specified in a format understood by col2rgb.

Bed15

In addition to the behavior for BED above, encodes columns named by the expNames parameter into the fields “expCount”, “expIds” and “expScores”.

bedGraph

The “score” column is used for the quantitative values.

WIG

The “score” column is used for the quantitative values.

The graph formats do not encode a strand. Thus, when targeting the UCSC format, if a track contains features from multiple strands, one track will be output for each strand. The string "m", "p" or "NA" is appended to the base track name for the minus, plus and NA/* strand, respectively.

Value

If con is missing, a character vector containing the string output, otherwise nothing.

Author(s)

Michael Lawrence

References

GFF1 and GFF2

http://www.sanger.ac.uk/Software/formats/GFF

GFF3

http://www.sequenceontology.org/gff3.shtml

BED

http://genome.ucsc.edu/goldenPath/help/customTrack.html\#BED

WIG

http://genome.ucsc.edu/goldenPath/help/wiggle.html

UCSC

http://genome.ucsc.edu/goldenPath/help/customTrack.html

See Also

See export for the high-level interface to these functions.

Examples

  dummy <- file() # dummy file connection for demo
  track <- import(system.file("tests", "bed.wig", package = "rtracklayer"))
  ## output a track as GFF2
  export.gff(track, dummy, version = "2")
  ## equivalently
  export.gff2(track, dummy)
  ## output as WIG string in variableStep format
  wig <- export.wig(track, dummy, dataFormat = "variableStep")
  ## output multiple tracks in UCSC meta-format
  track2 <- import(system.file("tests", "v1.gff", package = "rtracklayer"))
  ## output to WIG
  library(IRanges)  # for the RangedDataList() constructor
  export.ucsc(RangedDataList(track, track2), dummy, subformat = "wig") 

[Package rtracklayer version 1.14.1 Index]