Main Content

getBaseCoverage

Class: BioMap

Return base-by-base alignment coverage of reference sequence in BioMap object

Syntax

Cov = getBaseCoverage(BioObj, StartPos, EndPos)
Cov = getBaseCoverage(BioObj, StartPos, EndPos, R)
Cov = getBaseCoverage(..., Name,Value)
[Cov, BinStart] = getBaseCoverage(...)

Description

Cov = getBaseCoverage(BioObj, StartPos, EndPos) returns Cov, a row vector of nonnegative integers. This vector indicates the base-by-base alignment coverage of a range or set of ranges in the reference sequence in BioObj, a BioMap object. The range or set of ranges are defined by StartPos and EndPos. StartPos and EndPos can be two nonnegative integers such that StartPos is less than EndPos, and both integers are smaller than the length of the reference sequence. StartPos and EndPos can also be two column vectors representing a set of ranges (overlapping or segmented). When StartPos and EndPos specify a segmented range, Cov contains NaN values for base positions between segments.

Cov = getBaseCoverage(BioObj, StartPos, EndPos, R) selects the reference where getBaseCoverage calculates the coverage.

Cov = getBaseCoverage(..., Name,Value) returns alignment coverage information with additional options specified by one or more Name,Value pair arguments.

[Cov, BinStart] = getBaseCoverage(...) returns BinStart, a row vector of positive integers specifying the start position of each bin (when binning occurs).

Input Arguments

BioObj

Object of the BioMap class.

StartPos

Either of the following:

  • Nonnegative integer that defines the start of a range in the reference sequence. StartPos must be less than EndPos and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the start of a range in the reference sequence.

EndPos

Either of the following:

  • Nonnegative integer that defines the end of a range in the reference sequence. EndPos must be greater than StartPos and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the end of a range in the reference sequence.

R

Positive integer indexing the SequenceDictionary property of BioObj, or a character vector or string specifying the actual name of the reference.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

binWidth

Positive integer specifying the bin width, in number of base pairs (bp). Bins are centered within min(StartPos) and max(EndPos). Thus, the first and last bins span approximately equally outside the range from min(StartPos) to max (EndPos).

Note

You cannot specify both binWidth and numberOfBins.

numberOfBins

Positive integer specifying the number of equal-width bins to use to span the requested region. Bins are centered within min(StartPos) and max(EndPos). Thus, the first and last bins span approximately equally outside the range from min(StartPos) to max (EndPos).

Note

You cannot specify both binWidth and numberOfBins.

binType

Character vector or string specifying the binning algorithm. Choices are:

  • 'max' — From the bin, getBaseCoverage selects the base position with the most reads aligned to it, then uses its alignment coverage value for the bin.

  • 'min' — From the bin, getBaseCoverage selects the base position with the least reads aligned to it, then uses its alignment coverage value for the bin.

  • 'mean' — Uses the average alignment coverage, computed from all base positions within the bin.

Default: 'max'

complementRanges

Specifies whether to return the alignment coverage for the base positions between segments, instead of within segments. If true, the length of Cov is numel(min(StartPos):max(EndPos)), and Cov contains NaN values for base positions within segments.

Default: false

Spliced

Logical specifying whether short reads are spliced during mapping (as in mRNA-to-genome mapping). N symbols in the Signature property of the object are not counted.

Default: false

Output Arguments

Cov

Row vector of nonnegative integers. This vector specifies the number of read sequences that align with each base position or bin in the requested regions. A set of ranges can be overlapping or segmented. For a range, the length of Cov is numel(StartPos:EndPos). For a segmented range, the length of Cov is numel(min(StartPos):max(EndPos)). Cov contains NaN values for base positions between segments. When binning occurs, the number of elements in Cov equals the number of bins.

BinStart

Row vector of positive integers specifying the start position of each bin. BinStart is the same length as Cov. If no binning occurs, then BinStart equals min(StartPos):max(EndPos).

Examples

Construct a BioMap object, and then return the alignment coverage of each of the first 12 base positions of the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of reads that align to each of
% the first 12 base positions of the reference sequence
cov = getBaseCoverage(BMObj1, 1, 12)
cov =

     1     1     2     2     3     4     4     4     5     5     5     5

Construct a BioMap object, and then return the alignment coverage of the range between 1 and 1000, on a bin-by-bin basis, using bins with a width of 100 bp:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of reads that align to each 100-bp bin
% in the 1:1000 range of the reference sequence. Also return the
% start position of each bin
[cov, bin_starts] = getBaseCoverage(BMObj1, 1, 1000, 'binWidth', 100)
cov =

    17    20    41    44    45    48    48    45    46    42


bin_starts =

     1   101   201   301   401   501   601   701   801   901