7. Summary

These tables summarize BEDOPS utilities by option, file inputs and BED column requirements.

7.1. Set operation and statistical utilities

7.1.1. bedextract

  • Efficiently extracts features from BED input.

  • BEDOPS bedextract documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

--list-chr

Print every chromosome found in input.bed

1

1

3

<chromosome>

Retrieve all rows for specified chromosome, e.g. bedextract chr8 input.bed

1

1

3

<query> <reference>

Grab elements of query that overlap elements in reference. Same as bedops -e -1 query reference, except that this option fails when query contains fully-nested BED elements. May use - to indicate stdin for reference only.

2

2

3

7.1.2. bedmap

  • Maps source signals from map-file onto qualified target regions from ref-file. Calculates an output for every ref-file element.

  • BEDOPS bedmap documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

--bases

Reports the total number of bases from map-file that overlap the ref-file ‘s element.

1

2

3

--bases-uniq

Reports the number of distinct bases from ref-file ‘s element overlapped by elements in map-file.

1

2

3

--bases-uniq-f

Reports the fraction of distinct bases from ref-file ‘s element elements in map-file.

1

2

3

--bp-ovr <int>

Require <int> bases of overlap between elements of input files.

1

2

3

--chrom <chromosome>

Process data for given <chromosome> only.

1

2

3

--count

Reports the number of overlapping elements in map-file.

1

2

3

--cv

Reports the Coefficient of Variation: the result of --stdev divided by the result of --mean.

1

2

5

--ec

Error-check all input files (slower).

1

2

3

--echo

Echo each line from ref-file.

1

2

3

--echo-map

Reports the overlapping elements found in map-file.

1

2

3

--echo-map-id

Reports the IDs (4th column) from overlapping map-file elements.

1

2

4

--echo-map-id-uniq

List unique IDs from overlapping map-file elements.

1

2

4

--echo-map-range

Reports the genomic range of overlapping elements from map-file.

1

2

3

--echo-map-score

Reports the scores (5th column) from overlapping map-file elements.

1

2

5

--echo-map-size

Calculates difference between start and stop coordinates (or size) of each mapped element.

1

2

3

--echo-overlap-size

Calculates size of overlap between each mapped element and its reference element.

1

2

3

--echo-ref-name

Reports the first 3 fields of ref-file element in chrom:start-end format.

1

2

3

--echo-ref-size

Reports the length of the ref-file element.

1

2

3

--faster

(Advanced) Strong input assumptions are made. Review documents before use. Compatible with --bp-ovr and --range overlap options only.

1

2

5

--fraction-ref <val>

The fraction of the element’s size from ref-file that must overlap the element in map-file. Expects 0 < val <= 1.

1

2

5

--fraction-map <val>

The fraction of the element’s size from map-file that must overlap the element in ref-file. Expects 0 < val <= 1.

1

2

5

--fraction-both <val>

Both --fraction-ref <val> and --fraction-map <val> must be true to qualify as overlapping. Expects 0 < val <= 1.

1

2

5

--fraction-either <val>

Both --fraction-ref <val> and --fraction-map <val> must be true to qualify as overlapping. Expects 0 < val <= 1.

1

2

5

--exact

Shorthand for --fraction-both 1. First three fields from map-file must be identical to ref-file element.

1

2

5

--indicator

Reports the presence of one or more overlapping elements in map-file as a binary value (0 or 1).

1

2

3

--kth <val>

Reports the value at the k th fraction. A generalized median-like calculation, where --kth 0.5 is the median. (0 < val <= 1)

1

2

5

--mad <mult=1>

Reports the ‘median absolute deviation’ of overlapping elements in map-file, multiplied by <mult>.

1

2

5

--max

Reports the highest score from overlapping elements in map-file.

1

2

5

--max-element

The lexicographically “smallest” element with the highest score from overlapping elements in map-file. If no overlapping element exists, NAN is reported (unless --skip-unmapped is used).

1

2

5

--max-element-rand

A randomly-chosed element with the highest score from overlapping elements in map-file. If no overlapping element exists, NAN is reported (unless --skip-unmapped is used).

1

2

5

--mean

Reports the average score from overlapping elements in map-file.

1

2

5

--median

Reports the median score from overlapping elements in map-file.

1

2

5

--min

Reports the lowest score from overlapping elements in map-file.

1

2

5

--min-element

The lexicographically “smallest” element with the lowest score from overlapping elements in map-file. If no overlapping element exists, NAN is reported (unless --skip-unmapped is used).

1

2

5

--min-element-rand

A randomly-chosed element with the lowest score from overlapping elements in map-file. If no overlapping element exists, NAN is reported (unless --skip-unmapped is used).

1

2

5

--skip-unmapped

Omits printing reference elements which do not associate with any mapped elements.

1

2

3

--stdev

Reports the square root of the result of --variance.

1

2

5

--sum

Reports the accumulated value from scores of overlapping elements in map-file.

1

2

5

--sweep-all

Reads through entire map-file dataset to avoid early termination that may cause SIGPIPE or other I/O errors.

1

2

3

--tmean <low> <hi>

Reports the mean score from overlapping elements in map-file, after ignoring the bottom <low> and top <hi> fractions of those scores. (0 <= low <= 1, 0 <= hi <= 1, low + hi <= 1).

1

2

5

--variance

Reports the variance of scores from overlapping elements in map-file.

1

2

5

7.1.3. bedops

  • Offers set and multiset operations for files in BED format.

  • BEDOPS bedops documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

--chrom <chromosome>

Process data for given chromosome only.

1

No imposed limit

3

--complement, -c

Reports the intervening intervals between the input coordinate segments.

1

No imposed limit

3

--chop, -w

Breaks up merged regions into fixed-size chunks, optionally anchored on start coordinates a fixed distance apart.

1

No imposed limit

3

--difference, -d

Reports the intervals found in the first file that are not present in any other input file.

2

No imposed limit

3

--ec

Error-check input files (slower).

1

No imposed limit

3

--element-of, -e

Reports rows from the first file that overlap, by a specified percentage or number of base pairs, the merged segments from all other input files.

2

No imposed limit

3

--header

Accept headers (VCF, GFF, SAM, BED, WIG) in any input file.

1

No imposed limit

3

--intersect, -i

Reports the intervals common to all input files.

2

No imposed limit

3

--merge, -m

Reports intervals from all input files, after merging overlapping and adjoining segments.

1

No imposed limit

3

--not-element-of, -n

Reports exactly everything that --element-of does not, given the same overlap criterion.

2

No imposed limit

3

--partition, -p

Reports all disjoint intervals from all input files. Overlapping segments are cut up into pieces at all segment boundaries.

1

No imposed limit

3

--range L:R

Add L bases to all start coordinates and R base to end coordinates. Either value may be positive or negative to grow or shrink regions, respectively. With the -e or -n operation, the first (reference) file is not padded, unlike all other files.

1

No imposed limit

3

--range S

Pad input file(s) coordinates symmetrically by S bases. This is shorthand for --range -S:S.

1

No imposed limit

3

--symmdiff, -s

Reports the intervals found in exactly one input file.

2

No imposed limit

3

--everything, -u

Reports the intervals from all input files in sorted order. Duplicates are retained in the output.

1

No imposed limit

3

7.1.4. closest-features

  • For every element in input-file, find those elements in query-file nearest to its left and right edges.

  • BEDOPS closest-features documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

(no option)

NA

2

2

3

--chrom <chromosome>

Process data for given <chromosome> only.

2

2

3

--dist

Output includes the signed distances between the input-file element and the closest elements in query-file.

2

2

3

--ec

Error-check all input files (slower).

2

2

3

--no-overlaps

Do not consider elements that overlap. Overlapping elements, otherwise, have highest precedence.

2

2

3

--no-ref

Do not echo elements from input-file.

2

2

3

--closest

Choose the nearest element from query-file only. Ties go to the leftmost closest element.

2

2

3

7.2. Sorting

7.2.1. sort-bed

  • Sorts input BED file(s) into the order required by other utilities. Loads all input data into memory.

  • BEDOPS sort-bed documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

(no option)

NA

1

1000

3

--max-mem <val>

<val> specifies the maximum memory usage for the sort-bed process, which is useful for very large BED inputs. For example, --max-mem may be 8G, 8000M, or 8000000000 to specify 8 GB of memory.

1

1000

3

--unique

Report unique elements (those which only occur once) in output.

1

1000

3

--duplicates

Report duplicate elements (those which occur 2+ times) in output.

1

1000

3

7.3. Compression and extraction

7.3.1. starch

  • Lossless compression of any BED file.

  • BEDOPS starch documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

(no option)

NA

1

1

3

--bzip2 or --gzip

The internal compression method. The default --bzip2 method favors storage efficiency, while --gzip favors compression and extraction time performance.

1

1

3

--note="foo bar..."

Append note to output archive metadata (optional).

1

1

3

--report-progress=N

Write progress to standard error stream for every N input elements.

1

1

3

7.3.2. unstarch

  • Extraction of a starch archive or attributes.

  • BEDOPS unstarch documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

(no option)

NA

1

1

NA

--archive-type

Show archive’s compression type (either bzip2 or gzip).

1

1

NA

--archive-version

Show archive version (at this time, either 1.x or 2.x).

1

1

NA

--archive-timestamp

Show archive creation timestamp (ISO 8601 format).

1

1

NA

--bases <chromosome>

Show total, non-unique base counts for optional <chromosome> (omitting <chromosome> shows total non-unique base count).

1

1

NA

--bases-uniq <chromosome>

Show unique base counts for optional <chromosome> (omitting <chromosome> shows total, unique base count).

1

1

NA

<chromosome>

Decompress information for a single <chromosome> only.

1

1

NA

--duplicatesExist or --duplicatesExistAsString with <chromosome>

Report if optional <chromosome> or chromosomes contain duplicate elements as 0/1 numbers or false/true strings

1

1

NA

--elements <chromosome>

Show element count for optional <chromosome> (omitting <chromosome> shows total element count).

1

1

NA

--elements-max-string-length

Show element maximum string length for optional <chromosome> (omitting <chromosome> shows maximum string length over all chromosomes).

1

1

NA

--is-starch

Test if the <starch-file> is a valid starch archive, returning 0/1 for a false/true result

1

1

NA

--list or --list-json

Print the metadata for a starch file, either in tabular form or with JSON formatting.

1

1

NA

--list-chr or --list-chromosomes

List all chromosomes in starch archive (similar to bedextract --list-chr).

1

1

NA

--nestedsExist or --nestedsExistAsString with <chromosome>

Report if optional <chromosome> or chromosomes contain nested elements as 0/1 numbers or false/true strings

1

1

NA

--note

Show descriptive note (if originally added to archive).

1

1

NA

--signature with <chromosome>

Show SHA-1 signature of specified chromosome (Base64-encoded) or all signatures if chromosome is not specified.

1

1

NA

--verify-signature with <chromosome>

Compare SHA-1 signature of specified chromosome with signature that is stored in the archive metadata, reporting error is mismatched.

1

1

NA

7.3.3. starchcat

  • Merge multiple starch archive inputs into one starch archive output.

  • BEDOPS starchcat documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

(no option)

NA

1

No imposed limit

NA

--bzip2 or --gzip

The internal compression method. The default --bzip2 method favors storage efficiency, while --gzip favors compression and extraction time performance.

1

No imposed limit

NA

--note="foo bar..."

Append note to output archive metadata (optional).

1

No imposed limit

NA

--report-progress=N

Write progress to standard error stream for every N input elements.

1

No imposed limit

NA

7.3.4. starchstrip

  • Extract or filter a starch archive by one or more specified chromosome names.

  • BEDOPS starchstrip documentation.

option

description

min. file inputs

max. file inputs

min. BED columns

(no option)

NA

1

No imposed limit

NA

--include or --exclude with <chromosomes>

Writes output with inclusion or exclusion of specified chromosome name records (comma-delimited string).

NA

No imposed limit

NA