SeqAn3 3.3.0
The Modern C++ library for sequence analysis.
Loading...
Searching...
No Matches
Writing SAM files

Construction and specialisation

The seqan3::sam_file_output class comes with two constructors, one for construction from a file name and one for construction from an existing stream and a known format. The first one automatically picks the format based on the extension of the file name. The second can be used if you have a non-file stream, like std::cout or std::ostringstream, that you want to read from and/or if you cannot use file-extension based detection, but know that your output file has a certain format.

In most cases the template parameters are deduced completely automatically:

#include <filesystem>
int main()
{
auto tmp_file = std::filesystem::temp_directory_path() / "my.sam";
seqan3::sam_file_output fout{tmp_file}; // SAM format detected, std::ofstream opened for file
}
A class for writing SAM files, both SAM and its binary representation BAM are supported.
Definition io/sam_file/output.hpp:74
Provides seqan3::sam_file_output and corresponding traits classes.
T remove(T... args)
T temp_directory_path(T... args)

Writing to std::cout:

#include <iostream>
int main()
{
}
The SAM format (tag).
Definition format_sam.hpp:108

Note that this is not the same as writing sam_file_output<> (with angle brackets). In the latter case they are explicitly set to their default values, in the former case automatic deduction happens which chooses different parameters depending on the constructor arguments. For opening from file, sam_file_output<> would have also worked, but for opening from stream it would not have.

Writing record-wise

#include <sstream>
#include <string>
#include <vector>
int main()
{
std::string read_id;
// ... e.g. compute and alignment
using alignment_type =
alignment_type dummy_alignment{}; // an empty dummy alignment
// the record type specifies the fields we want to write
// initialize record
record_type rec{read, ref_id, dummy_alignment};
// Write the record
fout.push_back(rec);
// same as
fout.push_back(record_type{read, ref_id, dummy_alignment});
// as all our fields are empty so this would print an
}
Provides seqan3::dna5, container aliases and string literals.
@ ref_id
The identifier of the (reference) sequence that seqan3::field::seq was aligned to.
A class template that holds a choice of seqan3::field.
Definition record.hpp:128
The class template that file records are based on; behaves like a std::tuple.
Definition record.hpp:193
Type that contains multiple types.
Definition type_list.hpp:29
Provides seqan3::type_list.

The easiest way to write to a SAM/BAM file is to use the push_back() member functions. These work similarly to how they work on a std::vector. You may also use a tuple like interface or the emplace_back() function but this is not recommended since one would have to keep track of the correct order of many fields (14 in total). For the record based interface using push_back() please also see the seqan3::record documentation on how to specify a record with the correct field and type lists.

You may also use the output file's iterator for writing, however, this rarely provides an advantage.

Writing record-wise (custom fields)

If you want to omit non-required parameter or change the order of the parameters, you can pass a non-empty fields trait object to the seqan3::sam_file_output constructor to select the fields that are used for interpreting the arguments.

The following snippet demonstrates the usage of such a field_traits object.

#include <filesystem>
#include <sstream>
#include <tuple>
int main()
{
// I only want to print the mapping position (field::ref_offset) and flag:
unsigned mapping_pos{1300};
// ...
fout.emplace_back(mapping_pos, flag); // note that the order the arguments is now different, because
// or: you specified that REF_OFFSET should be first
fout.push_back(std::tie(mapping_pos, flag));
}
sam_flag
An enum flag that describes the properties of an aligned read (given as a SAM record).
Definition sam_flag.hpp:76
@ none
None of the flags below are set.
@ flag
The alignment flag (bit information), uint16_t value.
T tie(T... args)

A different way of passing custom fields to the file is to pass a seqan3::record – instead of a tuple – to push_back(). The seqan3::record clearly indicates which of its elements has which seqan3::field so the file will use that information instead of the template argument. This is especially handy when reading from one file and writing to another, because you don't have to configure the output file to match the input file, it will just work:

#include <filesystem>
#include <sstream>
auto sam_file_raw = R"(@HD VN:1.6 SO:coordinate GO:none
@SQ SN:ref LN:45
r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *
r003 0 ref 29 30 5S6M * 0 0 GCCTAAGCTAA * SA:Z:ref,29,-,6H5M,17,0;
r003 2064 ref 29 17 6H5M * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1;
r001 147 ref 237 30 9M = 7 -39 CAGCGGCAT * NM:i:1
)";
int main()
{
// fin uses custom fields, fout uses the default fields.
// output doesn't have to match the configuration of the input
for (auto & r : fin)
fout.push_back(r); // copy all the records.
}
A class for reading SAM files, both SAM and its binary representation BAM are supported.
Definition sam_file/input.hpp:242
Meta-header for the IO / SAM File submodule .

This will copy the seqan3::field::flag and seqan3::field::ref_offset value into the new output file.

Note
Note that the other SAM columns in the output file will have a default value, so unless you specify to read all SAM columns (see seqan3::format_sam) the output file will not be equal to the input file.

Writing record-wise in batches

You can write multiple records at once, by assigning to the file:

#include <sstream>
#include <string>
#include <tuple>
#include <vector>
int main()
{
using namespace seqan3::literals;
{"NATA"_dna5, "2nd"},
{"GATA"_dna5, "Third"}}; // a range of "records"
fout = range; // will iterate over the records and write them
// equivalent to:
range | fout;
}
The SeqAn namespace for literals.

File I/O pipelines

Record-wise writing in batches also works for writing from input files directly to output files, because input files are also input ranges in SeqAn:

#include <sstream>
auto sam_file_raw = R"(First 0 * 0 0 * * 0 0 ACGT *
2nd 0 * 0 0 * * 0 0 NATA *
Third 0 * 0 0 * * 0 0 GATA *
)";
int main()
{
// copying a file in one line:
// with seqan3::sam_file_output as a variable:
fout = fin;
// or in pipe notation:
}

This can be combined with file-based views to create I/O pipelines:

#include <ranges>
#include <sstream>
auto sam_file_raw = R"(@HD VN:1.6 SO:coordinate GO:none
@SQ SN:ref LN:45
r001 99 ref 7 30 * = 37 39 TTAGATAAAGGATACTG *
r003 0 ref 29 30 * * 0 0 GCCTAAGCTAA * SA:Z:ref,29,-,6H5M,17,0;
r003 2064 ref 29 17 * * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1;
r001 147 ref 237 30 * = 7 -39 CAGCGGCAT * NM:i:1
)";
int main()
{
auto input_file = seqan3::sam_file_input{std::istringstream{sam_file_raw}, seqan3::format_sam{}};
input_file | std::views::take(3) // take only the first 3 records
}

Formats

We currently support writing the following formats: