Get desktop application:
View/edit binary Protocol Buffers messages
Alignments link query strings, such as other genomes or reads, to Paths.
The sequence that has been aligned.
The Path that the sequence follows in the graph it has been aligned to, containing the `Edit`s that modify the graph to produce the sequence.
The name of the sequence that has been aligned. Similar to read name in BAM.
The quality scores for the sequence, as values on a 0-255 scale.
The mapping quality score for the alignment, in Phreds.
The score for the alignment, in points.
The offset in the query at which this Alignment occurs.
The name of the sample that produced the aligned read.
The name of the read group to which the aligned read belongs.
The previous Alignment in the fragment. Contains just enough information to locate the full Alignment; e.g. contains an Alignment with only a name, or only a graph mapping position.
Similarly, the next Alignment in the fragment.
Flag marking the Alignment as secondary. All but one maximal-scoring alignment of a given read in a GAM file must be secondary.
Portion of aligned bases that are perfect matches, or 0 if no bases are aligned.
An estimate of the length of the fragment, if this Alignment is paired.
The loci that this alignment supports. TODO: get rid of this, we have annotations in our data model again.
Position of the alignment in reference paths embedded in graph
SAMTools-style flags
The fraction of bases in the alignment that are covered by MEMs with <=1 total hits in the graph
Correctness metric 1 = perfectly aligned to truth, 0 = not overlapping true alignment
The ordered list of scores of secondary mappings
Score under the given fragment model, assume higher is better
The fragment length distribution under which a paired-end alignment was aligned.
True if this alignment's score is adjusted for haplotype consistency, and false otherwise.
Actual log probability haplotype consistency likelihood
The time this alignment took
A path/offset/orientation pair specifying the distance to the correct alignment
This can be set to true to annotate the Alignment as having been mapped correctly.
Annotations carried along with the Alignment.
Summarizes reads that map to single position in the graph. This structure is pretty much identical to a line in Samtools pileup format if qualities set, it must have size = num_bases
Used in:
*Edges* describe linkages between nodes. They are bidirected, connecting the end (default) or start of the "from" node to the start (default) or end of the "to" node.
Used in:
, ,ID of upstream node.
ID of downstream node.
If the edge leaves from the 5' (start) of a node.
If the edge goes to the 3' (end) of a node.
Length of overlap between the connected `Node`s.
Keep pileup-like record for reads that span edges
Used in:
total reads mapped
number of reads mapped on forward strand
Edits describe how to generate a new string from elements in the graph. To determine the new string, just walk the series of edits, stepping from_length distance in the basis node, and to_length in the novel element, replacing from_length in the basis node with the sequence. There are several types of Edit: - *matches*: from_length == to_length; sequence is empty - *snps*: from_length == to_length; sequence = alt - *deletions*: to_length == 0 && from_length > to_length; sequence is empty - *insertions*: from_length < to_length; sequence = alt
Used in:
Length in the target/ref sequence that is removed.
Length in read/alt of the sequence it is replaced with.
The replacement sequence, if different from the original sequence.
Describes a genotype at a particular locus.
Used in:
These refer to the offsets of the alleles in the Locus object.
Likelihood natural logged.
Prior natural logged.
Posterior natural logged (unnormalized).
*Graphs* are collections of nodes and edges. They can represent subgraphs of larger graphs or be wholly-self-sufficient. Protobuf memory limits of 67108864 bytes mean we typically keep the size of them small generating graphs as collections of smaller subgraphs.
The `Node`s that make up the graph.
The `Edge`s that connect the `Node`s in the graph.
A set of named `Path`s that visit sequences of oriented `Node`s.
Used to serialize kmer matches.
If true, this kmer is backwards relative to its node, and position counts from the end of the node.
Support pinned to a location, which can be either a node or an edge
The support
The location
Describes a genetic locus with multiple possible alleles, a genotype, and observational support.
Used in:
A locus may have an identifying name.
These are all the alleles at the locus, not just the called ones. Note that a primary reference allele may or may not appear.
These supports are per-allele, matching the alleles above
sorted by likelihood or posterior the first one is the "call"
We also have a Support for the locus overall, because reads may have supported multiple alleles and we want to know how many total there were.
We track the likelihood of each allele individually, in addition to genotype likelihoods. Stores the likelihood natural logged.
A Mapping defines the relationship between a node in system and another entity. An empty edit list implies complete match, however it is preferred to specify the full edit structure. as it is more complex to handle special cases.
Used in:
The position at which the first Edit, if any, in the Mapping starts. Inclusive.
The series of `Edit`s to transform to region in read/alt.
The 1-based rank of the mapping in its containing path.
A subgraph of the unrolled Graph in which each non-branching path is associated with an alignment of part of the read and part of the graph such that any path through the MultipathAlignment indicates a valid alignment of a read to the graph
non-branching paths of the multipath alignment, each containing an alignment of part of the sequence to a Graph IMPORTANT: downstream applications will assume these are stored in topological order
-10 * log_10(probability of mismapping)
optional: indices of Subpaths that align the beginning of the read (i.e. source nodes)
Annotations carried along with the Alignment.
*Nodes* store sequence data.
Used in:
Sequence of DNA bases represented by the Node.
A name provides an identifier.
Each Node has a unique positive nonzero ID within its Graph.
Collect pileup records by node. Saves some space and hashing over storing individually, assuming not too sparse and avg. node length more than couple bases the ith BasePileup in the array corresponds to the position at offset i.
Used in:
Paths are walks through nodes defined by a series of `Edit`s. They can be used to represent: - haplotypes - mappings of reads, or alignments, by including edits - relationships between nodes - annotations from other data sources, such as: genes, exons, motifs, transcripts, peaks
Used in:
, , , ,The name of the path. Path names starting with underscore (_) are reserved for internal VG use.
The `Mapping`s which describe the order and orientation in which the Path visits `Node`s.
Set to true if the path is circular.
Optional length annotation for the Path.
Bundle up Node and Edge pileups
Used in:
,The Node on which the Position is.
The offset into that node's sequence at which the Position occurs.
True if we obtain the original sequence of the path by reverse complementing the mappings.
If the position is used to represent a position against a reference path
Describes a subgraph that is connected to the rest of the graph by two nodes.
Used in:
What type of snarl is this?
Visits that connect the Snarl to the rest of the graph
points *INTO* the snarl
points *OUT OF* the snarl
If this Snarl is nested in another, this field should be filled in with a Snarl that has the start and end visits filled in (other information is optional/extraneous)
Allows snarls to be named, e.g. by the hash of the VCF variant they come from.
Indicate whether there is a reversing path contained in the Snarl from either the start to itself or the end to itself
Indicate whether the start of the Snarl is connected through to the end.
Indicate whether the snarl's net graph is free of directed cycles
Describes a walk through a Snarl where each step is given as either a node or a child Snarl (leaving the walk through the child Snarl to another SnarlTraversal)
Steps of the walk through a Snarl, including the start and end nodes. If the traversal includes a Visit that represents a Snarl, both the node entering the Snarl and the node leaving the Snarl should be included in the traversal.
The name of the traversal can be used for a variant allele id (e.g. <parentSnarlHash>_0, <parentSnarlHash>_1... or by some other arbitrary annotation , unique or non-unique, e.g. deleteterious, gain_of_function, etc., though these will be lost in any indices).
Enumeration of the classifications of snarls
Used in:
A non-branching path of a MultipathAlignment
Used in:
describes node sequence and edits to the graph sequences
the indices of subpaths in the multipath alignment that are to the right of this path where right is in the direction of the end of the read sequence
score of this subpath's alignment
Aggregates information about the reads supporting an allele.
Used in:
,The overall quality of all the support, as -10 * log10(P(all support is wrong))
The number of supporting reads on the forward strand (which may be fractional)
The number of supporting reads on the reverse strand (which may be fractional)
TODO: what is this?
TODO: What is this?
Translations map from one graph to another. A collection of these provides a covering mapping between a from and to graph. If each "from" path through the base graph corresponds to a "to" path in an updated graph, then we can use these translations to project positions, mappings, and paths in the new graph into the old one using the Translator interface.
Describes a step of a walk through a Snarl either on a node or through a child Snarl
Used in:
,The node ID or snarl of this step (only one should be given)
only needs to contain the start and end Visits
Indicates: if node_id is specified reverse complement of node if feature_id is specified traversal of a child snarl entering backwards through end and leaving backwards through start