package nucleus.genomics.v1

Mouse Melon logoGet desktop application:
View/edit binary Protocol Buffers messages

message BedGraphRecord

bedgraph.proto:36

Represents one line of a BedGraph file. See https://genome.ucsc.edu/goldenPath/help/bedgraph.html for details on the format.

message BedHeader

bed.proto:89

message BedReaderOptions

bed.proto:99

Options for reading BED files.

message BedRecord

bed.proto:35

This message represents a single BED record. See https://genome.ucsc.edu/FAQ/FAQformat.html#format1 for details.

enum BedRecord.Strand

bed.proto:53

Used in: BedRecord

message BedWriterOptions

bed.proto:110

Options for writing BED files. Currently this is a placeholder message.

(message has no fields)

message CigarUnit

cigar.proto:34

A single CIGAR operation.

Used in: LinearAlignment

enum CigarUnit.Operation

cigar.proto:37

Describes the different types of CIGAR alignment operations that exist. Used wherever CIGAR alignments are used.

Used in: CigarUnit

message ContigInfo

reference.proto:45

This record type records information about a contig. This is used both in VCF header parsing and by GenomeReference objects for querying references. Due to its generality, this message is also used by the FastaReader to provide detailed information on the description line of a FASTA record even in cases where the record does not correspond to a reference genome contig.

Used in: FastaRecord, SamHeader, VcfHeader

message FastaReaderOptions

fasta.proto:66

enum FastaReaderOptions.DeflineParsing

fasta.proto:74

Used in: FastaReaderOptions

message FastaRecord

fasta.proto:38

This message represents a single FASTA record. This can be any FASTA file, representing DNA, RNA, protein, or other sequence.

message FastaWriterOptions

fasta.proto:92

Options for writing FASTA files. Currently this is a placeholder message but could be used to support different choices on output like the number of columns per line.

(message has no fields)

message FastqReaderOptions

fastq.proto:56

message FastqRecord

fastq.proto:34

This message represents a single FASTQ record.

message FastqWriterOptions

fastq.proto:68

Options for writing FASTQ files. Currently this is a placeholder message but could be used to support different choices on output like whether the pad line should include the header or not.

(message has no fields)

message GffHeader

gff.proto:118

A message encoding the directives contained in a GFF3 file header. Consult the file format reference for detailed descriptions of these directives. Note that we do NOT handle the FASTA directive (a rarely-used method to bundle reference sequences within a GFF file.)

message GffHeader.GenomeBuildDirective

gff.proto:145

Used in: GffHeader

message GffHeader.OntologyDirective

gff.proto:130

An OntologyDirective holds the URI to a sequence ontology database, reflecting the ontology over the entities in the `type`, `source`, and `attributes` fields of a GffRecord.

Used in: GffHeader

message GffReaderOptions

gff.proto:152

(message has no fields)

message GffRecord

gff.proto:40

This message represents a single GFF3 record. See https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md for details on the file format; that document is quoted below. TODO: deal with %-encoding.

enum GffRecord.Strand

gff.proto:78

TODO: factor this out (here and BED, at least)

Used in: GffRecord

message GffWriterOptions

gff.proto:155

(message has no fields)

message LinearAlignment

reads.proto:40

A linear alignment can be represented by one CIGAR string. Describes the mapped position and local alignment of the read to the reference.

Used in: Read

message ListValue

struct.proto:91

`ListValue` is a wrapper around a repeated field of values. The JSON representation for `ListValue` is JSON array.

Used in: Read, Value, Variant, VariantCall

enum NullValue

struct.proto:83

`NullValue` is a singleton enumeration to represent the null value for the `Value` type union. The JSON representation for `NullValue` is JSON `null`.

Used in: Value

message Position

position.proto:38

An abstraction for referring to a genomic position, in relation to some already known reference. For now, represents a genomic position as a reference name, a base number on that reference (0-based), and a determination of forward or reverse strand.

Used in: learning.genomics.deepvariant.AlleleCount, LinearAlignment, Read

message Program

reads.proto:332

A Program is used in the SAM header to track how alignment data is generated. This is a sub-message of SamHeader, at the same scope to reduce verbosity.

Used in: SamHeader

message Range

range.proto:34

A 0-based half-open genomic coordinate range for search requests.

Used in: learning.genomics.deepvariant.CandidateHaplotypes, FastaRecord, GffHeader, GffRecord, ReferenceSequence

message Read

reads.proto:140

A read alignment describes a linear alignment of a string of DNA to a [reference sequence][learning.genomics.v1.Reference], in addition to metadata about the fragment (the molecule of DNA sequenced) and the read (the bases which were read by the sequencer). A read is equivalent to a line in a SAM file. A read belongs to exactly one read group and exactly one [read group set][learning.genomics.v1.ReadGroupSet]. For more genomics resource definitions, see [Fundamentals of Google Genomics](https://cloud.google.com/genomics/fundamentals-of-google-genomics) ### Reverse-stranded reads Mapped reads (reads having a non-null `alignment`) can be aligned to either the forward or the reverse strand of their associated reference. Strandedness of a mapped read is encoded by `alignment.position.reverseStrand`. If we consider the reference to be a forward-stranded coordinate space of `[0, reference.length)` with `0` as the left-most position and `reference.length` as the right-most position, reads are always aligned left to right. That is, `alignment.position.position` always refers to the left-most reference coordinate and `alignment.cigar` describes the alignment of this read to the reference from left to right. All per-base fields such as `alignedSequence` and `alignedQuality` share this same left-to-right orientation; this is true of reads which are aligned to either strand. For reverse-stranded reads, this means that `alignedSequence` is the reverse complement of the bases that were originally reported by the sequencing machine. ### Generating a reference-aligned sequence string When interacting with mapped reads, it's often useful to produce a string representing the local alignment of the read to reference. The following pseudocode demonstrates one way of doing this: out = "" offset = 0 for c in read.alignment.cigar { switch c.operation { case "ALIGNMENT_MATCH", "SEQUENCE_MATCH", "SEQUENCE_MISMATCH": out += read.alignedSequence[offset:offset+c.operationLength] offset += c.operationLength break case "CLIP_SOFT", "INSERT": offset += c.operationLength break case "PAD": out += repeat("*", c.operationLength) break case "DELETE": out += repeat("-", c.operationLength) break case "SKIP": out += repeat(" ", c.operationLength) break case "CLIP_HARD": break } } return out ### Converting to SAM's CIGAR string The following pseudocode generates a SAM CIGAR string from the `cigar` field. Note that this is a lossy conversion (`cigar.referenceSequence` is lost). cigarMap = { "ALIGNMENT_MATCH": "M", "INSERT": "I", "DELETE": "D", "SKIP": "N", "CLIP_SOFT": "S", "CLIP_HARD": "H", "PAD": "P", "SEQUENCE_MATCH": "=", "SEQUENCE_MISMATCH": "X", } cigarStr = "" for c in read.alignment.cigar { cigarStr += c.operationLength + cigarMap[c.operation] } return cigarStr (== resource_for v1.reads ==)

message ReadGroup

reads.proto:276

A read group is all the data that's processed the same way by the sequencer. This is a sub-message of SamHeader, at the same scope to reduce verbosity.

Used in: SamHeader

message ReadRequirements

reads.proto:414

Describes requirements for a read for it to be returned by a SamReader.

Used in: learning.genomics.deepvariant.AlleleCounterOptions, learning.genomics.deepvariant.MakeExamplesOptions, learning.genomics.deepvariant.PileupImageOptions, SamReaderOptions

enum ReadRequirements.MinBaseQualityMode

reads.proto:447

How should we enforce the min_base_quality requirement?

Used in: ReadRequirements

message ReferenceSequence

reference.proto:77

A full, or partial, sequence of bases from a contig in a reference genome.

message SamHeader

reads.proto:235

The SamHeader message represents the metadata present in the header of a SAM/BAM file.

enum SamHeader.AlignmentGrouping

reads.proto:250

The GO field from the HD line.

Used in: SamHeader

enum SamHeader.SortingOrder

reads.proto:241

The SO field from the HD line.

Used in: SamHeader

message SamReaderOptions

reads.proto:369

The SamReaderOptions message is used to alter the properties of a SamReader. It enables reads to be omitted from parsing based on their attributes, as well as more fine-grained handling of particular fields within the SAM records. Next ID: 12.

enum SamReaderOptions.AuxFieldHandling

reads.proto:375

How should we handle the aux fields in the SAM record?

Used in: SamReaderOptions

message Struct

struct.proto:42

`Struct` represents a structured data value, consisting of fields which map to dynamically typed values. In some languages, `Struct` might be supported by a native representation. For example, in scripting languages like JS a struct is represented as an object. The details of that representation are described together with the proto support for the language. The JSON representation for `Struct` is JSON object.

Used in: Value

message Value

struct.proto:53

`Value` represents a dynamically typed value which can be either null, a number, a string, a boolean, a recursive struct value, or a list of values. A producer of value is expected to set one of that variants, absence of any variant indicates an error. The JSON representation for `Value` is JSON value.

Used in: ListValue, Struct

message Variant

variants.proto:46

A variant represents a change in DNA sequence relative to a reference sequence. For example, a variant could represent a SNP or an insertion. The definition of the Variant message closely follows the common VCF variant representation. Each of the calls on a variant represent a determination of genotype with respect to that variant. For example, a call might assign probability of 0.32 to the occurrence of a SNP named rs1234 in a sample named NA12345. NextID: 17

Used in: learning.genomics.deepvariant.CallVariantsOutput, learning.genomics.deepvariant.DeepVariantCall

message VariantCall

variants.proto:118

A call represents the determination of genotype with respect to a particular variant. It may include associated information such as quality and phasing. For example, a call might assign a probability of 0.32 to the occurrence of a SNP named rs1234 in a call set with the name NA12345. NextID: 10

Used in: Variant

message VcfExtra

variants.proto:293

This record type is a catch-all for other types of headers. For example, ##pedigreeDB=http://url_of_pedigrees The VcfExtra message would represent this with key="pedigreeDB", value="http://url_of_pedigrees".

Used in: VcfHeader, VcfStructuredExtra

message VcfFilterInfo

variants.proto:220

The below messages are sub-messages of the VCF header. They are not nested within VcfHeader simply to avoid verbosity. We comment fields in one of three states: "Required": Required by both the VCF file format and for downstream users of Variant and VariantCall protos. "Required by VCF": Required by the VCF file format, unused otherwise. "Optional": Optional within the VCF file format, unused otherwise. This record type mirrors a VCF "FILTER" header.

Used in: VcfHeader

message VcfFormatInfo

variants.proto:258

This record type mirrors a VCF "FORMAT" header.

Used in: VcfHeader

message VcfHeader

variants.proto:173

This record type mirrors a VCF header. See https://samtools.github.io/hts-specs/VCFv4.3.pdf for details on the spec.

message VcfInfo

variants.proto:229

This message type mirrors a VCF "INFO" header.

Used in: VcfHeader

message VcfReaderOptions

variants.proto:309

The Vcf{Reader,Writer}Options messages are used to alter the properties of reading and writing variants. They enables certain fields to be omitted from parsing.

message VcfStructuredExtra

variants.proto:280

This record type is a catch-all for other headers containing multiple key-value pairs. For example, headers may have META lines that provide metadata about the VCF as a whole, e.g. ##META=<ID=Assay,Type=String,Number=.,Values=[WholeGenome, Exome]> The VcfStructuredExtra message would represent this with key="META", and fields mapping "ID" -> "Assay", "Type" -> "String", etc.

Used in: VcfHeader

message VcfWriterOptions

variants.proto:326