package lance.encodings21

Mouse Melon logoGet desktop application:
View/edit binary Protocol Buffers messages

message AllNullLayout

encodings_v2_1.proto:151

A layout used for pages where all values are null There may be buffers of repetition and definition information if required in order to interpret what kind of nulls are present

Used in: PageLayout

message BlobLayout

encodings_v2_1.proto:162

A layout where large binary data is encoded externally and only the descriptions (position + size) are placed in the page Repdef information is stored in the descriptions. A description with a size of 0 and a position of 0 is an empty value. A description with a size of 0 and a non-zero position is a null value and the position is the repdef value.

Used in: PageLayout

message BufferCompression

encodings_v2_1.proto:222

Compression applied to a single buffer of data A buffer is the leaf of the compression tree. Unlike data blocks, which can be further compressed with a variety of techniques, a buffer cannot be understood in any particular way. A general compression scheme may be applied to a buffer. This is something like zstd, lz4, etc. The entire buffer is compressed as a single unit. If this happens then any parent encoding becomes opaque, even if it would normally be transparent. This is a leaf, no further compression is applied to the data.

Used in: Flat, General, InlineBitpacking, Variable

message ByteStreamSplit

encodings_v2_1.proto:463

A compression scheme where fixed-width values are transposed into a series of byte streams This is commonly used for floating point values where the upper bits (the mantissa) have a significantly different meaning than the lower bits. By splitting the values into byte streams we group the mantissa bits together and the exponent bits together. The end result is typically more compressible. Note that this encoding is mostly useful when combined with other encodings. It does not do any compression on its own. This is an opaque encoding. The input is a fixed-width data block The output is a single fixed-width data block

Used in: CompressiveEncoding

enum CompressionScheme

encodings_v2_1.proto:205

Used in: BufferCompression

message CompressiveEncoding

encodings_v2_1.proto:469

An encoding that compresses a data block into buffers

Used in: ByteStreamSplit, Dictionary, FixedSizeList, Fsst, FullZipLayout, General, MiniBlockLayout, OutOfLineBitpacking, PackedStruct, Rle, Variable, VariablePackedStruct.FieldEncoding

message Constant

encodings_v2_1.proto:274

Compression algorithm where all values have a constant value (encoded in the description) This is a leaf encoding, there is no compression applied to the data. The input can be any kind of data block. There is no output.

Used in: CompressiveEncoding

message Dictionary

encodings_v2_1.proto:341

A compression scheme where common values are stored in a dictionary and the values are encoded as indices into the dictionary. This is an opaque encoding unless the dictionary is considered metadata. The input is a any kind of data block. There are two outputs: - A data block of the same kind as the input (the dictionary) - A fixed-width data block containing the indices into the dictionary.

Used in: CompressiveEncoding

message FixedSizeList

encodings_v2_1.proto:373

Converts a fixed-size-list of values into a flattened list of values This encoding does not actually compress the data, it just flattens out the FSL layers. This is a transparent encoding. The input is a single block of fixed-width data (with a wide width and few items) The output is a single block of fixed-width data (with a narrow width and many items)

Used in: CompressiveEncoding

message Flat

encodings_v2_1.proto:242

Fixed width items placed contiguously in a single buffer This is a leaf encoding, there is no compression applied to the data. This is a transparent encoding by definition. The input is a fixed-width data block. The output is a single buffer.

Used in: CompressiveEncoding

message Fsst

encodings_v2_1.proto:325

A compression scheme for variable-width data A small dictionary (referred to as a "symbol table") is used to compress the values. In this scheme there is a single symbol table for the entire page and it is stored in the encoding description itself. This is a transparent encoding. The input is a variable-width data block. The output is a single variable-width data block.

Used in: CompressiveEncoding

message FullZipLayout

encodings_v2_1.proto:121

A layout used for pages where the data is large In this case the cost of transposing the data is relatively small (compared to the cost of writing the data) and so we just zip the buffers together

Used in: PageLayout

message General

encodings_v2_1.proto:442

A compression scheme that wraps the underlying data with general compression Note: The application of wrapped compression will depend on the layout of the data. If we apply it to mini-block data then we compress entire mini-blocks. If we apply it to full-zip data then we compress each value individually. Note: Wrapped compression is somewhat unique at the moment as it is applied to the output of the inner encoding and not the input like all other compressive encodings. Note: General compression can usually be applied in two spots. We can apply it to individual buffers or we can apply it here, to the entire array. For example, let's say we are storing mini-blocks of strings and we are using FSST and bitpacking the offsets. We have something like this... WRAPPED(†3) -> FSST -> VARIABLE -(offsets)-> INLINE_BITPACKING -(data)-> FLAT -> BUFFER (†1) -(data)-> BUFFER (†2) General compression can be applied at †1, †2, or †3 (or any combination of these). If we apply it at †1 then we apply it just to the bitpacked offsets If we apply it at †2 then we apply it just to the FSST compressed data If we apply it at †3 then we apply it to the entire mini-block (both offsets and data) The input is a single data block of any kind. The output is a single data block of the same kind as the input.

Used in: CompressiveEncoding

message InlineBitpacking

encodings_v2_1.proto:308

Bitpacking variant where the bits per value are stored inline in the chunks themselves This variation of bitpacking allows for the number of bits per value to change throughout the buffer, which makes the compression more robust to outliers. This is an opaque encoding. The input is a fixed-width data block. The output is a single buffer.

Used in: CompressiveEncoding

message MiniBlockLayout

encodings_v2_1.proto:77

A layout used for pages where the data is small In this case we can fit many values into a single disk sector and transposing buffers is expensive. As a result, we do not transpose the buffers but compress the data into small chunks (called mini blocks) which are roughly the size of a disk sector. The end result is a small amount of read amplification (since we must read an entire page at a time) but we have more flexibility in compression and do less work per value when compressing and decompressing in bulk.

Used in: PageLayout

message OutOfLineBitpacking

encodings_v2_1.proto:292

A compression scheme in which a single fixed-width block is "packed" into a smaller fixed-width block values where each value has fewer bits. This is typically done by throwing away the most significant bits of each value when those bits are all the same. In this scheme the number of bits per value is fixed across the entire buffer and stored in this message. This is a transparent encoding. The input is a fixed-width data block. The output is a single fixed-width data block.

Used in: CompressiveEncoding

message PackedStruct

encodings_v2_1.proto:386

Packs a struct containing only fixed-width children into a single fixed-width data block The children are concatenated row by row and stored as a single fixed-width buffer. This is the legacy packed struct representation and remains available for backwards compatibility.

Used in: CompressiveEncoding

message PageLayout

encodings_v2_1.proto:172

Describes the structural encoding of a page

Used in: BlobLayout

enum RepDefLayer

encodings_v2_1.proto:51

Repetition and definition levels are described in more detail elsewhere. As we peel through the structure of an array we will encounter layers of struct and list. Each of these layers potentially adds a new level to the repetition and definition levels. This message describes the meaning of each layer.

Used in: AllNullLayout, BlobLayout, FullZipLayout, MiniBlockLayout

message Rle

encodings_v2_1.proto:358

A compression scheme where runs of common values are encoded as a single value and a count This is an opaque encoding unless the run lengths are considered metadata. The input is a single data block of any kind. There are two outputs: - A data block of the same kind as the input (the run values) - A fixed-width data block containing the lengths of the runs

Used in: CompressiveEncoding

message Variable

encodings_v2_1.proto:261

Variable width items have the values stored in one buffer and the offsets are output as a data block that may be further compressed. This is a partial leaf encoding. Values are not compressed but the offsets may be further compressed. This is a transparent encoding by definition. The input is a variable-width data block. The output is a single fixed-width data block (the offsets) and a single buffer (the values)

Used in: CompressiveEncoding

message VariablePackedStruct

encodings_v2_1.proto:399

Variable-width packed struct encoding (2.2 extension) Each child value is compressed independently before being transposed into a row-major layout. This preserves per-field compression boundaries at the cost of disabling mini-block compression. Readers must prefer this field when present and fall back to the legacy encoding otherwise.

Used in: CompressiveEncoding

message VariablePackedStruct.FieldEncoding

encodings_v2_1.proto:404

Encoding description for a single child field

Used in: VariablePackedStruct