Get desktop application:
View/edit binary Protocol Buffers messages
Each column has a metadata block that is placed at the end of the file. These may be read individually to allow for column projection.
Encoding information about the column itself. This typically describes how to interpret the column metadata buffers. For example, it could describe how statistics or dictionaries are stored in the column metadata.
The pages in the column
The file offsets of each of the column metadata buffers There may be zero buffers.
The size (in bytes) of each of the column metadata buffers This field will have the same length as `buffer_offsets` and may be empty.
This describes a page of column data.
Used in:
The file offsets for each of the page buffers The number of buffers is variable and depends on the encoding. There may be zero buffers (e.g. constant encoded data) in which case this could be empty.
The size (in bytes) of each of the page buffers This field will have the same length as `buffer_offsets` and may be empty.
Logical length (e.g. # rows) of the page
The encoding used to encode the page
The priority of the page For tabular data this will be the top-level row number of the first row in the page (and top-level rows should not split across pages).
The deferred encoding is used to place the encoding itself in a different part of the file. This is most commonly used to allow encodings to be shared across different columns. For example, when writing a file with thousands of columns, where many pages have the exact same encoding, it can be useful to cut down on the size of the metadata by using a deferred encoding.
Used in:
Location of the buffer containing the encoding. * If sharing encodings across columns then this will be in a global buffer * If sharing encodings across pages within a column this could be in a column metadata buffer. * This could also be a page buffer if the encoding is not shared, needs to be written before the file ends, and the encoding is too large to load unless we first determine the page needs to be read. This combination seems unusual.
The encoding is placed directly in the metadata section
Used in:
The bytes that make up the encoding embedded directly in the metadata This is the most common approach.
An encoding stores the information needed to decode a column or page For example, it could describe if the page is using bit packing, and how many bits there are in each individual value. At the column level it can be used to wrap columns with dictionaries or statistics.
Used in:
,The encoding is stored elsewhere and not part of this protobuf message
The encoding is stored within this protobuf message
There is no encoding information