Get desktop application:
View/edit binary Protocol Buffers messages
Dictionary field metadata
Used in:
/ The file offset for storing the dictionary value. / It is only valid if encoding is DICTIONARY. / / The logic type presents the value type of the column, i.e., string value.
/ The length of dictionary values.
Supported encodings.
Used in:
Invalid encoding.
Plain encoding.
Var-length binary encoding.
Dictionary encoding.
Run-length encoding.
Field metadata for a column.
Used in:
, , , , ,Fully qualified name.
/ Field Id. / / See the comment in `DataFile.fields` for how field ids are assigned.
/ Parent Field ID. If not set, this is a top-level column.
Logical types, support parameterized Arrow Type. PARENT types will always have logical type "struct". REPEATED types may have logical types: * "list" * "large_list" * "list.struct" * "large_list.struct" The final two are used if the list values are structs, and therefore the field is both implicitly REPEATED and PARENT. LEAF types may have logical types: * "null" * "bool" * "int8" / "uint8" * "int16" / "uint16" * "int32" / "uint32" * "int64" / "uint64" * "halffloat" / "float" / "double" * "string" / "large_string" * "binary" / "large_binary" * "date32:day" * "date64:ms" * "decimal:128:{precision}:{scale}" / "decimal:256:{precision}:{scale}" * "time:{unit}" / "timestamp:{unit}" / "duration:{unit}", where unit is "s", "ms", "us", "ns" * "dict:{value_type}:{index_type}:false"
If this field is nullable.
/ The file offset for storing the dictionary value. / It is only valid if encoding is DICTIONARY. / / The logic type presents the value type of the column, i.e., string value.
Deprecated: optional extension type name, use metadata field ARROW:extension:name
optional field metadata (e.g. extension type name/parameters)
/ The storage class of the field / / This determines the rate at which the field is compacted. / / Currently, there are only two storage classes: / / "" - The default storage class. / "blob" - The field is compacted into fewer rows per fragment. / / Fields that have non-default storage classes are stored in different / datasets (e.g. blob fields are stored in the nested "_blobs" dataset)
Used in:
A file descriptor that describes the contents of a Lance file
The schema of the file
The number of rows in the file
Metadata of one Lance file.
Position of the manifest in the file. If it is zero, the manifest is stored externally.
Logical offsets of each chunk group, i.e., number of the rows in each chunk.
The file position that page table is stored. A page table is a matrix of N x M x 2, where N = num_fields, and M = num_batches. Each cell in the table is a pair of <position:int64, length:int64> of the page. Both position and length are int64 values. The <position, length> of all the pages in the same column are then contiguously stored. Every field that is a part of the file will have a run in the page table. This includes struct columns, which will have a run of length 0 since they don't store any actual data. For example, for the column 5 and batch 4, we have: ```text position = page_table[5][4][0]; length = page_table[5][4][1]; ```
Used in:
The schema of the statistics. This might be empty, meaning there are no statistics. It also might not contain statistics for every field.
The field ids of the statistics leaf fields. This plays a similar role to the `fields` field in the DataFile message. Each of these field ids corresponds to a field in the stats_schema. There is one per column in the stats page table.
The file position of the statistics page table The page table is a matrix of N x 2, where N = length of stats_fields. This is the same layout as the main page table, except there is always only one batch. For example, to get the stats column 5, we have: ```text position = stats_page_table[5][0]; length = stats_page_table[5][1]; ```
A schema which describes the data type of each of the columns
Used in:
All fields in this file, including the nested fields.
Schema metadata.