Get desktop application:
View/edit binary Protocol Buffers messages
Used in:
sum will store the total binary blob length in a stripe
Used in:
Used in:
Used in:
A hybrid Julian/Gregorian calendar with a cutover point in October 1582.
A calendar that extends the Gregorian calendar back forever.
Statistics for list and map
Used in:
Used in:
,The encoding of the bloom filters for this column: 0 or missing = none or original 1 = ORC-135 (utc for timestamps)
Used in:
Used in:
, , , ,In ORC v2 (and for encrypted columns in v1), each column has their column statistics written separately.
one value for each stripe in the file
Used in:
How was the data masked? This isn't necessary for reading the file, but is documentation about how the file was written.
Used in:
the kind of masking, which may include third party masks
parameters for the mask
the unencrypted column roots this mask was applied to
Used in:
min,max values saved as days since epoch
Used in:
Used in:
Used in:
all of the masks used in this file
all of the keys used in this file
The encrypted variants. Readers should prefer the first variant that the user has access to the corresponding key. If they don't have access to any of the keys, they should get the unencrypted masked data.
How are the local keys encrypted?
Used in:
used for detecting future algorithms
Information about the encryption keys.
Used in:
The description of an encryption variant. Each variant is a single subtype that is encrypted with a single key.
Used in:
the column id of the root
The master key that was used to encrypt the local key, referenced as an index into the Encryption.key list.
the encrypted key for the file footer
the stripe statistics for this variant
encrypted file statistics as a FileStatistics
The contents of the file tail that must be serialized. This gets serialized as part of OrcSplit, also used by footer cache.
Used in:
Each implementation that writes ORC files should register for a code 0 = ORC Java 1 = ORC C++ 2 = Presto 3 = Scritchley Go from https://github.com/scritchley/orc 4 = Trino
information about the encryption in this file
informative description about the version of the software that wrote the file. It is assumed to be within a given writer, so for example ORC 1.7.2 = "1.7.2". It may include suffixes, such as "-SNAPSHOT".
Used in:
Which KeyProvider encrypted the local keys.
Used in:
This message type is only used in ORC v0 and v1.
Serialized length must be less that 255 bytes
Used in:
the version of the file format [0, 11] = Hive 0.11 [0, 12] = Hive 0.12
The version of the writer that wrote the file. This number is updated when we make fixes or large changes to the writer so that readers can detect whether a given bug is present in the data. Only the Java ORC writer may use values under 6 (or missing) so that readers that predate ORC-202 treat the new writers correctly. Each writer should assign their own sequence of versions starting from 6. Version of the ORC Java writer: 0 = original 1 = HIVE-8732 fixed (fixed stripe/file maximum statistics & string statistics use utf8 for min/max) 2 = HIVE-4243 fixed (use real column names from Hive tables) 3 = HIVE-12055 added (vectorized writer implementation) 4 = HIVE-13083 fixed (decimals write present stream correctly) 5 = ORC-101 fixed (bloom filters use utf8 consistently) 6 = ORC-135 fixed (timestamp statistics use utc) 7 = ORC-517 fixed (decimal64 min/max incorrect) 8 = ORC-203 added (trim very long string statistics) 9 = ORC-14 added (column encryption) Version of the ORC C++ writer: 6 = original Version of the Presto writer: 6 = original Version of the Scritchley Go writer: 6 = original Version of the Trino writer: 6 = original
the number of bytes in the encrypted stripe statistics
Leave this last in the record
Used in:
Used in:
, ,if you add new index stream kinds, you need to make sure to update StreamName to ensure it is added to the stripe in the right area
Used in:
Virtual stream kinds to allocate space for encrypted index and data.
stripe statistics streams
A virtual stream kind that is used for setting the encryption IV.
Used in:
Used in:
sum will store the total length of all strings in a stripe
If the minimum or maximum value was longer than 1024 bytes, store a lower or upper bound instead of the minimum or maximum values above.
Used in:
one for each column encryption variant
Used in:
the global file offset of the start of the stripe
the number of bytes of index
the number of bytes of data
the number of bytes in the stripe footer
the number of rows in this stripe
If this is present, the reader should use this value for the encryption stripe id for setting the encryption IV. Otherwise, the reader should use one larger than the previous stripe's encryptStripeId. For unmerged ORC files, the first stripe will use 1 and the rest of the stripes won't have it set. For merged files, the stripe information will be copied from their original files and thus the first stripe of each of the input files will reset it to 1. Note that 1 was chosen, because protobuf v3 doesn't serialize primitive types that are the default (eg. 0).
For each encryption variant, the new encrypted local key to use until we find a replacement.
StripeStatistics (1 per a stripe), which each contain the ColumnStatistics for each column. This message type is only used in ORC v0 and v1.
Used in:
Used in:
min,max values saved as milliseconds since epoch
store the lower 6 TS digits for min/max to achieve nanosecond precision
Used in:
Used in:
Used in: