package zetasketch

Get desktop application:
View/edit binary Protocol Buffers messages

Serialized state of an aggregator. Add additional fields here only if they make sense for all algorithms and if it doesn't make sense to expose them to the users of the library, e.g. encoding version.

required AggregatorType type = 1
The type of the aggregator.
required int64 num_values = 2
optional int32 encoding_version = 3
Version of the encoded internal state. On a per-aggregator basis, set this field to indicate that the format of the aggregator encoding has changed such that the library has to decide how to decode. Do NOT change the default value, as this affects all aggregators.
optional int32 value_type = 4
Specifies the value type for the aggregation. If the value type is one supported by the DefaultOps<T> template, and that set of operations (or a compatible implementation) was used, then this will be a value of the DefaultOpsType.Id enum. Otherwise, this is a globally unique number corresponding to the value and Ops implementation (e.g. the CL number in which the implementation is defined). Values for custom types should be greater than 1000. Implementors should consider registering a name for their custom type in custom-value-type.proto, to facilitate easier discovery and better error messages when conflicting types are merged.

This message contains common "public" properties of an aggregation algorithm. Add additional fields here only if they make sense for all algorithms.

required int64 num_values = 1
Total number of values added to this aggregator.

Enumeration of all supported aggregation algorithms. Values should start from 100.

Used in: AggregatorStateProto

SUM = 100
Sum all values added to the aggregator.
HYPERLOGLOG_PLUS_UNIQUE = 112
Computes a cardinality estimation using the HyperLogLog++ algorithm.

Additional metadata for each element in the result iterator.

(message has no fields)

Never instantiated, just for scoping an enum and associated options.

(message has no fields)

Each value corresponds to a C++ type T and its corresponding DefaultOps<T> instantiation. A ValueOps implementation returning something other than UNKNOWN for a given value is promising that the value of the type corresponding to the value, and that the Ops implementation performs identical operations as DefaultOps<T> for that type.

UNKNOWN = 0
INT8 = 1
int8, DefaultOps<int8> SerializeToString writes the single 2s-complement byte.
INT16 = 2
int16, DefaultOps<int16> SerializeToString writes the two little-endian 2s-complement bytes.
INT32 = 3
int32, DefaultOps<int32> SerializeToString uses varint encoding of the 2s complement in 32 bits - i.e. the result for negative integers is 5 bytes long, not 10.
INT64 = 4
int64, DefaultOps<int64> SerializeToString uses varint encoding of the 2s complement.
UINT8 = 5
uint8, DefaultOps<uint8> SerializeToString writes the single byte.
UINT16 = 6
uint16, DefaultOps<uint16> SerializeToString writes the two little-endian bytes.
UINT32 = 7
uint32, DefaultOps<uint32> SerializeToString uses varint encoding.
UINT64 = 8
uint64, DefaultOps<uint64> SerializeToString uses varint encoding.
FLOAT = 9
float, DefaultOps<float> SerializeToString encodes the 4 little endian IEEE754 bytes.
DOUBLE = 10
double, DefaultOps<double> SerializeToString encodes the 8 little endian IEEE754 bytes.
BYTES_OR_UTF8_STRING = 11
string, DefaultOps<string> SerializeToString just copies the bytes.

Represents an HLL++ aggregator in either sparse or normal representation. For more details on the algorithm, the representations and the concepts please check the HLL++ paper (https://goo.gl/pc916Z).

optional int32 sparse_size = 2
Size of sparse list, i.e., how many different indexes are present in "sparse_data".
optional int32 precision_or_num_buckets = 3
Precision / number of buckets for the normal representation. This field is used slightly differently across the v1 and v2 versions of the algorithm (see the encoding_version field in the AggregatorStateProto): * In v1 this field is the total number of buckets 2^p where "p" is the requested precision. Accepted values are powers of two in the [2^10, 2^24] interval. * In v2 this field is the precision "p" directly. Accepted values are in the range [10, 24]. Encoding the precision rather than the number of buckets allows us to save 1-2 bytes which makes a fair difference when storing many small cardinalities. Note that different implementations might choose to not support the whole range of precisions from [10, 24].
optional int32 sparse_precision_or_num_buckets = 4
Precision / number of buckets for sparse representation. This field is used slightly differently across the v1 and v2 versions of the algorithm (see the encoding_version field in the AggregatorStateProto): * In v1 this field is 2^sp where "sp" is the sparse precision. Accepted values are powers of two in the [2^p, 2^25] interval. * In v2 this field represents the precision "sp" directly. Accepted values are in the range [p, 25]. Encoding the precision rather than the number of buckets allows us to save 2-3 bytes which makes a fair difference when storing many small cardinalities.
optional bytes data = 5
Normal data representation. If this field is populated, there are exactly 2^p bytes in it. data[idx] represents rhoW for the substream with the given "idx". See the the HLL++ paper (https://goo.gl/pc916Z) for a description of how "rhoW" and "idx" are computed.
optional bytes sparse_data = 6
Sparse data representation. IMPORTANT: It is considered an error if the size of this field is bigger than precision_or_num_buckets (v1), resp. 2^precision_or_num_buckets (v2). The normal encoding should be used in this case since the memory usage would be smaller. For a sorted list of unsigned integers representing sparse data encodings, this field contains the varint encoding for the differences between consecutive values in the list: list[0], list[1] - list[0], ... , list[n] - list[n - 1] Note: if "encoding_version" of the enclosing AggregatorStateProto is 1, the diffs are encoded as signed varints using ZigZag encoding and the sparse encodings are the ones defined in the HLL++ paper (https://goo.gl/pc916Z), i.e., different than the ones below. In v2, there are two encodings possible for a value in sparse data format: enc(idx, rhoW) = 1) 1 << (max(sp, p+6)) | (idx >> (sp - p)) | rhoW if the last sp - p bits of idx are all 0; 2) idx, otherwise.

optional int64 estimated_cardinality = 1
The estimated number of unique elements in the input set.
optional double expected_error = 2
The expected error of the estimation algorithm.

package zetasketch

message AggregatorStateProto

required AggregatorType type = 1

required int64 num_values = 2

optional int32 encoding_version = 3

optional int32 value_type = 4

message AggregatorStatsProto

required int64 num_values = 1

enum AggregatorType

SUM = 100

HYPERLOGLOG_PLUS_UNIQUE = 112

message AggregatorValueStatsProto

message DefaultOpsType

enum DefaultOpsType.Id

UNKNOWN = 0

INT8 = 1

INT16 = 2

INT32 = 3

INT64 = 4

UINT8 = 5

UINT16 = 6

UINT32 = 7

UINT64 = 8

FLOAT = 9

DOUBLE = 10

BYTES_OR_UTF8_STRING = 11

message HyperLogLogPlusUniqueStateProto

optional int32 sparse_size = 2

optional int32 precision_or_num_buckets = 3

optional int32 sparse_precision_or_num_buckets = 4

optional bytes data = 5

optional bytes sparse_data = 6

message UniqueStatsProto

optional int64 estimated_cardinality = 1

optional double expected_error = 2