Get desktop application:
View/edit binary Protocol Buffers messages
Serialized state of an aggregator. Add additional fields here only if they make sense for all algorithms and if it doesn't make sense to expose them to the users of the library, e.g. encoding version.
The type of the aggregator.
Version of the encoded internal state. On a per-aggregator basis, set this field to indicate that the format of the aggregator encoding has changed such that the library has to decide how to decode. Do NOT change the default value, as this affects all aggregators.
Specifies the value type for the aggregation. If the value type is one supported by the DefaultOps<T> template, and that set of operations (or a compatible implementation) was used, then this will be a value of the DefaultOpsType.Id enum. Otherwise, this is a globally unique number corresponding to the value and Ops implementation (e.g. the CL number in which the implementation is defined). Values for custom types should be greater than 1000. Implementors should consider registering a name for their custom type in custom-value-type.proto, to facilitate easier discovery and better error messages when conflicting types are merged.
This message contains common "public" properties of an aggregation algorithm. Add additional fields here only if they make sense for all algorithms.
Total number of values added to this aggregator.
Enumeration of all supported aggregation algorithms. Values should start from 100.
Used in:
Sum all values added to the aggregator.
Computes a cardinality estimation using the HyperLogLog++ algorithm.
Additional metadata for each element in the result iterator.
(message has no fields)
Never instantiated, just for scoping an enum and associated options.
(message has no fields)
Each value corresponds to a C++ type T and its corresponding DefaultOps<T> instantiation. A ValueOps implementation returning something other than UNKNOWN for a given value is promising that the value of the type corresponding to the value, and that the Ops implementation performs identical operations as DefaultOps<T> for that type.
int8, DefaultOps<int8> SerializeToString writes the single 2s-complement byte.
int16, DefaultOps<int16> SerializeToString writes the two little-endian 2s-complement bytes.
int32, DefaultOps<int32> SerializeToString uses varint encoding of the 2s complement in 32 bits - i.e. the result for negative integers is 5 bytes long, not 10.
int64, DefaultOps<int64> SerializeToString uses varint encoding of the 2s complement.
uint8, DefaultOps<uint8> SerializeToString writes the single byte.
uint16, DefaultOps<uint16> SerializeToString writes the two little-endian bytes.
uint32, DefaultOps<uint32> SerializeToString uses varint encoding.
uint64, DefaultOps<uint64> SerializeToString uses varint encoding.
float, DefaultOps<float> SerializeToString encodes the 4 little endian IEEE754 bytes.
double, DefaultOps<double> SerializeToString encodes the 8 little endian IEEE754 bytes.
string, DefaultOps<string> SerializeToString just copies the bytes.
Represents an HLL++ aggregator in either sparse or normal representation. For more details on the algorithm, the representations and the concepts please check the HLL++ paper (https://goo.gl/pc916Z).
Size of sparse list, i.e., how many different indexes are present in "sparse_data".
Precision / number of buckets for the normal representation. This field is used slightly differently across the v1 and v2 versions of the algorithm (see the encoding_version field in the AggregatorStateProto): * In v1 this field is the total number of buckets 2^p where "p" is the requested precision. Accepted values are powers of two in the [2^10, 2^24] interval. * In v2 this field is the precision "p" directly. Accepted values are in the range [10, 24]. Encoding the precision rather than the number of buckets allows us to save 1-2 bytes which makes a fair difference when storing many small cardinalities. Note that different implementations might choose to not support the whole range of precisions from [10, 24].
Precision / number of buckets for sparse representation. This field is used slightly differently across the v1 and v2 versions of the algorithm (see the encoding_version field in the AggregatorStateProto): * In v1 this field is 2^sp where "sp" is the sparse precision. Accepted values are powers of two in the [2^p, 2^25] interval. * In v2 this field represents the precision "sp" directly. Accepted values are in the range [p, 25]. Encoding the precision rather than the number of buckets allows us to save 2-3 bytes which makes a fair difference when storing many small cardinalities.
Normal data representation. If this field is populated, there are exactly 2^p bytes in it. data[idx] represents rhoW for the substream with the given "idx". See the the HLL++ paper (https://goo.gl/pc916Z) for a description of how "rhoW" and "idx" are computed.
Sparse data representation. IMPORTANT: It is considered an error if the size of this field is bigger than precision_or_num_buckets (v1), resp. 2^precision_or_num_buckets (v2). The normal encoding should be used in this case since the memory usage would be smaller. For a sorted list of unsigned integers representing sparse data encodings, this field contains the varint encoding for the differences between consecutive values in the list: list[0], list[1] - list[0], ... , list[n] - list[n - 1] Note: if "encoding_version" of the enclosing AggregatorStateProto is 1, the diffs are encoded as signed varints using ZigZag encoding and the sparse encodings are the ones defined in the HLL++ paper (https://goo.gl/pc916Z), i.e., different than the ones below. In v2, there are two encodings possible for a value in sparse data format: enc(idx, rhoW) = 1) 1 << (max(sp, p+6)) | (idx >> (sp - p)) | rhoW if the last sp - p bits of idx are all 0; 2) idx, otherwise.
The estimated number of unique elements in the input set.
The expected error of the estimation algorithm.