Get desktop application:
View/edit binary Protocol Buffers messages
Configuration for the aggregation processor.
Used in:
Configuration for timers. If not specified quantile timers with default values will be used.
Timers will be aggregated into quantiles.
Timers will be aggregated into a reservoir and emited in bulk up to the reservoir size at each aggregation interval.
Configuration for counters.
Configuration for gauges.
Configuration for histograms.
Configurations for summaries.
Flush interval for aggregated metrics. Defaults to 60s.
By default the flush interval is rounded to the nearest second, and is pegged to *wall clock time.* A setting of 60s will attempt to fire as close to the 1st second of every minute as possible. A setting of 5s will attempt to fire at the [0, 5, 10, ...] seconds of every minute. Jitter for the sending of the aggregated metrics can be added via the `post_flush_send_jitter` setting. Setting this to false will instead pin the interval to process startup. This is preferrable if this is being used as a first stage aggregation before sending to a central aggregator.
If true, the /last_aggregation admin endpoint will be available which will dump the metric IDs aggregated in the previous window. Enabling has CPU and memory cost so it is optional.
If set, defines the maximum amount of time that the aggregated data will be jittered before being sent further in the pipeline. The actual jitter will be a random value between 0 and this value. This can be used to avoid synchronized receives by backends if all processors restart at a similar time.
If set, the aggregation processor will emit prometheus stale markers for missing metrics during each aggregation interval. In practice, this means emitting a metric with the same name and the special stale marker NaN value. In cases where the processor is handling metric values that come and go this may be required to avoid stale metrics in queries. See https://docs.google.com/document/d/1LPhVRSFkGNSuU1fBd81ulhsCPR4hkSZyyBj1SZ8fWOM/edit#heading=h.hfqkr2527w2g for more information.
Used in:
Optional aggregated counter name prefix.
Extended output configuration for counters.
Configuration for absolute counters.
Used in:
Whether to emit counters as delta rates or not. If not specified, counters are emited as absolute counters. If specified, counters increments are converted to delta increments, and divided by the flush interval in seconds to produce a per second value for the flush. Using this option means that when working with Prometheus rate() is no longer needed, which has substantial benefits when paired with elision, as by default rate() does not work with missing values and the rate window has to be as long as the elision window. Using this option means that it is easier to construct working queries using `or vector(0)`.
Extended output configuration for counters.
Used in:
The count of samples during the interval.
The sum of samples during the interval.
The rate of samples during the interval.
The lowest sample during the interval.
The largest sample during the interval.
Used in:
Optional aggregated gauge name prefix.
Extended output configuration for gauges.
Extended output configuration for gauges.
Used in:
The sum of samples during the interval. Important note: For this to work correctly the incoming gauges must have unique downstream IDs so that they can be differentiated.
The mean of samples during the interval. Important note: For this to work correctly the incoming gauges must have unique downstream IDs so that they can be differentiated.
The minimum sample during the interval.
The maximum sample during the interval.
Used in:
Optional aggregated histograms name prefix.
Used in:
Optional aggregated timer name prefix.
Aggregated timer error epsilon. Defaults to 0.01.
Aggregated timer quantiles. Defaults to [0.5, 0.95, 0.99].
Extended output configuration for timers.
Extended output configuration for timers.
Used in:
The mean time during the interval.
The lowest time during the interval.
The highest time during the interval.
The sample count during the interval.
The sample rate during the interval.
Used in:
The size of the reservoir for each timer. If not set defaults to 100.
If set to true, timers in the reservoir will be emitted as the synthetic "bulk timer" metric type. This type is internal to Pulse, but effectively packs all of the timer samples into a single pseudo metric to avoid repeated processing of metric names, tags, etc. If sending to another pulse layer that understands this format, this can yield substantial efficiency gains. Note if this type makes its way to a wire outflow, it will be converted back to individual timers. If it makes its way to prom remote write, the type will be emitted im bulk format, but only pulse understands how to read this format. Do not send this to other TSDBs.
Used in:
Optional aggregated summaries name prefix.
This is a basic buffer processor that accepts incoming metrics up to a limit before dropping. A fixed number of consumers pull metric batches and move them further along the pipeline. In the future we can add more sophisticated functionality such as spill to disk, a remote queue, etc.
Used in:
The maximum number of metrics that will be buffered. Metrics beyond this amount will be dropped.
The number of consumers that will pull metric batches from the buffer.
Configuration for a cardinality limiting processor. The processor uses Cuckoo filters to keep track of each metric ID without using very much space. Multiple buckets can be configured so that the limits smoothly roll across multiple time periods. When checking for the limit, only the first bucket is consulted. After this check, the metric is added to all buckets which provides the smooth rotation.
Used in:
Configuration for a single global limit that will be used by all metrics that flow through this processor.
Configuration for per-pod limits. Each dynamically discovered pod will get its own set of buckets.
How many buckets to store the metrics across.
How often to rotate the buckets. During each rotation the first bucket is dropped and a fresh bucket is added.
Global limit configuration.
Used in:
How many metrics can be stored in each bucket.
Per-pod limit configuration.
Used in:
The default cardinality limit for each pod.
Supplies a VRL program that can compute the cardinality limit for a pod. The program will be passed standard metadata for the pod and must return an integer result. See the mutate processor for more information on VRL programs.
Track cardinality of different metric groupings and output them on a built-in /cardinality_tracker admin endpoint. In the future this information will be sent via API to the SaaS for display. We can also considering emitting metrics.
Used in:
The cardinality types to track.
How often to rotate the trackers. For count trackers, after this interval the approximate cardinality will be snapped and shown on the admin endpoint, until the next rotation snap. For TopK trackers, every rotation interval the tracker will switch between the estimation of topk phase and the cardinality counting phase. Thus, to see anything on the admin endpoint it will require 2x this interval. If not specified this defaults to 5 minutes.
If true, emit internal gauge stats for the snapped cardinality of each tracker using the fixed metric name "cardinality_tracker". Every emitted series includes tracker_name and value labels so they share one metric family. For Count trackers the value label is empty. For TopK trackers the value label contains the tracked top-k item.
Approximates the cardinality of a specific metric name, matched based on regular expression. Every metric that matches will be counted. This uses a HyperLogLog algorithm to approximate the cardinality so it is lightweight from a memory perspective.
Used in:
The regular expression to match metric names.
Approximates the top K cardinality of a specific metric grouping. The way this works is as follows: 1. In the first phase of each counting interval, a streaming topk algorithm (filtered space saving) is used to track the relative frequency of each grouping, as determined by the number of increments attributed to the grouping. (The grouping will be incremented for every metric that matches the grouping, multiple times for different tag sets.) 2. Once the top K groupings are known, the second phase of counting will maintain a hash map of exact grouping to HyperLogLog sketches. This will acquire a roughly accurate total cardinality count for each top grouping. The purpose of this is to avoid having to keep a first tier HashMap of all possible groupings which would be very memory intensive. Both phases are repeated over time to maintain a roughly accurate accounting with limited memory usage.
Used in:
Grouping is performed by a regular expression on the metric name. If empty, all metrics are considered.
Grouping is performed by the content of a list of tag names. The tag values are effectively concatenated together to form the grouping. If a tag value is missing, it is treated as unknown.
The number of top groupings to track.
Used in:
Used in:
The name of the tracking type. Used for display in the admin endpoint.
Used in: ,
The set of rules to apply to determine whether a metric should be dropped or not. If any rule matches, the metric will be dropped. Order does *not* matter as Pulse may reorder the rules for performance reasons.
Used in:
The configuration will be loaded from a local/HTTP file source and reloaded as needed. The configuration is expected to be a YAML/JSON formatted representation of a DropConfig. The configuration will be internally converted to protobuf and verified using PGV.
The configuration is provided inline.
Used in:
The name of the rule, used for logging and meta metrics.
The mode of the rule. If set to TESTING, the rule will not drop any metrics, but will emit meta metrics for metrics that would have been dropped. This is useful for testing the rule before enabling it. The default is ENABLED.
The set of conditions. If any condition matches, the metric will be dropped. This is an OR set, grouped together so that rules can be tested and monitored as a block.
If set, a warning log will be emitted at most once per this duration (in seconds) when metrics are dropped by this rule. This is useful for debugging and monitoring that the rule is actively dropping metrics. If not set or set to 0, no warnings will be logged. Example: Setting this to 300 will log a warning at most once every 5 minutes when this rule drops a metric.
Used in:
Used in: ,
Will match on the metric name. If possible, favor non-regex name matches as these will be loaded into an FST for faster matching.
Will match a metric tag.
Will match a metric value.
Will perform a logical AND on the set of conditions and match if all of them match.
Matches if the condition does NOT match.
Will match metrics with timestamps older than the specified age.
Used in:
Used in:
The value to match against.
The operator to use for the match. The default is EQUAL.
Used in: ,
The string must match exactly.
The string must match the regex.
Used in:
The tag name to match.
The tag value to match. If not set, match will occur if the tag is present.
Used in:
The maximum age in seconds for a metric timestamp. Metrics with timestamps older than this duration (relative to the current time) will be matched. For example, if set to 3600, metrics with timestamps more than 1 hour old will be matched.
Used in:
Matches against "simple" metrics that have a single value. I.e., not histograms or summaries.
Match against NaN values.
Used in:
The value must be equal to the target.
The value must be not equal to the target.
The value must be greater than the target.
The value must be less than the target.
The value must be greater than or equal to the target.
The value must be less than or equal to the target.
Configuration for the elision processor.
Used in:
Global emission settings for any metrics/samples blocked by the processor. Emission can be overridden using regex via the `regex_overrides` settings.
Whether the processor runs in analyze only mode.
Regular expression overrides for emission settings. The first override that matches wins so ordering is important if there are overlaps.
Zero elision configuration. If not specified, defaults to enabled.
Blocklist configuration. If not specified, defaults to none.
Blocklist configuration.
Used in:
A file source providing regex patterns that will be allowed. One metric pattern per line.
A file source providing an FST for the exact allow list. This provides a compact format for allowing a large number of individual metrics. The FST format is the same as implemented in https://github.com/BurntSushi/fst.
A file source providing an FST for the exact block list. This provides a compact format for blocking a large number of individual metrics. The FST format is the same as implemented in https://github.com/BurntSushi/fst.
Block all metrics by default. This can be used in a setup where the desire is to perform elision on all metrics, and then allow list specific metrics on top.
Whether blocklist metric names are provided in Prometheus naming format.
Emission configuration.
Used in: ,
Metrics should be emited at least every interval time.
Metrics should be emited in the given ratio, from (0.0-1.0].
Metrics should be emitted every N seconds, based *only* on the metric timestamp. Additionally, the emission will will be jittered based on a hash of the metric name. Thus, all metrics with the same name (but different tags) will get emitted on the same second. This method has various benefits including being stateless (no internode needed), resilient to out of order metrics, and having all variants of a metric show up at the same time interval which will make charts that show sporadic data look better.
Used in:
Emit every N periods.
How long each period is. Defaults to 60s.
Configures overrides for emission based on metric name regex.
Used in:
The overriden emission configuration.
The regex to use for evaluating the emission override.
Zero elision configuration.
Used in:
Whether zeros should be elided. Defaults to true.
Configuration for counters.
Configuraiton for histograms.
Configuration for counters.
Used in:
Configuration for absolute counters.
Configuration for absolute counters.
Used in:
If true, elision will take place if the absolute counter does not change (true zero elision). This can yield substantially better savings, but has impact on queries as rate() does not work correctly across missing values in standard Prometheus. Queries will typically need to be adjusted to take into account the elision window. Alternatively, consider using the aggregation processor to switch to delta counters to avoid this problem (that will also require query changes, but the changes are more intuitive).
Configuration for histograms.
Used in:
If true, elision will take place if the absolute histogram sample count does not change (true zero elision). This can yield substantially better savings, but has impact on queries as rate() does not work correctly across missing values in standard Prometheus. Queries will typically need to be adjusted to take into account the elision window. Alternatively, consider using the aggregation processor to switch to delta histograms to avoid this problem (that will also require query changes, but the changes are more intuitive).
Configuration for the internode processor. Internode performs consistent hashing of metrics and sends them to other nodes in the cluster as necessary. This insures that processors that depend on seeing the same metric consistently will see it. Currently the internode cluster configuration is hard coded and not dynamically discovered. This can be accomplished using Kubernetes stateful sets or some other similar mechanism that provides stable DNS for cluster members.
Used in:
The address the processor listens on for internode connections.
The total nodes, including this node. This is used only for sanity checking the provided `nodes` configuration.
The name of this node.
The set of nodes the processor should internode with.
Configuration for internode requests. If not specified, default values are used.
Configuration for health checks. If not specified, health checks are disabled. Take great care in configuring health checks as settings that are too aggressive can cause flapping which is likely to be worse than having no health checks at all. During startup, nodes are assumed to be healthy until proven otherwise given the configured settings. This is to avoid heavy churn during a mass restart event.
Used in:
Interval between health checks. There is no default and must be configured. The timeout is specified in the RequestPolicy.
Number of consecutive failures before marking a node unhealthy. There is no default and must be configured.
Number of consecutive successes before marking a node healthy. There is no default and must be configured.
Configuration for an internode node.
Used in:
The name of the node.
The address of the node.
Configuration for internode requests.
Used in:
Internode request timeout. Default is 100ms. This timeout is used for health checks as well.
Retry policy. If not specified default values are used.
Defines the maximum number of internode requests that can be made in parallel across all nodes. If not specified defaults to 4 times the number of nodes. This allows for some amount of progress if there is a slow/bad internode node.
Defines a circuit breaker for the maximum pending internode requests that can be waiting for a request slot (the number of request slots is defined by max_concurrent_requests). By default this is equal to 2x max_concurrent_requests.
Processor that performs basic mutation based on metadata. Mutation is performed by a VRL program, which is documented at a high level here: https://vector.dev/docs/reference/vrl/. Pulse supports a limited set of operations on metrics, and provides context metadata that can be used during transforms. The following context is provided: .name is the name of the metric. Can be modified. .tags are the metric tags as a map. Can be modified. .mtype is the type of the metric. Cannot be modified. If using Kubernetes discovery, the following metadata is provided: %k8s.service.name is the name of the service if service discovery is enabled and there is an associated service. %k8s.namespace is the namespace of the pod. %k8s.pod.name is the name of the pod. %k8s.pod.ip is the IP of the pod. %k8s.pod.labels are the labels of the pod. %k8s.pod.annotations are the annotations of the pod. %k8s.node.name is the name of the node. %k8s.node.ip is the IP of the node. %k8s.node.labels are the labels of the node. %k8s.node.annotations are the annotations of the node. %prom.scrape.address is <IP>:<PORT> if the target has been scraped. Errors and abort() are handled differently. Errors will emit a periodic warnings and are assumed to be real operator errors. Aborts via abort() are assumed to be intentional with the desire to drop the metric. Early return is possible using the return statement. The expression passed to return will be ignored so it can be anything, such as "return true". It is possible to "flatten" Prometheus histograms and summaries into multiple component metrics. For histograms this emits multiple <metric>_bucket metrics, a <metric>_count metric, and a <metric>_sum metric. For summaries this emits multiple <metric> metrics with different quantiles and a <metric>_sum and <metric>_count metric. This is done by setting the following pseudo field to true: .flatten_prom_histogram_and_summary = true Pulse also implements some custom functions beyond what is available from core VRL. These include: - pulse_inc_counter(name: string, value: int, tag_names: array, tag_values: array): Increments a counter with the given name by the given value. The name must be a constant string. name: (required) The name of the counter to increment. Must be a constant string. value: (required) The value to increment the counter by. Must be an integer. tag_names: (optional) An array of tag names to add to the counter. tag_values: (optional) An array of tag values to add to the counter. Required if tag_names is provided. Must be the same length as tag_names. - pulse_log(value: any, level: string, rate_limit_secs: int): Outputs a rate limited log. value: (required) The value to log of any type level: (optional) The log level to use. Must be one of "trace", "debug", "info", "warn", "error". Defaults to "info". rate_limit_secs: (optional) The rate limit in seconds for the message. Defaults to 1 second. - pulse_add_to_downstream_id(suffix: bytes): Adds the given suffix to the metric's downstream ID. This can be used to create different downstream IDs based on metadata. The suffix is added after a colon (:) separator. If the downstream ID is not InflowProvided, it will be converted to InflowProvided first. This useful for aggregating metrics based on a particular metadata value before stripping it. For example, using the pod name before stripping it will allow proper gauge/counter aggregations for difference downstream ID sources while still producing a single output aggregated metric.
Used in:
The configuration for the populate cache processor. Cache population is required before calling the aggregation, elision, or sampler processor. TODO(mattklein123): Verify the previous during pipeline setup.
Used in:
(message has no fields)
Configuration for an individual processor.
Used in:
The routes that the processor sends to.
Alternate routes that the process can send to (for example for failover). Usage is processor specific.
Configuration for a regex allow/deny processor. Metrics can either be explicitly allowed (any that don't match will be dropped) or explicitly denied (any that do match will be dropped). Both types of rules can be supplied. While these rules can be expressed as VRL programs and aborts in the mutate filter, this processor is provided for performance reasons. The regular expression syntax is the one documented here: https://docs.rs/regex/latest/regex/#syntax
Used in:
Allow rules. All are processed in parallel.
Deny rules. All are processed in parallel.
The configuration for the sampler processor. This processor will cache metrics and emit them onward every configured interval of time.
Used in:
How often a metric should be emitted to further pipeline stages. Defaults to 12 hours. This value is jittered over the full duration so the metric will not be emitted evenly at this interval.
After startup, this interval will determine the initial jittered interval for sending new metrics. The purpose of this is to avoid overloading the sampler backend, but still send new metrics in a reasonable time period. This defaults to 15 minutes if not set. After the startup interval, new metrics are immediately sent.