Get desktop application:
View/edit binary Protocol Buffers messages
The Rerun Cloud public API. ## Headers Most endpoints in the Rerun Cloud service require specific gRPC headers to be set. The so-called "standard dataset headers" correspond to at least one of the following headers: * x-rerun-entry-id: ID of the entry of interest, e.g. `1860390B087BC65F602d68eb646c385c`. * x-rerun-entry-name-bin: Name of the entry of interest, e.g. `droid:sample2k`. Headers with a -bin suffix must be base64-encoded (HTTP only supports ASCII values, UTF8 strings must binary encoded).
Name of the dataset entry to create. The name should be a short human-readable string. It must be unique within all entries in the catalog. If an entry with the same name already exists, the request will fail. Entry names ending with `__manifest` are reserved.
If specified, create the entry using this specific ID. Use at your own risk.
Creates a custom index for a specific column (vector search, full-text search, etc). This endpoint requires the standard dataset headers.
The properties/configuration of the newly created index.
Backend-specific statistics about the index. This is guaranteed to be valid JSON.
Optional debug information about the index-creation task
Name of the dataset entry to create. The name should be a short human-readable string. It must be unique within all entries in the catalog. If an entry with the same name already exists, the request will fail. Entry names ending with `__manifest` are reserved.
Information about the table to register. This must be encoded message of one one of the following supported types: - LanceTable
Schema of the table to create
(message has no fields)
Delete a custom index for a specific column. This endpoint requires the standard dataset headers.
Which column to delete the indexes for.
Which indexes were actually deleted. Can be empty if no matching indexes were found.
Run global maintenance operations on the platform: this includes optimization of all datasets, garbage collection of unused data, and can include more in the future.
Request all maintenance operations to run on all datasets
(message has no fields)
(message has no fields)
Rerun Manifests maintenance operations: scalar index creation, compaction, etc. This endpoint requires the standard dataset headers.
Optimize all builtin and user-defined indexes on this dataset. This merges all individual index deltas back in the main index, improving runtime performance of all indexes.
Retrain all user-defined indexes on this dataset from scratch. This retrains all user-defined indexes from scratch for optimal runtime performance. This is faster than re-creating the indexes, and automatically keeps track of their configurations. This implies `optimize_indexes`.
Compact the underlying Lance fragments, for all Rerun Manifests. Hardcoded to the default (optimal) settings.
If set, all Lance fragments older than this date will be removed, for all Rerun Manifests. In case requested date is more recent than 1 hour, it will be ignored and 1 hour ago timestamp will be used. This is to prevent still used files (like recent transaction files) to be removed and cause Lance Dataset update issues. See https://docs.rs/lance/latest/lance/dataset/cleanup/index.html and https://docs.rs/lance/latest/lance/dataset/cleanup/fn.cleanup_old_versions.html
Override default platform behavior and allow cleanup of recent files. This will respect the value of `cleanup_before` timestamp even if it's more recent than 1 hour. ⚠️ Do not ever use this unless you know exactly what you're doing. Improper use will lead to data loss.
Fetch specific chunks from Rerun Cloud. In a 2-step query process, result of 1st phase, that is, the result of `QueryDataset` should include all the necessary information to send the actual chunk requests, which is the 2nd step of the query process. See `FetchChunksRequest` for details on the fields that describe each individual chunk.
Information about the chunks to fetch. These dataframes have to include the following columns: * `chunk_id` - Chunk unique identifier * `partition_id` - partition this chunk belongs to. Currently needed as we pass this metadata back and forth * `partition_layer` - specific partition layer. Currently needed as we pass this metadata back and forth * `chunk_key` - chunk location details
Every gRPC response, even within the confines of a stream, involves HTTP2 overhead, which isn't cheap by any means, which is why we're returning a batch of `ArrowMsg` rather than a single one.
Returns the schema of the dataset manifest. To inspect the data of the dataset manifest, which is guaranteed to match the schema returned by this endpoint, check out `ScanDatasetManifest`. This endpoint requires the standard dataset headers.
(message has no fields)
Returns the schema of the dataset. This is the union of all the schemas from all the underlying partitions. It will contain all the indexes, entities and components present in the dataset. This endpoint requires the standard dataset headers.
(message has no fields)
Returns the schema of the partition table. This is not to be confused with the schema of the dataset itself. For that, refer to `GetDatasetSchema`. To inspect the data of the partition table, which is guaranteed to match the schema returned by this endpoint, check out `ScanPartitionTable`. This endpoint requires the standard dataset headers.
(message has no fields)
List all user-defined indexes in this dataset. This endpoint requires the standard dataset headers.
(message has no fields)
The respective properties/configuration of the indexes.
Backend-specific statistics about the indexes. This is guaranteed to be valid JSON. If non-empty, this is the same length as `indexes`, and in the same order.
Perform Rerun-native queries on a dataset, returning the matching chunk IDs, as well as information that can be sent back to Rerun Cloud to fetch the actual chunks as part of `FetchChunks` request. In this 2-step query process, 1st step is getting information from the server about the chunks that contain relevant information. 2nd step is fetching those chunks (the actual data). These Rerun-native queries include: * Filtering by specific partition and chunk IDs. * Latest-at, range and dataframe queries. * Arbitrary Lance filters. To fetch the actual chunks themselves, see `FetchChunks`. Passing chunk IDs to this method effectively acts as a IF_EXIST filter. This endpoint requires the standard dataset headers.
Client can specify what partitions are queried. If left unspecified (empty list), all partitions will be queried.
Client can specify specific chunk ids to include. If left unspecified (empty list), all chunks that match other query parameters will be included.
Which entity paths are we interested in? Leave empty, and set `select_all_entity_paths`, in order to query all of them.
If set, the query will cover all existing entity paths. `entity_paths` must be empty, otherwise an error will be raised. Truth table: ```text select_all_entity_paths | entity_paths | result ------------------------+----------------+-------- false | [] | valid query, empty results (no entity paths selected) false | ['foo', 'bar'] | valid query, 'foo' & 'bar' selected true | [] | valid query, all entity paths selected true | ['foo', 'bar'] | invalid query, error ```
Which components are we interested in? If left unspecified, all existing components are considered of interest. This will perform a basic fuzzy match on the available columns' descriptors. The fuzzy logic is a simple case-sensitive `contains()` query. For example, given a `log_tick__SeriesLines:width` index, all of the following would match: `SeriesLines:width`, `Width`, `SeriesLines`, etc.
If set, static data will be excluded from the results.
If set, temporal data will be excluded from the results.
Generic parameters that will influence the behavior of the Lance scanner.
Query the status of submitted tasks
`QueryTasksRequest` is the request message for querying tasks status
Empty queries for all tasks if the server allows it.
`QueryTasksResponse` is the response message for querying tasks status encoded as a record batch
Query the status of submitted tasks as soon as they are no longer pending
`QueryTasksOnCompletionRequest` is the request message for querying tasks status. This is close-to-a-copy of `QueryTasksRequest`, with the addition of a timeout.
Empty queries for all tasks if the server allows it.
Time limit for the server to wait for task completion. The actual maximum time may be arbitrarily capped by the server.
`QueryTaskOnCompletionResponse` is the response message for querying tasks status encoded as a record batch. This is a copy of `QueryTasksResponse`.
Fetch metadata about a specific dataset. This endpoint requires the standard dataset headers.
(message has no fields)
Register a foreign table as a new table entry in the catalog.
Name of the table entry to create. The name should be a short human-readable string. It must be unique within all entries in the catalog. If an entry with the same name already exists, the request will fail. Entry names ending with `__manifest` are reserved.
Information about the table to register. This must be encoded message of one one of the following supported types: - LanceTable
Details about the table that was created and registered.
Register new partitions with the Dataset. This endpoint requires the standard dataset headers.
Inspect the contents of the dataset manifest. The data will follow the schema returned by `GetDatasetManifestSchema`. This endpoint requires the standard dataset headers.
A list of column names to be projected server-side. If empty, all columns are returned. If not empty, the returned `RecordBatch` are guaranteed to only have the requested column, in the order they were requested. If a projected column does not exist, or is projected more than once, the `ScanDatasetManifest` call will fail with an `InvalidArgument` error.
The contents of the dataset manifest (i.e. information about layers) as Arrow RecordBatch.
Inspect the contents of the partition table. The data will follow the schema returned by `GetPartitionTableSchema`. This endpoint requires the standard dataset headers.
A list of column names to be projected server-side. If empty, all columns are returned. If not empty, the returned `RecordBatch` are guaranteed to only have the requested column, in the order they were requested. If a projected column does not exist, or is projected more than once, the `ScanPartitionTable` call will fail with an `InvalidArgument` error.
Partitions metadata as Arrow RecordBatch.
TODO(jleibs): support ScanParameters iff we can plumb them into Datafusion TableProvider Otherwise, just wait for Arrow Flight rerun.common.v1alpha1.ScanParameters scan_parameters = 2;
Search a previously created index. This endpoint requires the standard dataset headers.
Index column that is queried
Query data - type of data is index specific. Caller must ensure to provide the right type. For vector search this should be a vector of appropriate size, for inverted index this should be a string. Query data is represented as a unit (single row) RecordBatch with 1 column.
Index type specific properties
Scan parameters
Chunks as arrow RecordBatch
The dataset to modify.
The new values.
The updated dataset entry
The entry to modify.
The new values for updatable fields.
The updated entry details
(message has no fields)
Write chunks to one or more partitions. The partition ID for each individual chunk is extracted from their metadata (`rerun:partition_id`). This endpoint requires the standard dataset headers.
(message has no fields)
Write record batches to a table. This endpoint requires the standard dataset headers. TODO(#11645): endpoints with streaming input are not supported by `grpc-web`. A non-streaming shim will need to be added if/when the viewer uses this endpoint.
(message has no fields)
TODO(zehiko) add properties as needed
Used in:
(message has no fields)
TODO(zehiko) add properties as needed
Used in:
(message has no fields)
Used in:
Where is the data for this data source stored (e.g. s3://bucket/file or file:///path/to/file)?
/ Which Partition Layer should this data source be registered to? / / Defaults to `base` if unspecified.
/ Is this a prefix URL (a directory)? / If true, all files of `typ` under this prefix will be / considered part of this data source.
What kind of data is it (e.g. rrd, mcap, Lance, etc)?
Used in:
Used in: ,
The blueprint dataset associated with this dataset (if any).
The partition of the blueprint dataset corresponding to the default blueprint (if any).
Used in: , ,
Dataset-specific information, may be update with `UpdateDatasetEntry`
Read-only
Optional debug info
Used in:
The amount of memory used by the task or service call in bytes
Minimal info about an Entry for high-level catalog summary
Used in: , , ,
The EntryId is immutable
The name of this entry.
The kind of entry
Updatable fields of an Entry
Used in:
The name of this entry.
Used in:
What type of entry. This has strong implication on which APIs are available for this entry.
Used in: ,
Always reserve unspecified as default value
Order as TYPE, TYPE_VIEW so things stay consistent as we introduce new types.
Application level error - used as `details` in the `google.rpc.Status` message
error code
unique identifier associated with the request (e.g. recording id, recording storage url)
human readable details about the error
Error codes for application level errors
Used in:
unused
object store access error
metadata database access error
Encoding / decoding error
used to define which column we want to index
Used in: , ,
The path of the entity.
Component details
Used in: , , ,
what kind of index do we want to create and what are its index specific properties.
Component / column we want to index.
What is the filter index i.e. timeline for which we will query the timepoints. TODO(zehiko) this might go away and we might just index across all the timelines
Used in:
Used in:
specific index query properties based on the index type
Used in:
TODO(zehiko) add other properties as needed
TODO(zehiko) add properties as needed
Used in:
(message has no fields)
A foreign table stored as a Lance table.
The URL of the Lance table.
Used in:
If specified, will perform a latest-at query with the given parameters. You can combine this with a `QueryRange` in order to gather all the relevant chunks for a full-fledged dataframe query (i.e. they get OR'd together).
If specified, will perform a range query with the given parameters. You can combine this with a `QueryLatestAt` in order to gather all the relevant chunks for a full-fledged dataframe query (i.e. they get OR'd together).
If true, `columns` will contain the entire schema.
If true, `columns` always includes `chunk_id`,
If true, `columns` always includes `byte_offset` and `byte_size`.
If true, `columns` always includes `entity_path`.
If true, `columns` always includes all static component-level indexes.
If true, `columns` always includes all temporal chunk-level indexes.
If true, `columns` always includes all component-level indexes.
A chunk-level latest-at query, aka `LatestAtRelevantChunks`. This has the exact same semantics as the query of the same name on our `ChunkStore`.
Used in:
Which index column should we perform the query on? E.g. `log_time`. Leave this empty to query for static data.
What index value are we looking for? Leave this empty to query for static data.
/ A chunk-level range query, aka `RangeRelevantChunks`. This has the exact same semantics as the query of the same name on our `ChunkStore`.
Used in:
Which index column should we perform the query on? E.g. `log_time`.
What index range are we looking for?
Used in:
Always reserve unspecified as default value
Not used yet
All of the entries in the associated namespace
Used in: , ,
Details specific to the table-provider
Used in:
Always reserve unspecified as default value
Appends new rows to the existing table without modifying any existing rows.
Overwrites all existing rows in the table with the new rows.
Used in:
Used in:
Used in:
num_partitions is deprecated. Use target_partition_num_rows instead TODO(RR-2798): Remove in 0.6
Target size of the IVF partition in rows