Get desktop application:
View/edit binary Protocol Buffers messages
--- Write data ---
Register new partitions with the Dataset (asynchronously)
Register new partitions with the Dataset (blocking)
TODO(andrea): This is a copy of RegisterWithDatasetRequest. Eventually we _may_ get rid of the sync version; until then, we should make sure that the two objects are in sync.
Unimplemented.
Returns the schema of the partition table (i.e. the dataset manifest) itself, *not* the underlying dataset. * To inspect the data of the partition table, use `ScanPartitionTable`. * To retrieve the schema of the underlying dataset, use `GetDatasetSchema` instead.
Inspect the contents of the partition table (i.e. the dataset manifest). The returned data will follow the schema specified by `GetPartitionTableSchema`.
Returns the schema of the dataset. This is the union of all the schemas from all the underlying partitions. It will contain all the indexes, entities and components present in the dataset.
Creates a custom index for a specific column (vector search, full-text search, etc). Index can be created for all or specific partitions. Creating an index will create a new index-specific chunk manifest for the Dataset. Chunk manifest contains information about individual chunk rows for all chunks containing relevant index data.
List of specific partitions that will be indexed (all if left empty).
Specify behavior when index for a partition was already created.
Recreate an index with the same configuration but (potentially) new data.
Search a previously created index. Do a full text, vector or scalar search. Currently only an Indexed search is supported, user must first call `CreateIndex` for the relevant column. The response is a RecordBatch with 4 columns: - 'partition_id': which partition the data is from - 'timepoint': represents the points in time where index query matches. What time points are matched depends on the type of index that is queried. For example: for vector search it might be timepoints where top-K matches are found within *each* partition in the indexed entry. For inverted index it might be timepoints where the query string is found in the indexed column - instance column: if index column contains a batch of values (for example a list of embeddings), then each instance of the batch is a separate row in the resulting RecordBatch - instance_id: this is a simple element index in the batch array. For example if indexed column is a list of embeddings \[a,b,c\] (where each embedding is of same length) then 'instance_id' of embedding 'a' is 0, 'instance_id' of 'b' is 1, etc. TODO(zehiko) add support for "brute force" search.
Dataset for which we want to search index
Index column that is queried
Query data - type of data is index specific. Caller must ensure to provide the right type. For vector search this should be a vector of appropriate size, for inverted index this should be a string. Query data is represented as a unit (single row) RecordBatch with 1 column.
Index type specific properties
Scan parameters
Perform Rerun-native queries on a dataset, returning the matching chunk IDs. These Rerun-native queries include: * Filtering by specific partition and chunk IDs. * Latest-at, range and dataframe queries. * Arbitrary Lance filters. To fetch the actual chunks themselves, see `GetChunks`. Passing chunk IDs to this method effectively acts as a IF_EXIST filter.
Dataset client wants to query
Client can specify what partitions are queried. If left unspecified (empty list), all partitions will be queried.
Client can specify specific chunk ids to include. If left unspecified (empty list), all chunks that match other query parameters will be included.
Which entity paths are we interested in? Leave empty to query all of them.
Generic parameters that will influence the behavior of the Lance scanner.
A chunk-level latest-at or range query, or both. This query is AND'd together with the `partition_ids` and `chunk_ids` filters above.
Perform Rerun-native queries on a dataset, returning the underlying chunks. These Rerun-native queries include: * Filtering by specific partition and chunk IDs. * Latest-at, range and dataframe queries. * Arbitrary Lance filters. To fetch only the actual chunk IDs rather than the chunks themselves, see `QueryDataset`.
Dataset for which we want to get chunks
Client can specify from which partitions to get chunks. If left unspecified (empty list), data from all partition (that match other query parameters) will be included.
Client can specify chunk ids to include. If left unspecified (empty list), all chunks (that match other query parameters) will be included.
Which entity paths are we interested in? Leave empty to query all of them.
A chunk-level latest-at or range query, or both. This query is AND'd together with the `partition_ids` and `chunk_ids` filters above.
Retrieves the chunk manifest for a specific index.
Dataset for which we want to fetch chunk manifest
Chunk manifest is index specific
Scan parameters
Chunk manifest as arrow RecordBatches
Create manifests for all partitions in the Dataset. Partition manifest contains information about the chunks in the partitions. This is normally automatically done as part of the registration process.
Dataset for which we want to create manifests
Create manifest for specific partitions. All will be created if left unspecified (empty list)
types of partitions and their storage location (same order as partition ids above)
Define what happens if create is called multiple times for the same Dataset / partitions
Fetch the internal state of a Partition Manifest.
TODO(cmc): this should have response extensions too.
TODO(zehiko) add properties as needed
Used in:
(message has no fields)
TODO(zehiko) add properties as needed
Used in:
(message has no fields)
Used as response type in: frontend.v1alpha1.FrontendService.CreateIndex, ManifestRegistryService.CreateIndex
Used in:
, , ,Where is the data for this data source stored (e.g. s3://bucket/file or file:///path/to/file)?
What kind of data is it (e.g. rrd, mcap, Lance, etc)?
Used in:
Application level error - used as `details` in the `google.rpc.Status` message
error code
unique identifier associated with the request (e.g. recording id, recording storage url)
human readable details about the error
Error codes for application level errors
Used in:
unused
object store access error
metadata database access error
Encoding / decoding error
Used as response type in: frontend.v1alpha1.FrontendService.GetChunks, ManifestRegistryService.GetChunks
Used as response type in: frontend.v1alpha1.FrontendService.GetDatasetSchema, ManifestRegistryService.GetDatasetSchema
Used as response type in: frontend.v1alpha1.FrontendService.GetPartitionTableSchema, ManifestRegistryService.GetPartitionTableSchema
used to define which column we want to index
Used in:
, , ,The path of the entity.
Component details
Used in:
,what kind of index do we want to create and what are its index specific properties.
Component / column we want to index.
What is the filter index i.e. timeline for which we will query the timepoints. TODO(zehiko) this might go away and we might just index across all the timelines
Used in:
Used in:
,specific index query properties based on the index type
Used in:
TODO(zehiko) add other properties as needed
TODO(zehiko) add properties as needed
Used in:
(message has no fields)
Used in:
, , ,If specified, will perform a latest-at query with the given parameters. You can combine this with a `QueryRange` in order to gather all the relevant chunks for a full-fledged dataframe query (i.e. they get OR'd together).
If specified, will perform a range query with the given parameters. You can combine this with a `QueryLatestAt` in order to gather all the relevant chunks for a full-fledged dataframe query (i.e. they get OR'd together).
If true, `columns` will contain the entire schema.
If true, `columns` always includes `chunk_id`,
If true, `columns` always includes `byte_offset` and `byte_size`.
If true, `columns` always includes `entity_path`.
If true, `columns` always includes all static component-level indexes.
If true, `columns` always includes all temporal chunk-level indexes.
If true, `columns` always includes all component-level indexes.
Used as response type in: frontend.v1alpha1.FrontendService.QueryDataset, ManifestRegistryService.QueryDataset
A chunk-level latest-at query, aka `LatestAtRelevantChunks`. This has the exact same semantics as the query of the same name on our `ChunkStore`.
Used in:
Which index column should we perform the query on? E.g. `log_time`.
What index value are we looking for?
Which components are we interested in? If left unspecified, all existing components are considered of interest. This will perform a basic fuzzy match on the available columns' descriptors. The fuzzy logic is a simple case-sensitive `contains()` query. For example, given a `log_tick__SeriesLines:StrokeWidth#width` index, all of the following would match: `SeriesLines:StrokeWidth#width`, `StrokeWidth`, `Stroke`, `Width`, `width`, `SeriesLines`, etc.
TODO(cmc): I shall bring that back into a more structured form later. repeated rerun.common.v1alpha1.ComponentDescriptor fuzzy_descriptors = 3;
/ A chunk-level range query, aka `RangeRelevantChunks`. This has the exact same semantics as the query of the same name on our `ChunkStore`.
Used in:
Which index column should we perform the query on? E.g. `log_time`.
What index range are we looking for?
Which components are we interested in? If left unspecified, all existing components are considered of interest. This will perform a basic fuzzy match on the available columns' descriptors. The fuzzy logic is a simple case-sensitive `contains()` query. For example, given a `log_tick__SeriesLines:StrokeWidth#width` index, all of the following would match: `SeriesLines:StrokeWidth#width`, `StrokeWidth`, `Stroke`, `Width`, `width`, `SeriesLines`, etc.
TODO(cmc): I shall bring that back into a more structured form later. repeated rerun.common.v1alpha1.ComponentDescriptor fuzzy_descriptors = 3;
Used as response type in: frontend.v1alpha1.FrontendService.ReIndex, ManifestRegistryService.ReIndex
Used as response type in: frontend.v1alpha1.FrontendService.RegisterWithDataset, ManifestRegistryService.RegisterWithDataset
Used as response type in: frontend.v1alpha1.FrontendService.ScanPartitionTable, ManifestRegistryService.ScanPartitionTable
Partitions metadata as arrow RecordBatch
Used as response type in: frontend.v1alpha1.FrontendService.SearchDataset, ManifestRegistryService.SearchDataset
Chunks as arrow RecordBatch
Used in:
Used in:
Used in:
Used as response type in: frontend.v1alpha1.FrontendService.WriteChunks, ManifestRegistryService.WriteChunks
(message has no fields)