Get desktop application:
View/edit binary Protocol Buffers messages
AggregationType defines a list of various aggregations.
Used in:
,Column is analogous to a database column and defines various semantic properties of a column. A column can either simply be a column in the base database schema or it can be an arbitrary expression over the base schema, e.g. `base_column1 + base_column2`.
Used in:
A descriptive name for this column.
A list of other terms/phrases used to refer to this column.
A brief description about this column, including things like what data this column has.
The SQL expression for this column. Could simply be a base table column name or an arbitrary SQL expression over one or more columns of the base table.
The data type of this column. TODO(nsehrawat): Consider creating an enum instead, with all snowflake support data types.
The kind of this column - dimension or fact, metric.
If true, assume that this column has unique values.
If no aggregation is specified, then this is the default aggregation applied to this column in contxt of a grouping.
Sample values of this column.
Whether to index the values and retrieve them based on the question. If False, all sample values will be used as input to the model.
Retrieved literals of this column.
A Cortex Search Service configured on this column to retrieve literals.
If true, this column has limited possible values, all of which are in the sample_values field.
ColumnKind defines various kinds of columns, mainly categorized into dimensions and measures.
Used in:
A column containing categorical values such as names, countries, dates.
A column containing numerical values such as revenue, impressions, salary. TODO: migrate to fact.
A column containing date/time data.
A "column" containing calculations about an entity such as sum_revenue, cvr.
Fully qualified Cortex Search Service name.
Used in:
,Dimension columns contain categorical values (e.g. state, user_type, platform). NOTE: If modifying this protobuf, make appropriate changes in context_to_column_format() of snowpilot/semantic_context/protos/schema.py.
Used in:
A descriptive name for this dimension.
A list of other terms/phrases used to refer to this dimension.
A brief description about this dimension, including things like what data this dimension has.
The SQL expression defining this dimension. Could simply be a physical column name or an arbitrary SQL expression over one or more columns of the physical table.
The data type of this dimension. TODO(nsehrawat): Consider creating an enum instead with all snowflake support data types.
If true, assume that this dimension has unique values.
Sample values of this column.
A Cortex Search Service configured on this column to retrieve literals.
If true, this column has limited possible values, all of which are in the sample_values field.
Measure columns contain numerical values (e.g. revenue, impressions, salary). NOTE: If modifying this protobuf, make appropriate changes in to_column_format() of snowpilot/semantic_context/utils/utils.py.
Used in:
A descriptive name for this measure.
A list of other terms/phrases used to refer to this measure.
A brief description about this measure, including things like what data it has.
The SQL expression defining this measure. Could simply be a physical column name or an arbitrary SQL expression over one or more physical columns of the underlying physical table.
The data type of this measure. TODO(nsehrawat): Consider creating an enum instead, with all snowflake support data types.
If no aggregation is specified, then this is the default aggregation applied to this measure in contxt of a grouping.
Sample values of this measure.
Defines a foreign key that references the primary key of another table.
Used in:
Base column names of the foreign key table.
The primary key table that this foreign key references.
Base column names of the primary key table.
FullyQualifiedTable is used to represent three part table names - (database, schema, table).
Used in:
,Type of the join - inner, left outer, etc.
Used in:
Metric are named computation over a collection of columns. For now, we only allow a metric to be defined over columns from a single table. In future, we'll expand to allowing metrics that refer to columns from multiple tables.
Used in:
A descriptive name of the metric.
A list of other term/phrases used to refer to this metric.
A brief description of this metric, including details of what it computes.
The SQL expression to compute this metric. All columns used must be fully qualified with the logical table name. Expression must be an aggregate
The filter associated with this metric. Do not expose this for now.
Used in:
A message that encapsulates custom instructions for each module.
Used in:
Custom instructions for SQL Generation.
Custom instructions for Question Categorization.
Filter represents a named SQL expression that's used for filtering. TODO: add validation. we should only support where clause style filter (no aggregations) and reject having clauses.
Used in:
A descriptive name for this filter.
A list of other term/phrases used to refer to this column.
A brief description about this column, including details of what this filter is typically used for.
The SQL expression of this filter.
Defines a primary key of a table. In the general case, primary keys are a collection of columns of the table. For discussion: PK FK are potentially duplicative to join path in a semantic model. However, it implies uniqueness which can be informative for getting right aggregation level. For that reason, we are exposing only the PrimaryKey currently. Join paths seem more extensible than foreign keys for supporting join. Further experimentation is needed to see if JoinPath and ForeignKey can yield similar results.
Used in:
Base column names that constitute the primary key.
Used in:
Only support equi-join relationship for now.
Relationship represents a join between two tables.
Used in:
A unique name of the join.
The left hand side table of the join.
The right hand side table of the join.
The expression used to join left and right tables. Only used internally.
Keys directly represent the join relationship.
Type of the join.
Type of the relationship.
Type of the relationship - one-to-one, many-to-one, etc.
Used in:
Used in:
The semantic context relevant to generating SQL for answering a data question.
A descriptive name of the project.
A brief description of this project, including details of what kind of analysis does this project enable.
List of tables in this project.
List of relationships in this project.
List of verified queries for this semantic model.
Custom instructions that will be applied to the final SQL generation.
Module-specific custom instructions. The SQL generation instruction here will take precedence over the legacy custom_instructions if it exists.
Table is analogous to a database table and provides a simple view over an existing database table. A table can leave out some columns from the base table and/or introduce new derived columns.
Used in:
A descriptive name for this table.
A list of other term/phrases used to refer to this table.
A brief description of this table, including details of what kinds of analysis is it typically used for.
Fully qualified name of the underlying base table.
We allow two formats for specifying logical columns of a table: 1. As a list of columns. 2. As three separate list of dimensions, time dimensions, and measures. For the external facing yaml specification, we have chosen to go with (2). However, for the time being we'll support both (1) and (2) and continue using (1) as the internal representation.
Primary key of the table, if any.
Foreign keys of the table, if any.
Predefined filters on this table, if any.
NEXT_TAG: 14.
Time dimension columns contain time values (e.g. sale_date, created_at, year). NOTE: If modifying this protobuf, make appropriate changes in to_column_format() of snowpilot/semantic_context/utils/utils.py.
Used in:
A descriptive name for this time dimension.
A list of other terms/phrases used to refer to this time dimension.
A brief description about this time dimension, including things like what data it has, the timezone of values, etc.
The SQL expression defining this time dimension. Could simply be a physical column name or an arbitrary SQL expression over one or more columns of the physical table.
The data type of this time dimension. TODO(nsehrawat): Consider creating an enum instead, with all snowflake support data types.
If true, assume that this time dimension has unique values.
Sample values of this time dimension.
VerifiedQuery represents a (question, sql) pair that has been manually verified (e.g. by an analyst) to be correct.
Used in:
,A name for this verified query. Mainly used for display purposes.
The name of the semantic model on which this verified query is based off.
The question being answered.
The correct SQL query for answering the question.
Timestamp at which the query was last verified - measures in seconds since epoch, in UTC.
Name of the person who verified this query.
Whether to always include in this question in the suggested questions module
VerifiedQueryRepository is a simply a collection of verified queries.