Proto commits in GlareDB/glaredb

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

Commit:456b26b
Author:Sean Smith
Committer:GitHub

chore: Rename some crates (#3464) Except `rayexec_python`, want to rename that when we're ready to overwrite the current glaredb lib. `rayexec_wasm` -> `glaredb_wasm` is fine since we never has a wasm lib before.

The documentation is generated from this commit.

Commit:f70d6d3
Author:Sean Smith
Committer:GitHub

feat: Execution rework (#3401)

Commit:d262071
Author:Sean Smith
Committer:GitHub

chore: Move physical type types, add in addressable traits (#3390) Split out from #3382

Commit:60128c0
Author:Sean Smith
Committer:GitHub

chore: Remove LargeUtf8/LargeBinary (#3333) Utf8/Binary already covers both, there's no difference when executing on the physical types.

Commit:91f7c14
Author:Sean Smith
Committer:GitHub

feat: More numeric functions (#3303) Adds a bunch of numeric functions. Currently many of these work on floats, decimal support to be added. Current scalar function list: ``` >> select * from list_functions() where function_type = 'scalar'; ┌───────────────┬───────────────┬───────────────┬───────────────┐ │ database_name │ schema_name │ function_name │ function_type │ │ Utf8 │ Utf8 │ Utf8 │ Utf8 │ ├───────────────┼───────────────┼───────────────┼───────────────┤ │ system │ glare_catalog │ or │ scalar │ │ system │ glare_catalog │ ceil │ scalar │ │ system │ glare_catalog │ floor │ scalar │ │ system │ glare_catalog │ degrees │ scalar │ │ system │ glare_catalog │ radians │ scalar │ │ system │ glare_catalog │ contains │ scalar │ │ system │ glare_catalog │ struct_pack │ scalar │ │ system │ glare_catalog │ negate │ scalar │ │ system │ glare_catalog │ random │ scalar │ │ system │ glare_catalog │ list_extract │ scalar │ │ system │ glare_catalog │ is_null │ scalar │ │ system │ glare_catalog │ is_not_true │ scalar │ │ system │ glare_catalog │ is_false │ scalar │ │ system │ glare_catalog │ + │ scalar │ │ system │ glare_catalog │ mul │ scalar │ │ system │ glare_catalog │ / │ scalar │ │ system │ glare_catalog │ rem │ scalar │ │ system │ glare_catalog │ mod │ scalar │ │ system │ glare_catalog │ < │ scalar │ │ system │ glare_catalog │ abs │ scalar │ │ system │ glare_catalog │ acos │ scalar │ │ system │ glare_catalog │ atan │ scalar │ │ system │ glare_catalog │ cos │ scalar │ │ system │ glare_catalog │ exp │ scalar │ │ system │ glare_catalog │ log │ scalar │ │ system │ glare_catalog │ log2 │ scalar │ │ system │ glare_catalog │ isnan │ scalar │ │ system │ glare_catalog │ substring │ scalar │ │ system │ glare_catalog │ prefix │ scalar │ │ system │ glare_catalog │ length │ scalar │ │ system │ glare_catalog │ char_length │ scalar │ │ system │ glare_catalog │ not │ scalar │ │ system │ glare_catalog │ list_values │ scalar │ │ system │ glare_catalog │ date_part │ scalar │ │ system │ glare_catalog │ date_trunc │ scalar │ │ system │ glare_catalog │ epoch │ scalar │ │ system │ glare_catalog │ array_distan… │ scalar │ │ system │ glare_catalog │ sub │ scalar │ │ system │ glare_catalog │ div │ scalar │ │ system │ glare_catalog │ % │ scalar │ │ system │ glare_catalog │ <> │ scalar │ │ system │ glare_catalog │ != │ scalar │ │ system │ glare_catalog │ <= │ scalar │ │ system │ glare_catalog │ >= │ scalar │ │ system │ glare_catalog │ sin │ scalar │ │ system │ glare_catalog │ sqrt │ scalar │ │ system │ glare_catalog │ tan │ scalar │ │ system │ glare_catalog │ substr │ scalar │ │ system │ glare_catalog │ suffix │ scalar │ │ system │ glare_catalog │ character_le… │ scalar │ │ system │ glare_catalog │ regexp_repla… │ scalar │ │ system │ glare_catalog │ like │ scalar │ │ system │ glare_catalog │ is_true │ scalar │ │ system │ glare_catalog │ is_not_false │ scalar │ │ system │ glare_catalog │ l2_distance │ scalar │ │ system │ glare_catalog │ add │ scalar │ │ system │ glare_catalog │ - │ scalar │ │ system │ glare_catalog │ * │ scalar │ │ system │ glare_catalog │ and │ scalar │ │ system │ glare_catalog │ = │ scalar │ │ system │ glare_catalog │ > │ scalar │ │ system │ glare_catalog │ asin │ scalar │ │ system │ glare_catalog │ cbrt │ scalar │ │ system │ glare_catalog │ ln │ scalar │ │ system │ glare_catalog │ repeat │ scalar │ │ system │ glare_catalog │ starts_with │ scalar │ │ system │ glare_catalog │ ends_with │ scalar │ │ system │ glare_catalog │ concat │ scalar │ │ system │ glare_catalog │ epoch_ms │ scalar │ │ system │ glare_catalog │ epoch_s │ scalar │ │ system │ glare_catalog │ is_not_null │ scalar │ └───────────────┴───────────────┴───────────────┴───────────────┘ ```

Commit:0813f43
Author:Sean Smith
Committer:GitHub

feat: Clickbench tests (#337) * bump tesdata submodule * int16 cast * larger internal avg state * fixup! larger internal avg state * queries * add clickbench to ci * more queries, epoch functions * more * more, min/max utf8 * more, string length * regexp_replace * more queries * lint

Commit:5d87fb6
Author:Sean Smith
Committer:GitHub

feat: Add f16 data type (#335) * add fp16 * fixup! add fp16 * fixup! add fp16

Commit:fb808d1
Author:Sean Smith
Committer:GitHub

feat: Distinct aggregates (#331) * distinct aggs * wire up * lint

Commit:fb41576
Author:Sean Smith
Committer:GitHub

perf: Join order (#295) * shared or owned * logical selection * bitmap * some reorder * fixup! some reorder * asd;fjl * initial cost * allow pushing down filters for inner join * stuff and things * fixup! stuff and things * hyper * sure * fixup! sure * sure * sure * fixup! sure * ok * things * make explains a bit more readable * include stats in explain * things * ok * remove projection * stats * qwoeriu * hyper * edge * estimate * filters as edges * generate filters * asd * quick * bump version * ok * fixup! ok * correct order q5 * some comments * account for cross join * lint * move variable

Commit:1149f05
Author:Sean Smith
Committer:GitHub

feat: Some optimization stuff (#261) * python benchmark script * add some methods * add files option to bin * utf8 * reorder materialized * get column names from bind context in explain * wip more reorder * recompute table refs for mat scan after reorder * hash table resize * things * split local/global hash table types * allow operators to produce multiple batches * initial stats * column pruning pt 1 * pt 2 * pt 3 * better parquet projectoin * fix panic * pass bind context to expr rewrite * const fold, bump version * try parsing as i32 first * bitmap zip * computed batch module * selection * quick fix, hash join not handling selection right * different storage * ok * stuff and things * physical * seems reasonable * ok * fixup! ok * german * german builder * union metadata * string is sane * more * binary executor * Uniform * ternary * hmm * ok * add * arith * compile * concat * string * constant value as array * batch shims * cast fail state * cast and try cast technically * remove concat * remove slice * rename sort ops * select exec * select * sure * fix selection * fixup! fix selection * fixup! fix selection * fixup cross join, delete some coed * fixup! fixup cross join, delete some coed * typed/untyped null * fixup! typed/untyped null * select the selection * fixup! select the selection * rm old compute filter * rm old compute take code * rm old scalar * wip ipc changes * fixup ipc * remove lifetime from array data buffer * fixup row encoding * rm eval2, select2 * fill state * fixup sort acc * fxup formatting * fixup parquet read/write * unary update * fixup! unary update * from row * things * ok * more * more * more * date part * rm macros * decimal special casing * things * delete some code * fixups * rm array2 * fix selecting selected * test fixups * more fixups * amazing * fix selecting with nulls * Preallocate encoded row offsets * script fixes * validate decimal precision * add back case, standard slts passing * fixup! add back case, standard slts passing * simplify avg/sum aggregates * fixup sum unit tests * add full datatype in error message in extract date part * replace const eval with const fold rewrite * fixup ipc test * lint * more lint * more lint * comment out q15 * fixup! comment out q15 * do a second filter pushdown

Commit:5405f71
Author:Sean Smith
Committer:GitHub

feat: Materialized CTEs (#225) * return multiple values per hash * set proper table refs for using + right join * uncomment some slts * fix cte col aliases * prep subquery tests * fixup! prep subquery tests * move correlated columns into scope * look * ok * logical plan looks right * fixup! logical plan looks right * plan plna plan * mat * group by test case * mat * intermediate * asdkflj * minor executable planning cleanup * planet * keep ast cte def in ast tree after resolve * bind cte before binding query body * more cte * somehow * lint + comments * more lint * bump version

Commit:cb2bf93
Author:Sean Smith
Committer:GitHub

feat: Robustify planning (#208) * test * fixup! test * stub split * fixup! stub split * BindData -> BindContext, for consistency * rename binder mod to resolver * module renames * rename proto types * some type renames * method renames * more renames * rename more, new binder stub * context context context * bind * try resolve from select list * grouping sets * scan stuff * more * expression stubs * more * empty and some other things * bind select list * things * more * seems reasonable * ok * ok * ok * attach, detach, set, reset * small rename * insert * fixup! insert * create schema * create table * describe * explain * copy to * phys expr planner * phys agg * phys proj * track table refs on logical node * limit, describe phys * phys scan * phys create schema, table * phys drop, insert, copy to * agg stuff * phys sort * start trying to execute, unnest explain module * explain formatter, alias stuff * add a test * LogicalNode -> Node * remove table refs from node, add LogicalNode trait * ok this will work * refs on the rest * use child refs * more froms * fixup! more froms * from values * fixup! from values * more explain info * expr things * arith ops, subquery projection * display impls * bind setop * wip plan setop * fix bug * agg preproject * handle preproject in intermediate planner instead * remove ast expressions from select list * fix bug * cast operator inputs during bind * update proj expresion with reference to group by expression * fix group by no aggregates * column binder * more * replacement equivalent columns in projection with group by * ensure appended expressions in project list * ensure appended projections are replaced too * conj * expand directly into column expr * more join planning * more planning * fix cross join * fix join side bug * fix flip * fix split conjunction * bind cte * fix from alias for cte * subquery stuff * mut bind context in planning * scalar subquery * push arb * make and/or functions variadic * subquery exists * starts with, concat * extend column binding * agg stuff * logic is hard * deduplicate expressions in project & group by * robustify order by * additional tests * skip an agg test * verify column exprs * factor out left join bitmaps * add back HAVING, also some join stuff apparently * hash join stuff * ok * actually merge * USING * allow selecting using column * remove previous hash join impl * remove old physical expr * remove old logical planning stuff * remove old stuff * some lint * unnest nl join module * lint, some proto stuff * proto stuff * fixup! proto stuff * lint * fixups * fixup! fixups * bump version

Commit:88b97f1
Author:Sean Smith
Committer:GitHub

feat: Hybrid inserts (#196) * feat: Hybrid inserts * bump version

Commit:d9476ec
Author:Sean Smith
Committer:GitHub

feat: FORMAT option for COPY TO (#194) * parse copy to options * bind options * lookup * lint * bump version

Commit:f845bbf
Author:Sean Smith
Committer:GitHub

feat: COPY TO in hybrid queries (#188) * copy to proto * insert copy to functions into catalog * copy to in bind data * use it * bump version * physical copy to proto, bump version

Commit:9348d1f
Author:Sean Smith
Committer:GitHub

feat: Send attach info for hybrid plan requests (#183) * handle attach * fixup! handle attach * include database info in unbound table reference * proto * context extender * probably * save money * lint * bump version * removed old comment * add back scan proto, bump version * use location when planning scan * sure

Commit:b09391d
Author:Sean Smith
Committer:GitHub

feat: Memory catalog, persistent data source connections (#175) * stub * add scc * comments * notes * similarity * some creates * storage traits * load into system * datasource connection * fixup! datasource connection * use memory catalog, table storage in operators & planning * update errors, fix bugs * fix bugs * use memory catalogs in system funcs * don't return empty batch, return batch with no rows * correct varlen len * fix bugs * set bool after * list schemas, wasm array * bump version * remove some code * swap * some lint * lint * comments, nice error * bump version

Commit:c78e797
Author:Sean Smith
Committer:GitHub

feat: Initial hybrid exec (#153) * operator name * map * physical operator enum * fixup! physical operator enum * use it * ok * proto * more proto * more proto * more * operator convert * packed * seems too easy * function proto * remove serde for scalars * remove serde for aggs * remove serde from table funcs * unfuck bind data * bind data proto * bound ast proto * logical unary/binary in bound ast * fixup! logical unary/binary in bound ast * add back serde for ast * fixup! add back serde for ast * remove bound proto SQL PROTO * client stream * server buffer * sure * ok * move sink to executable * close * start hybrid requests * ok * fixup! ok * more * proto messages * ok * sure * switch on bind mode * handlers * attach client * loc * stream ids on intermediate * remote sink * ok * some more * lint * bump version * protoc ci * fixup! protoc ci * would be cool if that worked * : * coll * asdf * don't care * things

Commit:dd1dafb
Author:Sam Kleinman
Committer:GitHub

feat: `jaq` filter for input to json tables (#3099)

Commit:c75c961
Author:Sam Kleinman
Committer:GitHub

fix: excel table function should read data more consistently from remote sources (#3043)

Commit:d8e636d
Author:universalmind303
Committer:GitHub

fix: inserts clean up on failure (#2875) This fixes a bug introduced by datafusion 36 previously some operations that failed during optimization such as invalid casts now fail at runtime. Since they now fail at runtime, it means that we would still create the catalog and table and only fail during insert afterwards. This left both the catalog and the storage in a bad state that didn't accurately reflect the operation. e.g. ```sql create table invalid_ctas as (select cast('test' as int) as 'bad_cast'); ``` This updated the catalog and created a table for `invalid_ctas`, but when you'd query it you would get an error. This PR makes sure that the operation is successful before committing the changes. It does so by exposing some new methods on the catalog client. `commit_state` `mutate_and_commit` and `mutate` instead of the previous `mutate`. The existing code was refactored to use the `mutate_and_commit` which is the same as the old `mutate`. The code that requires the commit semantics (create table) now uses `mutate` to first get an uncommitted catalog state with those changes, then does all of it's other actions - create the "native" table - _Optional_ inserts into the table - commits the catalog state If any of the operations before the commit fail, then the catalog mutations are never committed. --------- Co-authored-by: Sean Smith <scsmithr@gmail.com>

Commit:6cf8f2b
Author:universalmind303
Committer:GitHub

refactor: table options (#2767)

Commit:589b9e3
Author:Sam Kleinman
Committer:GitHub

feat: access sqlite dbs from object storage (#2772) Co-authored-by: Vaibhav Rabber <vrongmeal@gmail.com>

Commit:e6e85aa
Author:universalmind303

pull credentials entirely out of the options

Commit:0f19436
Author:universalmind303

get all tests passing, TODO: external schema

Commit:ace5fc1
Author:universalmind303

wip

Commit:4335900
Author:universalmind303

i think first pass is done

Commit:932884e
Author:universalmind303

wip: table options refactor

Commit:03b8ffb
Author:Sam Kleinman
Committer:GitHub

feat: initial external table schema (#2333)

Commit:6a03e0e
Author:universalmind303
Committer:GitHub

refactor: extensible function registry (#2699) I didn't want to create 1 massive undecipherable PR, so I wanted to submit this first. This ties a `FunctionRegistry` instance to a session instead of the previously global `FUNCTION_REGISTRY`. Right now, the entries are written to both the catalog and the function registry. I'd prefer writing to just one or the other, but havent quite figured that out yet as our udfs are not serializable so they cant be written directly to the catalog. Some other the notable changes so far - some catalog changes to allow for adding new user defined functions to the catalog. - internally to expose a new method `register_function` on the session. - rework `cosine_similarity` to the new datafusion udf impl pattern for easier usage with the new `register_function` method. I manually tested this by removing cosine_similarity from the builtin functions and registering it manually ```rs let mut sess = engine .new_local_session_context(SessionVars::default(), SessionStorageConfig::default()) .await?; let udf = Arc::new(CosineSimilarity::new()); sess.register_function(udf).await?; ``` This works as expected in **local** sessions only. We'll have to modify the planner to handle udfs in a hybrid context at a later date. ---- ### Next steps. I'm going to try to integrate the work I did with this datafusion <--> polars POC in a subsequent PR. https://github.com/universalmind303/polar-fusion

Commit:55bf8a3
Author:universalmind303

wip: extensible function registry

Commit:5051b59
Author:Vaibhav Rabber
Committer:GitHub

feat: SQLite data source (#2568) - [x] `CREATE EXTERNAL TABLE` - [x] `CREATE EXTERNAL DATABASE` - [x] Virtual listing - [x] Tests - [x] Fix filter pushdown for Sqlite - [x] Support dates/times/timestamps --------- Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:9b8483f
Author:hemanth94
Committer:GitHub

feat: create external table support for Excel (#2652)

Commit:56f8d38
Author:universalmind303
Committer:GitHub

feat: add openai_embed function (#2538) todo: - [ ] integration tests & slts

Commit:61e3981
Author:Sam Kleinman
Committer:GitHub

fix: add auth for cassandra (#2478) Co-authored-by: Grey <grey@glaredb.com>

Commit:abd3f55
Author:Sean Smith
Committer:GitHub

chore: Remove unneeded create credential message (#2474)

Commit:4b17c4a
Author:universalmind303
Committer:GitHub

feat: cassandra external database (#2374) I realized I branched off `slt-ci` instead of main for this. So this depends on https://github.com/GlareDB/glaredb/pull/2363 closes https://github.com/GlareDB/glaredb/issues/2346

Commit:0b4b1fc
Author:universalmind303
Committer:GitHub

feat: external table for cassandra (#2361) closes https://github.com/GlareDB/glaredb/issues/2345

Commit:f12fb20
Author:Vaibhav Rabber
Committer:GitHub

feat: Clickhouse followup items (#2340)

Commit:c21a473
Author:Sean Smith
Committer:GitHub

chore: Allow specifying aliases for table functions (#2364) Instead of duplicating the function definition itself. Also a few cleanups. Existing SLTs assert that this works as expected. SLTs for `parquet_scan`, etc are just copies of the `read_parquet` ones.

Commit:7c19b8b
Author:Sam Kleinman
Committer:GitHub

chore: mongo -> mongodb in symbol naming (#2343)

Commit:eaf795a
Author:Sean Smith
Committer:GitHub

feat: Clickhouse data source (#2300) Adds support for clickhouse as a data source. `read_clickhouse`: ``` > select * from read_clickhouse('clickhouse://localhost:9000/default', 'bikeshare_stations') limit 1; ┌────────────┬──────────┬────────┬─────────┬───┬─────────────────┬───────┬──────────────────┬───────────────────┐ │ station_id │ name │ status │ address │ … │ footprint_width │ notes │ council_district │ modified_date │ │ ── │ ── │ ── │ ── │ │ ── │ ── │ ── │ ── │ │ Int32 │ Utf8 │ Utf8 │ Utf8 │ │ Float32 │ Utf8 │ Int32 │ Timestamp<s, UTC> │ ╞════════════╪══════════╪════════╪═════════╪═══╪═════════════════╪═══════╪══════════════════╪═══════════════════╡ │ 0 │ South C… │ active │ 1901 S… │ … │ 10.0 │ In t… │ 9 │ 2022-03-04T09:01… │ └────────────┴──────────┴────────┴─────────┴───┴─────────────────┴───────┴──────────────────┴───────────────────┘ ``` External database: ``` > create external database ch ::: from clickhouse ::: options ( connection_string = 'clickhouse://localhost:9000/default' ); Database created > select status, address from ch.default.bikeshare_stations limit 1; ┌────────┬──────────────────────────┐ │ status │ address │ │ ── │ ── │ │ Utf8 │ Utf8 │ ╞════════╪══════════════════════════╡ │ active │ 1901 South Congress Ave. │ └────────┴──────────────────────────┘ ``` External table: ``` > create external table stations ::: from clickhouse ::: options ( connection_string = 'clickhouse://localhost:9000/default', ::: table = 'bikeshare_stations' ); Table created > select council_district, modified_date from stations limit 1; ┌──────────────────┬─────────────────────┐ │ council_district │ modified_date │ │ ── │ ── │ │ Int32 │ Timestamp<s, UTC> │ ╞══════════════════╪═════════════════════╡ │ 9 │ 2022-03-04T09:01:00 │ └──────────────────┴─────────────────────┘ ``` --- Follow up items: https://github.com/GlareDB/glaredb/issues/2315

Commit:633169d
Author:Sean Smith
Committer:GitHub

chore: Add .clang-format (#2312) Adds a `.clang-format` file for configuring `clang-format` (I use it for auto-formatting protobufs). My motivation for adding this in is by default clang-format will collapse a protobuf message with a single field and no comments into a single line. For example, it will format this: ```protobuf message Schema { repeated Field columns = 1; } ``` Into this: ```protobuf message Schema { repeated Field columns = 1; } ``` I don't like that way of formatting, and the only reason we had it in the code base was because I was using a unconfigured clang-format. Unfortunately I was not able to find a specific option for this (there's a ton, so I might've just missed it: https://clang.llvm.org/docs/ClangFormatStyleOptions.html), so instead I just enabled the Google style. This seems to do mostly what I want.

Commit:b5cec63
Author:Vaibhav Rabber
Committer:GitHub

chore: Remote `fn runtime_preference()` from `TableFunc` trait. (#2288) We should only have 1 source of truth for runtime preference for the functions. Here, `runtime_preference` method is removed so the runtime is always calculated from `detect_runtime`. Fixes #2165 --------- Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:e3cbff1
Author:Sam Kleinman
Committer:GitHub

feat(bson): add table function for bson files (#2202) Co-authored-by: Sean Smith <scsmithr@gmail.com>

Commit:a2d1c5f
Author:Sean Smith
Committer:GitHub

chore: Add go package option to protobuf definitions (#2241)

Commit:4256815
Author:Sean Smith
Committer:GitHub

feat: Simple query interface (#2201) Adds a simple query interface for use with Cloud.

Commit:b00ab67
Author:universalmind303
Committer:GitHub

feat: builtin descriptions & examples (#2178) closes #2177. Still need to finish adding examples and descriptions to all builtin functions, but this gives us a simple example & brief description for each builtin function. example: ```sql > select function_name, example, description from glare_catalog.functions where example is not null ORDER BY function_name asc; ┌────────────────────────────────────┬───────────────────────────────────────┬─────────────────────────────────────────────────────────────────────┐ │ function_name │ example │ description │ │ ── │ ── │ ── │ │ Utf8 │ Utf8 │ Utf8 │ ╞════════════════════════════════════╪═══════════════════════════════════════╪═════════════════════════════════════════════════════════════════════╡ │ approx_distinct │ approx_distinct(a) │ Gives the approximate count of distinct elements using HyperLogLog. │ │ approx_median │ approx_median(a) │ Gives the approximate median of a column │ │ approx_percentile_cont │ approx_percentile_cont(a) │ Gives the approximate percentile of a column │ │ approx_percentile_cont_with_weight │ approx_percentile_cont_with_weight(a) │ Gives the approximate percentile of a column with a weight column │ │ array_agg │ array_agg(a) │ Returns a list containing all the values of a column │ │ avg │ avg(a) │ Returns the average of a column │ │ bit_and │ bit_and(a) │ Returns the bitwise AND of a column │ │ bit_or │ bit_or(a) │ Returns the bitwise OR of a column │ │ bit_xor │ bit_xor(a) │ Returns the bitwise XOR of a column │ │ bool_and │ bool_and(a) │ Returns the boolean AND of a column │ │ … │ … │ … │ │ regr_r2 │ regr_r2(x, y) │ Returns the coefficient of determination (R-squared) for non-null … │ │ regr_slope │ regr_slope(y, x) │ Returns the slope of the linear regression line for non-null pairs… │ │ regr_sxx │ regr_sxx(y, x) │ Returns the sum of squares of the independent variable for non-nul… │ │ regr_sxy │ regr_sxy(y, x) │ Returns the sum of products of independent times dependent variabl… │ │ regr_syy │ regr_syy(y, x) │ Returns the sum of squares of the dependent variable for non-null … │ │ stddev │ stddev(a) │ Returns the sample standard deviation of a column │ │ stddev_pop │ stddev_pop(a) │ Returns the population standard deviation of a column │ │ sum │ sum(a) │ Returns the sum of a column │ │ variance │ variance(a) │ Returns the sample variance of a column │ │ variance_pop │ variance_pop(a) │ Returns the population variance of a column │ └────────────────────────────────────┴───────────────────────────────────────┴─────────────────────────────────────────────────────────────────────┘ ```

Commit:cffae1a
Author:universalmind303
Committer:GitHub

Merge branch 'main' into universalmind303/builtin-descriptions

Commit:035fe55
Author:Vaibhav Rabber
Committer:GitHub

chore: Send query text when executing query over RPC (#2179)

Commit:edcee38
Author:universalmind303
Committer:GitHub

Merge branch 'main' into universalmind303/builtin-descriptions

Commit:101f4c1
Author:Baasit
Committer:GitHub

chore: rename CREATE CREDENTIALS to CREATE CREDENTIAL (#2180) This PR adds `CREATE CRDENTIAL` as an SQL command and notifies the user of the deprecation of `CREATE CREDENTIALS`. Closes: #2133

Commit:7533701
Author:universalmind303

update bad comment string

Commit:e5c9cf9
Author:universalmind303
Committer:universalmind303

wip

Commit:667b574
Author:universalmind303
Committer:GitHub

fix: create or replace for credentials (#2134) closes #2107

Commit:582731d
Author:Vaibhav Rabber
Committer:GitHub

chore: Send `user_id` in RPC execute request to collect metrics (#2140) Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:ff955fa
Author:universalmind303
Committer:GitHub

feat: add lance as external table (#2120)

Commit:f9a276e
Author:Sean Smith
Committer:GitHub

feat: Azure blob storage (#1937) Adds support for reading from azure storage containers. Closes https://github.com/GlareDB/glaredb/issues/1838 # SQL Table function support for `azure://...` paths: ```sql SELECT * FROM csv_scan( 'azure://glaredb-test/copy_to/with_opts.csv', account_name => '${AZURE_ACCOUNT}', access_key => '${AZURE_ACCESS_KEY}' ); ``` Create credentials: ```sql CREATE CREDENTIALS azure_creds PROVIDER azure OPTIONS ( account_name = '${AZURE_ACCOUNT}', access_key = '${AZURE_ACCESS_KEY}', ); ``` Table function with azure creds: ```sql select * iceberg_scan('azure://glaredb-test/iceberg/tables/lineitem_simple', azure_creds); ``` # CI Adds the required secrets for hitting out azure account. The added SLTs are based off the existing tests we have for S3 and GCS. # Next - Azure data dir support (`con = glaredb.connect('azure://container/db')`) - Panic workaround on invalid access keys --------- Co-authored-by: tycho garen <garen@tychoish.com>

Commit:e93bf2b
Author:Sean Smith
Committer:GitHub

feat: SQL Server data source (#2018) Adds support for SQL Server as a data source Closes https://github.com/GlareDB/glaredb/issues/1101 # SQL Create external table: ```sql CREATE EXTERNAL TABLE large_table FROM sql_server OPTIONS ( connection_string = 'server=tcp:localhost,1433;user=SA;password=Password123;TrustServerCertificate=true', schema = 'dbo', table = 'bikeshare_trips' ); ``` Create external database: ```sql CREATE EXTERNAL DATABASE external_db FROM sql_server OPTIONS ( connection_string = 'server=tcp:localhost,1433;user=SA;password=Password123;TrustServerCertificate=true', ); ``` # Next - [ ] Virtual schema thing (for listing tables, column info) - [ ] `read_sqlserver` function - [ ] Push down filters to sql server - [ ] `datatypes` table for SLT

Commit:045ae3b
Author:Vaibhav Rabber
Committer:GitHub

feat: `ALTER [DATABASE/TABLE] .. SET ACCESS_MODE TO ..` (#2057) ``` > create external database pg from postgres ( ::: host = 'localhost', ::: port = 5433, ::: user = 'glaredb', ::: password = 'password', ::: database = 'glaredb_test' ); Database created > select * from pg.public.test_insert where a = 111; ┌───────┬──────┐ │ a │ b │ │ ── │ ── │ │ Int32 │ Utf8 │ ╞═══════╪══════╡ └───────┴──────┘ > insert into pg.public.test_insert values (111, 'glaredb'); Error: Not allowed to write into the object: pg.public.test_insert > select * from pg.public.test_insert where a = 111; ┌───────┬──────┐ │ a │ b │ │ ── │ ── │ │ Int32 │ Utf8 │ ╞═══════╪══════╡ └───────┴──────┘ > alter database pg set access_mode to read_write; Database altered > insert into pg.public.test_insert values (111, 'glaredb'); Inserted 1 row > select * from pg.public.test_insert where a = 111; ┌───────┬─────────┐ │ a │ b │ │ ── │ ── │ │ Int32 │ Utf8 │ ╞═══════╪═════════╡ │ 111 │ glaredb │ └───────┴─────────┘ > alter database pg set access_mode to read_only; Database altered > insert into pg.public.test_insert values (1234, 'glaredb'); Error: Not allowed to write into the object: pg.public.test_insert ``` Signed-off-by: Vaibhav <vrongmeal@gmail.com> Co-authored-by: Sean Smith <scsmithr@gmail.com>

Commit:63d43bc
Author:Vaibhav Rabber
Committer:GitHub

feat: Add "read-only" mode (#2026) ``` > create external table local_table from local ( location 'testdata/csv/empty_col.csv' ); Table created > select * from local_table; ┌───────┬──────┬───────┐ │ │ col1 │ col2 │ │ ── │ ── │ ── │ │ Int64 │ Utf8 │ Utf8 │ ╞═══════╪══════╪═══════╡ │ 0 │ a │ hello │ │ 1 │ b │ world │ └───────┴──────┴───────┘ > insert into local_table values (2, 'c', 'glaredb'); Error: Cannot write to read-only table: local_table ``` ### TODO - [ ] #2027 Fixes #2010 Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:5db69a9
Author:Grey
Committer:GitHub

!feat(cloudauth): rm ce param (#1915)

Commit:f1f5a40
Author:universalmind303
Committer:GitHub

feat: extend builtin function catalog (#1910) closes https://github.com/GlareDB/glaredb/issues/1798

Commit:fd518ce
Author:Marko Grujic
Committer:GitHub

feat: Remote session context per database instead of per connection (#1894) This is a pre-condition for implementing distributed execution. Progresses #1867.

Commit:e6d6326
Author:universalmind303
Committer:GitHub

fix: create or replace for external tables (#1879) closes #1869

Commit:6b7d235
Author:universalmind303
Committer:GitHub

fix: create or replace for native & temp tables (#1854) closes #1851 Note: `EXTERNAL TABLE` is not included in this PR. This only includes support for native & temp tables.

Commit:5673695
Author:Sean Smith
Committer:GitHub

chore: Move batch stream proto def to a new common file (#1868) Moves some types around and renames some things for clarity. This addes a new `rpcsrv/common.proto` for types that will be shared between internal and external apis. I also went ahead and started replacing `broadcast_id` with `work_id` as I believe it's clearer. I originally wrote a lot of the code around the concept of "broadcasting" batches, but I would say what we're currently doing (and will do with distribted exec) isn't really broadcasting, it just some piece of work tied to a stream.

Commit:f32c305
Author:Marko Grujic
Committer:GitHub

feat: Generic external Delta and Iceberg tables (#1797) The PR introduces a `StorageOptions` metastore options struct, which is aimed to be flexible enough to capture config options from various object stores for the `object_store` crate, as well as differing use cases in a very compact way (basically a map). Building on this, the PR also add support for: - Delta Lake databases in something other than S3 (though with Unity catalog still required); still need to add tests for this. - External Delta tables in arbitrary object stores (covered with tests). - External Iceberg tables in arbitrary object stores (covered with tests yet). Closes #1780

Commit:705e7e8
Author:Cory Grinstead
Committer:GitHub

feat: remote execution of table funcs via dispatcher (#1681) * resolve merge conflicts * clippy * fix: error messages werent transparent * forgot to add the table hints

Commit:0221f26
Author:Sean Smith
Committer:GitHub

feat: Update catalog with rpc (#1660) * feat: Update catalog with rpc This change makes it so that the local session will pull the catalog after it detects the query was a DDL query. ``` GlareDB (v0.4.0) Type \help for help. Using in-memory catalog > \open http:localhost:6542 Connected to Cloud deployment: unknown > create table hello7 (a int); Table created > insert into hello7 select 1 as a; Inserted 1 row > insert into hello7 select 2 as a; Inserted 1 row > select * from hello7; ┌────────────┐ │ $operation │ │ ── │ │ Utf8 │ ╞════════════╡ │ 1 │ │ 2 │ └────────────┘ ``` Other changes: - Reverts some of the execution stream/result stuff. It made working with the results pretty clumsy. - Adds a quick-fix around querying system tables by wrapping it in a local hint. This was useful for debugging. - Reinits the datafusion context on client attach. This was done to create a catalog mutator that didn't have a metastore client. Without this change, the catalog would inadvertently revert back to the local session's catalog since the session will attempt to update the catalog from the local cache with the client. Bugs: - The above example, the column name is incorrect. I'm not sure why yet, but plan on looking into it in a follow up. - The memory store not working is very inconvenient for testing. The above example was done by specifying a data dir. * properly dispatch external tables, refresh on error

Commit:f936ad2
Author:Cory Grinstead
Committer:GitHub

feat: DDL execs (#1613) * wip: create credentials exec * create credentials exec * create credentials exec * create credentials exec wip * code cleanup * add todo * code cleanup * remove unused code * remove unused code * chore: alter tunnel rotate keys physical plan (#1620) * chore: drop database physical plan (#1621) * chore: add physical plans for drop schemas, tunnel and views (#1622) * chore: Create schema physical plan (#1612) * Add types for schema * encode/decode * fix build * fmt fix --------- Co-authored-by: Adhish Singla <adhisingla94@gmail.com> * feat: More ddl physical plans (#1623) * cleanup * drop creds * references * view * external database * external table * tunnel * delete code * clippy + downgrade to df 28 * feat: Create table ddl (#1624) * create-table-ddl * wip: create table ddl * finish up create table --------- Co-authored-by: universalmind303 <cory.grinstead@gmail.com> * feat: Update (#1626) * feat: Insert (#1630) Fixes: #1601 Signed-off-by: Vaibhav <vrongmeal@gmail.com> * chore: add physical plan for create temp table (#1628) * chore: physical plan for create temp table * throw on duplicate temp table * add protobuf deifinitions of left over DDLs * fix temp table loading issue * fix build * chore: add physical plan for delete DML (#1631) * feat: SET/SHOW Variables (#1632) Signed-off-by: Vaibhav <vrongmeal@gmail.com> * chore: Copy To (#1633) * fix: optimize lp before creating physical plan (#1643) * feat: Plan schemas (#1646) * insert logical schema * start adding schemas to all operations * is temp * Cleaner integration of temp tables * local slts passing * fix test compilation * lint * lint * fmt * whoops * fix full temp table qualification, add missing rowsorts * fix python execute not pulling from stream * stream inspection * lint * remove extra print --------- Signed-off-by: Vaibhav <vrongmeal@gmail.com> Co-authored-by: Adhish Singla <adhisingla94@gmail.com> Co-authored-by: Sean Smith <scsmithr@gmail.com> Co-authored-by: Vaibhav Rabber <vrongmeal@gmail.com>

Commit:26d4c23
Author:Sean Smith
Committer:GitHub

chore: Remove init catalog endpoint for metastore (#1604) Removes needing to "init" a catalog before fetching or mutating it. Since we create catalogs lazily, the only thing "init" was doing was loading it into memory. This change makes it so that that other two endpoints just do that automatically if the catalog isn't found. These errors should no longer happen: ``` Missing database catalog: cbdfc895-3d01-4771-92b6-9ab1c43a3bb9 ``` (because I deleted it)

Commit:08c8079
Author:Sean Smith
Committer:GitHub

chore: Split session contexts (#1589) * remove (eventually) unused types for rpcsrv * Move context * more split * split dispatch * things * more things * wip * wipc * encode * lint * rename

Commit:de357f8
Author:Vaibhav Rabber
Committer:GitHub

feat: Support RPC server testing (#1575)

Commit:d17eb5c
Author:Sean Smith
Committer:GitHub

feat: Physical execs for client batch exchange (#1559) * wip * stuff * will definitely work first try * Add to logical plan * streamin * more stuff * stream stream stream * schema * physical_plan module * remove logical plan for exchange stuff * location aware table provider * lint * fix

Commit:c8cd4c0
Author:Vaibhav Rabber
Committer:GitHub

fix: Use named arguments for file compression (#1560)

Commit:c1e326d
Author:Vaibhav Rabber
Committer:GitHub

feat: Support remote table scans and inserts (#1553) ``` > select * from abc; ┌───────┐ │ a │ │ ── │ │ Int32 │ ╞═══════╡ │ 1 │ │ 2 │ │ 3 │ └───────┘ > insert into abc values (4), (5); Write success > select * from abc; ┌───────┐ │ a │ │ ── │ │ Int32 │ ╞═══════╡ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ 5 │ └───────┘ ``` **Logs:** ``` initializing remote session session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 dispatching table access session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 table_ref=abc creating physical plan session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 executing physical plan session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 exec_id=e72dc636-faf9-4953-b3bb-0bd786a626e5 dispatching table access session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 table_ref=abc creating physical plan session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 insert into table provider session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 provider_id=21f8f6c2-51af-44bf-b3e3-57367363ca0c executing physical plan session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 exec_id=9c2a7f9f-7449-408e-b667-357e10975801 dispatching table access session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 table_ref=abc creating physical plan session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 executing physical plan session_id=accc8ed7-d0e1-463a-834d-6f5ae2199668 exec_id=ea5525e1-38a4-45db-b90a-da9b499e5216 ``` Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:154a677
Author:Cory Grinstead
Committer:GitHub

remote exec for DDL's (#1539)

Commit:758b0d7
Author:Vaibhav Rabber
Committer:GitHub

feat: Support remote execution for queries with tables (#1534) Basically everything happens on remote (except creating the logical plan). Table providers and Physical plans are created on remote and executed on remote server. All the calls are RPC. Everything is stored in session context against a unique ID which the client uses to call on the server. **Internal changes:** * Renamed the server's `--ignore-auth` to `--ignore-pg-auth`. * Use `--disable-rpc-auth` to start server for calling rpc directly without proxy. * Use `--ignore-rpc-auth` with local to call the rpc server directly when `--disable-rpc-auth` is set. Fixes: #1502 Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:2f5da0c
Author:Sean Smith
Committer:GitHub

feat: Authenticated rpc client (#1518) * feat: Authenticated rpc client * seems close * things * fmt * remove print, revert request renumber * Update data-dir help

Commit:9d36653
Author:Adhish Singla
Committer:GitHub

feat: close remote sessions on exit (#1524)

Commit:1abc988
Author:Sean Smith
Committer:GitHub

feat: rpcsrv (#1499) * feat: rpcsrv * add remote exec * things * response stream * things * env * lint

Commit:18353e3
Author:Sean Smith
Committer:GitHub

chore: Add protogen crate (#1498) - Will centralize protobufs and some other shared types. - Removes `metastore_client` crate Prep work for rpc execution.

Commit:f65db57
Author:Vaibhav Rabber
Committer:GitHub

feat: Support globbed location for external tables as well (#1436) ```sql CREATE EXTERNAL TABLE abc FROM gcs CREDENTIALS gcp_creds ( location 'some/path/**/to/file.csv', -- Optional, if not provided, will be infered from location file_type 'csv' ); ``` Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:86f69d1
Author:Vaibhav Rabber
Committer:GitHub

test: Add SLTs for schemas (#1384) And some minor cleanup. Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:284c339
Author:0x333
Committer:GitHub

feat: Handle duplicate schema names (#1363) * feat: Handle duplicate schema names (#1320) Description: - Handle 'IF NOT EXISTS' clause in schema creation. - Add logging for schema creation and mutation. - Include 'tracing' for logging. This commit addresses issue #1320 * Fix: Add PR review-suggested fixes (#1363) This commit fixes the following from PR #1363: - Removing the added log statements. - Removing the if_not_exists check from client side. - Setting if_not_exists to false in tests, except for: - drop_missing_schema_if_exists() - drop_database_if_exists() ===> Since if_not_exist has been implemented. PR review details link: https://github.com/GlareDB/glaredb/pull/1363#pullrequestreview-1544790419 --------- Co-authored-by: Vaibhav Rabber <vrongmeal@gmail.com>

Commit:870791c
Author:Vaibhav Rabber
Committer:GitHub

feat: Add background job to track storage usage (#1398) Runs a background job on `INSERT`s (with a batch delay of 5 minutes) for the deployment. ``` 〉select * from glare_catalog.deployment_metadata; +--------------+-------+ | key | value | +--------------+-------+ | storage_size | 3852 | +--------------+-------+ ``` Fixes #1155 Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:841dd51
Author:Adhish Singla
Committer:GitHub

chore: move "client" stuff to metastoreproto (#1311) * chore: move builtins.rs to sqlbuiltins fixes #1128 * fix import for builtins * Update crates/sqlexec/src/context.rs Co-authored-by: Sean Smith <scsmithr@gmail.com> * chore: move "client" stuff to metastoreproto Fixes #1128 * rename metastoreproto to metastore_client --------- Co-authored-by: Adhish Singla <adhish@Adhishs-MacBook-Pro.local> Co-authored-by: Sean Smith <scsmithr@gmail.com>

Commit:b1ace82
Author:Vaibhav Rabber
Committer:GitHub

feat: Add support for credentials objects (#1140)

Commit:c1362ac
Author:Sean Smith
Committer:GitHub

feat: Add native (internal) table support using delta-rs (#1129) * feat: Add native (internal) table support using delta-rs We're a data lake now * session stuff * wire it up * things * Wire up to cli args * Use correct path * asyncify it * Some insert stuff * Proxy gcs bucket

Commit:5a2d5b1
Author:Sean Smith
Committer:GitHub

feat: Delta lake data source (initial implementation) (#1119) * delta stubs * Accessor (s3) * Downgrade datafusion + use storage opts when opening table * fmt

Commit:901586e
Author:Sean Smith
Committer:GitHub

feat: Add functions for reading data sources (#1086) * Add sqlbuiltins crate * Resolve funcs * Add builtin table * Add test * Add other read funcs * fixup! Add other read funcs * fixup! Add other read funcs * Add test asserting separate namespaces

Commit:335aede
Author:Sean Smith
Committer:GitHub

chore: Move metastore related protos and types to separate crates (#1083) Doing this to resolve a dependency cycle for the table funcs that are being worked on. The idea is that crates that depend on types from metastore shouldn't depend on the service impl too. This was already true in our case, so the refactor was pretty straightforward. `datasources` still depends on `metastore`, but I'm going to open up a followup pr that adds a `metastorebuiltins` crate which will contain the stuff that `datasources` currently depends on.

Commit:b77badc
Author:Sean Smith
Committer:GitHub

feat: Make gcs and s3 credentials optional (#1062) ``` ~/Code/github.com/glaredb/glaredb [2] % cargo run --bin glaredb -- local Finished dev [unoptimized + debuginfo] target(s) in 0.58s Running `target/debug/glaredb local` GlareDB (v0.0.11) Using in-memory catalog > create external table v from gcs options (location = 'tyrell/voight_kampff.csv', bucket='glaredb-demos'); create table > select * from v limit 10; +--------------------------------------+---------------------+--------------------+-----------------+------------------+------------+ | Test_ID | Examiner_Name | Subject_Name | Questions_Asked | Final_Assessment | Test_Date | +--------------------------------------+---------------------+--------------------+-----------------+------------------+------------+ | 273886b2-c778-4757-8779-68cf4de40154 | Angelica Wilson | Eric Jarvis | 68 | Replicant | 2020-01-01 | | b9831559-803c-40b7-bf12-b21417ed6bcd | Robert Hall | Margaret Smith | 71 | Human | 2020-01-02 | | 38de6ce6-4012-4e0f-99b1-a2f4757fe8b9 | Victoria Sims | Mrs. Sheila Hall | 83 | Human | 2020-01-03 | | 2c36ad83-7049-4cfc-ba51-af96f1c2eebe | Shannon Horton | Andrew Wiggins | 82 | Human | 2020-01-04 | | 1b8827c1-cd9b-4fa6-8fc0-e7b402acc184 | Sheila Torres | Mark Tate | 84 | Human | 2020-01-05 | | 6cd8ee64-5ff2-475b-b276-c2bda44f6330 | Nicole Taylor | Justin Fleming | 47 | Replicant | 2020-01-06 | | 5b04797c-e861-41eb-a76d-12e7c1d3e6c0 | Patrick Harrison MD | Laura Hicks | 22 | Human | 2020-01-07 | | e477d5e3-f886-4d5c-8ba4-b403513450fc | Edward Salinas | Brittany Hendricks | 101 | Replicant | 2020-01-08 | | 0be01ad5-9359-48e6-85a7-38ca0abf9276 | Michael Nelson | Betty Lawson | 40 | Replicant | 2020-01-09 | | c4c59f2d-c94a-4f3b-bc0e-9a727f889bec | Susan Chandler | Debbie Weeks | 45 | Replicant | 2020-01-10 | +--------------------------------------+---------------------+--------------------+-----------------+------------------+------------+ ```

Commit:8029c98
Author:Vaibhav Rabber
Committer:GitHub

feat: `ALTER TUNNEL .. ROTATE KEYS` support. (#1017)

Commit:ed3f3c6
Author:Vaibhav Rabber
Committer:GitHub

feat: Implement tunnels (with SSH) (#975) Fixes: #789 Fixes: #965 Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Commit:0784f1c
Author:Sean Smith
Committer:GitHub

feat: Allow column aliases in views (#928) Allows view definitions like ``` create view my_view(a, b) as select 1, 2; ``` Also checks that what's being provided as a view body produces a valid plan.

Commit:b95e236
Author:Sean Smith
Committer:GitHub

feat: Support 'OR REPLACE' when creating a view (#926) * feat: Support 'OR REPLACE' when creating a view - [ ] Metastore logic - [ ] SLT tests * fixup! feat: Support 'OR REPLACE' when creating a view

Commit:02d2cd0
Author:Sean Smith
Committer:GitHub

feat: Support DROP SCHEMA <schema> CASCADE (#916) * feat: Support DROP SCHEMA <schema> CASCADE Also fixes some existing SLTs that weren't testing what they should. * fix * lint

Commit:903b8c2
Author:Sean Smith
Committer:GitHub

chore: Remove storing columns directly on external tables (#891) Closes #857 Instead these should/will be stored on the table options. Currently this just has columns for the internal tables, I didn't feel like we needed to keep the columns for the other data sources since we have the virtual catalog now. In the future, if we think storing column info is worth it, we can lazily load and store the column info using the virtual catalog.