Proto commits in pixelsdb/pixels

These 94 commits are when the Protocol Buffers files have changed:

Commit:df09a64
Author:huasiy
Committer:huasiy

only one worker will pass schema to down stream

Commit:0947cd2
Author:huasiy
Committer:huasiy

one worker write to one port of down stream worker

Commit:eeb09a8
Author:Dongyang Geng
Committer:GitHub

[Issue #862] update retina interfaces (#863)

Commit:48c7fa3
Author:Dongyang Geng
Committer:GitHub

[Issue #860] fix visibility merge error (#861)

Commit:561a2e8
Author:Dongyang Geng
Committer:Haoqiong Bian

[Issue #741] add visibility structure for real-time CRUD (#847) Co-authored-by: slzo <85170982+slzo@users.noreply.github.com> Co-authored-by: Haoqiong Bian <bianhaoqiong@gmail.com>

Commit:6a717c9
Author:Dongyang Geng
Committer:Haoqiong Bian

[Issue #741] add visibility structure (#822) pixels-daemon: create retina rpc server. pixels-common: acts as a client and sends rpc requests to the retina server. pixels-retina: build and manage various data structures.

Commit:d8684a4
Author:Haoqiong Bian
Committer:GitHub

[Issue #838] support lookup unique and non-unique indexes. (#850) Secondary indexes can be unique or non-unique.

The documentation is generated from this commit.

Commit:0bf8ec8
Author:Haoqiong Bian
Committer:GitHub

[Issue #838] implement the framework for secondary indexes (#844)

Commit:f6abe87
Author:Haoqiong Bian
Committer:GitHub

[Issue #838] implement metadata service for secondary index (#839) Also add min/max row ids into file metadata objects and fix some minor problems in metadata.

Commit:9d27783
Author:Haoqiong Bian
Committer:GitHub

[Issue #658] revise range index definition (#837) Remove the index structure from the metadata table RANGE_INDEXES.

Commit:f6396dc
Author:slzo
Committer:GitHub

[Issue #745] add ML-based auto-scaling policy (#818)

Commit:dde7312
Author:Haoqiong Bian
Committer:GitHub

[Issue #787] revise hidden column implementation and fix cf reading error (#788) 1. move row group level hidden column statistic into RowGroupStatistics. 2. revise the variable names related to the hidden column and transaction timestamp. 3. fix a buy at line 445 of PixelsRecordReaderImpl, using (this.timestamp != -1) to check whether hidden column should be read is incorrect.

Commit:1401421
Author:Haoqiong Bian
Committer:GitHub

[Issue #82] clean old retina implementation (#766) We will use a new design for real-time updates.

Commit:3f01bae
Author:Dongyang Geng
Committer:GitHub

[Issue #737] implement hidden timestamp column (#754) Adding a hidden column into the pixels file and filtering out invisible tuples based on timestamp ordering. This is compatible with the previous file format without such a hidden column.

Commit:0c7538c
Author:Haoqiong Bian
Committer:GitHub

[pixels-retine] clean proto files (#765)

Commit:0915e2c
Author:Jiaxuan Zhu
Committer:GitHub

[Issue #645] add scaling server and auto-scaling policies (#747) Add a grpc service to collect the scaling metrics and a background thread to trigger auto-scaling according to scaling policies.

Commit:ea992ee
Author:Haoqiong Bian
Committer:GitHub

[Issue #708] remove the file id generated by etcd (#709) The file ID is now generated in the metadata. The metadata schema should be updated using the SQL scripts provided in PR #709.

Commit:dd68717
Author:Haoqiong Bian
Committer:GitHub

[Issue #700] getting layout while getting table (#702) This is needed by `show create table` in the clients.

Commit:2dc06ac
Author:Haoqiong Bian
Committer:GitHub

[Issue #658] get file paths from metadata instead of storage (#686) So that we can control the access of every file, this is necessary for range index and real-time update. We also adapt the implementation of projections accordingly.

Commit:f302a81
Author:Haoqiong Bian
Committer:GitHub

[Issue #658] add import command and automatically add files in load and compact commands (#683) The import command imports files into a table. The files must already exist in the table's data path(s). Currently, queries do not use the file information in metadata. This can be implemented after adding a separate metadata table for the projections in the layout.

Commit:ae148f3
Author:Haoqiong Bian
Committer:GitHub

[Issue #658] split heartbeat from cache coordinator/worker into heartbeat coordinator/worker (#675) This is necessary for implementing the index coordinator/worker. We also rename pixels data node to pixels worker.

Commit:db96a58
Author:Haoqiong Bian
Committer:GitHub

[Issue #666] revise naming and comments for streaming format (#671)

Commit:30b5a99
Author:Yijun Ma
Committer:GitHub

[Issue #666] http serialization functionality (#664) Adds functionality to serialize & deserialize data via HTTP connections between workers, so that workers do not have to materialize intermediate results in an external storage.

Commit:af419f4
Author:Haoqiong Bian
Committer:GitHub

[Issue #658] add files related metadata RPCs (#668) By managing file information (e.g., file name and locality), we can index files by their ids in range index and cache. The file id also can be used to implement a very simple distributed file system.

Commit:449f2be
Author:Haoqiong Bian
Committer:GitHub

[Issue #82] merge the initial implementation of pixels-retina (#667) In this initial implementation, we tried to implement the row-wise writer in C++ (cpp/pixels-retina) and some related rpc protocols. We recently redesigned retina and will make major changes based on the initial implementation. Since the initial implementation is not intrusive to other features and modules of pixels, we will merge the initial version into the master branch before reimplementing it. Thanks xxchan and zhipeng for their contributions in the initial implementation. --------- Co-authored-by: Zhipeng Mao <34649843+mzp0514@users.noreply.github.com> Co-authored-by: xxchan <37948597+xxchan@users.noreply.github.com>

Commit:92c42b8
Author:Haoqiong Bian
Committer:GitHub

[Issue #658] implement metadata RPCs for range and range index (#661) These RPCs will be used to implement the range partitioned index.

Commit:770c25a
Author:Haoqiong Bian
Committer:GitHub

[Issue #649] clean query results periodically and fix bugs in query manager (#652) 1. Clean the result of finished queries periodically instead of immediately after the first getQueryResult request. 2. Fix 'Query is not finished' for the repeated getQueryResult requests. 3. Fix dead-lock in popQueryResult of QueryManager. 4. Clean code.

Commit:5c771b1
Author:Dongyang Geng
Committer:GitHub

[Issue pixelsdb#649] ensure vm cost can be get from the query result (#650)

Commit:ddfcaaa
Author:Haoqiong Bian
Committer:GitHub

[Issue #506] support cost cents collection (#642) Revise the operators in pixels-planer to support getting output from the return value of the execute() method. Add methods for the VM workers to check the status of the root operator executed in CF workers. Add methods for the VM cluster to clean the states and intermediate results. Fix a bug in PartitionedJoinOperator.execute(). Add protocol for the VM cluster to report scan bytes and cost cents more efficiently.

Commit:44e119a
Author:Haoqiong Bian
Committer:GitHub

[Issue #506] support billed cents in pixels-server (#639) Allow pixels-trino to set scan size in the transaction properties and use it to calculate the billed cents.

Commit:93daa41
Author:Haoqiong Bian
Committer:GitHub

[Issue #506] get query costs from transaction service (#638) Add rpc protocol to set query costs as transaction properties. Get the spent and billed cents from the transaction service. Return the query costs and the query pending time in the get query result response.

Commit:41cac6c
Author:Tiannan Sha
Committer:GitHub

[Issue #580] add vector column and its writer reader and related metadata handling (#600) Implements a vector column type, with proper support in pixels writer, pixels reader, and type description.

Commit:6862237
Author:Haoqiong Bian
Committer:GitHub

[Issue #491] implement plan coordinator initialization (#574) Users can initialize the plan coordinator of a query plan by invoking initPlanCoordinator on the root operator of the plan. TaskId is changed from a string to an integer starting from 0 per stage.

Commit:e610345
Author:Haoqiong Bian
Committer:GitHub

[Issue #491] add worker coordinate service (#571) 1. implement a task queue for task scheduling of a stage in the query execution pipeline; 2. implement a worker registration and management framework to track the status of the workers at each stage.

Commit:f1c5760
Author:Haoqiong Bian
Committer:GitHub

[Issue #548] support alignment for isNull bitmap (#569) Add the following property in pixels.properties to control the alignment of the isNull bitmap: # the alignment of the start offset of the isnull bitmap in a column chunk, # it is for the compatibility of DuckDB and its unit is byte. # for DuckDB, it is only effective when column.chunk.alignment also meets the alignment of the isNull bitmap isnull.bitmap.alignment=8

Commit:0133f1b
Author:Haoqiong Bian
Committer:GitHub

[Issue #545] support nulls padding in the file format (#566) Add a writer option for column writers. Handle nulls padding in the column writers and readers. Add a -p parameter to the LOAD command of pixels-cli to specify whether nulls padding is enabled. Optimize the de-compaction of the isNull bitmap in the column readers, single-table query performance on TPCH.orders is improved by 15%-20%. Fix a bug related to null records reading in the column writers of Date, Time, and Timestamp. Fix a bug related to float column writer by adding the FloatColumnWriter.

Commit:bbed2d5
Author:Haoqiong Bian
Committer:GitHub

[Issue #538] support encoding levels and dictionary encoding without cascading RLE (#561) 1. Support encoding levels in the column writers. The encoding level can be passed from pixels-cli's LOAD command using the -e parameter. 2. Support dictionary encoding without cascading run-length encoding for encoding level 1. 3. Remove adaptive encoding in StringColumnWriter. It was not implemented correctly. 4. Add the dictContent length as the last element into dictStarts. @yuly16 please update the C++ reader accordingly.

Commit:caa686c
Author:Lin Yuan
Committer:GitHub

[Issues #436, #438, #439, #441] an adaptive solution to save cloud query cost (#559)

Commit:38859cc
Author:Haoqiong Bian
Committer:GitHub

[Issue #531] support configurable endianness (#534) The endianness of Pixels writer is configured by column.chunk.little.endian=true/false. The endianness is then saved in the ColumnChunkIndex of each column chunk. The Pixels column readers will check the ColumnChunkIndex to use the right endianness. This is currently implemented in the Java codebase, and the C++ codebase should at least check and report errors for unsupported endianness.

Commit:4c53d6b
Author:Haoqiong Bian
Committer:GitHub

[Issue #522] add layout configurations into the file format (#526)

Commit:5fb3906
Author:Haoqiong Bian
Committer:GitHub

[Issue #524] using full path as the key in file footer cache (#525)

Commit:efaaaf4
Author:Haoqiong Bian
Committer:GitHub

[Issue #491] fix query service and revise comments. (#509) Avoid unknown query status when there are concurrent updates on the queues and query results.

Commit:a7d1999
Author:Haoqiong Bian
Committer:GitHub

[Issue #490] support relaxed and best-effort query execution. (#503) Also: 1. revise metadata_schema.sql and docs; 2. fix bugs related to auto-scaling in QueryScheduleService; 3. support query status check in a batch manner.

Commit:635c7e5
Author:Haoqiong Bian
Committer:GitHub

[Issue #437] redesign the schema of metadata. (#474) Support the new features in pixels metadata: 1. user management and role-based authentication; 2. range-based table partitioning; 3. schema versioning; 4. path-granular catalog management. Currently, only the RPCs for catalog management are fully implemented. In addition, the metadata of tables and layouts are optimized. I also fixed a bug in layout permission conversion and a bug in layout dao's update method.

Commit:15af8cb
Author:Haoqiong Bian
Committer:GitHub

[Issue #431] finish pixels query server. (#464) Currently, cost estimation and cost collection are not implemented; hence we can not get the monetary cost of queries.

Commit:30141db
Author:zhaoshihan
Committer:GitHub

[Issue #423] pixels vhive invoker and worker. (#463) Extend pixels-turbo to vHive functions.

Commit:d6c4182
Author:Haoqiong Bian
Committer:GitHub

[Issue #459] update row count automatically. (#462) Automatically update row count of each table in the STAT command of pixels-cli.

Commit:2100ada
Author:Haoqiong Bian
Committer:GitHub

[Issue #431] refine transaction protocol. (#458)

Commit:1800c81
Author:Lin Yuan
Committer:GitHub

[Issue #441] SQLglot transpile integration (#455) * Added Sqlglot executor for dialect transpile * Removed gRPC hello example * Added exception handler for SQLglot transpile

Commit:23af9bb
Author:Haoqiong Bian
Committer:GitHub

[Issue #441] add grpc example into pixels-server. (#442)

Commit:771067e
Author:Haoqiong Bian
Committer:GitHub

[Issue #371]: update metadata for view creation in Trino. (#372) LocalFS is also updated to check invalid paths.

Commit:15c1cb5
Author:Haoqiong Bian
Committer:GitHub

[Issue #170]: implement min/min in column stats in metadata. (#308)

Commit:f779fcc
Author:Haoqiong Bian
Committer:GitHub

[Issue #203]: support long decimal with 38 max digit precision and scale. (#296) Long decimal is currently not supported for column filtering. Fix several bugs in column vector and type description.

Commit:471dbdd
Author:Haoqiong Bian
Committer:GitHub

[Issue #258]: implement table and column statistics. (#287) 1. add optimizer module. 2. implement statistics.

Commit:5c56bf3
Author:Haoqiong Bian
Committer:GitHub

[Issue #170]: implement hash partitioned join. (#263) 1. support using row group as a partition in pixels file. 2. implement hash partitioning and hash partitioned join based on lambda.

Commit:fd33ddd
Author:Haoqiong Bian
Committer:GitHub

[Issue #170]: optimize Pixels S3 writer and lambda. (#253) 1. clean and optimize the code in pixels-lambda; 2. reduce the buffer size in S3InputStream and S3OutputStream; 3. enable overwrite in PhysicalS3Writer.

Commit:9d1bbf5
Author:Haoqiong Bian
Committer:GitHub

[Issue #196]: refine type management and support decimal. (#197) Mainly fixed the type management/parsing, and add decimal support.

Commit:9181501
Author:Haoqiong Bian
Committer:GitHub

[Issue #194]: support views. (#195) Support creating, using, dropping, and checking the schema_information of views.

Commit:57798c2
Author:Haoqiong Bian
Committer:GitHub

[Issue #179]: fix show tables in information_schema. (#180) Tables in information_schema are system tables. However, Presto still calls the getTables method of the customer ConnectorMetadata class.

Commit:674a1ee
Author:Haoqiong Bian
Committer:GitHub

[Issue #174]: implement transaction server and pass query (trans) id into record reader. (#176) 1. Implement the transaction server and service using gRPC. The pushing operations of the watermarks are also implemented. 2. Push down the transaction (query) id into the record reader of Pixels.

Commit:1fb1e4f
Author:Haoqiong Bian
Committer:GitHub

[Issue #131]: implement projections for compact layout. (#134) Projection for compact layout is similar to the projections in C-Store, it contains a subset of the columns that are frequently accessed by a set of queries.

Commit:8f3f99e
Author:Haoqiong Bian
Committer:GitHub

[Issue #108]: replace hdfs FileSystem api with the unified Storage api. (#110) With the unified Storage interface, we can easily add different types of storage systems (e.g. S3, local fs, Ozone...) into Pixels. In this PR, we have already implemented the HDFS storage, which internally use APIs in hadoop libraries to access HDFS. The local file storage is still to be implemented. For this PR, as we add a new column into the TBLS table in the metadata database, some components like pixels-presto may be not compatible with a component from the previous version.

Commit:93cd994
Author:Haoqiong Bian
Committer:GitHub

[Issue #94]: Support Date and Time types. (#95) We added Date and Time data types in pixels-core, and tested them from pixels-presto and pixels-load using orders table in TPCH. For both types, they are represented as 4-byte integer and runlength+delta encoded. For Date, the 4-byte integer is the number of days from epoch (1970-1-1 0:0:0 UTC). For Time, the 4-byte integer is the number of millis in the day. ##Changes: * implement column reader/writer/vector for date and time. * implement stats and Presto support for Date and Time types. * test and fix Date and Time types support. * test and fix Date/Time type integration in presto.

Commit:72a80ab
Author:Haoqiong Bian

[Issue #36]: fix copyright and licence notice.

Commit:bc2cca5
Author:Haoqiong Bian

[Issue #36]: add copyright and license notice.

Commit:1ace268
Author:Haoqiong Bian
Committer:GitHub

Revert "[Issue #36]: add copyright and license notice."

Commit:30ae253
Author:Haoqiong Bian

[Issue #36]: add copyright and license notice.

Commit:0101906
Author:guodong

rename packages to io.pixelsdb

Commit:09222f9
Author:Haoqiong Bian

add metadata rpc methods

Commit:b261364
Author:Haoqiong Bian

update metadata service

Commit:5fea29c
Author:Haoqiong Bian

make metadata and daemon more robust

Commit:13f2318
Author:Haoqiong Bian

finish MetadataService and MetadataServiceImpl

Commit:0021f67
Author:Haoqiong Bian

update grpc impl

Commit:9de2755
Author:Haoqiong Bian

fix metadata service getlayout; add grpc protocol definition

Commit:608b076
Author:Haoqiong Bian

change maven plugin for protobuf, prepare for grpc

Commit:8264988
Author:Guodong Jin

add close for writer

Commit:6c8dcda
Author:Guodong Jin

add support for count(*) in reader

Commit:0cf190f
Author:Haoqiong Bian

the word statistic in comments of proto and core was changed to metric, revert it.

Commit:ef75442
Author:Haoqiong Bian

add metrics server and performance metrics definition.

Commit:9d51f51
Author:Guodong Jin

refactor writer and compactor

Commit:26ac330
Author:Guodong Jin

add integer column reader

Commit:9ae5d9b
Author:Guodong Jin
Committer:Haoqiong Bian

Basic pixel writer

Commit:a4e4d01
Author:Guodong Jin

add pixel positions and chunk stat

Commit:84b173f
Author:Haoqiong Bian
Committer:Guodong Jin

Master

Commit:d1113cd
Author:bianhq

update proto

Commit:5e99d1a
Author:Guodong Jin

add logic in main writer

Commit:2f00062
Author:Guodong Jin

add stat recorder for each type

Commit:7a61d33
Author:Guodong Jin

temp sync during writer

Commit:77bd760
Author:bianhq

init repo

Commit:6153684
Author:Guodong Jin

change proto java package and remove useless indexLength in RowGroupInformation

Commit:07e632b
Author:bianhq

update proto

Commit:1cede73
Author:Guodong Jin

change project name to pixel

Commit:4b6cc3c
Author:bianhq

init repo

Commit:2d10367
Author:Guodong Jin

add empty proto file