Proto commits in facebookincubator/velox

These 11 commits are when the Protocol Buffers files have changed:

2024-02-28

Commit:	09f56d5
Author:	Masha Basmanova	2024-02-27 18:34:31 -0800
Committer:	Facebook GitHub Bot	2024-02-27 18:34:31 -0800

Delete dead codegen code Reviewed By: xiaoxmeng Differential Revision: D54269600 fbshipit-source-id: adc6120e85c63678449572ac93f71372de8a13d1

The documentation is generated from this commit.

2023-02-08

Commit:	f54056d
Author:	ZJie1	2023-02-08 12:14:24 -0800
Committer:	Facebook GitHub Bot	2023-02-08 12:14:24 -0800

Update substrait to 0.23.0 (#3959) Summary: Substrait supports physical join and expr evaluation in v0.23.0 Pull Request resolved: https://github.com/facebookincubator/velox/pull/3959 Reviewed By: Yuhta Differential Revision: D43123510 Pulled By: pedroerp fbshipit-source-id: f2c53a7e42a43445eed9c6c8fffa7e53b5106d94

2023-01-13

Commit:	58bb1d7
Author:	Xuedong Luan	2023-01-13 08:04:18 -0800
Committer:	Facebook GitHub Bot	2023-01-13 08:04:18 -0800

Update substrait to 0.22.0 (#3697) Summary: Update substrait to newest release version 0.22.0 Pull Request resolved: https://github.com/facebookincubator/velox/pull/3697 Reviewed By: Yuhta Differential Revision: D42497470 Pulled By: mbasmanova fbshipit-source-id: 4b78f03f1f4c806981a4e4ac36a9008e36b58afe

2022-07-25

Commit:	4988cec
Author:	PenghuiJiao	2022-07-25 14:07:45 -0700
Committer:	Facebook GitHub Bot	2022-07-25 14:07:45 -0700

Update substrait to 0.7.0 (#2002) Summary: Also, rename tests/data/sub.json to q6_first_stage.json. Pull Request resolved: https://github.com/facebookincubator/velox/pull/2002 Reviewed By: pedroerp Differential Revision: D37988016 Pulled By: mbasmanova fbshipit-source-id: 163e17ba9fbd19f89ba431c8e95797c1486f0dc8

2022-06-23

Commit:	82d7763
Author:	usurai	2022-06-23 13:31:43 -0700
Committer:	Facebook GitHub Bot	2022-06-23 13:31:43 -0700

Add limited ORC reader support (#1746) Summary: This PR is an attempt to adding ORC reader support based on `DwrfReader`. For more context, please refer to https://github.com/facebookincubator/velox/issues/1747. This PR introduces the following changes: - Import `.proto` file of ORC from [apache/orc](https://github.com/apache/orc) - Distinguish ORC and DWRF in `FileFormat` and make `DwrfReader` aware of the file format it's processing - Adding the new `PostScript` class that is shared by of DWRF and ORC* - Don't try to read cache when parsing ORC's `PostScript` - For ORC's int DIRECT decoding, use RLE insterad of `DirectDecoder` Currently I use ad hoc method to test the change, that is, try reading and printing the content of [apache/orc's example files](https://github.com/apache/orc/tree/main/examples) (excluding corrupted and encrypted) using modified `DwrfReader` and comparing the output with the output of [apache/orc's orc-contents](https://orc.apache.org/docs/cpp-tools.html#orc-contents), with above changes, I've got some of tests emitting correct output(24 out of 39). For the incorrect / crashed test cases, I think they are due to missing type support (DECIMAL, DATE etc) or feature support (bloom filter). <details> <summary>Detailed test cases</summary> <google-sheets-html-origin><meta name="sheets-banding" data-sheets-range="A1:Z1000" data-sheets-banding-properties="{"1":{"1":{"4":{"1":2,"2":12434877}},"2":{"4":{"1":2,"2":16777215}},"3":1,"4":{"4":{"1":2,"2":15987699}},"5":1}}"> Test | Correct | Issue | Compression | Type -- | -- | -- | -- | -- TestOrcFile.testSeek | Y | | zlib | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>> TestOrcFile.test1 | Y | | zlib | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>> TestOrcFile.testStringAndBinaryStatistics | Y | | zlib | struct<bytes1:binary,string1:string> nulls-at-end-snappy | Y | | snappy | struct<_col0:tinyint,_col1:smallint,_col2:int,_col3:bigint,_col4:float,_col5:double,_col6:boolean> TestStringDictionary.testRowIndex | Y | | zlib | struct<str:string> TestOrcFile.testMemoryManagementV11 | Y | | none | struct<int1:int,string1:string> TestOrcFile.columnProjection | Y | | none | struct<int1:int,string1:string> TestOrcFile.emptyFile | Y | | none | none TestOrcFile.metaData | Y | | none | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>> demo-12-zlib | Y | | zlib | struct<_col0:int,_col1:string,_col2:string,_col3:string,_col4:int,_col5:string,_col6:int,_col7:int,_col8:int> TestOrcFile.testWithoutIndex | Y | | snappy | struct<int1:int,string1:string> zero | Y | | none | none TestOrcFile.testPredicatePushdown | Y | | none | struct<int1:int,string1:string> complextypes_iceberg | Y | | zlib | struct<id:bigint,int_array:array<int>,int_array_array:array<array<int>>,int_map:map<string,int>,int_map_array:array<map<string,int>>,nested_struct:struct<a:int,b:array<int>,c:struct<d:array<array<struct<e:int,f:string>>>>,g:map<string,struct<h:struct<i:array<double>>>>>> TestOrcFile.testStripeLevelStats | Y | | zlib | struct<int1:int,string1:string> TestOrcFile.testMemoryManagementV12 | Y | | none | struct<int1:int,string1:string> TestVectorOrcFile.testLzo | Y | | lzo | struct<x:bigint,y:int,z:bigint> TestVectorOrcFile.testZstd.0.12 | Y | | zstd | struct<x:bigint,y:int,z:bigint> demo-11-zlib | Y | | zlib | struct<_col0:int,_col1:string,_col2:string,_col3:string,_col4:int,_col5:string,_col6:int,_col7:int,_col8:int> TestVectorOrcFile.testLz4 | Y | | lz4 | struct<x:bigint,y:int,z:bigint> TestOrcFile.testSnappy | Y | | snappy | struct<int1:int,string1:string> version1999 | Y | | none | struct<> orc_no_format | Y | | zlib | struct<_col0:array<string>,_col1:map<int,string>,_col2:struct<name:string,score:int>> demo-11-none | Y | | none | struct<_col0:int,_col1:string,_col2:string,_col3:string,_col4:int,_col5:string,_col6:int,_col7:int,_col8:int> TestOrcFile.testTimestamp | N | TIMESTAMP | zlib | timestamp orc_split_elim_new | N | DECIMAL | zlib | struct<userid:bigint,string1:string,subtype:double,decimal1:decimal(16,6),ts:timestamp> decimal64_v2 | N | DECIMAL | zlib | struct<a:bigint,b:decimal(12,0),c:decimal(20,2),d:decimal(12,2),e:decimal(2,2)> orc-file-11-format | N | DECIMAL | none | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>,ts:timestamp,decimal1:decimal(0,0)> TestOrcFile.testUnionAndTimestamp | N | TIMESTAMP, UNION | none | struct<time:timestamp,union:uniontype<int,string>,decimal:decimal(38,18)> orc_split_elim | N | DECIMAL | none | struct<userid:bigint,string1:string,subtype:double,decimal1:decimal(0,0),ts:timestamp> TestOrcFile.testDate1900 | N | TIMESTAMP, DATE | zlib | struct<time:timestamp,date:date> bad_bloom_filter_1.6.11 | N | bloom filter | zlib | struct<id:bigint,name:string> TestOrcFile.testDate2038 | N | TIMESTAMP, DATE | zlib | struct<time:timestamp,date:date> over1k_bloom | N | DECIMAL, bloom filter | zlib | struct<_col0:tinyint,_col1:smallint,_col2:int,_col3:bigint,_col4:float,_col5:double,_col6:boolean,_col7:string,_col8:timestamp,_col9:decimal(4,2),_col10:binary> orc_split_elim_cpp | N | TIMESTAMP, DECIMAL | zlib | struct<userid:bigint,string1:string,subtype:double,decimal1:decimal(16,6),ts:timestamp> decimal64_v2_cplusplus | N | DECIMAL | zlib | struct<a:bigint,b:decimal(10,2),c:decimal(2,2),d:decimal(2,2),e:decimal(2,2)> bad_bloom_filter_1.6.0 | N | bloom filter | zlib | struct<id:bigint,name:string> decimal | N | DECIMAL | none | struct<_col0:decimal(10,5)> orc_index_int_string | N | VARCHAR | zlib | struct<_col0:int,_col1:varchar(4)> </details> To print an ORC file ``` _build/release/velox/examples/velox_example_scan_orc {path_to_orc_file} ``` A more important point I wanna discuss is the modeling of `proto`: > Adding the new `PostScript` class that is shared by of DWRF and ORC As we introduce ORC's `.proto` file, a mechanism is needed to switching between two `.proto` files flexibly. My current solution is to create a class that mimic the `proto` class' intefaces (I only implemented `PostScript` in this PR), which contains attributes of both DWRF and ORC. Pull Request resolved: https://github.com/facebookincubator/velox/pull/1746 Reviewed By: HuamengJiang Differential Revision: D37220861 Pulled By: zzhao0 fbshipit-source-id: 545c4834e0816220de4d3d93b7146589da62b349

2022-05-20

Commit:	1910066
Author:	Jie1 Zhang	2022-05-20 06:03:32 -0700
Committer:	Facebook GitHub Bot	2022-05-20 06:03:32 -0700

Add velox-to-substrait converter (#1212) Summary: Implement Velox-to-Substrait pan conversion for Values, Filter and Project plan nodes. Expression conversion is limited to handful of hard-coded functions. Proper integration with function registry as well as support for other plan nodes will come in a future PR. Pull Request resolved: https://github.com/facebookincubator/velox/pull/1212 Reviewed By: kgpai Differential Revision: D36536485 Pulled By: mbasmanova fbshipit-source-id: 972ce70f6c681bd4a98a44db6b3594514c807530

2022-03-11

Commit:	2ed40db
Author:	Rui Mo	2022-03-11 13:24:22 -0800
Committer:	Facebook GitHub Bot	2022-03-11 13:24:22 -0800

Add Substrait-to-Velox conversion (#1048) Summary: This pr is targeted to add the base layout for Substrait-to-Velox conversion. With this change, the base layout of Substrait-to-Velox conversion would be decided. Some of Substrait plan representations can be converted into Velox plan for computing. More conversions support will be added in following pull requests. Based on [Substrait Commit](https://github.com/substrait-io/substrait/commit/9d9805be19c1d606dc2811395f203857db782872). The used function names are based on: [Substrait PR](https://github.com/substrait-io/substrait/pull/147). Pull Request resolved: https://github.com/facebookincubator/velox/pull/1048 Reviewed By: kagamiori Differential Revision: D34773052 Pulled By: pedroerp fbshipit-source-id: c4c56a43486519b567aa1d715a0d64ce02a447d1

2021-08-20

Commit:	d4d18cf
Author:	Pedro Eugenio Rocha Pedreira	2021-08-19 23:55:12 -0700
Committer:	Facebook GitHub Bot	2021-08-19 23:56:12 -0700

Fixing leftover copyright headers Summary: Adding copyright headers to a couple of remaining files. Reviewed By: amitkdutta Differential Revision: D30444573 fbshipit-source-id: 2af861a0a9f569707c79e0cb271b3b6b34f31dfd

2021-08-19

Commit:	a081810
Author:	Pedro Eugenio Rocha Pedreira	2021-08-19 15:55:33 -0700
Committer:	Facebook GitHub Bot	2021-08-19 15:56:29 -0700

Rename ChecksumAlgorithm NULL option to NULL_ (#60) Summary: Pull Request resolved: https://github.com/facebookincubator/velox/pull/60 gcc on linux (9.3 on Ubuntu 20.04) seems to have a problem with the `NULL` option name and fails the compilation. Renaming the flag to NULL_ instead. Interestingly enough, all other usages of the flag I could find already use "NULL_", so I'm a little confused about how this worked in the first place: Reviewed By: zzhao0 Differential Revision: D30396818 fbshipit-source-id: 7def0b77955f76fd45af6235a802cf60ac1240e7

2021-08-09

Commit:	bcff77a
Author:	Michael Shang	2021-08-09 01:59:57 -0700
Committer:	Facebook GitHub Bot	2021-08-09 02:26:26 -0700

Remove a quip link in comments (#2) Summary: Pull Request resolved: https://github.com/facebookincubator/Velox/pull/2 Reviewed By: mbasmanova Differential Revision: D30185776 fbshipit-source-id: 3a8311cdd581c6f95e6a2a79580da387be3b9b18

Commit:	719915b
Author:	facebook-github-bot	2021-08-08 18:39:28 -0700
Committer:	facebook-github-bot	2021-08-08 18:39:41 -0700

Initial commit fbshipit-source-id: 7c98b4413a035219f8a2949626de33424f6182e5