These 11 commits are when the Protocol Buffers files have changed:
Commit: | 09f56d5 | |
---|---|---|
Author: | Masha Basmanova | |
Committer: | Facebook GitHub Bot |
Delete dead codegen code Reviewed By: xiaoxmeng Differential Revision: D54269600 fbshipit-source-id: adc6120e85c63678449572ac93f71372de8a13d1
The documentation is generated from this commit.
Commit: | f54056d | |
---|---|---|
Author: | ZJie1 | |
Committer: | Facebook GitHub Bot |
Update substrait to 0.23.0 (#3959) Summary: Substrait supports physical join and expr evaluation in v0.23.0 Pull Request resolved: https://github.com/facebookincubator/velox/pull/3959 Reviewed By: Yuhta Differential Revision: D43123510 Pulled By: pedroerp fbshipit-source-id: f2c53a7e42a43445eed9c6c8fffa7e53b5106d94
Commit: | 58bb1d7 | |
---|---|---|
Author: | Xuedong Luan | |
Committer: | Facebook GitHub Bot |
Update substrait to 0.22.0 (#3697) Summary: Update substrait to newest release version 0.22.0 Pull Request resolved: https://github.com/facebookincubator/velox/pull/3697 Reviewed By: Yuhta Differential Revision: D42497470 Pulled By: mbasmanova fbshipit-source-id: 4b78f03f1f4c806981a4e4ac36a9008e36b58afe
Commit: | 4988cec | |
---|---|---|
Author: | PenghuiJiao | |
Committer: | Facebook GitHub Bot |
Update substrait to 0.7.0 (#2002) Summary: Also, rename tests/data/sub.json to q6_first_stage.json. Pull Request resolved: https://github.com/facebookincubator/velox/pull/2002 Reviewed By: pedroerp Differential Revision: D37988016 Pulled By: mbasmanova fbshipit-source-id: 163e17ba9fbd19f89ba431c8e95797c1486f0dc8
Commit: | 82d7763 | |
---|---|---|
Author: | usurai | |
Committer: | Facebook GitHub Bot |
Add limited ORC reader support (#1746) Summary: This PR is an attempt to adding ORC reader support based on `DwrfReader`. For more context, please refer to https://github.com/facebookincubator/velox/issues/1747. This PR introduces the following changes: - Import `.proto` file of ORC from [apache/orc](https://github.com/apache/orc) - Distinguish ORC and DWRF in `FileFormat` and make `DwrfReader` aware of the file format it's processing - Adding the new `PostScript` class that is shared by of DWRF and ORC* - Don't try to read cache when parsing ORC's `PostScript` - For ORC's int DIRECT decoding, use RLE insterad of `DirectDecoder` Currently I use ad hoc method to test the change, that is, try reading and printing the content of [apache/orc's example files](https://github.com/apache/orc/tree/main/examples) (excluding corrupted and encrypted) using modified `DwrfReader` and comparing the output with the output of [apache/orc's orc-contents](https://orc.apache.org/docs/cpp-tools.html#orc-contents), with above changes, I've got some of tests emitting correct output(24 out of 39). For the incorrect / crashed test cases, I think they are due to missing type support (DECIMAL, DATE etc) or feature support (bloom filter). <details> <summary>Detailed test cases</summary> <google-sheets-html-origin><meta name="sheets-banding" data-sheets-range="A1:Z1000" data-sheets-banding-properties="{"1":{"1":{"4":{"1":2,"2":12434877}},"2":{"4":{"1":2,"2":16777215}},"3":1,"4":{"4":{"1":2,"2":15987699}},"5":1}}"> Test | Correct | Issue | Compression | Type -- | -- | -- | -- | -- TestOrcFile.testSeek | Y | | zlib | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>> TestOrcFile.test1 | Y | | zlib | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>> TestOrcFile.testStringAndBinaryStatistics | Y | | zlib | struct<bytes1:binary,string1:string> nulls-at-end-snappy | Y | | snappy | struct<_col0:tinyint,_col1:smallint,_col2:int,_col3:bigint,_col4:float,_col5:double,_col6:boolean> TestStringDictionary.testRowIndex | Y | | zlib | struct<str:string> TestOrcFile.testMemoryManagementV11 | Y | | none | struct<int1:int,string1:string> TestOrcFile.columnProjection | Y | | none | struct<int1:int,string1:string> TestOrcFile.emptyFile | Y | | none | none TestOrcFile.metaData | Y | | none | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>> demo-12-zlib | Y | | zlib | struct<_col0:int,_col1:string,_col2:string,_col3:string,_col4:int,_col5:string,_col6:int,_col7:int,_col8:int> TestOrcFile.testWithoutIndex | Y | | snappy | struct<int1:int,string1:string> zero | Y | | none | none TestOrcFile.testPredicatePushdown | Y | | none | struct<int1:int,string1:string> complextypes_iceberg | Y | | zlib | struct<id:bigint,int_array:array<int>,int_array_array:array<array<int>>,int_map:map<string,int>,int_map_array:array<map<string,int>>,nested_struct:struct<a:int,b:array<int>,c:struct<d:array<array<struct<e:int,f:string>>>>,g:map<string,struct<h:struct<i:array<double>>>>>> TestOrcFile.testStripeLevelStats | Y | | zlib | struct<int1:int,string1:string> TestOrcFile.testMemoryManagementV12 | Y | | none | struct<int1:int,string1:string> TestVectorOrcFile.testLzo | Y | | lzo | struct<x:bigint,y:int,z:bigint> TestVectorOrcFile.testZstd.0.12 | Y | | zstd | struct<x:bigint,y:int,z:bigint> demo-11-zlib | Y | | zlib | struct<_col0:int,_col1:string,_col2:string,_col3:string,_col4:int,_col5:string,_col6:int,_col7:int,_col8:int> TestVectorOrcFile.testLz4 | Y | | lz4 | struct<x:bigint,y:int,z:bigint> TestOrcFile.testSnappy | Y | | snappy | struct<int1:int,string1:string> version1999 | Y | | none | struct<> orc_no_format | Y | | zlib | struct<_col0:array<string>,_col1:map<int,string>,_col2:struct<name:string,score:int>> demo-11-none | Y | | none | struct<_col0:int,_col1:string,_col2:string,_col3:string,_col4:int,_col5:string,_col6:int,_col7:int,_col8:int> TestOrcFile.testTimestamp | N | TIMESTAMP | zlib | timestamp orc_split_elim_new | N | DECIMAL | zlib | struct<userid:bigint,string1:string,subtype:double,decimal1:decimal(16,6),ts:timestamp> decimal64_v2 | N | DECIMAL | zlib | struct<a:bigint,b:decimal(12,0),c:decimal(20,2),d:decimal(12,2),e:decimal(2,2)> orc-file-11-format | N | DECIMAL | none | struct<boolean1:boolean,byte1:tinyint,short1:smallint,int1:int,long1:bigint,float1:float,double1:double,bytes1:binary,string1:string,middle:struct<list:array<struct<int1:int,string1:string>>>,list:array<struct<int1:int,string1:string>>,map:map<string,struct<int1:int,string1:string>>,ts:timestamp,decimal1:decimal(0,0)> TestOrcFile.testUnionAndTimestamp | N | TIMESTAMP, UNION | none | struct<time:timestamp,union:uniontype<int,string>,decimal:decimal(38,18)> orc_split_elim | N | DECIMAL | none | struct<userid:bigint,string1:string,subtype:double,decimal1:decimal(0,0),ts:timestamp> TestOrcFile.testDate1900 | N | TIMESTAMP, DATE | zlib | struct<time:timestamp,date:date> bad_bloom_filter_1.6.11 | N | bloom filter | zlib | struct<id:bigint,name:string> TestOrcFile.testDate2038 | N | TIMESTAMP, DATE | zlib | struct<time:timestamp,date:date> over1k_bloom | N | DECIMAL, bloom filter | zlib | struct<_col0:tinyint,_col1:smallint,_col2:int,_col3:bigint,_col4:float,_col5:double,_col6:boolean,_col7:string,_col8:timestamp,_col9:decimal(4,2),_col10:binary> orc_split_elim_cpp | N | TIMESTAMP, DECIMAL | zlib | struct<userid:bigint,string1:string,subtype:double,decimal1:decimal(16,6),ts:timestamp> decimal64_v2_cplusplus | N | DECIMAL | zlib | struct<a:bigint,b:decimal(10,2),c:decimal(2,2),d:decimal(2,2),e:decimal(2,2)> bad_bloom_filter_1.6.0 | N | bloom filter | zlib | struct<id:bigint,name:string> decimal | N | DECIMAL | none | struct<_col0:decimal(10,5)> orc_index_int_string | N | VARCHAR | zlib | struct<_col0:int,_col1:varchar(4)> </details> To print an ORC file ``` _build/release/velox/examples/velox_example_scan_orc {path_to_orc_file} ``` A more important point I wanna discuss is the modeling of `proto`: > Adding the new `PostScript` class that is shared by of DWRF and ORC As we introduce ORC's `.proto` file, a mechanism is needed to switching between two `.proto` files flexibly. My current solution is to create a class that mimic the `proto` class' intefaces (I only implemented `PostScript` in this PR), which contains attributes of both DWRF and ORC. Pull Request resolved: https://github.com/facebookincubator/velox/pull/1746 Reviewed By: HuamengJiang Differential Revision: D37220861 Pulled By: zzhao0 fbshipit-source-id: 545c4834e0816220de4d3d93b7146589da62b349
Commit: | 1910066 | |
---|---|---|
Author: | Jie1 Zhang | |
Committer: | Facebook GitHub Bot |
Add velox-to-substrait converter (#1212) Summary: Implement Velox-to-Substrait pan conversion for Values, Filter and Project plan nodes. Expression conversion is limited to handful of hard-coded functions. Proper integration with function registry as well as support for other plan nodes will come in a future PR. Pull Request resolved: https://github.com/facebookincubator/velox/pull/1212 Reviewed By: kgpai Differential Revision: D36536485 Pulled By: mbasmanova fbshipit-source-id: 972ce70f6c681bd4a98a44db6b3594514c807530
Commit: | 2ed40db | |
---|---|---|
Author: | Rui Mo | |
Committer: | Facebook GitHub Bot |
Add Substrait-to-Velox conversion (#1048) Summary: This pr is targeted to add the base layout for Substrait-to-Velox conversion. With this change, the base layout of Substrait-to-Velox conversion would be decided. Some of Substrait plan representations can be converted into Velox plan for computing. More conversions support will be added in following pull requests. Based on [Substrait Commit](https://github.com/substrait-io/substrait/commit/9d9805be19c1d606dc2811395f203857db782872). The used function names are based on: [Substrait PR](https://github.com/substrait-io/substrait/pull/147). Pull Request resolved: https://github.com/facebookincubator/velox/pull/1048 Reviewed By: kagamiori Differential Revision: D34773052 Pulled By: pedroerp fbshipit-source-id: c4c56a43486519b567aa1d715a0d64ce02a447d1
Commit: | d4d18cf | |
---|---|---|
Author: | Pedro Eugenio Rocha Pedreira | |
Committer: | Facebook GitHub Bot |
Fixing leftover copyright headers Summary: Adding copyright headers to a couple of remaining files. Reviewed By: amitkdutta Differential Revision: D30444573 fbshipit-source-id: 2af861a0a9f569707c79e0cb271b3b6b34f31dfd
Commit: | a081810 | |
---|---|---|
Author: | Pedro Eugenio Rocha Pedreira | |
Committer: | Facebook GitHub Bot |
Rename ChecksumAlgorithm NULL option to NULL_ (#60) Summary: Pull Request resolved: https://github.com/facebookincubator/velox/pull/60 gcc on linux (9.3 on Ubuntu 20.04) seems to have a problem with the `NULL` option name and fails the compilation. Renaming the flag to NULL_ instead. Interestingly enough, all other usages of the flag I could find already use "NULL_", so I'm a little confused about how this worked in the first place: Reviewed By: zzhao0 Differential Revision: D30396818 fbshipit-source-id: 7def0b77955f76fd45af6235a802cf60ac1240e7
Commit: | bcff77a | |
---|---|---|
Author: | Michael Shang | |
Committer: | Facebook GitHub Bot |
Remove a quip link in comments (#2) Summary: Pull Request resolved: https://github.com/facebookincubator/Velox/pull/2 Reviewed By: mbasmanova Differential Revision: D30185776 fbshipit-source-id: 3a8311cdd581c6f95e6a2a79580da387be3b9b18
Commit: | 719915b | |
---|---|---|
Author: | facebook-github-bot | |
Committer: | facebook-github-bot |
Initial commit fbshipit-source-id: 7c98b4413a035219f8a2949626de33424f6182e5