These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
| Commit: | 8dfef9f | |
|---|---|---|
| Author: | gaoouyenizi | |
| Committer: | GitHub | |
add expr string to SparkUDFWrapper (#967) Co-authored-by: lihao29 <lihao29@kuaishou.com>
The documentation is generated from this commit.
| Commit: | 97ff610 | |
|---|---|---|
| Author: | lihao29 | |
add expr string to SparkUDFWrapper
The documentation is generated from this commit.
| Commit: | f6de2b6 | |
|---|---|---|
| Author: | Harvey Yue | |
| Committer: | GitHub | |
Expect sha2 function result will be consistent with spark (#966)
| Commit: | 6845c44 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports WindowGroupLimitExec (#957) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | f4e26b5 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports WindowGroupLimitExec
| Commit: | 6e9098e | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
code refactoring and bug fixes (#952) * fix rss writer: do not commit when native shuffle writing is not succeeded. fix unsafe lifetime error in spark UDAF wrapper. fix hash join and aggregate exec metrics when failed back to sorted refactor UnionExec to support PartitionerAwareUnionRDD. refactor celeborn reader: remove hard copied code from celeborn. * fix EmptyRDD error with empty local table scan --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | c4c57da | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
fix rss writer: do not commit when native shuffle writing is not succeeded. fix unsafe lifetime error in spark UDAF wrapper. fix hash join and aggregate exec metrics when failed back to sorted refactor UnionExec to support PartitionerAwareUnionRDD. refactor celeborn reader: remove hard copied code from celeborn.
| Commit: | c343f0b | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
refactor UnionExec to support PartitionerAwareUnionRDD. fix unsafe lifetime error in spark UDAF wrapper.
| Commit: | 26eef94 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
refactor UnionExec to support PartitionerAwareUnionRDDPartition
| Commit: | 3316f0d | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
convert scalar value using arrow ipc, fixing unsupported map type (#938) fix GetIndexField with scalar value fix arrow writer with map type Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | cb655ec | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
convert scalar value using arrow ipc, fixing unsupported map type fix GetIndexField with scalar value fix arrow writer with map type
| Commit: | 5952072 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
NativeConverters adds aggregate function return type (#930) Coalesce adds params casting Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | c918c08 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
NativeConverters adds aggregate function return type Coalesce adds params casting
| Commit: | 3c40f39 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
rewrite UnionExec and support auto type casting (#927) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 660b30c | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
rewrite UnionExec and support auto type casting
| Commit: | 09ce344 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
ProjectExec adds cast automatically when data types not matched (#916) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | de4c260 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
ProjectExec adds cast automatically when data types not matched
| Commit: | 2c2f0a9 | |
|---|---|---|
| Author: | gy11233 | |
| Committer: | GitHub | |
Supports UDAF and other aggregate functions not implemented (#848)
| Commit: | c5a1ea8 | |
|---|---|---|
| Author: | guoying06 | |
formate name and schema
| Commit: | 9d88168 | |
|---|---|---|
| Author: | guoying06 | |
optimize
| Commit: | e3ba97b | |
|---|---|---|
| Author: | guoying06 | |
| Committer: | guoying06 | |
Dev udaf new merge NativeConverters
| Commit: | f8f2097 | |
|---|---|---|
| Author: | guoying06 | |
| Committer: | guoying06 | |
init DeclarativeAggregate udaf Resolving conflicted_file ArrowFFIExporter
| Commit: | a110e53 | |
|---|---|---|
| Author: | gy11233 | |
| Committer: | GitHub | |
Supports range partitioning (#734) Co-authored-by: guoying06 <guoying06@kuaishou.com>
| Commit: | d53d1b3 | |
|---|---|---|
| Author: | guoying06 | |
update range partition
| Commit: | 01108e3 | |
|---|---|---|
| Author: | guoying06 | |
update range partition
| Commit: | 9e459f8 | |
|---|---|---|
| Author: | gy11233 | |
| Committer: | GitHub | |
Dev repartitioning (#693) * add round robin shuffle * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update roundrobin log info * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update partition proto * update partition proto * update sort before round robin * update sort before round robin * update round robin * update round robin * update sort_batch_by_partition_id test * update sort_batch_by_partition_id test --------- Co-authored-by: guoying06 <guoying06@kuaishou.com>
| Commit: | b83338f | |
|---|---|---|
| Author: | guoying06 | |
update partition proto
| Commit: | b9af073 | |
|---|---|---|
| Author: | guoying06 | |
update partition proto
| Commit: | 009d904 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
optimize bloom filter (#620) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | dd59e4d | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
optimize bloom filter
| Commit: | 4027465 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
update to datafusion-v42/arrow-v53/arrow-java-v16 (#574) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 6a68b11 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
update to datafusion-v42/arrow-v53/arrow-java-v16
| Commit: | a0d4b20 | |
|---|---|---|
| Author: | Harvey Yue | |
| Committer: | GitHub | |
support native scan orc format (#544) Co-authored-by: Zhang Li <richselian@gmail.com>
| Commit: | c2cc15f | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports bloom filter join (#532) fix decimal issue supports spark351 Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | a023a41 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports bloom filter join fix decimal issue supports spark351
| Commit: | baa06d6 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports native shuffled hash join (#509) * supports native shuffled hash join * supports distinct hash map during semi/anti join --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | a153dac | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports native shuffled hash join
| Commit: | 5fff824 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
release version 3.0.0 (#506) * debug * supports using spark.io.compression.codec for shuffle/broadcast compression * use cached parquet metadata * remove out-of-date codes * refactor native broadcast to avoid duplicated broadcast jobs * fix in_list conversion in from_proto.rs * supports date type casting * release version 3.0.0 refactor join implementations to support existence joins and BHJ building hash map on driver side. supports spark333 batch shuffle reading. update rust-toolchain to latest nightly version. other minor improvements. update docs. --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 8e4b4cd | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
release version 3.0.0 refactor join implementations to support existence joins and BHJ building hash map on driver side. supports spark333 batch shuffle reading. update rust-toolchain to latest nightly version. other minor improvements. update docs.
| Commit: | ca48cd1 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
release version 3.0.0
| Commit: | 5e33279 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
joins refactoring
| Commit: | 9fbff78 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
joins refactoring
| Commit: | abd0308 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports BHJ in blaze
| Commit: | a905014 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
refactor sort-merge join
| Commit: | 696a6a3 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports UDTF fallbacks to spark (#483) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 438de89 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports UDTF fallbacks to spark
| Commit: | 753be42 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports brickhouse UDF/UDAFs: array_union, collect, combine_unique (#479) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | f064b9c | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports brickhouse UDF/UDAFs: array_union, collect, combine_unique
| Commit: | 480148a | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports parquet sink with zstd compression level. (#471) use stable sorting for dynamic partitions. fix parquet sink metrics. code formatting. Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 59fae25 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports parquet sink with zstd compression level. use stable sorting for dynamic partitions. fix parquet sink metrics. code formatting.
| Commit: | c84ad6a | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
update datafusion/arrow version to v36/v50. (#428) delegate sort-merge join to datafusion implementation. Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | b7beb9e | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
update datafusion/arrow version to v36/v50. delegate sort-merge join to datafusion implementation.
| Commit: | a218cd6 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
fix acc loader/saver for AggDynScalar (#413) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | f9657b9 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
fix acc loader/saver for AggDynScalar
| Commit: | 07cc20d | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
implement get_json udf with sonic-rs (with serde-json fail-backing). (#399) implement json_tuple udtf. get_json flats nested arrays (keep consistent with hive UDFJson). Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 093747d | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
implement get_json udf with sonic-rs (with serde-json fail-backing). implement json_tuple udtf. get_json flats nested arrays (keep consistent with hive UDFJson).
| Commit: | aa89239 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
implement get_json udf with sonic-rs (with serde-json fail-backing). implement json_tuple udtf. get_json flats nested arrays (keep consistent with hive UDFJson).
| Commit: | 5dda9a0 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports compressing multiple batches in ipc format. improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating.
| Commit: | e7d9082 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports compressing multiple batches in ipc format. (#392) improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating. Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 7d0b947 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports compressing multiple batches in ipc format. improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating.
| Commit: | 25c2ce3 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports compressing multiple batches in ipc format. improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating.
| Commit: | 53d2cca | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
close FSDataOutputStream after writing finished (#363) supports writing parquet table with dynamic partitions Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | b557acf | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
close FSDataOutputStream after writing finished supports writing parquet table with dynamic partitions
| Commit: | 50fd3d0 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
minor fixes and reafcatoring (#333) * code refactoring supports partial aggregate skipping fix incorrect assertion when converting concat_ws function fix ffi "not all nodes and buffers were consumed" issues * supports nested type hashing supports nested type array() function * use arrow snapshot version --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 4947ed5 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
code refactoring supports partial aggregate skipping fix incorrect assertion when converting concat_ws function fix ffi "not all nodes and buffers were consumed" issues
| Commit: | 3440789 | |
|---|---|---|
| Author: | Zhang Li | |
| Committer: | GitHub | |
supports native broadcast nested loop join (#282) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
| Commit: | 92cc27b | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports native broadcast nested loop join
| Commit: | 3b48eaa | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
add shims spark-3.3.3
| Commit: | f6d50a1 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
refactor SMJ to support post filters add BlazeConf
| Commit: | a5c4cdc | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
native parquet writer
| Commit: | c29917b | |
|---|---|---|
| Author: | lihao29 | |
| Committer: | zhangli20 | |
supports combining filter and projection exec. supports cached exprs evaluator.
| Commit: | fb3d74d | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports native first() aggregator. supports scalar map value. refactor agg_buf by adding AggDynScalar.
| Commit: | 3214fd5 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
convert If to CaseWhen expr
| Commit: | a3711ff | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
use sc_and/or from datafusion v25-blaze
| Commit: | 85de466 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
remove datafusion-ext-file-format and move parquet-exec to datafusion-ext-plans
| Commit: | 7ba14bb | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports converting literal array values
| Commit: | dd42e51 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
new broadcast join
| Commit: | 7a4bac0 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports name_batch in named_struct
| Commit: | 6383958 | |
|---|---|---|
| Author: | lihao29 | |
| Committer: | zhangli20 | |
Map entries
| Commit: | f7a25ed | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports all timestamp units use postcard to serialize/deserialize data types
| Commit: | 737ab73 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
next KDev_MR_link:https://ksurl.cn/SMbgH7by
| Commit: | 8f37ecc | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
reduce partitions size in parquet scan
| Commit: | beb0110 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
implements native GenerateExec
| Commit: | bc312b3 | |
|---|---|---|
| Author: | lihao29 | |
| Committer: | zhangli20 | |
1.update datafusion version from 15.0.0 to 20.0.0 2.update arrow version from 28.0.0 to 34.0.0 3.update parquet version from 28.0.0 to 34.0.0 4.use new method to convert like expr 5.concat_batch method fix(skip name check) 6.decimal compute in add/minus/multipy/divide cast to double and then pass to datafusion
| Commit: | 08f71d8 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
remove SortMergeJoin.null_equals_null (always false in spark)
| Commit: | a8e3df7 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports native SortAggregate
| Commit: | 32c8264 | |
|---|---|---|
| Author: | lihao29 | |
| Committer: | zhangli20 | |
support rss shuffle for blaze
| Commit: | 3ee8c37 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports spillable aggregation
| Commit: | ed5f1e4 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
update datafusion to 15.0.0
| Commit: | 1475c19 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
supports native expand exec
| Commit: | 8dcf25a | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
implement short-circuiting logical operators fix aggregation transforming with inconvertible result expressions
| Commit: | 721a355 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
add native if expression
| Commit: | fe7e1e2 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
coalesce batches before shuffle write and ipc write
| Commit: | ded3178 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
refactor: split out datafusion-ext-common, datafusion-ext-plans
| Commit: | 2992bf2 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
refactor: rename package plan-serde to blaze-serde
| Commit: | 7314bcd | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
update arrow/datafusion to latest revision
| Commit: | 7d24bd8 | |
|---|---|---|
| Author: | lihao29 | |
| Committer: | zhangli20 | |
add string expr
| Commit: | 42fb286 | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
refactor SparkFallbackToJvmExpr to SparkExpressionWrapperExpr
| Commit: | 5893a1a | |
|---|---|---|
| Author: | zhangli20 | |
| Committer: | zhangli20 | |
implement native limits exec
| Commit: | 24f4066 | |
|---|---|---|
| Author: | lihao29 | |
| Committer: | zhangli20 | |
fix error for partition column data type does not match