These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
Commit: | 8dfef9f | |
---|---|---|
Author: | gaoouyenizi | |
Committer: | GitHub |
add expr string to SparkUDFWrapper (#967) Co-authored-by: lihao29 <lihao29@kuaishou.com>
The documentation is generated from this commit.
Commit: | 97ff610 | |
---|---|---|
Author: | lihao29 |
add expr string to SparkUDFWrapper
The documentation is generated from this commit.
Commit: | f6de2b6 | |
---|---|---|
Author: | Harvey Yue | |
Committer: | GitHub |
Expect sha2 function result will be consistent with spark (#966)
Commit: | 6845c44 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports WindowGroupLimitExec (#957) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | f4e26b5 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports WindowGroupLimitExec
Commit: | 6e9098e | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
code refactoring and bug fixes (#952) * fix rss writer: do not commit when native shuffle writing is not succeeded. fix unsafe lifetime error in spark UDAF wrapper. fix hash join and aggregate exec metrics when failed back to sorted refactor UnionExec to support PartitionerAwareUnionRDD. refactor celeborn reader: remove hard copied code from celeborn. * fix EmptyRDD error with empty local table scan --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | c4c57da | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
fix rss writer: do not commit when native shuffle writing is not succeeded. fix unsafe lifetime error in spark UDAF wrapper. fix hash join and aggregate exec metrics when failed back to sorted refactor UnionExec to support PartitionerAwareUnionRDD. refactor celeborn reader: remove hard copied code from celeborn.
Commit: | c343f0b | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
refactor UnionExec to support PartitionerAwareUnionRDD. fix unsafe lifetime error in spark UDAF wrapper.
Commit: | 26eef94 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
refactor UnionExec to support PartitionerAwareUnionRDDPartition
Commit: | 3316f0d | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
convert scalar value using arrow ipc, fixing unsupported map type (#938) fix GetIndexField with scalar value fix arrow writer with map type Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | cb655ec | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
convert scalar value using arrow ipc, fixing unsupported map type fix GetIndexField with scalar value fix arrow writer with map type
Commit: | 5952072 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
NativeConverters adds aggregate function return type (#930) Coalesce adds params casting Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | c918c08 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
NativeConverters adds aggregate function return type Coalesce adds params casting
Commit: | 3c40f39 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
rewrite UnionExec and support auto type casting (#927) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 660b30c | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
rewrite UnionExec and support auto type casting
Commit: | 09ce344 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
ProjectExec adds cast automatically when data types not matched (#916) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | de4c260 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
ProjectExec adds cast automatically when data types not matched
Commit: | 2c2f0a9 | |
---|---|---|
Author: | gy11233 | |
Committer: | GitHub |
Supports UDAF and other aggregate functions not implemented (#848)
Commit: | c5a1ea8 | |
---|---|---|
Author: | guoying06 |
formate name and schema
Commit: | 9d88168 | |
---|---|---|
Author: | guoying06 |
optimize
Commit: | e3ba97b | |
---|---|---|
Author: | guoying06 | |
Committer: | guoying06 |
Dev udaf new merge NativeConverters
Commit: | f8f2097 | |
---|---|---|
Author: | guoying06 | |
Committer: | guoying06 |
init DeclarativeAggregate udaf Resolving conflicted_file ArrowFFIExporter
Commit: | a110e53 | |
---|---|---|
Author: | gy11233 | |
Committer: | GitHub |
Supports range partitioning (#734) Co-authored-by: guoying06 <guoying06@kuaishou.com>
Commit: | d53d1b3 | |
---|---|---|
Author: | guoying06 |
update range partition
Commit: | 01108e3 | |
---|---|---|
Author: | guoying06 |
update range partition
Commit: | 9e459f8 | |
---|---|---|
Author: | gy11233 | |
Committer: | GitHub |
Dev repartitioning (#693) * add round robin shuffle * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update roundrobin log info * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update roundrobin * update partition proto * update partition proto * update sort before round robin * update sort before round robin * update round robin * update round robin * update sort_batch_by_partition_id test * update sort_batch_by_partition_id test --------- Co-authored-by: guoying06 <guoying06@kuaishou.com>
Commit: | b83338f | |
---|---|---|
Author: | guoying06 |
update partition proto
Commit: | b9af073 | |
---|---|---|
Author: | guoying06 |
update partition proto
Commit: | 009d904 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
optimize bloom filter (#620) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | dd59e4d | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
optimize bloom filter
Commit: | 4027465 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
update to datafusion-v42/arrow-v53/arrow-java-v16 (#574) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 6a68b11 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
update to datafusion-v42/arrow-v53/arrow-java-v16
Commit: | a0d4b20 | |
---|---|---|
Author: | Harvey Yue | |
Committer: | GitHub |
support native scan orc format (#544) Co-authored-by: Zhang Li <richselian@gmail.com>
Commit: | c2cc15f | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports bloom filter join (#532) fix decimal issue supports spark351 Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | a023a41 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports bloom filter join fix decimal issue supports spark351
Commit: | baa06d6 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports native shuffled hash join (#509) * supports native shuffled hash join * supports distinct hash map during semi/anti join --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | a153dac | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports native shuffled hash join
Commit: | 5fff824 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
release version 3.0.0 (#506) * debug * supports using spark.io.compression.codec for shuffle/broadcast compression * use cached parquet metadata * remove out-of-date codes * refactor native broadcast to avoid duplicated broadcast jobs * fix in_list conversion in from_proto.rs * supports date type casting * release version 3.0.0 refactor join implementations to support existence joins and BHJ building hash map on driver side. supports spark333 batch shuffle reading. update rust-toolchain to latest nightly version. other minor improvements. update docs. --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 8e4b4cd | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
release version 3.0.0 refactor join implementations to support existence joins and BHJ building hash map on driver side. supports spark333 batch shuffle reading. update rust-toolchain to latest nightly version. other minor improvements. update docs.
Commit: | ca48cd1 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
release version 3.0.0
Commit: | 5e33279 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
joins refactoring
Commit: | 9fbff78 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
joins refactoring
Commit: | abd0308 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports BHJ in blaze
Commit: | a905014 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
refactor sort-merge join
Commit: | 696a6a3 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports UDTF fallbacks to spark (#483) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 438de89 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports UDTF fallbacks to spark
Commit: | 753be42 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports brickhouse UDF/UDAFs: array_union, collect, combine_unique (#479) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | f064b9c | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports brickhouse UDF/UDAFs: array_union, collect, combine_unique
Commit: | 480148a | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports parquet sink with zstd compression level. (#471) use stable sorting for dynamic partitions. fix parquet sink metrics. code formatting. Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 59fae25 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports parquet sink with zstd compression level. use stable sorting for dynamic partitions. fix parquet sink metrics. code formatting.
Commit: | c84ad6a | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
update datafusion/arrow version to v36/v50. (#428) delegate sort-merge join to datafusion implementation. Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | b7beb9e | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
update datafusion/arrow version to v36/v50. delegate sort-merge join to datafusion implementation.
Commit: | a218cd6 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
fix acc loader/saver for AggDynScalar (#413) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | f9657b9 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
fix acc loader/saver for AggDynScalar
Commit: | 07cc20d | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
implement get_json udf with sonic-rs (with serde-json fail-backing). (#399) implement json_tuple udtf. get_json flats nested arrays (keep consistent with hive UDFJson). Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 093747d | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
implement get_json udf with sonic-rs (with serde-json fail-backing). implement json_tuple udtf. get_json flats nested arrays (keep consistent with hive UDFJson).
Commit: | aa89239 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
implement get_json udf with sonic-rs (with serde-json fail-backing). implement json_tuple udtf. get_json flats nested arrays (keep consistent with hive UDFJson).
Commit: | 5dda9a0 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports compressing multiple batches in ipc format. improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating.
Commit: | e7d9082 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports compressing multiple batches in ipc format. (#392) improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating. Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 7d0b947 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports compressing multiple batches in ipc format. improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating.
Commit: | 25c2ce3 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports compressing multiple batches in ipc format. improved batch_serde format for better compression. implement radix tournament tree for linear time k-way merging. implement accumulator store. other minor refactoring and optimizating.
Commit: | 53d2cca | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
close FSDataOutputStream after writing finished (#363) supports writing parquet table with dynamic partitions Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | b557acf | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
close FSDataOutputStream after writing finished supports writing parquet table with dynamic partitions
Commit: | 50fd3d0 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
minor fixes and reafcatoring (#333) * code refactoring supports partial aggregate skipping fix incorrect assertion when converting concat_ws function fix ffi "not all nodes and buffers were consumed" issues * supports nested type hashing supports nested type array() function * use arrow snapshot version --------- Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 4947ed5 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
code refactoring supports partial aggregate skipping fix incorrect assertion when converting concat_ws function fix ffi "not all nodes and buffers were consumed" issues
Commit: | 3440789 | |
---|---|---|
Author: | Zhang Li | |
Committer: | GitHub |
supports native broadcast nested loop join (#282) Co-authored-by: zhangli20 <zhangli20@kuaishou.com>
Commit: | 92cc27b | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports native broadcast nested loop join
Commit: | 3b48eaa | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
add shims spark-3.3.3
Commit: | f6d50a1 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
refactor SMJ to support post filters add BlazeConf
Commit: | a5c4cdc | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
native parquet writer
Commit: | c29917b | |
---|---|---|
Author: | lihao29 | |
Committer: | zhangli20 |
supports combining filter and projection exec. supports cached exprs evaluator.
Commit: | fb3d74d | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports native first() aggregator. supports scalar map value. refactor agg_buf by adding AggDynScalar.
Commit: | 3214fd5 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
convert If to CaseWhen expr
Commit: | a3711ff | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
use sc_and/or from datafusion v25-blaze
Commit: | 85de466 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
remove datafusion-ext-file-format and move parquet-exec to datafusion-ext-plans
Commit: | 7ba14bb | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports converting literal array values
Commit: | dd42e51 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
new broadcast join
Commit: | 7a4bac0 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports name_batch in named_struct
Commit: | 6383958 | |
---|---|---|
Author: | lihao29 | |
Committer: | zhangli20 |
Map entries
Commit: | f7a25ed | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports all timestamp units use postcard to serialize/deserialize data types
Commit: | 737ab73 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
next KDev_MR_link:https://ksurl.cn/SMbgH7by
Commit: | 8f37ecc | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
reduce partitions size in parquet scan
Commit: | beb0110 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
implements native GenerateExec
Commit: | bc312b3 | |
---|---|---|
Author: | lihao29 | |
Committer: | zhangli20 |
1.update datafusion version from 15.0.0 to 20.0.0 2.update arrow version from 28.0.0 to 34.0.0 3.update parquet version from 28.0.0 to 34.0.0 4.use new method to convert like expr 5.concat_batch method fix(skip name check) 6.decimal compute in add/minus/multipy/divide cast to double and then pass to datafusion
Commit: | 08f71d8 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
remove SortMergeJoin.null_equals_null (always false in spark)
Commit: | a8e3df7 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports native SortAggregate
Commit: | 32c8264 | |
---|---|---|
Author: | lihao29 | |
Committer: | zhangli20 |
support rss shuffle for blaze
Commit: | 3ee8c37 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports spillable aggregation
Commit: | ed5f1e4 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
update datafusion to 15.0.0
Commit: | 1475c19 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
supports native expand exec
Commit: | 8dcf25a | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
implement short-circuiting logical operators fix aggregation transforming with inconvertible result expressions
Commit: | 721a355 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
add native if expression
Commit: | fe7e1e2 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
coalesce batches before shuffle write and ipc write
Commit: | ded3178 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
refactor: split out datafusion-ext-common, datafusion-ext-plans
Commit: | 2992bf2 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
refactor: rename package plan-serde to blaze-serde
Commit: | 7314bcd | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
update arrow/datafusion to latest revision
Commit: | 7d24bd8 | |
---|---|---|
Author: | lihao29 | |
Committer: | zhangli20 |
add string expr
Commit: | 42fb286 | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
refactor SparkFallbackToJvmExpr to SparkExpressionWrapperExpr
Commit: | 5893a1a | |
---|---|---|
Author: | zhangli20 | |
Committer: | zhangli20 |
implement native limits exec
Commit: | 24f4066 | |
---|---|---|
Author: | lihao29 | |
Committer: | zhangli20 |
fix error for partition column data type does not match