Proto commits in daphne-eu/daphne

These 16 commits are when the Protocol Buffers files have changed:

Commit:d735331
Author:Aristotelis Vontzalidis
Committer:Mark Dokter

[DAPHNE-#767] Initial HDFS support This commit adds initial support to read and write files from Hadoop Filesystems in distributed mode. Besides the read, write and distributed functionality, this also contains new configuration options and a new context object to manage the connection information to the distributed filesystem. Finally, this feature requires the installation of more external dependencies. The compilation is therefore optional and can be activated with the --hdfs flag to build.sh. Closes #767 Co-authored-by: KostasBitsakos <kbitsak@cslab.ece.ntua.gr> Co-authored-by: Mark Dokter <mark@dokter.cc>

The documentation is generated from this commit.

Commit:5cca09c
Author:avontz@cslab.ece.ntua.gr
Committer:Aristotelis Vontzalidis

[DISTRIBUTED] Send data in chunks and serializer improvements. - Distributed runtime was limited to sending messages of maximum size INT_MAX for each message (both for MPI and gRPC). With this commit we are no longer limited by this and we serialize data into chunks which are then sent to the workers, allowing us to send bigger messages. - Added a new command line arg to set maximum chunk size for distr messages (default ~2GB). - Use pointer for holding buffer inside serialization iterator. - DaphneSerializer std::vector buffer, used reserve() and capacity() methods to manage available space. However, we should not write to vector's memory (even if reserved) without first resizing the buffer. Therefore replaced reserve() calls with resize(). Similar replacements for MPI and gRPC worker. - Due to resize() use, it's best to use the smallest possible chunk size. - Added test case for distributed chunked messages.

Commit:478ae7c
Author:Aristotelis Vontzalidis
Committer:GitHub

Initial implementation of DaphneSerializer (#467) - Initial implementation of DaphneSerializer. - For Structure, DenseMatrix, CSRMatrix, and fundamental types (Frame still missing, see #545. - Variants for (de)serializing a data object as a whole or in chunks (for in-order and out-of-order deserialization). - Updated WriteDaphne.h and ReadDaphne.h (reader/writer of DAPHNE binary data format) to use DaphneSerializer. - Updated distributed kernels with DaphneSerializer. - Distribute.h, Broadcast.h and DistributedCollect.h kernels updated with DaphneSerializer. - Updated gRPC and MPI distributed backends (e.g., workers). - Removed DAPHNE ProtoDataConverter. - Test cases. - API documentation in the source code. - Fixed a small bug for MPI: MPI helper function, called by the distribute kernel, received the wrong rank. - Contributes to #103, #465. Co-authored-by: Stratos Psomadakis <774566+psomas@users.noreply.github.com>

Commit:be8d3d3
Author:avontz@cslab.ece.ntua.gr
Committer:avontz@cslab.ece.ntua.gr

Updated Distributed kernels with DaphneSerializer - Distribute.h, Broadcast.h and DistributedCollect.h kernels updated with DaphneSerializer. - Updated WorkerImplGRPC.h with DaphneSerializer. - Removed Daphne ProtoDataConverter.

Commit:25c4486
Author:avontz@cslab.ece.ntua.gr
Committer:avontz@cslab.ece.ntua.gr

Updated Distributed runtime with DaphneSerializer. - MPI makes use of DaphneSerializer. - MPI is now independent of Protobufs. - Added serialization of fundumental types. - Updated gRPC to use DaphneSerialization for fundumental types. - Other minor updates.

Commit:a1e92f5
Author:avontz@cslab.ece.ntua.gr
Committer:Mark Dokter

[DAPHNE-367] Distributed Pipelines Metadata Handling This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below. Closes #367 Initial AllocationDescriptor Distributed Implementation - GRPC implementation Moved gRPC-related classes and files under "runtime/distributed/proto/" . - Some files containing gRPC code where located under distributed/worker. Moved class ProtoDataConverter, class CallData - Some files containing gRPC code where located under distributed/coordinator. Moved class DistributedGRPCCaller - Updated CMAKE files. Updated DistributedWorker - Seperated worker implementation from gRPC. - Worker gRPC implementation now derives from base class Worker Implementation. - Base class WorkerImpl contains generic functions for storing data, computing pipelines, etc. - class WorkerImplGRPC contains functions for communicating with gRPC and using parent class for storing/computing data. - TODO WorkerImplMPI. Distributed pipeline kernel: Support for more than two outputs. Enabling multiple outputs for distributed pipelines. - There was already a partial implementation transfering the recent changes from vectorized pipelines to distributed pipelines. - However, a few pieces were still missing to make it work: - The CallKernelOp generated for the DistributedPipelineOp in RewriteToCallKernelOpPass must have the attribute "hasVariadicResults" to ensure correct lowering in LowerToLLVMPass. - The number of outputs must come after the outputs in the kernel, and must not be added as an operand to the CallKernelOp, since it is added automatically for variadic results in LowerToLLVMPass. [MINOR] Bugfix: grpc was not throwing an error when handling unsupported types (for now we support only Dense<double>) - Support for broadcasting single double values. - Minor fixes. - Due to current Object Meta Data limitations, we only support unique inputs for a Distributed Pipeline (no duplicate pipeline inputs). Distributed kernels - Distributed kernels have specializations for each distributed backend implementation. - Distributed kernels update the meta data and handle the communication using specific distributed-backend implementation. - Distributed metadata now hold only information. - TODO: Add simple transferTo/From functions in the meta data class for the distributed gRPC implementation. - Various small changes. Rebased onto main - main includes the initial Meta Data implementation - MetaDataObject mdo field of class Structure is now public. Distributed kernels need to access and modify the metadata of an object. - Various small updates to kernels in order to support the new meta data implementation. Updated distributed runtime tests - WorkerTest.cpp now tests the generic WorkerImpl class, instead of the gRPC specific implementation. - TODO: Add a test for the gRPC WorkerImpl class. - Removed unused utility function "StartDistributedWorker" - Disabled "DistributedRead" test. With the new Distributed-Pipeline implementation we do not support distributed read yet, therefore this test does not actually test something significant. - Updated a few test-scripts for the distributed runtime, due to unique-pipeline-inputs limitations. Cleanup. - Added Status nested class to WorkerImpl. - Renamed and moved AllocationDescriptorGRPC. - Renamed Worker::StoredInfo::filename to identifier. - Improved serialization from CSRMatrix to protobuf. - Changed MetaDataObject mdo in Structure class, from public to private. - Added getter by reference for modifying MetaDataObject of a Structure. - Improved CSRMatrix serialization from Daphne object to protobuf. - Fixed various warnings. - Minor changes.

Commit:bf56d8f
Author:avontz@cslab.ece.ntua.gr
Committer:Mark Dokter

[DAPHNE-96, DAPHNE-194] Distributed Runtime Refactoring This commit merges a longer development process to the main branch. The general topic is given in the first line of the commit message and the aggregated individual commit messages are listed below. Closes #96, Closes #194 Distributed runtime updates: - Updated project structure - Moved distributed related kernels and datastructures under runtime/distributed/coordinator/ - Generalized Broadcast&Distribute kernels - DistributedWrapper implementation - Updated Worker to support Vectorized execution. Implementation of vectorizedPipeline local kernel - Updated distributedCollect Primitive - TODO generated ir code needs to be fixed - TODO Additional debugging needed on worker side Distributed runtime updates: - Extended parseType for rows/cols - Updated Distributed Runtime for COLS combine -Updated DistributedTest - DistributedDescriptor implementation (metadata for the distributed runtime) - Distributed allocation descriptor implementation. simply holds object metadata information - Distributed kernels (distribute/broadcast/compute, etc.) use template functions for each communication framework. - MPI implementation missing. - New enum type for distributed backend implementation [MINOR] Changes for readCSVFiles after rebasing

Commit:abcb70b
Author:avontz@cslab.ece.ntua.gr
Committer:Aristotelis Vontzalidis

Support for CSRMatrices on the Distributed Runtime - Updated Worker implementation to support CSR matrices - Updated local primitives (distribute/broadcast/collect) for CSR support - TODO: Add WorkerTest.cpp additional testcases - TODO: ProtoDataConverter can be optimized for CSRMatrix serialization

Commit:0e64e70
Author:avontz@cslab.ece.ntua.gr
Committer:Aristotelis Vontzalidis

Int support for distributed runtime

Commit:c19293c
Author:avontz@cslab.ece.ntua.gr
Committer:Aristotelis Vontzalidis

Support for various datatypes on the Distributed runtime - Support for `DenseMatrix<int64_t>`. - Support for `CSRMatrix<double/int64_t>`. - Implemented a pure virtual function for `class Structure` that converts objects to protobuf (similar to ProtoDataConverter). - `static class ProtoDataConverter` was removed. - Implemented more effiecient way to sent CSRMatrices using protobufs. - Various minor changes due to new serialization function.

Commit:80b5822
Author:kbitsakos

CSRMatrix Mull with Tests completed

Commit:be10fc1
Author:avontz@cslab.ece.ntua.gr
Committer:Aristotelis Vontzalidis

Tree-based broadcast initial implementation

Commit:81fbac6
Author:avontz@cslab.ece.ntua.gr

Support for CSRMatrices on the Distributed Runtime - Updated Worker implementation to support CSR matrices - Updated local primitives (distribute/broadcast/collect) for CSR support - TODO: Add WorkerTest.cpp additional testcases - TODO: ProtoDataConverter can be optimized for CSRMatrix serialization

Commit:232f62b
Author:avontz@cslab.ece.ntua.gr
Committer:avontz@cslab.ece.ntua.gr

Int support for distributed runtime

Commit:5e294bd
Author:Patrick Damme

[DAPHNE-#177] Worker FreeMem implementation - Special gRPC request to tell distributed workers to free a cached data object. - Triggered from the destructor of Handle (distributed matrix). - Closes #177

Commit:3c3ecf7
Author:Kevin Innerebner
Committer:Patrick Damme

[DAPHNE-#91] Distributed Operations. - This commit pioneers the implementation of a distributed run-time for the Daphne prototype. - The basic idea is to launch a set of distributed workers communicating to the main program via a gRPC/protobuf-based protocol. - Distributed data: - The data is partitioned into blocks of size 512 rows x 512 columns. - A new type Handle provides a mapping from the position in a distributed matrix to the address where the data resides. - Distribution primitives: - Three distribution operations and corresponding kernels are introduced: - Distribute: Partitions a local DenseMatrix and sends the partitions to the workers. - DistributedComputation: Sends a textual DaphneIR fragment to the workers and executes it on the individual data partitions. The results are cached in-memory on the workers. - Collect: Fetches the partitions of a distributed data object back into a local data object in the main program. - Compiler integration: - An MLIR interface is used to flag DaphneIR operations as distribution-enabled. - A new compiler pass optionally rewrites those operations to appropriate uses of the new distribution operations. - Some unit and script-level test cases. - The current implementation provides the basic infrastructure, but naturally has a number of limitations, which we can extend upon later, e.g.: - Only DenseMatrix<double> is fully supported. - Only elementwise binary addition (EwAddOp) is supported. - Partitions to be processed must be co-located on the same node. - Furthermore, several general changes not directly related to this issue are included: - Factoring out of the compiler chain and JIT-compilation/execution. - Declarative compiler pass registration. - Several small things. - AMLS project SS2021. - Closes #91.