Proto commits in cloudera/Impala

These 70 commits are when the Protocol Buffers files have changed:

Commit:31c6e2a
Author:Andrew Sherman
Committer:Jenkins

IMPALA-7985: Port RemoteShutdown() to KRPC. The :shutdown command is used to shutdown a remote server. The common case is that a user specifies the impalad to shutdown by specifying a host e.g. :shutdown('host100'). If a user has more than one impalad on a remote host then the form :shutdown('<host>:<port>') can be used to specify the port by which the impalad can be contacted. Prior to IMPALA-7985 this port was the backend port, e.g. :shutdown('host100:22000'). With IMPALA-7985 the port to use is the KRPC port, e.g. :shutdown('host100:27000'). Shutdown is implemented by making an rpc call to the target impalad. This changes the implementation of this call to use KRPC. To aid the user in finding the KRPC port, the KRPC address is added to the /backends section of the debug web page. We attempt to detect the case where :shutdown is pointed at a thrift port (like the backend port) and print an informative message. Documentation of this change will be done in IMPALA-8098. Further improvements to DoRpcWithRetry() will be done in IMPALA-8143. For discussion of why it was chosen to implement this change in an incompatible way, see comments in https://issues.apache.org/jira/browse/IMPALA-7985. TESTING Ran all end-to-end tests. Enhance the test for /backends in test_web_pages.py. In test_restart_services.py add a call to the old backend port to the test. Some expected error messages were changed in line with what KRPC returns. Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Reviewed-on: http://gerrit.cloudera.org:8080/12260 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

The documentation is generated from this commit.

Commit:5825509
Author:Thomas Tauber-Marshall
Committer:Jenkins

IMPALA-4555: Make QueryState's status reporting more robust QueryState periodically collects runtime profiles from all of its fragment instances and sends them to the coordinator. Previously, each time this happens, if the rpc fails, QueryState will retry twice after a configurable timeout and then cancel the fragment instances under the assumption that the coordinator no longer exists. We've found in real clusters that this logic is too sensitive to failed rpcs and can result in fragment instances being cancelled even in cases where the coordinator is still running. This patch makes a few improvements to this logic: - When a report fails to send, instead of retrying the same report quickly (after waiting report_status_retry_interval_ms), we wait the regular reporting interval (status_report_interval_ms), regenerate any stale portions of the report, and then retry. - A new flag, --status_report_max_retries, is introduced, which controls the number of failed reports that are allowed before the query is cancelled. --report_status_retry_interval_ms is removed. - Backoff is used for repeated failed attempts, such that for a period between retries of 't', on try 'n' the actual timeout will be t * n. Testing: - Added a test which results in a large number of failed intermediate status reports but still succeeds. Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Reviewed-on: http://gerrit.cloudera.org:8080/12049 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Commit:382b4de
Author:Andrew Sherman
Committer:Jenkins

IMPALA-7468: Port CancelQueryFInstances() to KRPC. When the Coordinator needs to cancel a query (for example because a user has hit Control-C), it does this by sending a CancelQueryFInstances message to each fragment instance. This change switches this code to use KRPC. Add new protobuf definitions for the messages, and remove the old thrift definitions. Move the server-side implementation of Cancel() from ImpalaInternalService to ControlService. Rework the scheduler so that the FInstanceExecParams always contains the KRPC address of the fragment executors, this address can then be used if a query is to be cancelled. For now keep the KRPC calls to CancelQueryFInstances() as synchronous. While moving the client-side code, remove the fault injection code that was inserted with FAULT_INJECTION_SEND_RPC_EXCEPTION and FAULT_INJECTION_RECV_RPC_EXCEPTION (triggered by running impalad with --fault_injection_rpc_exception_type=1) as this tickles code in client-cache.h which is now not used. TESTING: Ran all end-to-end tests. No new tests as test_cancellation.py provides good coverage. Checked in debugger that DebugAction style fault injection (triggered from test_cancellation.py) was working correctly. Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Reviewed-on: http://gerrit.cloudera.org:8080/12142 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Commit:8e84bf2
Author:Michael Ho
Committer:Laszlo Gaal

Add missing authorization in KRPC In 2.12.0, Impala adopted Kudu RPC library for certain backened services (TransmitData(), EndDataStream()). While the implementation uses Kerberos for authenticating users connecting to the backend services, there is no authorization implemented. This is a regression from the Thrift based implementation because it registered a SASL callback (SaslAuthorizeInternal) to be invoked during the connection negotiation. With this regression, an unauthorized but authenticated user may invoke RPC calls to Impala backend services. This change fixes the issue above by overriding the default authorization method for the DataStreamService. The authorization method will only let authenticated principal which matches FLAGS_principal / FLAGS_be_principal to access the service. Also added a new startup flag --krb5_ccname to allow users to customize the locations of the Kerberos credentials cache. Testing done: 1. Added a new test case in rpc-mgr-kerberized-test.cc to confirm an unauthorized user is not allowed to access the service. 2. Ran some queries in a Kerberos enabled cluster to make sure there is no error. 3. Exhaustive builds. Thanks to Todd Lipcon for pointing out the problem and his guidance on the fix. ==C5_APPROVED_BUGFIX== Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5 Reviewed-on: http://gerrit.cloudera.org:8080/11331 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit dcb53473dd2c1e4ecf375af81ca3f9e2f61ead9f) Reviewed-on: https://gerrit.sjc.cloudera.com/36787 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Michael Ho <kwho@cloudera.com> Reviewed-on: https://gerrit.sjc.cloudera.com/39276 Tested-by: Laszlo Gaal <laszlo.gaal@cloudera.com> Reviewed-by: Laszlo Gaal <laszlo.gaal@cloudera.com>

Commit:2afb2f0
Author:Michael Ho
Committer:Jenkins

IMPALA-4063: Merge report of query fragment instances per executor Previously, each fragment instance executing on an executor will independently report its status to the coordinator periodically. This creates a huge amount of RPCs to the coordinator under highly concurrent workloads, causing lock contention in the coordinator's backend states when multiple fragment instances send them at the same time. In addition, due to the lack of coordination between query fragment instances, a query may end without collecting the profiles from all fragment instances when one of them hits an error before another fragment instance manages to finish Prepare(), leading to missing profiles for certain fragment instances. This change fixes the problem above by making a thread per QueryState (started by QueryExecMgr) to be responsible for periodically reporting the status and profiles of all fragment instances of a query running on a backend. As part of this refactoring, each query fragment instance will not report their errors individually. Instead, there is a cumulative status maintained per QueryState. It's set to the error status of the first fragment instance which hits an error or any general error (e.g. failure to start a thread) when starting fragment instances. With this change, the status reporting threads are also removed. Testing done: exhaustive tests This patch is based on a patch by Sailesh Mukil Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa Reviewed-on: http://gerrit.cloudera.org:8080/11615 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Commit:56fb619
Author:Michael Ho
Committer:Jenkins

IMPALA-7213, IMPALA-7241: Port ReportExecStatus() RPC to use KRPC This change converts ReportExecStatus() RPC from thrift based RPC to KRPC. This is done in part of the preparation for fixing IMPALA-2990 as we can take advantage of TCP connection multiplexing in KRPC to avoid overwhelming the coordinator with too many connections by reducing the number of TCP connection to one for each executor. This patch also introduces a new service pool for all query execution control related RPCs in the future so that control commands from coordinators aren't blocked by long-running DataStream services' RPCs. To avoid unnecessary delays due to sharing the network connections between DataStream service and Control service, this change added the service name as part of the user credentials for the ConnectionId so each service will use a separate connection. The majority of this patch is mechanical conversion of some Thrift structures used in ReportExecStatus() RPC to Protobuf. Note that the runtime profile is still retained as a Thrift structure as Impala clients will still fetch query profiles using Thrift RPCs. This also avoids duplicating the serialization implementation in both Thrift and Protobuf for the runtime profile. The Thrift runtime profiles are serialized and sent as a sidecar in ReportExecStatus() RPC. This patch also fixes IMPALA-7241 which may lead to duplicated dml stats being applied. The fix is by adding a monotonically increasing version number for fragment instances' reports. The coordinator will ignore any report smaller than or equal to the version in the last report. Testing done: 1. Exhaustive build. 2. Added some targeted test cases for profile serialization failure and RPC retries/timeout. Change-Id: I7638583b433dcac066b87198e448743d90415ebe Reviewed-on: http://gerrit.cloudera.org:8080/10855 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Commit:6390769
Author:Michael Ho
Committer:Michael Ho

Add missing authorization in KRPC In 2.12.0, Impala adopted Kudu RPC library for certain backened services (TransmitData(), EndDataStream()). While the implementation uses Kerberos for authenticating users connecting to the backend services, there is no authorization implemented. This is a regression from the Thrift based implementation because it registered a SASL callback (SaslAuthorizeInternal) to be invoked during the connection negotiation. With this regression, an unauthorized but authenticated user may invoke RPC calls to Impala backend services. This change fixes the issue above by overriding the default authorization method for the DataStreamService. The authorization method will only let authenticated principal which matches FLAGS_principal / FLAGS_be_principal to access the service. Also added a new startup flag --krb5_ccname to allow users to customize the locations of the Kerberos credentials cache. Testing done: 1. Added a new test case in rpc-mgr-kerberized-test.cc to confirm an unauthorized user is not allowed to access the service. 2. Ran some queries in a Kerberos enabled cluster to make sure there is no error. 3. Exhaustive builds. Thanks to Todd Lipcon for pointing out the problem and his guidance on the fix. ==C5_APPROVED_BUGFIX== Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5 Reviewed-on: http://gerrit.cloudera.org:8080/11331 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> (cherry picked from commit dcb53473dd2c1e4ecf375af81ca3f9e2f61ead9f) Reviewed-on: https://gerrit.sjc.cloudera.com/36787 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Michael Ho <kwho@cloudera.com>

Commit:dcb5347
Author:Michael Ho
Committer:Jenkins

Add missing authorization in KRPC In 2.12.0, Impala adopted Kudu RPC library for certain backened services (TransmitData(), EndDataStream()). While the implementation uses Kerberos for authenticating users connecting to the backend services, there is no authorization implemented. This is a regression from the Thrift based implementation because it registered a SASL callback (SaslAuthorizeInternal) to be invoked during the connection negotiation. With this regression, an unauthorized but authenticated user may invoke RPC calls to Impala backend services. This change fixes the issue above by overriding the default authorization method for the DataStreamService. The authorization method will only let authenticated principal which matches FLAGS_principal / FLAGS_be_principal to access the service. Also added a new startup flag --krb5_ccname to allow users to customize the locations of the Kerberos credentials cache. Testing done: 1. Added a new test case in rpc-mgr-kerberized-test.cc to confirm an unauthorized user is not allowed to access the service. 2. Ran some queries in a Kerberos enabled cluster to make sure there is no error. 3. Exhaustive builds. Thanks to Todd Lipcon for pointing out the problem and his guidance on the fix. Change-Id: I2f82dee5e721f2ed23e75fd91abbc6ab7addd4c5 Reviewed-on: http://gerrit.cloudera.org:8080/11331 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>

Commit:9de081a
Author:Lars Volker
Committer:Jenkins

IMPALA-7006: Remove KRPC folders Change-Id: Ic677484c27ed18b105da0a6b0901df4eb9f248e6 Reviewed-on: http://gerrit.cloudera.org:8080/10756 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>

Commit:2a6d08d
Author:Lars Volker
Committer:Jenkins

IMPALA-7006: Add KRPC folders from kudu@334ecafd cp -a ~/checkout/kudu/src/kudu/{rpc,util,security} be/src/kudu/ Change-Id: I232db2b4ccf5df9aca87b21dea31bfb2735d1ab7 Reviewed-on: http://gerrit.cloudera.org:8080/10757 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Lars Volker <lv@cloudera.com>

Commit:d17722e
Author:Michael Ho
Committer:Agnes Tevesz

IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender This change implements a couple of improvements to the profiles of KrpcDataStreamRecvr and KrpcDataStreamSender: - track pending number of deferred row batches over time in KrpcDataStreamRecvr - track the number of bytes dequeued over time in KrpcDataStreamRecvr - track the total time deferred RPCs queues are not empty - track the number of bytes sent from KrpcDataStreamSender over time - track the total amount of time spent in KrpcDataStreamSender, including time spent waiting for RPC completion. Sample profile of an Exchange node instance: EXCHANGE_NODE (id=21):(Total: 2s284ms, non-child: 64.926ms, % non-child: 2.84%) - ConvertRowBatchTime: 44.380ms - PeakMemoryUsage: 124.04 KB (127021) - RowsReturned: 287.51K (287514) - RowsReturnedRate: 125.88 K/sec Buffer pool: - AllocTime: 1.109ms - CumulativeAllocationBytes: 10.96 MB (11493376) - CumulativeAllocations: 562 (562) - PeakReservation: 112.00 KB (114688) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 112.00 KB (114688) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Dequeue: BytesDequeued(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 700.00 KB, 2.00 MB, 3.49 MB, 4.39 MB, 5.86 MB, 6.85 MB - FirstBatchWaitTime: 0.000ns - TotalBytesDequeued: 6.85 MB (7187850) - TotalGetBatchTime: 2s237ms - DataWaitTime: 2s219ms Enqueue: BytesReceived(500.000ms): 0, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 328.73 KB, 963.79 KB, 1.64 MB, 2.09 MB, 2.76 MB, 3.23 MB DeferredQueueSize(500.000ms): 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 - DispatchTime: (Avg: 108.593us ; Min: 30.525us ; Max: 1.524ms ; Number of samples: 281) - DeserializeRowBatchTime: 8.395ms - TotalBatchesEnqueued: 281 (281) - TotalBatchesReceived: 281 (281) - TotalBytesReceived: 3.23 MB (3387144) - TotalEarlySenders: 0 (0) - TotalEosReceived: 1 (1) - TotalHasDeferredRPCsTime: 15s446ms - TotalRPCsDeferred: 38 (38) Sample sender's profile: KrpcDataStreamSender (dst_id=21):(Total: 17s923ms, non-child: 604.494ms, % non-child: 3.37%) BytesSent(500.000ms): 0, 0, 0, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 46.54 KB, 46.54 KB, 46.54 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 974.44 KB, 2.82 MB, 4.93 MB, 6.27 MB, 8.28 MB, 9.69 MB - EosSent: 3 (3) - NetworkThroughput: 4.61 MB/sec - PeakMemoryUsage: 22.57 KB (23112) - RowsSent: 287.51K (287514) - RpcFailure: 0 (0) - RpcRetry: 0 (0) - SerializeBatchTime: 329.162ms - TotalBytesSent: 9.69 MB (10161432) - UncompressedRowBatchSize: 20.56 MB (21563550) Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd Reviewed-on: http://gerrit.cloudera.org:8080/9690 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/31545 Tested-by: Michael Ho <kwho@cloudera.com>

Commit:37eb97d
Author:Michael Ho
Committer:Laszlo Gaal

IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender This change implements a couple of improvements to the profiles of KrpcDataStreamRecvr and KrpcDataStreamSender: - track pending number of deferred row batches over time in KrpcDataStreamRecvr - track the number of bytes dequeued over time in KrpcDataStreamRecvr - track the total time deferred RPCs queues are not empty - track the number of bytes sent from KrpcDataStreamSender over time - track the total amount of time spent in KrpcDataStreamSender, including time spent waiting for RPC completion. Sample profile of an Exchange node instance: EXCHANGE_NODE (id=21):(Total: 2s284ms, non-child: 64.926ms, % non-child: 2.84%) - ConvertRowBatchTime: 44.380ms - PeakMemoryUsage: 124.04 KB (127021) - RowsReturned: 287.51K (287514) - RowsReturnedRate: 125.88 K/sec Buffer pool: - AllocTime: 1.109ms - CumulativeAllocationBytes: 10.96 MB (11493376) - CumulativeAllocations: 562 (562) - PeakReservation: 112.00 KB (114688) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 112.00 KB (114688) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Dequeue: BytesDequeued(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 700.00 KB, 2.00 MB, 3.49 MB, 4.39 MB, 5.86 MB, 6.85 MB - FirstBatchWaitTime: 0.000ns - TotalBytesDequeued: 6.85 MB (7187850) - TotalGetBatchTime: 2s237ms - DataWaitTime: 2s219ms Enqueue: BytesReceived(500.000ms): 0, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 328.73 KB, 963.79 KB, 1.64 MB, 2.09 MB, 2.76 MB, 3.23 MB DeferredQueueSize(500.000ms): 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 - DispatchTime: (Avg: 108.593us ; Min: 30.525us ; Max: 1.524ms ; Number of samples: 281) - DeserializeRowBatchTime: 8.395ms - TotalBatchesEnqueued: 281 (281) - TotalBatchesReceived: 281 (281) - TotalBytesReceived: 3.23 MB (3387144) - TotalEarlySenders: 0 (0) - TotalEosReceived: 1 (1) - TotalHasDeferredRPCsTime: 15s446ms - TotalRPCsDeferred: 38 (38) Sample sender's profile: KrpcDataStreamSender (dst_id=21):(Total: 17s923ms, non-child: 604.494ms, % non-child: 3.37%) BytesSent(500.000ms): 0, 0, 0, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 46.54 KB, 46.54 KB, 46.54 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 974.44 KB, 2.82 MB, 4.93 MB, 6.27 MB, 8.28 MB, 9.69 MB - EosSent: 3 (3) - NetworkThroughput: 4.61 MB/sec - PeakMemoryUsage: 22.57 KB (23112) - RowsSent: 287.51K (287514) - RpcFailure: 0 (0) - RpcRetry: 0 (0) - SerializeBatchTime: 329.162ms - TotalBytesSent: 9.69 MB (10161432) - UncompressedRowBatchSize: 20.56 MB (21563550) Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd Reviewed-on: http://gerrit.cloudera.org:8080/9690 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/31545 Tested-by: Michael Ho <kwho@cloudera.com>

Commit:344b39a
Author:Philip Zeyliger
Committer:Philip Zeyliger

Revert "IMPALA-6193: Track memory of incoming data streams" This reverts commit 3bfda3348740e0951cbf8f60cde70cc4d1391c5e.

Commit:4a6518f
Author:Philip Zeyliger
Committer:Philip Zeyliger

Revert "KUDU-2301: (Part-1) Add instrumentation on a per connection level" This reverts commit cf5ef7f70983747103ca2d5052a9a61f7eb4b349.

Commit:0773946
Author:Philip Zeyliger
Committer:Philip Zeyliger

Revert "IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender" This reverts commit 421af4e40a862e6fb9184520ee64b4c86b77f7e2. Conflicts: be/src/runtime/krpc-data-stream-recvr.cc Change-Id: I669485ef0b2252af4fa2d3f7eb1260bae9767c00

Commit:ac3bffb
Author:Michael Ho
Committer:Michael Ho

IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender This change implements a couple of improvements to the profiles of KrpcDataStreamRecvr and KrpcDataStreamSender: - track pending number of deferred row batches over time in KrpcDataStreamRecvr - track the number of bytes dequeued over time in KrpcDataStreamRecvr - track the total time deferred RPCs queues are not empty - track the number of bytes sent from KrpcDataStreamSender over time - track the total amount of time spent in KrpcDataStreamSender, including time spent waiting for RPC completion. Sample profile of an Exchange node instance: EXCHANGE_NODE (id=21):(Total: 2s284ms, non-child: 64.926ms, % non-child: 2.84%) - ConvertRowBatchTime: 44.380ms - PeakMemoryUsage: 124.04 KB (127021) - RowsReturned: 287.51K (287514) - RowsReturnedRate: 125.88 K/sec Buffer pool: - AllocTime: 1.109ms - CumulativeAllocationBytes: 10.96 MB (11493376) - CumulativeAllocations: 562 (562) - PeakReservation: 112.00 KB (114688) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 112.00 KB (114688) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Dequeue: BytesDequeued(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 700.00 KB, 2.00 MB, 3.49 MB, 4.39 MB, 5.86 MB, 6.85 MB - FirstBatchWaitTime: 0.000ns - TotalBytesDequeued: 6.85 MB (7187850) - TotalGetBatchTime: 2s237ms - DataWaitTime: 2s219ms Enqueue: BytesReceived(500.000ms): 0, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 328.73 KB, 963.79 KB, 1.64 MB, 2.09 MB, 2.76 MB, 3.23 MB DeferredQueueSize(500.000ms): 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 - DispatchTime: (Avg: 108.593us ; Min: 30.525us ; Max: 1.524ms ; Number of samples: 281) - DeserializeRowBatchTime: 8.395ms - TotalBatchesEnqueued: 281 (281) - TotalBatchesReceived: 281 (281) - TotalBytesReceived: 3.23 MB (3387144) - TotalEarlySenders: 0 (0) - TotalEosReceived: 1 (1) - TotalHasDeferredRPCsTime: 15s446ms - TotalRPCsDeferred: 38 (38) Sample sender's profile: KrpcDataStreamSender (dst_id=21):(Total: 17s923ms, non-child: 604.494ms, % non-child: 3.37%) BytesSent(500.000ms): 0, 0, 0, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 46.54 KB, 46.54 KB, 46.54 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 974.44 KB, 2.82 MB, 4.93 MB, 6.27 MB, 8.28 MB, 9.69 MB - EosSent: 3 (3) - NetworkThroughput: 4.61 MB/sec - PeakMemoryUsage: 22.57 KB (23112) - RowsSent: 287.51K (287514) - RpcFailure: 0 (0) - RpcRetry: 0 (0) - SerializeBatchTime: 329.162ms - TotalBytesSent: 9.69 MB (10161432) - UncompressedRowBatchSize: 20.56 MB (21563550) Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd Reviewed-on: http://gerrit.cloudera.org:8080/9690 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins Reviewed-on: http://gerrit.sjc.cloudera.com:8080/31545 Tested-by: Michael Ho <kwho@cloudera.com>

Commit:421af4e
Author:Michael Ho
Committer:Impala Public Jenkins

IMPALA-6685: Improve profiles in KrpcDataStreamRecvr and KrpcDataStreamSender This change implements a couple of improvements to the profiles of KrpcDataStreamRecvr and KrpcDataStreamSender: - track pending number of deferred row batches over time in KrpcDataStreamRecvr - track the number of bytes dequeued over time in KrpcDataStreamRecvr - track the total time deferred RPCs queues are not empty - track the number of bytes sent from KrpcDataStreamSender over time - track the total amount of time spent in KrpcDataStreamSender, including time spent waiting for RPC completion. Sample profile of an Exchange node instance: EXCHANGE_NODE (id=21):(Total: 2s284ms, non-child: 64.926ms, % non-child: 2.84%) - ConvertRowBatchTime: 44.380ms - PeakMemoryUsage: 124.04 KB (127021) - RowsReturned: 287.51K (287514) - RowsReturnedRate: 125.88 K/sec Buffer pool: - AllocTime: 1.109ms - CumulativeAllocationBytes: 10.96 MB (11493376) - CumulativeAllocations: 562 (562) - PeakReservation: 112.00 KB (114688) - PeakUnpinnedBytes: 0 - PeakUsedReservation: 112.00 KB (114688) - ReadIoBytes: 0 - ReadIoOps: 0 (0) - ReadIoWaitTime: 0.000ns - WriteIoBytes: 0 - WriteIoOps: 0 (0) - WriteIoWaitTime: 0.000ns Dequeue: BytesDequeued(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 700.00 KB, 2.00 MB, 3.49 MB, 4.39 MB, 5.86 MB, 6.85 MB - FirstBatchWaitTime: 0.000ns - TotalBytesDequeued: 6.85 MB (7187850) - TotalGetBatchTime: 2s237ms - DataWaitTime: 2s219ms Enqueue: BytesReceived(500.000ms): 0, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 23.36 KB, 328.73 KB, 963.79 KB, 1.64 MB, 2.09 MB, 2.76 MB, 3.23 MB DeferredQueueSize(500.000ms): 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0 - DispatchTime: (Avg: 108.593us ; Min: 30.525us ; Max: 1.524ms ; Number of samples: 281) - DeserializeRowBatchTime: 8.395ms - TotalBatchesEnqueued: 281 (281) - TotalBatchesReceived: 281 (281) - TotalBytesReceived: 3.23 MB (3387144) - TotalEarlySenders: 0 (0) - TotalEosReceived: 1 (1) - TotalHasDeferredRPCsTime: 15s446ms - TotalRPCsDeferred: 38 (38) Sample sender's profile: KrpcDataStreamSender (dst_id=21):(Total: 17s923ms, non-child: 604.494ms, % non-child: 3.37%) BytesSent(500.000ms): 0, 0, 0, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 34.78 KB, 46.54 KB, 46.54 KB, 46.54 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 58.31 KB, 974.44 KB, 2.82 MB, 4.93 MB, 6.27 MB, 8.28 MB, 9.69 MB - EosSent: 3 (3) - NetworkThroughput: 4.61 MB/sec - PeakMemoryUsage: 22.57 KB (23112) - RowsSent: 287.51K (287514) - RpcFailure: 0 (0) - RpcRetry: 0 (0) - SerializeBatchTime: 329.162ms - TotalBytesSent: 9.69 MB (10161432) - UncompressedRowBatchSize: 20.56 MB (21563550) Change-Id: I8ba405921b3df920c1e85b940ce9c8d02fc647cd Reviewed-on: http://gerrit.cloudera.org:8080/9690 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:bb96230
Author:Sailesh Mukil
Committer:Jenkins

KUDU-2301: (Part-1) Add instrumentation on a per connection level This patch returns the OutboundTransfer queue size on a per connection level and makes them accessible via the DumpRunningRpcs() call. A test is added in rpc-test to ensure that this metric works as expected. A future patch will add more metrics. Change-Id: Iae1a5fe0066adf644a9cac41ad6696e1bbf00465 Reviewed-on: http://gerrit.cloudera.org:8080/9343 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <todd@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/9383 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:cf5ef7f
Author:Sailesh Mukil
Committer:Impala Public Jenkins

KUDU-2301: (Part-1) Add instrumentation on a per connection level This patch returns the OutboundTransfer queue size on a per connection level and makes them accessible via the DumpRunningRpcs() call. A test is added in rpc-test to ensure that this metric works as expected. A future patch will add more metrics. Change-Id: Iae1a5fe0066adf644a9cac41ad6696e1bbf00465 Reviewed-on: http://gerrit.cloudera.org:8080/9343 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <todd@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/9383 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:5960485
Author:Lars Volker
Committer:Jenkins

IMPALA-6193: Track memory of incoming data streams This change adds memory tracking to incoming transmit data RPCs when using KRPC. We track memory against a global tracker called "Data Stream Service" until it is handed over to the stream manager. There we track it in a global tracker called "Data Stream Queued RPC Calls" until a receiver registers and takes over the early sender RPCs. Inside the receiver, memory for deferred RPCs is tracked against the fragment instance's memtracker until we unpack the batches and add them to the row batch queue. The DCHECK in MemTracker::Close() covers that all memory consumed by a tracker gets release eventually. In addition to that, this change adds a custom cluster test that makes sure that queued memory gets tracked by inspecting the peak consumption of the new memtrackers. Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Reviewed-on: http://gerrit.cloudera.org:8080/8914 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins

Commit:3bfda33
Author:Lars Volker
Committer:Impala Public Jenkins

IMPALA-6193: Track memory of incoming data streams This change adds memory tracking to incoming transmit data RPCs when using KRPC. We track memory against a global tracker called "Data Stream Service" until it is handed over to the stream manager. There we track it in a global tracker called "Data Stream Queued RPC Calls" until a receiver registers and takes over the early sender RPCs. Inside the receiver, memory for deferred RPCs is tracked against the fragment instance's memtracker until we unpack the batches and add them to the row batch queue. The DCHECK in MemTracker::Close() covers that all memory consumed by a tracker gets release eventually. In addition to that, this change adds a custom cluster test that makes sure that queued memory gets tracked by inspecting the peak consumption of the new memtrackers. Change-Id: I2df1204d2483313a8a18e5e3be6cec9e402614c4 Reviewed-on: http://gerrit.cloudera.org:8080/8914 Reviewed-by: Lars Volker <lv@cloudera.com> Tested-by: Impala Public Jenkins

Commit:88f2bc1
Author:Michael Ho
Committer:Jenkins

IMPALA-4856: Port data stream service to KRPC This patch implements a new data stream service which utilizes KRPC. Similar to the thrift RPC implementation, there are 3 major components to the data stream services: KrpcDataStreamSender serializes and sends row batches materialized by a fragment instance to a KrpcDataStreamRecvr. KrpcDataStreamMgr is responsible for routing an incoming row batch to the appropriate receiver. The data stream service runs on the port FLAGS_krpc_port which is 29000 by default. Unlike the implementation with thrift RPC, KRPC provides an asynchronous interface for invoking remote methods. As a result, KrpcDataStreamSender doesn't need to create a thread per connection. There is one connection between two Impalad nodes for each direction (i.e. client and server). Multiple queries can multi-plex on the same connection for transmitting row batches between two Impalad nodes. The asynchronous interface also prevents avoids the possibility that a thread is stuck in the RPC code for extended amount of time without checking for cancellation. A TransmitData() call with KRPC is in essence a trio of RpcController, a serialized protobuf request buffer and a protobuf response buffer. The call is invoked via a DataStreamService proxy object. The serialized tuple offsets and row batches are sent via "sidecars" in KRPC to avoid extra copy into the serialized request buffer. Each impalad node creates a singleton DataStreamService object at start-up time. All incoming calls are served by a service thread pool created as part of DataStreamService. By default, the number of service threads equals the number of logical cores. The service threads are shared across all queries so the RPC handler should avoid blocking as much as possible. In thrift RPC implementation, we make a thrift thread handling a TransmitData() RPC to block for extended period of time when the receiver is not yet created when the call arrives. In KRPC implementation, we store TransmitData() or EndDataStream() requests which arrive before the receiver is ready in a per-receiver early sender list stored in KrpcDataStreamMgr. These RPC calls will be processed and responded to when the receiver is created or when timeout occurs. Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr. If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit to exceed, the request will be stashed in a queue for deferred processing. The stashed RPC requests will not be responded to until they are processed so as to exert back pressure to the senders. An alternative would be to reply with an error and the request / row batches need to be sent again. This may end up consuming more network bandwidth than the thrift RPC implementation. This change adopts the behavior of allowing one stashed request per sender. All rpc requests and responses are serialized using protobuf. The equivalent of TRowBatch would be ProtoRowBatch which contains a serialized header about the meta-data of the row batch and two Kudu Slice objects which contain pointers to the actual data (i.e. tuple offsets and tuple data). This patch is based on an abandoned patch by Henry Robinson. TESTING ------- * Builds {exhaustive/debug, core/release, asan} passed with FLAGS_use_krpc=true. TO DO ----- * Port some BE tests to KRPC services. Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Reviewed-on: http://gerrit.cloudera.org:8080/8023 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:b4ea57a
Author:Michael Ho
Committer:Impala Public Jenkins

IMPALA-4856: Port data stream service to KRPC This patch implements a new data stream service which utilizes KRPC. Similar to the thrift RPC implementation, there are 3 major components to the data stream services: KrpcDataStreamSender serializes and sends row batches materialized by a fragment instance to a KrpcDataStreamRecvr. KrpcDataStreamMgr is responsible for routing an incoming row batch to the appropriate receiver. The data stream service runs on the port FLAGS_krpc_port which is 29000 by default. Unlike the implementation with thrift RPC, KRPC provides an asynchronous interface for invoking remote methods. As a result, KrpcDataStreamSender doesn't need to create a thread per connection. There is one connection between two Impalad nodes for each direction (i.e. client and server). Multiple queries can multi-plex on the same connection for transmitting row batches between two Impalad nodes. The asynchronous interface also prevents avoids the possibility that a thread is stuck in the RPC code for extended amount of time without checking for cancellation. A TransmitData() call with KRPC is in essence a trio of RpcController, a serialized protobuf request buffer and a protobuf response buffer. The call is invoked via a DataStreamService proxy object. The serialized tuple offsets and row batches are sent via "sidecars" in KRPC to avoid extra copy into the serialized request buffer. Each impalad node creates a singleton DataStreamService object at start-up time. All incoming calls are served by a service thread pool created as part of DataStreamService. By default, the number of service threads equals the number of logical cores. The service threads are shared across all queries so the RPC handler should avoid blocking as much as possible. In thrift RPC implementation, we make a thrift thread handling a TransmitData() RPC to block for extended period of time when the receiver is not yet created when the call arrives. In KRPC implementation, we store TransmitData() or EndDataStream() requests which arrive before the receiver is ready in a per-receiver early sender list stored in KrpcDataStreamMgr. These RPC calls will be processed and responded to when the receiver is created or when timeout occurs. Similarly, there is limited space in the sender queues in KrpcDataStreamRecvr. If adding a row batch to a queue in KrpcDataStreamRecvr causes the buffer limit to exceed, the request will be stashed in a queue for deferred processing. The stashed RPC requests will not be responded to until they are processed so as to exert back pressure to the senders. An alternative would be to reply with an error and the request / row batches need to be sent again. This may end up consuming more network bandwidth than the thrift RPC implementation. This change adopts the behavior of allowing one stashed request per sender. All rpc requests and responses are serialized using protobuf. The equivalent of TRowBatch would be ProtoRowBatch which contains a serialized header about the meta-data of the row batch and two Kudu Slice objects which contain pointers to the actual data (i.e. tuple offsets and tuple data). This patch is based on an abandoned patch by Henry Robinson. TESTING ------- * Builds {exhaustive/debug, core/release, asan} passed with FLAGS_use_krpc=true. TO DO ----- * Port some BE tests to KRPC services. Change-Id: Ic0b8c1e50678da66ab1547d16530f88b323ed8c1 Reviewed-on: http://gerrit.cloudera.org:8080/8023 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:ff0068b
Author:Michael Ho
Committer:Jenkins

IMPALA-4670: Introduces RpcMgr class This patch introduces a new class, RpcMgr which is the abstraction layer around KRPC core mechanics. It provides an interface RegisterService() for various services to register themselves. Kudu RPC is invoked via an auto-generated interface called proxy. This change implements an inline wrapper for KRPC client to obtain a proxy for a particular service exported by remote server. Last but not least, the RpcMgr will start all registered services if FLAGS_use_krpc is true. This patch hasn't yet added any service except for some test services in rpc-mgr-test. This patch is based on an abandoned patch by Henry Robinson. Testing done: a new backend test is added to exercise the code and demonstrate the way to interact with KRPC framework. Change-Id: I8adb10ae375d7bf945394c38a520f12d29cf7b46 Reviewed-on: http://gerrit.cloudera.org:8080/7901 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:dd4c6be
Author:Michael Ho
Committer:Impala Public Jenkins

IMPALA-4670: Introduces RpcMgr class This patch introduces a new class, RpcMgr which is the abstraction layer around KRPC core mechanics. It provides an interface RegisterService() for various services to register themselves. Kudu RPC is invoked via an auto-generated interface called proxy. This change implements an inline wrapper for KRPC client to obtain a proxy for a particular service exported by remote server. Last but not least, the RpcMgr will start all registered services if FLAGS_use_krpc is true. This patch hasn't yet added any service except for some test services in rpc-mgr-test. This patch is based on an abandoned patch by Henry Robinson. Testing done: a new backend test is added to exercise the code and demonstrate the way to interact with KRPC framework. Change-Id: I8adb10ae375d7bf945394c38a520f12d29cf7b46 Reviewed-on: http://gerrit.cloudera.org:8080/7901 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:ebd0b24
Author:Michael Ho
Committer:jenkins

KUDU-2065: Support cancellation for outbound RPC call This change implements a new interface RpcController::Cancel() which takes a RpcController as argument and cancels any pending OutboundCall associated with it. RpcController::Cancel() queues a cancellation task scheduled on the reactor thread for that outbound call. Once the task is run, it will cancel the outbound call right away if the RPC hasn't started sending yet or if it has already sent the request and waiting for a response. If cancellation happens when the RPC request is being sent, the RPC will be cancelled only after the RPC has finished sending the request. If the RPC is finished, the cancellation will be a no-op. Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf Reviewed-on: http://gerrit.cloudera.org:8080/7455 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <todd@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/7743 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins

Commit:c1c4815
Author:Michael Ho
Committer:Impala Public Jenkins

KUDU-2065: Support cancellation for outbound RPC call This change implements a new interface RpcController::Cancel() which takes a RpcController as argument and cancels any pending OutboundCall associated with it. RpcController::Cancel() queues a cancellation task scheduled on the reactor thread for that outbound call. Once the task is run, it will cancel the outbound call right away if the RPC hasn't started sending yet or if it has already sent the request and waiting for a response. If cancellation happens when the RPC request is being sent, the RPC will be cancelled only after the RPC has finished sending the request. If the RPC is finished, the cancellation will be a no-op. Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf Reviewed-on: http://gerrit.cloudera.org:8080/7455 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <todd@apache.org> Reviewed-on: http://gerrit.cloudera.org:8080/7743 Reviewed-by: Sailesh Mukil <sailesh@cloudera.com> Tested-by: Impala Public Jenkins

Commit:9c03500
Author:Henry Robinson
Committer:jenkins

IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8 Change-Id: I06ab5b56312e482a27fa484414c338438ad6972c Reviewed-on: http://gerrit.cloudera.org:8080/5718 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:c7db60a
Author:Henry Robinson
Committer:Impala Public Jenkins

IMPALA-4669: [KRPC] Import RPC library from kudu@314c9d8 Change-Id: I06ab5b56312e482a27fa484414c338438ad6972c Reviewed-on: http://gerrit.cloudera.org:8080/5718 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

Commit:b87fbe6
Author:Henry Robinson
Committer:jenkins

IMPALA-4669: [SECURITY] Import Kudu security library from kudu@314c9d8 The security library provides Kerberos and TLS facilities to the rpc library. Change-Id: I76daeead00f672aa468f5ab6de4d70eac2078cb2 Reviewed-on: http://gerrit.cloudera.org:8080/5716 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>

Commit:84b8155
Author:Henry Robinson
Committer:Henry Robinson

IMPALA-4669: [SECURITY] Import Kudu security library from kudu@314c9d8 The security library provides Kerberos and TLS facilities to the rpc library. Change-Id: I76daeead00f672aa468f5ab6de4d70eac2078cb2 Reviewed-on: http://gerrit.cloudera.org:8080/5716 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>

Commit:31afd80
Author:Henry Robinson
Committer:jenkins

IMPALA-4669: [KUTIL] Import kudu_util library from kudu@314c9d8 Update LICENSE.txt and rat_exclude_files.txt Change-Id: I6d89384730b60354b5fae2b1472183d2a561d170 Reviewed-on: http://gerrit.cloudera.org:8080/5714 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Impala Public Jenkins

Commit:d6abb29
Author:Henry Robinson
Committer:Impala Public Jenkins

IMPALA-4669: [KUTIL] Import kudu_util library from kudu@314c9d8 Update LICENSE.txt and rat_exclude_files.txt Change-Id: I6d89384730b60354b5fae2b1472183d2a561d170 Reviewed-on: http://gerrit.cloudera.org:8080/5714 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Impala Public Jenkins

Commit:ac1564a
Author:Henry Robinson
Committer:jenkins

IMPALA-4758: (1/2) Update gutil/ from Kudu@a1bfd7b * Copy gutil from Kudu * Minimal changes to gutil/CMakeLists.txt Change-Id: Ic708a9c4e76ede17af9b06e0a0a8e9ae7d357960 Reviewed-on: http://gerrit.cloudera.org:8080/5687 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>

This commit does not contain any .proto files.

Commit:02f3e3f
Author:Henry Robinson
Committer:Henry Robinson

IMPALA-4758: (1/2) Update gutil/ from Kudu@a1bfd7b * Copy gutil from Kudu * Minimal changes to gutil/CMakeLists.txt Change-Id: Ic708a9c4e76ede17af9b06e0a0a8e9ae7d357960 Reviewed-on: http://gerrit.cloudera.org:8080/5687 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>

This commit does not contain any .proto files.

Commit:e8fe220
Author:David Alves
Committer:David Alves

Merge thirdparty from cdh5-trunk This is the first step towards merging impala-kudu with trunk. These are basically just mechanical changes, pulling from trunk thirparty and just enough other changes to cmake build scripts or impala-config.sh to make it compile. NOTE: This patch is basically half-way between the impala-kudu build, that doesn't yet use the toolchain and the impala trunk build that does. As such this patch doesn't actually build stand-alone and serves merely the purpose of ommitting +/- 650K loc from the merge patch itself. Change-Id: Ic794988dcadee16e687a82745b417605772ff325

Commit:113f88d
Author:David Alves
Committer:David Alves

Merge thirdparty from cdh5-trunk This is the first step towards merging impala-kudu with trunk. These are basically just mechanical changes, pulling from trunk thirparty and just enough other changes to cmake build scripts or impala-config.sh to make it compile. NOTE: This patch is basically half-way between the impala-kudu build, that doesn't yet use the toolchain and the impala trunk build that does. As such this patch doesn't actually build stand-alone and serves merely the purpose of ommitting +/- 650K loc from the merge patch itself. Change-Id: Ic794988dcadee16e687a82745b417605772ff325

Commit:706b757
Author:Martin Grund
Committer:Martin Grund

Optional Impala Toolchain This patch allows to optionally enable the new Impala binary toolchain. For now there are now major version differences in the toolchain dependencies and what is currently kept in thirdparty. To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the folder where the binaries are available. In addition this patch moves gutil from the thirdparty directory into the source tree of be/src to allow easy propagation of compiler and linker flags. Furthermore, the thrift-cpp target was added as a dependency to all targets that require the generated thrift sources to be available before the build is started. What is the new toolchain: The goal of the toolchain is to homogenize the build environment and to make sure that Impala is build nearly identical on every platform. To achieve this, we limit the flexibility of using the systems host libraries and rather rely on a set of custom produced binaries including the necessary compiler. Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b Reviewed-on: http://gerrit.cloudera.org:8080/427 Reviewed-by: Adar Dembo <adar@cloudera.com> Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>

Commit:81f247b
Author:Martin Grund
Committer:Martin Grund

Optional Impala Toolchain This patch allows to optionally enable the new Impala binary toolchain. For now there are now major version differences in the toolchain dependencies and what is currently kept in thirdparty. To enable the toolchain, export the variable IMPALA_TOOLCHAIN to the folder where the binaries are available. In addition this patch moves gutil from the thirdparty directory into the source tree of be/src to allow easy propagation of compiler and linker flags. Furthermore, the thrift-cpp target was added as a dependency to all targets that require the generated thrift sources to be available before the build is started. What is the new toolchain: The goal of the toolchain is to homogenize the build environment and to make sure that Impala is build nearly identical on every platform. To achieve this, we limit the flexibility of using the systems host libraries and rather rely on a set of custom produced binaries including the necessary compiler. Change-Id: If2dac920520e4a18be2a9a75b3184a5bd97a065b Reviewed-on: http://gerrit.cloudera.org:8080/427 Reviewed-by: Adar Dembo <adar@cloudera.com> Tested-by: Internal Jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>

Commit:f1fff5d
Author:David Alves
Committer:Martin Grund

Update/Trim thirdparty/gutil Working on the integration of Kudu and Impala we're starting to hit problems due to each project keeping slighlty different versions of gutil. Simply copy/pasting Kudu's gutil is problematic as it refers to Kudu often (both in comments and namespace). Ideally we'd like to pull this to a common place such as the toolchain Intil we do so, however, the least problematic approach is to trim the the dependencies to the ones that Impala actually uses and update the ones that are not trimmed to the latest version, from Kudu, which is that this patch does. Change-Id: I935c50344622ff12cde78ef809e7961378ec7774 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/6343 Tested-by: jenkins Reviewed-by: David Alves <david.alves@cloudera.com>

This commit does not contain any .proto files.

Commit:69dc9b1
Author:Srinath Shankar
Committer:Srinath Shankar

[CDH5] Remove cdh5.0.0 from thirdparty Change-Id: Ie55e301fe791a7baba512ac0fea291267ff3017e

Commit:2e001db
Author:Lenni Kuff
Committer:Lenni Kuff

[CDH5] Remove old /thirdparty Hadoop/Hbase/Hive dependencies Change-Id: I1fd3609fa533991da712204bbb422228e6b5d46f

Commit:5ec24e6
Author:Lenni Kuff
Committer:Lenni Kuff

[CDH5] Remove -SNAPSHOT suffix from /thirdparty dependencies Change-Id: Id8d69aebf404ac7471fb3456e85c2c21ddd1bd55

Commit:6f06b8f
Author:Lenni Kuff
Committer:Henry Robinson

Adding hive-0.10.0-cdh4.5.0 w/ PATCH-219 to /thirdparty PATCH-219 HIVE for DISTRO-557 Hive revision: 208131090a7888bd7038404e5a3003a906c16b36 Fixes 1s sleep before opening each metastore connection. Change-Id: Ic0cb7c4a7349b63d6cb97a2434a47f7ab11ddd38 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1311 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: Lenni Kuff <lskuff@cloudera.com>

Commit:29e65a5
Author:Matthew Jacobs
Committer:ishaan

[cdh5] Add latest cdh5 hadoop, hbase, and hive snapshots to thirdparty Change-Id: I60c93b259a26e86aca60f2b3b5b6226eabc0b5eb

Commit:16fabb9
Author:Alex Behm

Remove unused CDH4.5 dependencies from thirdparty. Change-Id: Ibe3745c50f748088d833e6d906a0d4f7f116987a

Commit:0bcfe7e
Author:Nong Li
Committer:jenkins

[CDH4] Remove docs and srcs from hbase thirdparty. Change-Id: I7da94152f625937a3abb3ac8cdb0c8372e520081 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1339 Reviewed-by: Lenni Kuff <lskuff@cloudera.com> Tested-by: jenkins

Commit:4925827
Author:Henry Robinson
Committer:Henry Robinson

Add gutil to thirdparty Change-Id: Ic7bf4c0faef6b11ef3a0cfff68af845127ff21e9 Reviewed-on: http://gerrit.ent.cloudera.com:8080/1025 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Henry Robinson <henry@cloudera.com>

Commit:d2ccfba
Author:ishaan
Committer:Henry Robinson

Upgrade thirdparty to use CDH4.5 bits. The following changes have been made: -- Update hbase -- Update hive -- Update hadoop -- Update the parquet version to 1.2.5 Change-Id: Id6ceaef0e9eebab27ffd408160116fa84ed300fb

Commit:35247fb
Author:ishaan
Committer:Henry Robinson

Remove CDH4.3 thirdparty libraries. Change-Id: Ic9ce07d46054bcf03ea98eb90b2972ed768aed3e

This commit does not contain any .proto files.

Commit:4629aa3
Author:Skye Wanderman-Milne
Committer:Henry Robinson

Upgrade Avro library to 1.7.4

Commit:71278c9
Author:Lenni Kuff
Committer:Henry Robinson

Update Hive to CDH4.3.0

Commit:3a596b9
Author:Lenni Kuff
Committer:Henry Robinson

Remove CDH4.2.0 Hive

Commit:dadb634
Author:Lenni Kuff
Committer:Henry Robinson

Add CDH 4.3.0 HBase to thirdparty

Commit:55154d4
Author:Lenni Kuff
Committer:Henry Robinson

Remove CDH 4.2.0 HBase from thirdparty

Commit:16482a5
Author:Skye Wanderman-Milne
Committer:Henry Robinson

thirdparty/avro

Commit:2639483
Author:Nong Li
Committer:Henry Robinson

Moved HBase to CDH4.2

Commit:de224a4
Author:Nong Li
Committer:Henry Robinson

Remove hive 0.9.

Commit:43be278
Author:Lenni Kuff
Committer:Henry Robinson

Add hive-0.10.0-cdh4.2.0 snapshot (for Hive JDBC driver) This change adds the hive-0.10.0-cdh4.2.0 snapshot. It is currently only used JDBC testing but eventually we will move from hive-0.9.0 to hive-0.10.0. This snapshot may need to be refreshed at that time. From: CDH/hive - b57b6b7375edcbf54713c4900a3d48e1a76bd00a

Commit:fb46888
Author:Nong Li
Committer:Henry Robinson

Updated hive and hbase to rc3.

Commit:08b65c1
Author:ishaan
Committer:Lenni Kuff

Upgrade thirdparty to use CDH4.5 bits. The following changes have been made: -- Update hbase -- Update hive -- Update hadoop -- Update the parquet version to 1.2.5 Change-Id: Id6ceaef0e9eebab27ffd408160116fa84ed300fb

Commit:c89b054
Author:ishaan
Committer:Lenni Kuff

Remove CDH4.3 thirdparty libraries. Change-Id: Ic9ce07d46054bcf03ea98eb90b2972ed768aed3e

This commit does not contain any .proto files.

Commit:084fc74
Author:Skye Wanderman-Milne
Committer:Lenni Kuff

Upgrade Avro library to 1.7.4

Commit:30b28a6
Author:Lenni Kuff
Committer:Lenni Kuff

Update Hive to CDH4.3.0

Commit:f0130c6
Author:Lenni Kuff
Committer:Lenni Kuff

Add CDH 4.3.0 HBase to thirdparty

Commit:89d855d
Author:Skye Wanderman-Milne
Committer:Lenni Kuff

thirdparty/avro

Commit:34d45fd
Author:Henry Robinson
Committer:Henry Robinson

Upgrade to post-CDH4u0: Hadoop 2.0.0, Hive 0.8.1 and HBase 0.92.1

Commit:1671993
Author:Henry Robinson
Committer:Henry Robinson

Upgrade to hbase-0.92.0-cdh4b1

Commit:57a6a0a
Author:Henry Robinson
Committer:Henry Robinson

Upgrade Hive to hive-0.8.0-cdh4b1

Commit:1f779e8
Author:Alexander Behm
Committer:Alexander Behm

Added HBase test files, and loading of HBase metadata into Impala.