Proto commits in Netflix/genie

These 39 commits are when the Protocol Buffers files have changed:

Commit:96b4bd1
Author:Marco Primi
Committer:Marco Primi

Improve handling of command/cluster selector runtime errors Motivation: allow a client or user to distinguish between 2 failure cases: - The job is submitted with criteria that cannot be satisfied, criteria must be updated for this job to be accepted. - The selector encountered a runtime error, resubmitting the job with the same criteria may work Until now, both cases would be handled the same leading to: * Error `412: Precondition Failed' via API * Fatal error resolving job request via Agent And the job would be marked FAILED with status message JobStatusMessages.FAILED_TO_RESOLVE_JOB This change better distinguishes the 2 error conditions such that: - Selector runtime error returns a `500` via API - Selector runtime error produces a retryable error in the agent - Selector runtime error sets a different message with the final job status - Cluster selector runtime error does not proceed to the next selector (for better determinism) This allows clients to retry, if appropriate.

The documentation is generated from this commit.

Commit:1815e05
Author:Marco Primi
Committer:Marco Primi

Bug: integer overflow for large files Updates the gRPC protocol to allow downloading files that are over 2GB in size. The previous version of the protocol was (naively) using int32 for offsets. So anything beyond the 2GB mark would produce an integer overflow server-side. - Rename deprecated unit32 fields, they will be ignored by current and future agent in favor of new int64 - Add a field so that an agent can inform the server it is recent and supports files/ranges over 2GB - Block request whose range is over the 2GB mark IFF the target agent is old and does not support it - Migrate from int to long where appropriate Old behavior: 1) Files smaller than 2GB: no problem 2) Range below 2GB for larger than 2GB file: no problem 3) Range above 2GB for larger than 2GB files: error 500, integer overflow New behavior 1) Unchanged 2) Unchanged 3) (against old agent) Unchanged (but better error message) 3) (against new agent) successful

Commit:0a22e12
Author:Marco Primi
Committer:Marco Primi

Define and implement 'changeJobArchiveStatus' gRPC API

Commit:3804e73
Author:Marco Primi
Committer:Marco Primi

Define and implement agent gRPC service to retrieve job status

Commit:4115229
Author:Marco Primi
Committer:Marco Primi

Implement client and server for Agent configure endpoint The server-side underlying service is still a NOOP and sends back an empty map of properties. The Agent still makes no use of the properties received.

Commit:194f9d5
Author:Marco Primi
Committer:Marco Primi

Define new gRPC service for agent runtime configuration Create new 'configure' RPC method and Request/Response messages. For now, this is a stub, messages are empty and the service is NOOP.

Commit:d982a36
Author:Tom Gianos
Committer:Tom Gianos

Add option to disable archiving from agent CLI

Commit:489448d
Author:Tom Gianos
Committer:Tom Gianos

Deprecate agent requested archive location prefix This makes the `--archiveLocationPrefix` argument from the agent cli a no-op

Commit:0d51fac
Author:Marco Primi
Committer:Marco Primi

Add missing service description in genie.proto

Commit:92346d6
Author:Marco Primi
Committer:Marco Primi

Fix handling of failed job resolution in agent/server protocol Bug overview: When the agent was requesting a job resolution with criteria that cannot be satisfied, the server was not handling the case properly and sending back a generic error (error code 'UNKNOWN'). On the agent side, this was surfaced to the user as a generic exception, i.e.: `GenieRuntimeException Unhandled error: UNKNOWN: com.netflix.genie.common.exceptions.GeniePreconditionException:No cluster/command combination found for the given criteria. Unable to continue` New behavior: When the specification resolution fails due to unsatisfiable criteria, the server sends a newly introduced error code: RESOLUTION_FAILED. This results in a clearer handling of the error on the agent, and the user message becomes: `JobSpecificationResolutionException: No cluster/command combination found for the given criteria` Backward compatibility: Agent newer than server - No behavior change, the agent surfaces the error same as before. Server newer than the agent - No behavior change, the agent will fail to recognize the new RESOLUTION_FAILED code and treat it as UNKNOWN, therefore executing the same codepath as before

Commit:86b81ba
Author:Marco Primi
Committer:Tom Gianos

Update Agent/Server protocol to send executable and arguments separately Send executable and arguments (provided by Command) and job arguments (provided by user) as two separate arrays or strings via gRPC. Keep populating the deprecated `command_args` field for older agents to keep working.

Commit:4c989ed
Author:Tom Gianos
Committer:Tom Gianos

Add optional Timeout field to JobSpecification Crosses DTO, Proto and Entity layers Add corresponding converter logic

Commit:1bc987a
Author:Tom Gianos
Committer:Tom Gianos

Autoformat proto file

Commit:fd1c9cc
Author:Tom Gianos
Committer:Tom Gianos

Add optional timeout to agent config proto messages Include in convert methods for the proto to dto and vice versa

Commit:1bf40b1
Author:Marco Primi
Committer:Marco Primi

Proto definition for agent-server file streaming

Commit:69b6ea0
Author:Marco Primi
Committer:Marco Primi

Update genie.proto comments for consistency

Commit:50cb815
Author:Marco Primi
Committer:Marco Primi

Remove abandoned JobFileSync protocol

Commit:6e07e92
Author:Marco Primi
Committer:Marco Primi

Define and implement gRPC handshake protocol Agent sends metadata to the server. The latter inspects it and can reject the client. This facilitates turning away agents that are running an incompatible or deprecated version.

Commit:488a1be
Author:Sumit Tandon
Committer:sumitnetflix

Persist archiveLocationPrefix and archiveLocation on the server Persist archivelLocationPrefix on reserving job id. Calculate archiveLocation from archivelLocationPrefix when resolving job specification and return to the client. Persist archiveLocation when saving job specification.

Commit:99a11d0
Author:Sumit Tandon
Committer:sumitnetflix

GRPC implementation of job kill service interfaces

Commit:e406c1e
Author:Marco Primi
Committer:Marco Primi

Define HeartBeat service, implement client and server services HeartBeat service using bi-directional gRPC streaming to detect connections and disconnections at both ends.

Commit:1b74a51
Author:Tom Gianos
Committer:Tom Gianos

Fix misspelling of 'dependencies' in ExecutionResource message for proto JobService

Commit:758c38b
Author:Tom Gianos
Committer:Tom Gianos

Initial proto definition of the gRPC job file sync service Includes messages for a bi-directional stream implementation which will push files from the agent location to the server for storage and retrieval via REST API.

Commit:37e51e7
Author:Marco Primi
Committer:Marco Primi

Review Agent/Server protocol and service implementations on both ends - Align all protocols to return similar kind of error messages - Add missing services declaration and implementation in AgentJobService (both client and server) - Update JobServiceProtoConverter to throw GenieConversionException where appropriate - Add more conversion types to JobServiceProtoConverter and remove some redundant ones - Stricter validation for parameters of AgentJobService

Commit:20b523b
Author:Marco Primi
Committer:Marco Primi

Remove deprecated 'agent registration' functionality

Commit:7badc30
Author:Tom Gianos
Committer:Tom Gianos

Add initial error handling for gRPC DTO conversion code

Commit:9d8208a
Author:Tom Gianos
Committer:Tom Gianos

Initial implementation of change job status gRPC API

Commit:88cfe13
Author:Tom Gianos
Committer:Tom Gianos

Initial implementation of the claimJob gRPC API

Commit:043126d
Author:Tom Gianos
Committer:Tom Gianos

Implement server side reservation of a job id

Commit:86a0267
Author:Tom Gianos
Committer:Tom Gianos

Modify proto definition file for updated service APIs and commit associated changes to supporting classes.

Commit:361cda4
Author:Tom Gianos
Committer:Tom Gianos

Refactor GRpc JobSpecificationService into JobService in preparation for new agree upon workflow of id reservation -> spec resolve -> initialization -> running

Commit:789a711
Author:Tom Gianos
Committer:Tom Gianos

Add version as an option for criterion

Commit:25822b2
Author:Marco Primi
Committer:Marco Primi

Use consistent field naming convention protobuf definitions Consistently use lowercase with underscores for field names. Does not affect generated Java classes.

Commit:a8efd61
Author:Marco Primi
Committer:Marco Primi

Add JobDirectoryLocation field to agent/server protocol and DTOs JobDirectory is specified via CLI and echoed by the server (and logged for auditing and debugging). Later this also allows the server to choose the location (useful when launching an agent in response to a job request via API).

Commit:0230c09
Author:Tom Gianos
Committer:Tom Gianos

Server side implementation of the resolveJobSpecification rpc API. First versions of supporting v4 proto definitions and DTO classes

Commit:3a42699
Author:Tom Gianos
Committer:Tom Gianos

Reformat for 4 spaces per tab/indent in proto file

Commit:9ef594c
Author:Marco Primi
Committer:Marco Primi

Define and implement agent registration protocol

Commit:df7867d
Author:Marco Primi
Committer:Marco Primi

Repurpose 'Greeter' gRPC service example to 'Ping' service The ping protocol can be used by the agent to test for server connectivity and exchange client/server metadata.

Commit:5bb2552
Author:Marco Primi
Committer:Marco Primi

Add protobuf/gRPC to build - Add protobuf and gRPC dependencies - Add and configure protobuf Gradle plugin and gRPC extension - Set up additional source directory for generated classes - Add new genie-proto module with placeholder message definitions - Excluded generated code from various check-type tasks - Set up Spring Boot auto-configuration and starter for gRPC server components (disabled by default) - Add example gRPC server interceptor - Create example service and unit test