Proto commits in jcrist/skein

These 52 commits are when the Protocol Buffers files have changed:

Commit:702192d
Author:Jim Crist
Committer:GitHub

Add method to get application logs (#185) * Add method to get application logs Adds `client.application_logs`, a method to get application logs for a finished application. This is the equivalent to shelling out to `yarn logs -applicationId <appid>`, but can be done programatically and using proxy-user permissions. This works for Hadoop 2.6-ish right now, and uses lots of semi-private APIs (Hadoop doesn't really expose its logs programatically).

The documentation is generated from this commit.

Commit:abadaf9
Author:Jim Crist
Committer:GitHub

Add `ApplicationClient.add_container` (#174) * Add `ApplicationClient.add_container` Adds a method to add a new container to a service, with optional service configuration overrides. So far only environment variables can be overridden, additional fields could be added as needed. # Add a new container for service `test`, with additional # environment variable `FOO` set to `BAR` in the container app.add_container('test', env={'FOO': 'BAR'}) This is useful for services where every container should be *almost* identical, with small changes made to update configuration parameters as needed. I needed this for ``dask-gateway``, where each worker needs a unique name provided to it before startup. In the presence of container restarts, the container is restarted with the same overridden configuration. A test is added to check this behavior. * Flaky get_applications test Only compare application names, the resource manager doesn't update all fields atomically leading to potential differences in state between requests.

Commit:39ba759
Author:Jim Crist
Committer:GitHub

Support GPU and FPGA resources (#171) Adds (limited) support for Hadoop 3 resource requests. Currently only the builtin resource extensions (gpu and fpga) are supported, true custom requests could be added later with greater effort. If the version of hadoop or the configuration don't support these resources, a nice error is thrown.

Commit:4c50ac1
Author:Jim Crist
Committer:GitHub

Add filter methods to `Client.get_applications` (#162) Allows filtering applications on: - state - name - user - queue - start and finish times These filters are also exposed through the CLI.

Commit:4b0ffa7
Author:Jim Crist
Committer:GitHub

Add method to move application to another queue (#161) Adds support for moving an application between queues. This is doable using either the python API (`skein.Client.move_application`) or the CLI ('skein application mv`).

Commit:0f6f6e4
Author:Jim Crist
Committer:GitHub

Add API for querying queue states (#159) * Add API for querying queue states Adds methods to `skein.Client` for querying the status of yarn queues. * Py27 compat

Commit:0d74863
Author:Jim Crist
Committer:Jim Crist

Add `Client.get_nodes` for getting node information Adds a method to get information on the nodes in a YARN cluster. Allows filtering by node state, and provides access to useful info on selected nodes.

Commit:a80b895
Author:Jim Crist
Committer:GitHub

Support scaling by a delta instead of total count (#150) * Support scaling by a delta instead of total count Support scaling a service by a delta in number of containers rather than specifying the number of total containers. This can be useful if container lifetimes are being managed by some external service rather than the skein appmaster itself. * change -> delta

Commit:b9d0317
Author:Jim Crist
Committer:GitHub

Add `allow_failures` field (#145) * Add `allow_failures` field Adds support for allowing failures in a service. If set to True, the application will not terminate if the number of failures for a service exceeds `max_restarts` for that service. Default is False. This is useful for things like `dask` workers, where we may want to handle restart behavior manually instead of relying on skein to do it for us.

Commit:9acaedd
Author:Jim Crist
Committer:GitHub

Deprecate ``commands`` for ``script`` (#125) Deprecate the ``commands`` field in favor of ``script`` instead. This removes the oddness of us composing a script from a list of commands, as well as the automatic addition of ``set -x -e``.

Commit:cf6e9a5
Author:Jim Crist
Committer:GitHub

Optionally run application on master node (#120) * Add ability to run user code on the application master Allows running a single user process on the application master. This allows singleton services to start up *much* faster. The master process (currently called a "driver", but that's a bit too overloaded for my liking) also manages the lifetime of the application - when the process exits the application shuts down automatically, regardless of the state of other services. The application final status is determined by the exit code of the process (0 for succeeded, anything else for failed). This can be useful for running things like the dask client for dask-submit (more robustly ensure the cluster shutsdown after submission) or starting single processes like a jupyter notebook server (used in jcrist/yarnspawner to launch jupyterhub on YARN). * Update docs

Commit:3da851c
Author:Jim Crist
Committer:GitHub

Rename daemon to driver (#116) * Rename "Daemon" to "Driver" The "Daemon" process name has been confusing for some users, here we rename it to "Driver" to better reflect its purpose and usage. Note that this is (currently) a backwards incompatible change for expert users that may have been touching lower-level methods/classes. This will be fixed in a subsequent commit deprecating those features. * Deprecate daemon smoothly for driver Raise nice warnings when old `Daemon` commands/classes are used. This includes: - `DaemonError` - `DaemonNotRunningError` - `skein daemon` cli command - `Client.start_global_daemon` - `Client.stop_global_daemon` * Update changelog [skip-tests]

Commit:e403256
Author:Jim Crist
Committer:Jim Crist

Use protobuf-lite Use protobuf-lite instead of protobuf. Drops another 1 MiB off the final jar size (down to 7.3 MiB from 8.3 MiB). This was complicated due to: - Lack of up-to-date compiler plugin for protobuf-lite (had to set back to 3.0.0 from 3.5.1) - Many many warnings in the generated code (suppressed by rewriting the generated source to add a suppression annotation) - Lack of documentation on how to do any of this.

Commit:172a37f
Author:Jim Crist
Committer:Jim Crist

Make security configurable per application Allows specifying security configuration as part of the application spec. TLS credentials can be specified either as file locations or as raw bytestrings. This allows for launching applications with differing TLS credentials from the same originating daemon. Also quiets a noisy extraneous log message from grpc >= 1.15 when using fork + exec. This noisy extraneous message should be fixed in a future release of grpc (1.17?), but for now it's better to just quiet it ourselves.

Commit:b9736fd
Author:Jim Crist
Committer:GitHub

Add proxy-user support (#101) * Add `user` to spec * Add Proxy User Support * Add test, fix flake8 issues * Add documentation.

Commit:41f0dbe
Author:Jim Crist
Committer:GitHub

Add ability to set progress for running application (#93) Adds a `set_progress` method to an application, allows updating progress reports for applications that have a known amount of work to do.

Commit:1296685
Author:Jim Crist
Committer:GitHub

Support custom diagnostic messages on shutdown (#92) * Support custom diagnostic messages on shutdown

Commit:4c71e2b
Author:Jim Crist
Committer:GitHub

Add nodes, racks, and relax_locality (#90) Add `nodes`, `racks`, and `relax_locality` to the skein specification. These are forwarded along for all container requests for the specified service, and provide another way for services to restrict locality to a subset of nodes. - Added `nodes`, `racks`, and `relax_locality` to the specification - Documented new fields - Added tests for models and for functionality

Commit:9bf96d2
Author:Sergei Lebedev
Committer:Jim Crist

Added support for YARN node label expressions (#44) * Added support for YARN node label expressions * Propagate the node label expression ApplicationSpec->ServiceTracker The ApplicationSpec node label expression is used as a default for services missing an explicitly set one. * Added tests for YARN node labels The tests require jcrist/hadoop-test-cluster#2. * Reduced the memory of the requested containers in test_core.py This has no effect until jcrist/hadoop-test-cluster#2 is merged. * Final fixups

Commit:57eff9a
Author:Jim Crist
Committer:Jim Crist

Add `log_level` field - Add a `log_level` field to the `Master` configuration. This allows changing skein's debug level easily without the need to provide a custom log4j.properties file. - Clean up a few additional logging messages - Document and tests the new field

Commit:91bd05f
Author:Jim Crist
Committer:Jim Crist

Improve logging, add exit_message - Improve logging messages and coverage - Add `exit_message` attribute on containers, logging diagnostic information from failed containers. - Parse and handle failures due to exceeded memory

Commit:0b0449f
Author:Jim Crist
Committer:Jim Crist

Add ability to specify custom log4j configuration Adds a `master` field to the application specification, where configuration specific to the application master can go. For now this only supports overriding the logging configuration, but other things can go here as needed.

Commit:daaca80
Author:Jim Crist
Committer:GitHub

Add proxied pages through Web UI (#72) * Add support for proxying user provided pages * Final cleanups - Add tests - Rename methods - Add docstrings - A few assorted fixups * Increase timeout

Commit:ae0aa43
Author:Jim Crist
Committer:GitHub

Support for ACLs (#78) Adds support for Access Control Lists (ACLs). This allows filtering permissions on users/groups.

Commit:edbbe2f
Author:Jim Crist
Committer:GitHub

Add a Web UI (#68) Adds a web-based UI for monitoring the application. The states of all services and the key-value store are exposed. As per the YARN security documentation, the webui is secured using the AmIpFilter, and has been tested locally. Smoketests have been added for the UI, but most testing has to be manual for now.

Commit:db1e0c5
Author:Sergei Lebedev
Committer:Jim Crist

Added support for ViewFs (#58) * Added support for ViewFs Prior to this commit the Daemon only accquired a delegation token for defaultFs which did not work well in the presence of federation. Now the application spec contains a list the namenodes to acquire the delegation token for in addition to defaultFs. In theory, all tokens should be automatically renewed by the RM. However, I did not have the occasion to test this. * Addressed review comments * Fix for kerberos

Commit:2f8d2d7
Author:Jim Crist
Committer:GitHub

Rewrite Key Value store (#40) * Keystore uses bytes for values - Change value type in keystore to bytes instead of string. Keys are still utf-8 encoded strings. This encourages using human-readable names, while allowing sharing raw bytes between containers. - Rename `set` to `put` in cli - Add `skein kv ls` command - Allow piping value from stdin to `skein kv put`, which allows for easy storage of bytes in keystore. * First pass at reworking the key-value store [WIP] * WIP: add key-value store operators * Register operations on KeyValueStore Adds corresponding methods for every key-value operator. * Add docs for new operations Also clean up tab-complete on the ``skein.kv`` namespace, since that namespace will commonly be used by users. * Add back testing for kv store Still need to test new capabilities. * Return to full test coverage * Track ownership in server * Implement key ownership Owned keys are deleted when container shuts down. * Add conditionals Adds conditional expression support. * Implement transaction conditionals - Implement conditional expressions on the server - Test conditional expressions with the transaction command * Add transaction support Finish implementing transaction support, add tests. * Add Event support Python api * Add interval-tree implementation * Implement kv watch operations Allows for watching put and delete events on ranges in the key-value store. * Fix thread safety issues * Test event stream * Final cleanups - Rename `contains` to `exists` - Spell check - Py27 compatibility - Rename `range_start`/`range_end` -> `start`/`end` to be more uniform everywhere. * Improve test robustness

Commit:9d05ad5
Author:Sergei Lebedev
Committer:Jim Crist

Exposed YARN node address in the ProtoBuf and Python API (#39) Closes #38

Commit:3028809
Author:Jim Crist
Committer:GitHub

Api cleanup (#36) * Rename methods Client.kill -> Client.kill_application Client.applications -> Client.get_applications Client.status -> Client.application_report ApplicationClient.describe -> ApplicationClient.get_specification ApplicationClient.containers -> ApplicationClient.get_containers ApplicationClient.kill -> ApplicationClient.kill_container * Remove `Application` class Class was unnecessary - user can build management classes out of client primitives. * Remove obsolete finalize code - Remove tests for legacy finalize implementation - Remove need for weakref.finalize period * Delete unused proto * Remove deprecated `skein init` command.

Commit:5092904
Author:Jim Crist
Committer:GitHub

Add `shutdown` method to `ApplicationClient` (#16) Allows for stopping an application master without explicitly killing it.

Commit:03ca0ff
Author:Jim Crist

Support max_attempts and tags in spec - Fixes spurious warning in resource manager when max_attempts wasn't set - Allows tagging applications, providing better metadata.

Commit:94666d7
Author:Jim Crist
Committer:Jim Crist

Add restart behavior Support `max_restarts` argument for services, change failure handling behavior to only terminate the job once `max_restarts` has been exceeded.

Commit:001e738
Author:Jim Crist

Job -> ApplicationSpec Reduce the number of names a user needs to keep trap of.

Commit:a8117d4
Author:Jim Crist

Support scaling services Allow scaling services up or down to meet a specified number of instances.

Commit:a11e339
Author:Jim Crist

Add ability to kill containers - Can shutdown containers via commandline and python api - Improve cli output from `skein container ls`

Commit:41a3e2e
Author:Jim Crist
Committer:Jim Crist

More refactor

Commit:ea247d6
Author:Jim Crist

Mild refactor

Commit:f0e6c67
Author:Jim Crist

Expose container query endpoint Can get query state from containers

Commit:a257906
Author:Jim Crist
Committer:Jim Crist

Add Container types

Commit:7b41506
Author:Jim Crist

Rename keystore -> kv Shorter name, and more accurate in name (this is a key-value store).

Commit:a17d175
Author:Jim Crist

Rework keystore api

Commit:9fa7d21
Author:Jim Crist
Committer:Jim Crist

Add wait-for-start behavior

Commit:b991485
Author:Jim Crist

Finish port to grpc Adds keystore support, including blocking on a key to wait.

Commit:87cff8a
Author:Jim Crist

Move over inspect

Commit:f2a00ed
Author:Jim Crist

Completely remove jetty/jackson

Commit:6bfeb34
Author:Jim Crist

Implement listing applications

Commit:f14d910
Author:Jim Crist
Committer:Jim Crist

Status works with grpc

Commit:69c9bbd
Author:Jim Crist

First steps at grpc communication

Commit:2d5790a
Author:Jim Crist

Conversion between protos and model in python

Commit:46f652c
Author:Jim Crist

Consolidate files/local_resources

Commit:c96cf08
Author:Jim Crist

Define services and messages

Commit:78a2db4
Author:Jim Crist

Add grpc build configuration - setup.py builds protobuf/grpc automatically - maven builds protobuf/grpc automatically - add proper dependencies to both libraries