These 52 commits are when the Protocol Buffers files have changed:
Commit: | 702192d | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add method to get application logs (#185) * Add method to get application logs Adds `client.application_logs`, a method to get application logs for a finished application. This is the equivalent to shelling out to `yarn logs -applicationId <appid>`, but can be done programatically and using proxy-user permissions. This works for Hadoop 2.6-ish right now, and uses lots of semi-private APIs (Hadoop doesn't really expose its logs programatically).
The documentation is generated from this commit.
Commit: | abadaf9 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add `ApplicationClient.add_container` (#174) * Add `ApplicationClient.add_container` Adds a method to add a new container to a service, with optional service configuration overrides. So far only environment variables can be overridden, additional fields could be added as needed. # Add a new container for service `test`, with additional # environment variable `FOO` set to `BAR` in the container app.add_container('test', env={'FOO': 'BAR'}) This is useful for services where every container should be *almost* identical, with small changes made to update configuration parameters as needed. I needed this for ``dask-gateway``, where each worker needs a unique name provided to it before startup. In the presence of container restarts, the container is restarted with the same overridden configuration. A test is added to check this behavior. * Flaky get_applications test Only compare application names, the resource manager doesn't update all fields atomically leading to potential differences in state between requests.
Commit: | 39ba759 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Support GPU and FPGA resources (#171) Adds (limited) support for Hadoop 3 resource requests. Currently only the builtin resource extensions (gpu and fpga) are supported, true custom requests could be added later with greater effort. If the version of hadoop or the configuration don't support these resources, a nice error is thrown.
Commit: | 4c50ac1 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add filter methods to `Client.get_applications` (#162) Allows filtering applications on: - state - name - user - queue - start and finish times These filters are also exposed through the CLI.
Commit: | 4b0ffa7 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add method to move application to another queue (#161) Adds support for moving an application between queues. This is doable using either the python API (`skein.Client.move_application`) or the CLI ('skein application mv`).
Commit: | 0f6f6e4 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add API for querying queue states (#159) * Add API for querying queue states Adds methods to `skein.Client` for querying the status of yarn queues. * Py27 compat
Commit: | 0d74863 | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Add `Client.get_nodes` for getting node information Adds a method to get information on the nodes in a YARN cluster. Allows filtering by node state, and provides access to useful info on selected nodes.
Commit: | a80b895 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Support scaling by a delta instead of total count (#150) * Support scaling by a delta instead of total count Support scaling a service by a delta in number of containers rather than specifying the number of total containers. This can be useful if container lifetimes are being managed by some external service rather than the skein appmaster itself. * change -> delta
Commit: | b9d0317 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add `allow_failures` field (#145) * Add `allow_failures` field Adds support for allowing failures in a service. If set to True, the application will not terminate if the number of failures for a service exceeds `max_restarts` for that service. Default is False. This is useful for things like `dask` workers, where we may want to handle restart behavior manually instead of relying on skein to do it for us.
Commit: | 9acaedd | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Deprecate ``commands`` for ``script`` (#125) Deprecate the ``commands`` field in favor of ``script`` instead. This removes the oddness of us composing a script from a list of commands, as well as the automatic addition of ``set -x -e``.
Commit: | cf6e9a5 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Optionally run application on master node (#120) * Add ability to run user code on the application master Allows running a single user process on the application master. This allows singleton services to start up *much* faster. The master process (currently called a "driver", but that's a bit too overloaded for my liking) also manages the lifetime of the application - when the process exits the application shuts down automatically, regardless of the state of other services. The application final status is determined by the exit code of the process (0 for succeeded, anything else for failed). This can be useful for running things like the dask client for dask-submit (more robustly ensure the cluster shutsdown after submission) or starting single processes like a jupyter notebook server (used in jcrist/yarnspawner to launch jupyterhub on YARN). * Update docs
Commit: | 3da851c | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Rename daemon to driver (#116) * Rename "Daemon" to "Driver" The "Daemon" process name has been confusing for some users, here we rename it to "Driver" to better reflect its purpose and usage. Note that this is (currently) a backwards incompatible change for expert users that may have been touching lower-level methods/classes. This will be fixed in a subsequent commit deprecating those features. * Deprecate daemon smoothly for driver Raise nice warnings when old `Daemon` commands/classes are used. This includes: - `DaemonError` - `DaemonNotRunningError` - `skein daemon` cli command - `Client.start_global_daemon` - `Client.stop_global_daemon` * Update changelog [skip-tests]
Commit: | e403256 | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Use protobuf-lite Use protobuf-lite instead of protobuf. Drops another 1 MiB off the final jar size (down to 7.3 MiB from 8.3 MiB). This was complicated due to: - Lack of up-to-date compiler plugin for protobuf-lite (had to set back to 3.0.0 from 3.5.1) - Many many warnings in the generated code (suppressed by rewriting the generated source to add a suppression annotation) - Lack of documentation on how to do any of this.
Commit: | 172a37f | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Make security configurable per application Allows specifying security configuration as part of the application spec. TLS credentials can be specified either as file locations or as raw bytestrings. This allows for launching applications with differing TLS credentials from the same originating daemon. Also quiets a noisy extraneous log message from grpc >= 1.15 when using fork + exec. This noisy extraneous message should be fixed in a future release of grpc (1.17?), but for now it's better to just quiet it ourselves.
Commit: | b9736fd | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add proxy-user support (#101) * Add `user` to spec * Add Proxy User Support * Add test, fix flake8 issues * Add documentation.
Commit: | 41f0dbe | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add ability to set progress for running application (#93) Adds a `set_progress` method to an application, allows updating progress reports for applications that have a known amount of work to do.
Commit: | 1296685 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Support custom diagnostic messages on shutdown (#92) * Support custom diagnostic messages on shutdown
Commit: | 4c71e2b | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add nodes, racks, and relax_locality (#90) Add `nodes`, `racks`, and `relax_locality` to the skein specification. These are forwarded along for all container requests for the specified service, and provide another way for services to restrict locality to a subset of nodes. - Added `nodes`, `racks`, and `relax_locality` to the specification - Documented new fields - Added tests for models and for functionality
Commit: | 9bf96d2 | |
---|---|---|
Author: | Sergei Lebedev | |
Committer: | Jim Crist |
Added support for YARN node label expressions (#44) * Added support for YARN node label expressions * Propagate the node label expression ApplicationSpec->ServiceTracker The ApplicationSpec node label expression is used as a default for services missing an explicitly set one. * Added tests for YARN node labels The tests require jcrist/hadoop-test-cluster#2. * Reduced the memory of the requested containers in test_core.py This has no effect until jcrist/hadoop-test-cluster#2 is merged. * Final fixups
Commit: | 57eff9a | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Add `log_level` field - Add a `log_level` field to the `Master` configuration. This allows changing skein's debug level easily without the need to provide a custom log4j.properties file. - Clean up a few additional logging messages - Document and tests the new field
Commit: | 91bd05f | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Improve logging, add exit_message - Improve logging messages and coverage - Add `exit_message` attribute on containers, logging diagnostic information from failed containers. - Parse and handle failures due to exceeded memory
Commit: | 0b0449f | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Add ability to specify custom log4j configuration Adds a `master` field to the application specification, where configuration specific to the application master can go. For now this only supports overriding the logging configuration, but other things can go here as needed.
Commit: | daaca80 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add proxied pages through Web UI (#72) * Add support for proxying user provided pages * Final cleanups - Add tests - Rename methods - Add docstrings - A few assorted fixups * Increase timeout
Commit: | ae0aa43 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Support for ACLs (#78) Adds support for Access Control Lists (ACLs). This allows filtering permissions on users/groups.
Commit: | edbbe2f | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add a Web UI (#68) Adds a web-based UI for monitoring the application. The states of all services and the key-value store are exposed. As per the YARN security documentation, the webui is secured using the AmIpFilter, and has been tested locally. Smoketests have been added for the UI, but most testing has to be manual for now.
Commit: | db1e0c5 | |
---|---|---|
Author: | Sergei Lebedev | |
Committer: | Jim Crist |
Added support for ViewFs (#58) * Added support for ViewFs Prior to this commit the Daemon only accquired a delegation token for defaultFs which did not work well in the presence of federation. Now the application spec contains a list the namenodes to acquire the delegation token for in addition to defaultFs. In theory, all tokens should be automatically renewed by the RM. However, I did not have the occasion to test this. * Addressed review comments * Fix for kerberos
Commit: | 2f8d2d7 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Rewrite Key Value store (#40) * Keystore uses bytes for values - Change value type in keystore to bytes instead of string. Keys are still utf-8 encoded strings. This encourages using human-readable names, while allowing sharing raw bytes between containers. - Rename `set` to `put` in cli - Add `skein kv ls` command - Allow piping value from stdin to `skein kv put`, which allows for easy storage of bytes in keystore. * First pass at reworking the key-value store [WIP] * WIP: add key-value store operators * Register operations on KeyValueStore Adds corresponding methods for every key-value operator. * Add docs for new operations Also clean up tab-complete on the ``skein.kv`` namespace, since that namespace will commonly be used by users. * Add back testing for kv store Still need to test new capabilities. * Return to full test coverage * Track ownership in server * Implement key ownership Owned keys are deleted when container shuts down. * Add conditionals Adds conditional expression support. * Implement transaction conditionals - Implement conditional expressions on the server - Test conditional expressions with the transaction command * Add transaction support Finish implementing transaction support, add tests. * Add Event support Python api * Add interval-tree implementation * Implement kv watch operations Allows for watching put and delete events on ranges in the key-value store. * Fix thread safety issues * Test event stream * Final cleanups - Rename `contains` to `exists` - Spell check - Py27 compatibility - Rename `range_start`/`range_end` -> `start`/`end` to be more uniform everywhere. * Improve test robustness
Commit: | 9d05ad5 | |
---|---|---|
Author: | Sergei Lebedev | |
Committer: | Jim Crist |
Exposed YARN node address in the ProtoBuf and Python API (#39) Closes #38
Commit: | 3028809 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Api cleanup (#36) * Rename methods Client.kill -> Client.kill_application Client.applications -> Client.get_applications Client.status -> Client.application_report ApplicationClient.describe -> ApplicationClient.get_specification ApplicationClient.containers -> ApplicationClient.get_containers ApplicationClient.kill -> ApplicationClient.kill_container * Remove `Application` class Class was unnecessary - user can build management classes out of client primitives. * Remove obsolete finalize code - Remove tests for legacy finalize implementation - Remove need for weakref.finalize period * Delete unused proto * Remove deprecated `skein init` command.
Commit: | 5092904 | |
---|---|---|
Author: | Jim Crist | |
Committer: | GitHub |
Add `shutdown` method to `ApplicationClient` (#16) Allows for stopping an application master without explicitly killing it.
Commit: | 03ca0ff | |
---|---|---|
Author: | Jim Crist |
Support max_attempts and tags in spec - Fixes spurious warning in resource manager when max_attempts wasn't set - Allows tagging applications, providing better metadata.
Commit: | 94666d7 | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Add restart behavior Support `max_restarts` argument for services, change failure handling behavior to only terminate the job once `max_restarts` has been exceeded.
Commit: | 001e738 | |
---|---|---|
Author: | Jim Crist |
Job -> ApplicationSpec Reduce the number of names a user needs to keep trap of.
Commit: | a8117d4 | |
---|---|---|
Author: | Jim Crist |
Support scaling services Allow scaling services up or down to meet a specified number of instances.
Commit: | a11e339 | |
---|---|---|
Author: | Jim Crist |
Add ability to kill containers - Can shutdown containers via commandline and python api - Improve cli output from `skein container ls`
Commit: | 41a3e2e | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
More refactor
Commit: | ea247d6 | |
---|---|---|
Author: | Jim Crist |
Mild refactor
Commit: | f0e6c67 | |
---|---|---|
Author: | Jim Crist |
Expose container query endpoint Can get query state from containers
Commit: | a257906 | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Add Container types
Commit: | 7b41506 | |
---|---|---|
Author: | Jim Crist |
Rename keystore -> kv Shorter name, and more accurate in name (this is a key-value store).
Commit: | a17d175 | |
---|---|---|
Author: | Jim Crist |
Rework keystore api
Commit: | 9fa7d21 | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Add wait-for-start behavior
Commit: | b991485 | |
---|---|---|
Author: | Jim Crist |
Finish port to grpc Adds keystore support, including blocking on a key to wait.
Commit: | 87cff8a | |
---|---|---|
Author: | Jim Crist |
Move over inspect
Commit: | f2a00ed | |
---|---|---|
Author: | Jim Crist |
Completely remove jetty/jackson
Commit: | 6bfeb34 | |
---|---|---|
Author: | Jim Crist |
Implement listing applications
Commit: | f14d910 | |
---|---|---|
Author: | Jim Crist | |
Committer: | Jim Crist |
Status works with grpc
Commit: | 69c9bbd | |
---|---|---|
Author: | Jim Crist |
First steps at grpc communication
Commit: | 2d5790a | |
---|---|---|
Author: | Jim Crist |
Conversion between protos and model in python
Commit: | 46f652c | |
---|---|---|
Author: | Jim Crist |
Consolidate files/local_resources
Commit: | c96cf08 | |
---|---|---|
Author: | Jim Crist |
Define services and messages
Commit: | 78a2db4 | |
---|---|---|
Author: | Jim Crist |
Add grpc build configuration - setup.py builds protobuf/grpc automatically - maven builds protobuf/grpc automatically - add proper dependencies to both libraries