Proto commits in petewarden/tensorflow_makefile

These 89 commits are when the Protocol Buffers files have changed:

Commit:feed4e0
Author:Derek Murray

Merge commit for internal changes

The documentation is generated from this commit.

Commit:be8a3e2
Author:Yuan Yu
Committer:TensorFlower Gardener

Add the ability to return the cost model to the client as part of run metadata. Change: 122696039

Commit:5ec60d9
Author:Derek Murray

Merge commit for internal changes

Commit:49ec6ff
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Allow configuration of multiple per-session thread pools for inter-op scheduling. Allow RunOptions to select the thread pool to use. Change: 122636949

Commit:1d4fd06
Author:Derek Murray

Merge commit for internal changes

Commit:2ecbb22
Author:Kevin Robinson

Typo fix for comment in worker.proto

Commit:c34a0d7
Author:Vijay Vasudevan
Committer:TensorFlower Gardener

Change behavior of DirectSession so that it places the entire graph by default, instead of only the pruned graphs. Several reasons motivate this change: - A bug in a the graph (program) today is only identified when you run the op. - Interactive sessions need to allow graphs that have "bugs" in it because people just continue to add to their existing graph, so we need a way to preserve the old behavior. We do that by setting the field by default to true in InteractiveSession. Tests added for both the change in direct_session as well as the InteractiveSession use case. Fixes #1914, #1748 This refactors direct_session and simple_graph_execution_state (used by the distributed runtime) to share the same graph execution and placement code, to reduce code divergence. Change: 122210923

Commit:f3e8282
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add 'add_shapes' to SessionConfig.GraphOptions. Also: Extend session_factory error messages to describe the SessionOptions for which a unique factory cannot be found. Change: 121688635

Commit:768b499
Author:Noah Fiedel
Committer:TensorFlower Gardener

Removes InferenceExample from tensorflow.Example. Background: InferenceExample was confusing as : (a) it exposed Features rather than Examples and, (b) it was primarily intended for serving optimization. Change: 121402533

Commit:7a715d3
Author:Yuan Yu
Committer:TensorFlower Gardener

Integrate the memory optimizer from the OR team and cost model. Change: 121060255

Commit:da0433e
Author:Noah Fiedel
Committer:TensorFlower Gardener

Move NamedTensorProto from being a private detail of master.proto, to a public tensor in core/framework. Change: 120919381

Commit:7202b5d
Author:RJ Ryan
Committer:TensorFlower Gardener

Audio summary support. * Add a simple S16LE WAV encoder. * Add an Audio value type to Summary protocol buffer. * Add AudioSummary kernel and op. * Add support to EventAccumulator/EventMultiplexer for Audio events. * Add 16-bit little endian encode/decode functions. Change: 120854931

Commit:d111957
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

In proto_text, define the Map parsing functions as inside anonymous namespaces, and define them only once per .cc file. Previous code defined them multiple times and relied on 'inline' keyword to avoid ODR conflicts, but someone reported this not working with their build (ODR violation was reported). Change: 120454125

Commit:30334d2
Author:Geoffrey Irving
Committer:TensorFlower Gardener

Add a .Deprecated method to REGISTER_OP This replaces the OP_DEPRECATED macro with something declarative, which in particular lets us throw exceptions at graph construction time based on deprecation. I've left the OP_DEPRECATED macro around in case uses elsewhere can't be expressed in a purely declarative manner. Change: 120386133

Commit:1150ce5
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Fix bugs in tools/proto_text: - Oneof values should be printed even when equal to the default. - Fields should be printed in tag number order, not declaration order. Change: 120223509

Commit:3c280f6
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Added a format for saving an inference graph that can be memmapped and an utility to convert a freezed graph into this format. Change: 120128412

Commit:df15baa
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add tools/proto_text for generating ProtoDebugString, ProtoShortDebugString, and ProtoParseFromString methods from protos. This will allow changing code used on mobile to use the proto LITE_RUNTIME, to reduce code size. This change is only for the tool itself. A future change will add a better genrule and use it the generated code in tensorflow. Change: 119919087

Commit:cc7f05f
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Restrict the use of OUT_OF_RANGE to what it was intended to. Clarify that OUT_OF_RANGE is raised only when reaching the end of input for interable contents. Change the few places where we incorrectly raised OUT_OF_RANGE to raise ILLEGAL_ARGUMENT instead. This will make code that catches the OUT_OF_RANGE exception more robust as it won't get confused by spurious uses of the exception class. Change: 119560848

Commit:0cd025b
Author:Manjunath Kudlur
Committer:TensorFlower Gardener

Enable constant folding in L0 optimization level. Change: 118861866

Commit:9b71f96
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add half support for the first basic ops, namely Cast and Const. Change: 118661449

Commit:160ac73
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add half support for the first basic ops, namely Cast and Const. Change: 118445579

Commit:b6d66ff
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add half support for the first basic ops, namely Cast and Const. Change: 118445207

Commit:22cfbd1
Author:Craig Citro
Committer:TensorFlower Gardener

Switch Docker instructions to always `--pull` on build. This fixes situations like Vincent hit, where a stale base image would lead to new packages based on old base packages. Change: 118071412

Commit:1e401c4
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Plumbing to allow RunOptions in the remote_session: - Add a RunOptions proto in RunStepRequest. - Still need to pass it back to the grpc session. Change: 118045453

Commit:61c252c
Author:Dan Smilkov
Committer:TensorFlower Gardener

Adding ability for users to store `RunOutputs` information to the events file and serve it via the TensorBoard back-end. The `RunOutputs` information contains execution statatistics, such as compute time and memory usage for each node in the subgraph executed by a particular `session.run()`. A follow-up change will add the front-end support to overlay this data onto the graph. Change: 117981334

Commit:f10637b
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Allows explicitly setting a gradient function for a user defined function. Change: 117950844

Commit:9d03824
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add a half type to TensorFlow core, based on Eigen::half. Note that this is only the type, not support for it in any ops, so it is not useful for anything yet. In particular, neither TF_CALL_REAL_NUMBER_TYPES nor TF_CALL_GPU_NUMBER_TYPES list Eigen::half, so even though a lot of ops will end up declaring support for the new type, calling them will fail at runtime. Change: 117825461

Commit:b49474d
Author:Dan Smilkov
Committer:TensorFlower Gardener

Rename RunOutputs -> RunMetadata proto. Reason: it can be confusing to have session.run(ops, run_outputs=...) as users might get confused that run_outputs contains the outputs of the ops that are passed. Or in the C++ API it is even more confusing since we have Session.Run(..., std::vector<Tensor>* outputs, RunOutputs* run_outputs) Change: 117764542

Commit:4c85a08
Author:Geoffrey Irving
Committer:TensorFlower Gardener

Rollforward of "Merge changes from github." Change: 117375570

Commit:f1bccd3
Author:Eugene Brevdo
Committer:TensorFlower Gardener

Another bugfix in run_and_gather_logs. Change: 117375431

Commit:9a4878c
Author:Vijay Vasudevan
Committer:TensorFlower Gardener

Rollback of: "Merge changes from github." Change: 117304114

Commit:3ae663c
Author:Martin Wicke
Committer:TensorFlower Gardener

Merge changes from github. Change: 117301677

Commit:6f1acd3
Author:Eugene Brevdo
Committer:TensorFlower Gardener

A few more tweaks to test_log.proto to reflect what's accessible from bazel build. Also some updates to run_and_gather_logs. Reduce the set of tests that run_and_gather_logs_test runs to speed up the tests. Change: 117271064

Commit:13ab7ff
Author:Igor Babuschkin
Committer:Igor Babuschkin

Add complex128 dtype to framework

Commit:49792ef
Author:Eugene Brevdo
Committer:TensorFlower Gardener

Add the test runner run_and_gather_logs to tools/test * includes a unit test that runs a benchmark and collects the results. * CC tests in tf/core/kernels now set linkstatic so GPU-based CC tests can be run by this test runner. * refactored the py test declarations in tensorflow.bzl to allow users to set main and args directly. necessary for the test runner. Change: 117170938

Commit:ab6ffc9
Author:Vijay Vasudevan
Committer:TensorFlower Gardener

TensorFlow: allow growth in the GPU BFC allocator. This allows an option to start the BFC allocator small and grow it over time as needed. This can lead to increased fragmentation, but the benefit is that only as much memory as "needed" is reserved. This option defaults to off, but can be turned on by passing an option to the first Session. This is done by adding one more layer of indirection between mapping a ChunkHandle to a pointer by introducing the concept of AllocationRegions, which are contiguous memory regions that mimic the previous implementation in their indexing (constant time indexing within an AllocationRegion). The drawback is that we must introduce one more lookup to find out which allocation region a pointer is a part of. This implementation uses a sorted vector and upper_bound to do a binary search based on end_ptr. Its impact is relatively low based on the microbenchmarks below, and if it were a cause for later concern, we can try to map the 'page tables' of multiple regions into one very large AllocationRegion, and hope that there are no holes between address spaces so that the ChunkHandle map is not too large for memory. That being said, this change appears to not slow down the ptb_word_lm benchmark, which was initial impetus for most of the recent changes to this class, so this appears safe. Microbenchmarks I had ran showed no real difference, even when there were multiple regions, and ptb_word_lm benchmark also didn't change. The following numbers bear this out: At HEAD: (consumes 5.8GiB on my Titan Black) Epoch: 1 Learning rate: 1.000 0.004 perplexity: 6119.287 speed: 679 wps 0.104 perplexity: 849.526 speed: 5743 wps 0.204 perplexity: 629.677 speed: 6935 wps 0.304 perplexity: 509.189 speed: 7461 wps 0.404 perplexity: 438.585 speed: 7760 wps 0.504 perplexity: 392.459 speed: 7953 wps 0.604 perplexity: 352.998 speed: 8081 wps 0.703 perplexity: 325.909 speed: 8182 wps 0.803 perplexity: 304.531 speed: 8261 wps 0.903 perplexity: 284.988 speed: 8322 wps Epoch: 1 Train Perplexity: 270.398 Epoch: 1 Valid Perplexity: 178.860 Epoch: 2 Learning rate: 1.000 0.004 perplexity: 212.458 speed: 8836 wps 0.104 perplexity: 151.131 speed: 9039 wps 0.204 perplexity: 158.768 speed: 8950 wps 0.304 perplexity: 153.650 speed: 8925 wps 0.404 perplexity: 150.586 speed: 8910 wps 0.504 perplexity: 148.136 speed: 8817 wps 0.604 perplexity: 143.511 speed: 8778 wps 0.703 perplexity: 141.382 speed: 8773 wps 0.803 perplexity: 139.401 speed: 8775 wps 0.903 perplexity: 135.706 speed: 8777 wps Epoch: 2 Train Perplexity: 133.618 Epoch: 2 Valid Perplexity: 143.462 Epoch: 3 Learning rate: 1.000 0.004 perplexity: 146.292 speed: 8947 wps 0.104 perplexity: 104.901 speed: 9325 wps 0.204 perplexity: 114.335 speed: 9108 wps 0.304 perplexity: 111.434 speed: 9046 wps 0.404 perplexity: 110.328 speed: 9014 wps 0.504 perplexity: 109.455 speed: 8995 wps 0.604 perplexity: 106.877 speed: 8984 wps 0.703 perplexity: 106.158 speed: 8978 wps 0.803 perplexity: 105.532 speed: 8966 wps 0.903 perplexity: 103.284 speed: 8965 wps Epoch: 3 Train Perplexity: 102.326 Epoch: 3 Valid Perplexity: 132.332 Epoch: 4 Learning rate: 1.000 0.004 perplexity: 116.748 speed: 8990 wps 0.104 perplexity: 85.032 speed: 9172 wps 0.204 perplexity: 93.827 speed: 9051 wps 0.304 perplexity: 91.716 speed: 9010 wps 0.404 perplexity: 91.088 speed: 8966 wps 0.504 perplexity: 90.654 speed: 8955 wps 0.604 perplexity: 88.841 speed: 8952 wps 0.703 perplexity: 88.550 speed: 8943 wps 0.803 perplexity: 88.268 speed: 8932 wps 0.903 perplexity: 86.610 speed: 8924 wps Epoch: 4 Train Perplexity: 86.030 Epoch: 4 Valid Perplexity: 127.415 Epoch: 5 Learning rate: 1.000 0.004 perplexity: 98.907 speed: 8952 wps 0.104 perplexity: 73.707 speed: 9238 wps 0.204 perplexity: 81.525 speed: 9112 wps 0.304 perplexity: 79.768 speed: 9074 wps 0.404 perplexity: 79.366 speed: 9060 wps 0.504 perplexity: 79.199 speed: 9039 wps 0.604 perplexity: 77.728 speed: 9037 wps 0.703 perplexity: 77.630 speed: 9037 wps 0.803 perplexity: 77.596 speed: 9033 wps 0.903 perplexity: 76.270 speed: 9005 wps Epoch: 5 Train Perplexity: 75.907 Epoch: 5 Valid Perplexity: 126.183 Epoch: 6 Learning rate: 0.500 0.004 perplexity: 88.458 speed: 8816 wps 0.104 perplexity: 64.231 speed: 9143 wps 0.204 perplexity: 69.896 speed: 9050 wps 0.304 perplexity: 67.342 speed: 9016 wps 0.404 perplexity: 66.162 speed: 8989 wps 0.504 perplexity: 65.290 speed: 8952 wps 0.604 perplexity: 63.331 speed: 8945 wps 0.703 perplexity: 62.617 speed: 8942 wps 0.803 perplexity: 61.883 speed: 8943 wps 0.903 perplexity: 60.149 speed: 8934 wps Epoch: 6 Train Perplexity: 59.222 Epoch: 6 Valid Perplexity: 119.635 Epoch: 7 Learning rate: 0.250 0.004 perplexity: 73.009 speed: 8941 wps 0.104 perplexity: 53.369 speed: 9241 wps 0.204 perplexity: 58.193 speed: 9115 wps 0.304 perplexity: 55.957 speed: 9091 wps 0.404 perplexity: 54.885 speed: 9073 wps 0.504 perplexity: 54.052 speed: 9059 wps 0.604 perplexity: 52.298 speed: 9053 wps 0.703 perplexity: 51.598 speed: 9036 wps 0.803 perplexity: 50.858 speed: 9024 wps With this change: (Consumes 700MiB on my TitanBlack) Epoch: 1 Learning rate: 1.000 0.004 perplexity: 6220.805 speed: 649 wps 0.104 perplexity: 847.498 speed: 5631 wps 0.204 perplexity: 628.919 speed: 6853 wps 0.304 perplexity: 506.395 speed: 7391 wps 0.404 perplexity: 435.559 speed: 7675 wps 0.504 perplexity: 389.903 speed: 7883 wps 0.604 perplexity: 351.013 speed: 8033 wps 0.703 perplexity: 324.474 speed: 8144 wps 0.803 perplexity: 303.551 speed: 8230 wps 0.903 perplexity: 284.267 speed: 8300 wps Epoch: 1 Train Perplexity: 269.826 Epoch: 1 Valid Perplexity: 178.575 Epoch: 2 Learning rate: 1.000 0.004 perplexity: 214.660 speed: 8880 wps 0.104 perplexity: 152.258 speed: 9222 wps 0.204 perplexity: 159.331 speed: 9072 wps 0.304 perplexity: 154.358 speed: 9036 wps 0.404 perplexity: 151.455 speed: 9019 wps 0.504 perplexity: 148.906 speed: 9008 wps 0.604 perplexity: 144.203 speed: 8990 wps 0.703 perplexity: 142.134 speed: 8979 wps 0.803 perplexity: 140.096 speed: 8971 wps 0.903 perplexity: 136.424 speed: 8968 wps Epoch: 2 Train Perplexity: 134.372 Epoch: 2 Valid Perplexity: 144.896 Epoch: 3 Learning rate: 1.000 0.004 perplexity: 146.571 speed: 9008 wps 0.104 perplexity: 105.991 speed: 9277 wps 0.204 perplexity: 114.965 speed: 9151 wps 0.304 perplexity: 112.041 speed: 9101 wps 0.404 perplexity: 110.948 speed: 9057 wps 0.504 perplexity: 110.141 speed: 9050 wps 0.604 perplexity: 107.539 speed: 9043 wps 0.703 perplexity: 106.877 speed: 9040 wps 0.803 perplexity: 106.181 speed: 9040 wps 0.903 perplexity: 103.940 speed: 9025 wps Epoch: 3 Train Perplexity: 103.023 Epoch: 3 Valid Perplexity: 132.966 Epoch: 4 Learning rate: 1.000 0.004 perplexity: 117.296 speed: 8990 wps 0.104 perplexity: 85.532 speed: 9764 wps 0.204 perplexity: 94.076 speed: 9784 wps 0.304 perplexity: 91.875 speed: 9773 wps 0.404 perplexity: 91.423 speed: 9689 wps 0.504 perplexity: 91.090 speed: 9546 wps 0.604 perplexity: 89.244 speed: 9460 wps 0.703 perplexity: 89.004 speed: 9399 wps 0.803 perplexity: 88.732 speed: 9352 wps 0.903 perplexity: 87.097 speed: 9312 wps Epoch: 4 Train Perplexity: 86.571 Epoch: 4 Valid Perplexity: 128.440 Epoch: 5 Learning rate: 1.000 0.004 perplexity: 100.152 speed: 8973 wps 0.104 perplexity: 74.050 speed: 9271 wps 0.204 perplexity: 81.658 speed: 9157 wps 0.304 perplexity: 79.822 speed: 9115 wps 0.404 perplexity: 79.594 speed: 9061 wps 0.504 perplexity: 79.486 speed: 9020 wps 0.604 perplexity: 78.066 speed: 8990 wps 0.703 perplexity: 78.046 speed: 8974 wps 0.803 perplexity: 77.968 speed: 8963 wps 0.903 perplexity: 76.702 speed: 8946 wps Epoch: 5 Train Perplexity: 76.386 Epoch: 5 Valid Perplexity: 127.245 Change: 117032081

Commit:56f1d64
Author:Eugene Brevdo
Committer:TensorFlower Gardener

Fix dependencies bugs Change: 116925769

Commit:848d554
Author:Derek Murray
Committer:TensorFlower Gardener

Prototype of an in-process gRPC server for TensorFlow/Python. Adds support for binding a TensorFlow server to any port, to support single-process testing. This interface is a work in progress. In particular, it supports launching a server, but the support for clean shutdown is incomplete. Change: 116593644

Commit:1442f04
Author:Sherry Moore
Committer:TensorFlower Gardener

Moved config.proto from core/framework to core/protobuf. Change: 116396958

Commit:fa07bfd
Author:Sherry Moore
Committer:TensorFlower Gardener

Added timeout support for Session APIs. To use in Python Session API: sess = tf.Session( config=tf.ConfigProto(session_timeout_in_ms=your_timeout_value)) Change: 116288637

Commit:b3a193d
Author:Derek Murray
Committer:TensorFlower Gardener

Adds an in-process gRPC server with support for clean shutdown. This CL refactors the `grpc_server_lib` library such that it exports a `ServerInterface` class with `Start()`, `Stop()`, and `Join()` methods; and a factory method that takes a new `ServerDef` proto for configuring the server. Also fixes a bug in "master.cc" whereby a `~Master()` would hang if GC was disabled. TODO(mrry): Add a SWIG wrapper so that this can be instantiated in Python. Change: 116194792

Commit:d3e8e04
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Use java_generate_equals_and_hash=true for Features and Examples. Without it, generic reflective implementations of equals() and hashCode() are used, which are much slower. Change: 116150839

Commit:ec1403e
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add optional comprehensive logging of memory allocation/deallocation events. When enabled, the following events are recorded: The start of a step, with the numerical step_id and a textual handle describing the step. A Tensor allocation, including the step_id, the name of the OpKernel, the data type, shape, allocation size, allocation_id, data pointer location, and allocator used (the allocation_id is local to an allocator). A Tensor deallocation, including the allocation_id and allocator used. A raw memory allocation, including the step_id, the name of the component (e.g. Eigen), the number of bytes, data pointer location, allocation_id and allocator used. A raw memory deallocation, including the step_id, the name of the component (e.g. Eigen), allocation_id and allocator used. For now many Tensor allocations show 'unknown' for the kernel and step_id. These mostly come from Tensors allocated by the system from protocol buffers, and Tensors allocated by Ops using the Tensor constructor directly instead of calling OpKernelContext::allocate_temp. The latter can in principle be cleaned up one by one as necessary. The former would require some plumbing to associate an allocation with the appropriate step_id. With this CL memory logging is enabled by raising the VLOG level to 1. Once there is an ability to set process-wide options programmatically it would make sense to update the machinery to do that. Currently recorded events are logged as INFO, and they can all be retrieved by filtering the log for lines including __LOG_MEMORY__. Some example lines are as follows: I0301 13:38:55.797563 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown (from Proto)" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 4 allocator_name: "cuda_host" allocation_id: 2 has_single_reference: true ptr: 8717861408 } } } I0301 13:38:55.802245 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: -6 kernel_name: "Unknown" tensor { dtype: DT_FLOAT shape { } allocation_description { requested_bytes: 4 allocated_bytes: 256 allocator_name: "gpu_bfc" allocation_id: 1 has_single_reference: true ptr: 47378989056 } } } I0301 13:38:55.802347 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 2 allocator_name: "cuda_host" } [...] I0301 13:38:55.806454 81179 log_memory.cc:18] __LOG_MEMORY__ MemoryLogStep { step_id: 1 handle: "->/init;0" } I0301 13:38:55.806659 81220 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "random_normal/shape" tensor { dtype: DT_INT32 shape { dim { size: 4 } } allocation_description { requested_bytes: 16 allocated_bytes: 16 allocator_name: "cuda_host" allocation_id: 1 ptr: 8717860896 } } } [...] I0301 13:38:56.362898 81218 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 1 kernel_name: "conv1/truncated_normal" tensor { dtype: DT_FLOAT shape { dim { size: 11 } dim { size: 11 } dim { size: 3 } dim { size: 96 } } allocation_description { requested_bytes: 139392 allocated_bytes: 139520 allocator_name: "gpu_bfc" allocation_id: 36 has_single_reference: true ptr: 47379030016 } } } I0301 13:38:56.362894 81217 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 24 allocator_name: "gpu_bfc" } I0301 13:38:56.362903 81213 log_memory.cc:18] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 1 kernel_name: "conv5/truncated_normal/mul" tensor { dtype: DT_FLOAT shape { dim { size: 3 } dim { size: 3 } dim { size: 1024 } dim { size: 1024 } } allocation_description { requested_bytes: 37748736 allocated_bytes: 37748736 allocator_name: "gpu_bfc" allocation_id: 34 ptr: 48512711168 } } } [...] I0229 16:39:57.482980 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawAllocation { step_id: 13 operation: "xentropy/EigenAllocator" num_bytes: 64 ptr: 47386857472 allocation_id: 625 allocator_name: "gpu_bfc" } I0229 16:39:57.483147 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" deferred: true } I0229 16:39:57.483197 76558 log_memory.cc:18] __LOG_MEMORY__ MemoryLogRawDeallocation { step_id: 13 operation: "xentropy/EigenAllocator" allocation_id: 625 allocator_name: "gpu_bfc" } Change: 116065112

Commit:9611873
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Adds RunWithOpts() that takes a per-step RunOptions and RunOutputs to C++ Session API. Use it for optionally turning tracing on for a step and returning profiling info collected via StepStats. For DirectSession only. Example usage: RunOptions run_options; run_options.set_trace_level(RunOptions::FULL_TRACE); RunOutputs run_outputs; ASSERT_TRUE(!run_outputs.has_step_stats()); Status s = session->RunWithOpts(run_options, inputs, output_names, target_nodes, &outputs, &run_outputs); ASSERT_TRUE(run_outputs.has_step_stats()); Change: 115693287

Commit:fdfbd3a
Author:Eugene Brevdo
Committer:TensorFlower Gardener

Final fix to TestReporter (hopefully). Change: 115675044

Commit:00986d4
Author:Derek Murray
Committer:TensorFlower Gardener

Initial version of the open-source distributed TensorFlow runtime. This includes a gRPC server (grpc_tensorflow_server) that can serve as both the master of a distributed TensorFlow computation, and an individual worker in the computation. The GrpcSession class is included to allow client programs (including Python clients) to interact with a server. See tensorflow/core/distributed_runtime/README.md for usage instructions. This change partially addresses issue #23. Change: 115634191

Commit:c38bbf4
Author:Vijay Vasudevan
Committer:TensorFlower Gardener

Rollback of "TestReporter is back in. Maybe also fixed the Android build." Test fails. Change: 115602477

Commit:ad3ef4c
Author:Eugene Brevdo
Committer:TensorFlower Gardener

TestReporter is back in. Maybe also fixed the Android build. Change: 115589642

Commit:5c9f4f8
Author:Vijay Vasudevan
Committer:TensorFlower Gardener

TensorFlow: fix bug in StringPiece::contains which made it always return true. Add a unittest to catch this type of regression in the future. Change: 115573280

Commit:fcfa866
Author:Eugene Brevdo
Committer:TensorFlower Gardener

Added TestReporter and test / benchmark reporting tools. These tools are meant to allow recording of benchmark & unit test structured output to pbtxt files in a directory only when the environment variable TEST_REPORT_FILE_PREFIX is set. For now, only saving of C++ microbenchmark output is supported. Change: 115518303

Commit:0aa874f
Author:Sherry Moore
Committer:TensorFlower Gardener

Added documentation for import_meta_graph and export_meta_graph. Change: 115005379

Commit:ea92856
Author:Geoffrey Irving
Committer:TensorFlower Gardener

Add versions to checkpoints Checkpoints now have a version scheme analogous to that for GraphDefs. We have no plans to ever deprecate a checkpoint version, but it's good to have the scheme in place in case we need to. Change: 114364388

Commit:7218dad
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Enables java_multiple_files for all tf protos, and sets the outer class name consistently to <FileName>Protos. Also specifies the java namespace as org.tensorflow.*. This enables compiling tf protos with nano proto for Android (which currently does not work because of file/message name clashes) and follows the same convention for proto3 as used by the API platform. Change: 114279703

Commit:e52cecf
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Added the use of google.protobuf.Any in meta_graph.proto to allow packing any any user protos. Change: 114256308

Commit:cc30473
Author:Josh Levenberg
Committer:Manjunath Kudlur

Add the Ops for the producer to MetaGraphDef. Change: 114175259

Commit:0269c36
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Added export_meta_graph() and import_meta_graph() for serializing/de-serializing the graph and other Python objects necessary restarting training, running eval, or running inference into a MetaGraphDef protocol buffer. MetaGraphDef contains the following: - MetaInfoDef: For storing version and other meta data associated with the meta graph. - GraphDef: The Graph. - SaverDef: The Saver. - CollectionDef * Int64List * FloatList * BytesList * NodeList * AnyList These are evolving APIs and subject to change. Change: 114026857

Commit:e830638
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Refactor the logic to apply optimization into a common module. Change: 113692577

Commit:1e14183
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Add assert_same_float_dtype. Change: 113646664

Commit:6c436f2
Author:Josh Levenberg
Committer:Manjunath Kudlur

Get rid of mysterious comment previously needed for syncing with the Google-internal version. Change: 113552310

Commit:0cee1d3
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Add a new 'log message' event type. This will be used in a subsequent change to allow Python code to log to the events file so that it can be viewed in TensorBoard and so logging events can be associated with models more easily. Change: 113194631

Commit:c2722a1
Author:Manjunath Kudlur
Committer:Vijay Vasudevan

- Added optimizer_options field to GraphOptions, moved graph optmization options there. - Deprecated the existing skip_common_subexpression_elimination field. Change: 113194182

Commit:a18d48f
Author:Geoffrey Irving
Committer:Vijay Vasudevan

Change GraphDef versions to use version, min_consumer, min_producer Since all GraphDef versions to date are forwards compatible, we make no attempt to merge old-style and new-style GraphDef version information. The old int32 version field is ignored (and deprecated) and the new VersionDef versions field defaults to producer 0, min_consumer 0 if left out. As a benefit, once this CL is in we can immediately bump the scalar strictness GraphDef version, since consumers will either ignore the new versions (before this CL) or be forward compatible (after this CL). Later, I'll use the same mechanism and protobuf to add versioning to checkpoints. Change: 113091281

Commit:7d4a063
Author:Josh Levenberg
Committer:Manjunath Kudlur

Delete obsolete comment. Change: 113024053

Commit:7118462
Author:Manjunath Kudlur
Committer:Vijay Vasudevan

Added constant folding optimization pass. - Graph* -> Graph* pass - Creates a local executor and executes a copy of the constant "slice" of the original graph, and replaces nodes in original graph with constant nodes. Change: 112971745

Commit:668b2a7
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Adds UINT16 type to TensorFlow. Change: 112970147

Commit:9b70316
Author:Vijay Vasudevan
Committer:Vijay Vasudevan

Running our linter on a lot of files. Change: 112920860

Commit:d04e3b7
Author:Dongjoon Hyun
Committer:Vijay Vasudevan

Fix typos in core/framework.

Commit:00440e9
Author:Manjunath Kudlur

Merge commit for internal changes

Commit:6f62e43
Author:A. Unique TensorFlower
Committer:Manjunath Kudlur

Adds enough auditing to make it possible to track tensor buffers throughout an execution, and build a cost model of memory usage. There are two main components: 1) GPU allocators now assign to each allocated tensor buffer a unique ID so its use can be tracked within and across steps. 2) The checkin cleans up the tracking of usage of Tensor buffers, and makes it work for both sync and async kernels (async kernels did not previously track gpu memory correctly). Each use is now tracked by the OpKernelContext (for allocators that need this support) in a single uniquified set of TensorReferences. When the kernel finishes, the executor retrieves the list of references, logs it if needed in the nodeexecstats, then passes it to the device, which may add an additional reference to keep the memory from being reused until the execution completes. When the tensor is logged in the nodeexecstats a flag is set if there is a single remaining reference to the buffer, which means that the memory will be freed once the Op completes. Change: 112375683

Commit:b989c69
Author:Manjunath Kudlur

Merge commit for internal changes

Commit:cbff45c
Author:Geoffrey Irving
Committer:Vijay Vasudevan

Add GraphDef version 6 implementing scalar strictness, but don't use it The kAllowLegacyScalars flag is now gone. Instead, scalar strictness now depends on the GraphDef version: we are lenient below 6 and strict with 6 and above. The current GraphDef version is still 5; new graphs will not yet use the new version by default. Outside of PLATFORM_GOOGLE, this change has no effect since the code was already scalar strict. The TensorShapeUtils versions of IsLegacyScalar and IsLegacyVector are also gone; users should now get them from OpKernel. The GraphDef version history has been moved to core/public/version.h to minimize the number of files that must be changed in future CLs. Change: 111947842

Commit:d1b8333
Author:Vijay Vasudevan

Merge commit for internal changes

Commit:681db2c
Author:Josh Levenberg
Committer:Vijay Vasudevan

Add note regarding when we can make parsing of AttrValue's holding an empty list more strict. We had some convenient GraphDef version bumps at the relevant time. Change: 111863523

Commit:e7f6336
Author:Dan Vanderkam

Fix broken reference to TensorBase::Serialize

Commit:be5bc79
Author:Vijay Vasudevan

TensorFlow: Add more graphdef validation, this time against OpDefs. The Graph construction code assumes that the GraphDef's NodeDefs match the OpDef specifications. When creating NodeDefs using NodeDef builder, we call ValidateNodeDef() function in node_def_util, but when loading an arbitrary protobuf, we do not. This changes the code to do some important sanity checking during Session::Create/Extend. Bumps the version of graph_def to only do this for new graphs. Also fixes placeholder's shape attr to really be optional, which it wasn't. Fix one test that was using an invalid graph. Change: 111657920

Commit:cb91829
Author:Eugene Brevdo
Committer:Vijay Vasudevan

Add PartialTensorShape C++ class. Wraps the TensorShapeProto, but supports a dimension value of -1 as meaning an "unknown" value. Contains compatibility and merging capabilities similar to the Python version of TensorShape. Future changes will add: * a bool flag to the TensorShape proto that signifies an unknown rank. * serialization/deserialization of Python TensorShape None values to -1 in the Proto. * A new Op attr type "partial_shape" that deserializes to PartialTensorShape. Change: 111622281

Commit:a1d4149
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Change: 111522571

Commit:0ce489f
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

New GraphDef version 4: TensorFlow is now scalar strict The kAllowLegacyScalars flag is now gone. Instead, scalar strictness now depends on the GraphDef version: we are lenient below 4 and strict with 4 and above. All new graphs should use version >= 4. The internal Google version of tensorflow is now as scalar strict as the open source release. Note that outside of PLATFORM_GOOGLE, this change has no effect, since the code was already scalar strict. Change: 111467133

Commit:1c57936
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Added 'logging' import to control_flow_ops which is used in the file but not imported. Change: 110842260

Commit:ef50775
Author:A. Unique TensorFlower
Committer:Vijay Vasudevan

Change: 110592065

Commit:10e62dc
Author:Vijay Vasudevan

TensorFlow: merge changes from internal Change 110055925 Clean up interface for adjust_contrast and adjust_brightness. - Simplify kernel for adjust_contrast and remove all min/max and casts. - Change semantics of delta arg to adjust_brightness (always in [0,1)), and adjust users. - Add saturate_cast for casting images without over/underflow problems. - Add new numbers for adjust_contrast benchmark. This CL makes two changes to the public API: - It changes the semantics of the delta parameter of adjust_brightness, which was in the same range as the input image before, and now is always in [0,1). - It changes the semantics of adjust_contrast (the cc op), which wasn't hidden, but was shadowed by the python wrapper in image_ops. It's a little questionable whether this function was part of the public API. It definitely shouldn't have been. It is now hidden, although now it could be part of the public API, albeit with a different name. Change 110054427 update ci_build * add PYTHON_BIN_PATH and always run ./configure in ci_build * rename ci_build cache directory to bazel-ci_build-cache * sync ci_build/Dockerfile.cpu with docker/Dockerfile.devel * use "FROM nvidia/cuda:..." for gpu container * therefore no need of the tensorflow_extra_deps directory anymore * share install code between containers using ./install/*.sh scripts * do not inherit (and override FROM clausule in dockerfiles anymore) * print bazel test errors to stderr Change 110047126 Update ops.pbtxt. Change 110046428 Simplify the example for the Fill op. Base CL: 110056265

Commit:ddd4aaf
Author:Vijay Vasudevan

TensorFlow: upstream changes to git. Change 109695551 Update FAQ Change 109694725 Add a gradient for resize_bilinear op. Change 109694505 Don't mention variables module in docs variables.Variable should be tf.Variable. Change 109658848 Adding an option to create a new thread-pool for each session. Change 109640570 Take the snapshot of stream-executor. + Expose an interface for scratch space allocation in the interface. Change 109638559 Let image_summary accept uint8 input This allows users to do their own normalization / scaling if the default (very weird) behavior of image_summary is undesired. This required a slight tweak to fake_input.cc to make polymorphically typed fake inputs infer if their type attr is not set but has a default. Unfortunately, adding a second valid type to image_summary *disables* automatic implicit conversion from np.float64 to tf.float32, so this change is slightly backwards incompatible. Change 109636969 Add serialization operations for SparseTensor. Change 109636644 Update generated Op docs. Change 109634899 TensorFlow: add a markdown file for producing release notes for our releases. Seed with 0.5.0 with a boring but accurate description. Change 109634502 Let histogram_summary take any realnumbertype It used to take only floats, not it understands ints. Change 109634434 TensorFlow: update locations where we mention python 3 support, update them to current truth. Change 109632108 Move HSV <> RGB conversions, grayscale conversions, and adjust_* ops back to tensorflow - make GPU-capable version of RGBToHSV and HSVToRGB, allows only float input/output - change docs to reflect new size constraints - change HSV format to be [0,1] for all components - add automatic dtype conversion for all adjust_* and grayscale conversion ops - fix up docs Change 109631077 Improve optimizer exceptions 1. grads_and_vars is now a tuple, so must be wrapped when passed to format. 2. Use '%r' instead of '%s' for dtype formatting Base CL: 109697989

Commit:54a644f
Author:Vijay Vasudevan

TensorFlow: upstream changes to git Change 109366961 TensorFlow BUILD: now that we have an ops library, set linkstatic to 1. This fixes a breakage in the would-be opensource build, and it *might* mean we can get rid of all of the RequireDefaultOps() calls in our code. The ops library is much smaller than the kernels library that was previously linked together. We set linkstatic=0 presumably since we didn't want to package a static copy of the kernels (very large) everywhere. But the op definitions are small, so this seems like a safe change to make. Time to build the various tests was not any longer after this change, and inspecting the example_trainer binary showed no large increase. Change 109363613 TensorFlow: new graph_def_builder_test needs to RequireDefaultOps. Change 109362569 Split ":ops" out of ":kernels" target in tensorflow/core. Change 109360666 Catch dtype and some shape errors sooner in `QueueBase`. Some avoidable errors were not being caught (e.g. the dtypes of the enqueue components were not checked against the queue's dtypes in Python), leading to cryptic messages at runtime. After this CL, they will be caught earlier. Change 109359569 TensorFlow: Expect g_ != nullptr in test Change 109350735 Add a version number to GraphDef We would like to be able to deprecate behavior in newly generated graphs without invalidating tensorflow's ability to read and evaluate old graphs. For this purpose, GraphDef now has a version field which can be checked inside op kernels to determine how backwards compatible to be. version.h defines TF_GRAPHDEF_VERSION_MIN and TF_GRAPHDEF_VERSION_MAX specifying the range of supported GraphDef versions in the current version of tensorflow. Also expose tf.__version__ and tf.__graph_def_version{,_min,_max}__ for Python interrogation purposes. Whenever we want to deprecate or change some GraphDef semantics, we will proceed as follows: 1. Bump TF_GRAPHDEF_VERSION_MAX, leaving TF_GRAPHDEF_VERSION_MIN unchanged. Describe the change in graph.proto, include the date introduced. 2. In each relevant kernel, implement the new behavior if the GraphDef version is new, but preserve the old behavior for previous GraphDef versions. 3. Wait six months or so (we need to formalize this somewhere). 4. Bump TF_GRAPHDEF_VERSION_MIN and remove the backwards compatibility. The GraphDef version is distinct from the open source version, but at least (4) and possibly (1) correspond to major version number bumps. The first GraphDef version bump is the upcoming scalar strictness change, which affects Google users only since open source is already scalar strict. This commit does not yet plumb the version number into OpKernelConstruction so that ops can access it. That will follow. Change 109350260 Made TensorShapeProto implicitly convertible to TensorShape. Base CL: 109366982

Commit:bf6b536
Author:Vijay Vasudevan

TensorFlow: Upstream changes to git. Change 109240606 Fix typo Change 109240358 Fix bug in Concat's shape inference due to legacy scalar handling. The shape function was inadvertently converting outputs of unknown shape (rank=None) to vectors of unknown length (rank=1), due to inability to distinguish between legacy scalars and vectors, because `max(1, None)` is 1. Change 109237152 Remove numarray requirement in python_config. Change 109234003 Fix typo in elu documentation. Change 109232946 Python must now be configured via ./configure script Change 109232134 Backported fixes to the tensor comparison operators from the public Eigen repository Change 109231761 Test invalid inputs to softmax_cross_entropy_with_logits. Change 109230218 Backported fixes to the tensor comparison operators from the public Eigen repository Change 109229915 Correct comments in seq2seq to show the right input types for embedding models. (Thanks to hugman@github for bringing this up.) Change 109229118 Fix resize_images example in documentation and allow resize_images to run on a single image with partially-known shape. Change 109228940 Fix demo and node add/remove button spacing Change 109227909 Include Elu in the NN docs. Change 109227059 Adds variable_op_scope and makes variable_scope always add a name_scope. This creates an op scope for variables that makes it easy to create independent operations with a default name by making that name unique for the current scope and it allows explicit names that are not made unique. Change 109224492 Streamline yuv -> rgb conversion to be done in one pass in native code. The entire process now takes ~2ms (including the ByteBuffer.get() calls), down from 10+ ms when the arrays were being interleaved in Java prior to conversion. Also abstracting common yuv->rgb color conversion into helper method. Change 109224389 Add ability to move nodes in and out of auxiliary nodes in graph. Change 109217177 Update generated Op docs. Change 109215030 Implementation of the ELU activation function: http://arxiv.org/abs/1511.07289 Change 109209848 When GPUBFCAllocator runs out of memory, also log a summary of chunks in use by size. Change 109206569 Switched to the public version of the Eigen::sign method since it supports complex numbers. Change 109199813 Modify tensorflow.SequenceExample to support multiple-length sequences. Base CL: 109241553

Commit:ab34d55
Author:Vijay Vasudevan

TensorFlow: more features, performance improvements, and doc fixes. Changes: - Add Split/Concat() methods to TensorUtil (meant for convenience, not speed) by Chris. - Changes to linear algebra ops interface by Rasmus - Tests for tensorboard by Daniel - Fix bug in histogram calculation by Cassandra - Added tool for backwards compatibility of OpDefs. Tool Checks in history of opdefs and their changes, checks for backwards-incompatible changes. All done by @josh11b - Fix some protobuf example proto docs by Oliver - Add derivative of MatrixDeterminant by @yaroslavvb - Add a priority queue queue by @ebrevdo - Doc and typo fixes by Aurelien and @dave-andersen - Speed improvements to ConvBackwardFilter by @andydavis - Improve speed of Alexnet on TitanX by @zheng-xq - Add some host memory annotations to some GPU kernels by Yuan. - Add support for doubles in histogram summary by @jmchen-g Base CL: 108158338

Commit:56313de
Author:Vijay Vasudevan

TensorFlow: Doc and linter fixes, some additional tests and error handling, updates to website. Changes: - Removes redundant reshape from image models by @mrry - Default TensorBoard to localhost by @danmane - Reformatting of tensorflow/core by @josh11b - Make tutorials backwards compatible to 0.5.0 by @girving - Improve print documentation (md files not updated). - Add proper scrolling to sitemap by @martinwicke Base CL: 107956254

Commit:f2102f4
Author:Vijay Vasudevan

TensorFlow: upstream changes from the afternoon. Changes: - futurize --stage2 changes for Python 3 compatibility by @girving. - Small updates to documentation by @vrv, schuster and others - Account for failure of std::thread::hardware_concurrency by @ebrevdo. - More changes for backwards-compatibility tests by Josh - Updates to python op doc generation by Josh - Added support for using the best-fit allocator via ConfigProto by @vrv. - Rename LocalSession to DirectSession, since local was a bad name for it. - Enable tf.nn.moments() to work with tensors of unknown shape by @mrry. GITHUB_ISSUE: 139 - Changes for Android build by Andrew. Base CL: 107645181

Commit:cd9e60c
Author:Manjunath Kudlur

TensorFlow: Upstream latest changes to Git. Changes: - Updates to installation instructions. - Updates to documentation. - Minor modifications and tests for word2vec. Base CL: 107284192

Commit:f41959c
Author:Manjunath Kudlur

TensorFlow: Initial commit of TensorFlow library. TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108