Proto commits in benoitsteiner/tensorflow-opencl

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

Commit:5e855c5
Author:Benoit Steiner
Committer:GitHub

Merge pull request #13976 from benoitsteiner/branch_173415707 Branch 173415707

The documentation is generated from this commit.

Commit:7c1ff88
Author:Tayo Oguntebi
Committer:GitHub

Update node_def.proto Clarifying comments for valid device string in NodeDef, as discussed in PR #13874. Notes: 1. The device string is as emitted by: tensorflow/python/framework/device.py to_string() function. 2. I notice our regex convention does not use '+' (e.g. XX* instead of X+).

Commit:8357c31
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

First part of the refactoring allowing sparse multivalent feature columns. This change extends the split proto to allow feature ids within the feature columns. PiperOrigin-RevId: 173403860

Commit:355e25e
Author:Benoit Steiner
Committer:TensorFlower Gardener

Merge changes from github. END_PUBLIC --- Commit 9f8523640 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 173145770 --- Commit 01b6b0638 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Cut tracing memory cost PiperOrigin-RevId: 173144626 --- Commit 5e23e0e67 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Erase cloned instructions on the fly when merging fusion nodes. This avoids the awkward situation where an RNG which is clearly eligible for fusion becomes ineligible mid-fusion because it suddenly has an extra (dead) user. PiperOrigin-RevId: 173141716 --- Commit 1038927c0 authored by Saurabh Saxena<srbs@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add SerializeIterator op that serializes an IteratorResource into a variant tensor. Add DeserializeIterator op that builds IteratorResource from a variant tensor. Move BundleReaderWrapper and BundleWriterWrapper from dataset.h to iterator_ops.cc. Add generic key-value store interfaces IteratorStateReader and IteratorStateWriter for reading/writing state of iterators. Get rid of IteratorBundleReader and IteratorBundleWriter. PiperOrigin-RevId: 173140858 --- Commit 57f3e529d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change PiperOrigin-RevId: 173136642 --- Commit 0e56ffb7b authored by Shanqing Cai<cais@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix breakages in OSS builds See example breakages logs at: http://ci.tensorflow.org/job/tensorflow-cl-cpu-python3-pip/10847/console http://ci.tensorflow.org/job/tensorflow-cl-gpu/11008/console 1. CL/172477381 added the no_oss tag to tests with oss_serial tags, which broke the logic of OSS_SERIAL tests in pip.sh and run_pip_test.sh. This CL fixes that. 2. The nccl_kernels BUILD target in contrib/nccl/BUILD was missing some dependencies. This CL adds the missing ones. Fixes: #13918 PiperOrigin-RevId: 173133914 --- Commit 3ed049b67 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allows calling keras layers in eager mode. PiperOrigin-RevId: 173129805 --- Commit 4ec6f2b07 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Switching contrib.summaries API to be context-manager-centric PiperOrigin-RevId: 173129793 --- Commit 03b02ffc9 authored by Justine Tunney<jart@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Put Bazel mirror URLs first PiperOrigin-RevId: 173127955 --- Commit 46ab25e4d authored by David Majnemer<majnemer@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add support for convolutions with no spatial dimensions PiperOrigin-RevId: 173126950 --- Commit fc56349b7 authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tf.data] Convert dataset arguments to tensors as early as possible. This change raises a `TypeError` earlier if (for example) the `batch_size` argument to `Dataset.batch()` has the incorrect type. PiperOrigin-RevId: 173126678 --- Commit 4f7503a87 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: K-FAC: Support for registering multiple minibatches with register_fully_connected() PiperOrigin-RevId: 173121735 --- Commit 2845bfcd6 authored by Tim Harley<tharley@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Avoid listing all modified Enter/RefEnter nodes on INFO, use VLOG(1) instead. Leave a single, simple, message on INFO. PiperOrigin-RevId: 173121726 --- Commit 434695921 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: K-FAC: _check_registration() supports multiple towers. PiperOrigin-RevId: 173115870 --- Commit 670dddf4a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Multi-minibatch support for tf.contrib.kfac.fisher_blocks.FullyConnectedKFACBasicFB. PiperOrigin-RevId: 173109677 --- Commit dc13a8e2f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix import of meta graphs with partitioned variables into a scope. Saver inspects SliceInfo to decide the variable name when creating a checkpoint. Before this fix even if a partitioned variable ("weights") was imported into a scope "a" it would still be checkpointed as ("weights") instead of ("a/weights") since import_scoped_meta_graph was not adjusting the SliceInfo. WARNING: if you use import_meta_graph on graphs with partitioned_variables WITH an import_scope argument AND then create a Saver to write/read checkpoints this change may break your checkpoint loading. PiperOrigin-RevId: 173105796 --- Commit eea089bdb authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: K-FAC: Multi-tower support for ConvDiagonalFB. PiperOrigin-RevId: 173105412 --- Commit 9b9cbbe2a authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 Tperm type support for `Transpose` (#13909) * Add int64 Tperm type support for `Transpose` This fix adds int64 Tperm support for `Transpose`. In `array_ops.cc`, `Transpose` and `ConjugateTranspose` have been specified as accepting int32 and int64 perm types. However, only int32 kernels has been registered. This fix adds the int64 perm support by removing the constraint on Tperm, resolve the type at runtime, and copying the data type accordingly to correctly handle the int64/int32 types. Additional tests have been added as well. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 of perm in Transpose. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add namespace to hide PermutationHelper Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable use_gpu=True for perm type test. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * extra // namespace annotation * Adding a comment about int32 casting that should be safe. Permutations only contain values that refer to dimensions, and the maximum number of dimensions we have is 254, so an int32 is always safe here. --- Commit ac0004e71 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 shape support on GPU for stateless random ops. (#13908) * Add int64 shape support on GPU for stateless random ops. This fix adds int64 shape support on GPU for stateless random ops `StatelessRandomUniform`, `StatelessRandomNormal`, `StatelessTruncatedNormal`. The int64 shape for stateless random ops is already supported on CPU with int32/int64 processed properly through `MakeShape`. However, on GPU a type constraint `.TypeConstraint<int32>("T")` has been improperly added. Such a type constraint actually prevents an int64 shape type to run on GPU. (As a comparision, no type constraint on CPU). This fix removes the type constraint and allows int64 shape to be run on GPU. This fix also adds test cases for int64 shape support on stateless random ops. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 shape support for stateless random ops. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int32 to shape types tested. --- Commit 0d437c3be authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 padding support for MirrorPad (#13907) * Add int64 padding support for MirrorPad This fix adds int64 padding support for `MirrorPad`. In the `array_ops.cc` the `MirrorPad`/`MirrorPadGrad` has been specified as supporting int64 padding. The related kernels does not have the int64 padding registered though. This fix adds the int64 padding support. This fix also adds additional test cases for coverage. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update template for CPU and GPU support of int64 paddings. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 padding support for MirrorPad Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Put eigen header first like before, just in case. --- Commit 690003cc0 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add `int64` type `multiples` support for `tf.tile` (#13884) * Add `int64` type `multiples` support for `tf.tile` In the doc of `tf.tile` (tf.tile.__doc__) both `int32` and `int64` are supported for `multiples`. However, the kernel for `int64` is not registered yet. This fix adds the support of `int64` `multiples` so that the behavior matches the description of the docs. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update functors for int64 multiples support in `tf.tile` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update test cases for int64 of multiples in `tf.tile` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add GPU and non GPU tests Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * format with clang-format -i Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Move Tmultiples after T (as it is auxilliary) And use `use_gpu=True` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit fd8d517b9 authored by Yunxing Dai<yunxing@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add tests for convolution 1D RELNOTES: n/a PiperOrigin-RevId: 173060283 --- Commit 40c475b48 authored by formath<jinpengliu@163.com> Committed by Vijay Vasudevan<vrv@google.com>: add segment_reduction_ops to tf_op_files (#13901) --- Commit bfa4ec194 authored by Tayo Oguntebi<10927929+tayo@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Update node_def.proto comments (#13874) The device field had outdated comments. Note: We could consider adding tpu as an example here, e.g. "gpu" | "cpu" | "tpu". Thoughts? --- Commit c9cb5a58d authored by formath<jinpengliu@163.com> Committed by Vijay Vasudevan<vrv@google.com>: protobuf lib path bug fix for benckmark on osx (#13878) --- Commit 1c1dad105 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Add int64 axis support for reduction ops. (#13891) * Add int64 axis support for reduction ops. This fix is a follow up to PR 13863. In PR 13863 the program crash is fixed if int64 axis is passed to reduction ops, e.g. reduce_sum, reduce_max, etc. However, 13863 does not process the case of int64 support, it merely fixes the crash. This fix adds the support for int64 axis of reduction ops. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 axis support for mean, prod, sum Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 axis support for min and max. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add int64 axis support for reduce_all and reduce_any Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 axis support of reduce_any and reduce_all Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 17096081e authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Improve resize_bicubic performance by reorganizing loops (#13840) * Improve resize_bicubic performance by reorganizing loops This fix tries to address the issue raised in 13693 where performance of `resize_bicubic` is not on par with opencv. This fix rearranges the loops so that it is the same for num_channel=40 and num_channel=3: Pre-fix: ``` CHANNEL=40 opencv: 145.08ms tf: 314.26ms CHANNEL=3 opencv: 11.95ms tf: 8.95ms ``` Post-fix: ``` CHANNEL=40 opencv: 144.25ms tf: 214.55ms CHANNEL=3 opencv: 11.78ms tf: 14.07ms ``` This fix fixes 13693. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Keep special handling of `num_channels=3` for `resize_bicubic` This commit keeps special handling of `num_channels=3` for `resize_bicubic`: Without special handling: ``` opencv: 11.78ms tf: 14.07ms ``` With special handling: ``` opencv: 11.74ms tf: 9.46ms ``` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Expand Benchmark test for resize_bicubic Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update from review feedback. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit b927df57f authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Update protobuf.cmake to b04e5cba356212e4e8c66c61bbe0c3a20537c5b9 (#13893) This fix tries to address the issue raised in 8187 where protobuf.cmake used different version as bazel. The reason for discrepancy was due to the fact that a customerized protobuf was needed with Windows patch. Since the patch has been merged in (https://github.com/google/protobuf/pull/2203), it makes sense to update protobuf.cmake so that the same version of cmake is used. This fix fixes 8187. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit d1183ca6a authored by Vijay Vasudevan<vrv@google.com> Committed by GitHub<noreply@github.com>: Give each variable a unique name in accumulate_n_v2_eager_test. (#13886) --- Commit a69945810 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update pin for bazel-toolchains to latest version PiperOrigin-RevId: 173002530 --- Commit 9d55c249c authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix doc in TF_CALL_ when invoked in mobile platform (#13881) * Fix doc in TF_CALL_ when defined(IS_MOBILE_PLATFORM) && !defined(__ANDROID_TYPES_FULL__) This is a small doc fix that includes bool as part of the types that is supported in mobile (IS_MOBILE_PLATFORM && !__ANDROID_TYPES_FULL__), as bool is clearly invoked in the following define. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Also add bool to android full version. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit ba49d8583 authored by Bjarke Hammersholt Roune<broune@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Slight change to reduce_test to avoid generating inf, which was triggering an inf detector unnecessarily. PiperOrigin-RevId: 172965466 --- Commit 93e8f3c67 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adding Python ApiDef overrides. PiperOrigin-RevId: 172960496 --- Commit 0d6a2e353 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change. PiperOrigin-RevId: 172960439 --- Commit 62df65c72 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add dtype argument to Mean and Accuracy object-oriented metrics. PiperOrigin-RevId: 172957714 --- Commit d7409d32b authored by Simone Cirillo<my.accounts@gmx.se> Committed by Vijay Vasudevan<vrv@google.com>: Fix import of spatial_softmax from tensorflow.contrib.layers (#13833) --- Commit df8bce63d authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix crash when `int64` axis is passed to `tf.reduce_sum` (#13863) * Fix crash when `int64` axis is passed to `tf.reduce_sum` This fix tries to fix the crash triggered by `int64` axis passed to `tf.reduce_sum`: ``` ubuntu@ubuntu:~/tensorflow2$ (cd && python) Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> v = tf.reduce_sum([1,2,3], tf.constant(0, tf.int64)) 2017-10-20 15:55:06.993430: F tensorflow/core/framework/tensor.cc:601] Check failed: dtype() == expected_dtype (9 vs. 3) ubuntu@ubuntu:~/tensorflow2$ ``` The issue is caused by the fact that shape inference in `common_shape_fns.cc` only assumes int32 without proper handling of diffent types. In `math_ops.cc` both int32 and int64 are mentioned. NOTE that this fix does not address the issue that int64 is not supported. To allow int64 axis it is more than adding a template in `ReductionOp` as the type of the axis seems to be decided by some other ways in Eigen. This fix merely fixed the crash so that an error message will return without exit from the python program "No OpKernel was registered to support Op 'Sum' with these attrs". Still, I think its worth to at least allow the program to continue in case of unsupported kernel. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update implementation with a template helper function. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 29c7b4658 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adding the Stanford Tensorflow class to community resources. PiperOrigin-RevId: 172956049 --- Commit f758b24a8 authored by Alexandre Passos<apassos@google.com> Committed by Vijay Vasudevan<vrv@google.com>: Variable name for the eager test (#13873) --- Commit a5fe66b15 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Removed some unnecessary broadcasts in binary ops where only one input needs broadcasting (which is a fairly common case, even in the fallback path). PiperOrigin-RevId: 172950493 --- Commit c77090a0a authored by Yong Tang<yong.tang.github@outlook.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix issues where int64 crops could not be passed to batch_to_space. (#13862) * Fix issues where int64 crops could not be passed to batch_to_space. This fix tries to address the issue where int64 `crops` could not be passed to `batch_to_space` even though both int32 and int64 are specified as supported in the docs (tf.batch_to_space.__doc__) The reason is that BatchToSpace kernel puts a constraint of int32 to crops data types. This fix removed the constraint so that int64 `crops` could be supported. NOTE: Just removing the constraint should work and it is not necessary to add specification to the kernel class template, as `SubtleMustCopyFlat` called in the class already correctly handled both int32 and int64 cases. Besides, other data types (e.g., float or double) will not be passed to the kernel as they are guarded by the specification in `array_ops.cc`. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Also remove int64/int32 type constraints for SpaceToBatch kernels Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for int64 crops of batch_to_space and space_to_batch Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix test failures. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 494837936 authored by Joshua V. Dillon<jvdillon@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make `tf.contrib.distributions` quadrature family accept a `Tensor` for `quadrature_grid_and_probs` argument. PiperOrigin-RevId: 172950094 --- Commit 9c825d32c authored by Jinze Bai<baijinze1994@163.com> Committed by Vijay Vasudevan<vrv@google.com>: Merge two GPU kernel launching to one in DiagOp. (#13859) --- Commit c0ca50a47 authored by Yan Facai (???)<facai.yan@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: ENH: add Relu6GradGrad (#13268) * ENH: add Relu6GradGrad * TST: add test case * CLN: import nn_grad * TST: add init value --- Commit 8ff33271e authored by Justin Lebar<jlebar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Dump the computation's SessionModule as part of the tf_compile rule. PiperOrigin-RevId: 172946149 --- Commit ebcae4a5e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add streaming_precision_recall_at_equal_thresholds This helper method computes streaming tp, fp, tn, fp, precision, and recall for the user in a way that exhibits O(T + N) time and space complexity (instead of O(T * N)), where T is the number of thresholds and N is the size of the predictions tensor. Thanks to Frank Chu for the efficient algorithm! PiperOrigin-RevId: 172946073 --- Commit ccfd9c1e5 authored by Sanjoy Das<sanjoy@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Log Hlo IR during AOT compilation PiperOrigin-RevId: 172944165 --- Commit 985031a10 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allows tfe.enable_eager_execution(device_policy=tfe.DEVICE_POLICY_WARN). PiperOrigin-RevId: 172943398 --- Commit 703182d85 authored by Mingxing Tan<tanmingxing@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add performance guide for fused decode_and_crop_jpeg optimization. PiperOrigin-RevId: 172943116 --- Commit 66b1f4383 authored by Francois Chollet<fchollet@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make Network compatible with eager mode. Currently it only allows to instantiate a Network in eager mode using the regular Keras API, and call it on eager tensors. PiperOrigin-RevId: 172942569 --- Commit 41df2cec2 authored by ashankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Testing pending CL: 172939383 --- Commit 37fd95179 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Simplifies capturing code in graph_callable to use recent function improvements. PiperOrigin-RevId: 172937003 --- Commit d1e7382af authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 172924803 PiperOrigin-RevId: 173347587

Commit:f226eb3
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Adds a C64 type to XLA, with actual compilation support coming soon. PiperOrigin-RevId: 173172916

Commit:1038927
Author:Saurabh Saxena
Committer:TensorFlower Gardener

Add SerializeIterator op that serializes an IteratorResource into a variant tensor. Add DeserializeIterator op that builds IteratorResource from a variant tensor. Move BundleReaderWrapper and BundleWriterWrapper from dataset.h to iterator_ops.cc. Add generic key-value store interfaces IteratorStateReader and IteratorStateWriter for reading/writing state of iterators. Get rid of IteratorBundleReader and IteratorBundleWriter. PiperOrigin-RevId: 173140858

Commit:bfa4ec1
Author:Tayo Oguntebi
Committer:Vijay Vasudevan

Update node_def.proto comments (#13874) The device field had outdated comments. Note: We could consider adding tpu as an example here, e.g. "gpu" | "cpu" | "tpu". Thoughts?

Commit:5c24b8b
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

XLA refactoring PiperOrigin-RevId: 172891551

Commit:3715cff
Author:Anna R
Committer:TensorFlower Gardener

Internal change. PiperOrigin-RevId: 172829126

Commit:14a66fd
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[TF:XLA] Update xla_data comments for And, Or, and Not. PiperOrigin-RevId: 172333451

Commit:5dd569c
Author:Mark Heffernan
Committer:TensorFlower Gardener

Make the HLO proto representation (hlo.proto) full fidelity. Hlo modules can be serialized to HLO protos and deserialized without any information loss. As part of this change, a bug is fixed in NameUniquer. Previously, passing names with numeric suffixes could result in name collisions. PiperOrigin-RevId: 172161360

Commit:d426d30
Author:Anna R
Committer:TensorFlower Gardener

Internal change. PiperOrigin-RevId: 172159815

Commit:d871fdc
Author:Jingyue Wu
Committer:TensorFlower Gardener

[Grappler] Remove reshapes whose source shape and destination shape are equal. Also makes ArithmeticOptimizer::Optimize run shape inference at the beginning, and clear _output_shapes at the end. PiperOrigin-RevId: 172133948

Commit:1c241e5
Author:Peter Hawkins
Committer:TensorFlower Gardener

[XLA] Add ShiftLeft, ShiftRightArithmetic, and ShiftRightLogical operators. PiperOrigin-RevId: 172091595

Commit:2645045
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Extend the ExecuteParallel service interface to allow multiple devices per computation. PiperOrigin-RevId: 172071664

Commit:19708cc
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

A few profiler improvements. 1. Use a id_to_string map to reduce the profile size (2/3 in xception) 2. dedup code view's function name with extra file base name. 3. remove code view display heuristic that doesn't work in some cases. 4. make the profile_context thread-safe. PiperOrigin-RevId: 172031528

Commit:87c59cd
Author:Toby Boyd
Committer:TensorFlower Gardener

Internal change. PiperOrigin-RevId: 172013289

Commit:48eee55
Author:David Majnemer
Committer:TensorFlower Gardener

Automated g4 rollback of changelist 170358888 PiperOrigin-RevId: 171982861

Commit:721fbda
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[TF:XLA] Rename BINOP_LOGICAL_X to BINOP_X PiperOrigin-RevId: 171716540

Commit:3bafe0a
Author:Peter Hawkins
Committer:TensorFlower Gardener

Add uint32 and uint64 types to TensorFlow. This change merely creates the types, but does not register kernels that act on uint32/uint64 values. It also does not alter most op registration lists to include uint32/uint64 values. If desirable, that can be done in a subsequent change, although binary size will likely prove problematic if adding more kernels. The intent of the change is so XLA-compiled code can make use uint32/uint64 types. Since XLA does not use traditional TensorFlow kernels, using uint32/uint64 operators from XLA will require only uint32/uint64 op registrations, but will require few new kernel registrations. PiperOrigin-RevId: 171681867

Commit:86238e8
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Track memory allocation/deallocation history. PiperOrigin-RevId: 171239477

Commit:9d2d6cd
Author:David Majnemer
Committer:TensorFlower Gardener

Automated g4 rollback of changelist 170130811 PiperOrigin-RevId: 170358888

Commit:6dc4aac
Author:Benoit Steiner
Committer:TensorFlower Gardener

Fixed outdated comment PiperOrigin-RevId: 170276755

Commit:0b853ef
Author:David Majnemer
Committer:TensorFlower Gardener

[XLA] Split input and output in ConvolutionDimensionNumbers This allows for additional freedom when reasoning and transforming the input and output of convolutions. PiperOrigin-RevId: 170130811

Commit:122ad24
Author:Igor Ganichev
Committer:TensorFlower Gardener

Add equality and hash functions for AttrDef and OpDef PiperOrigin-RevId: 170116027

Commit:b29b839
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Map API change to enable mapping over an arbitrary set of dimensions. PiperOrigin-RevId: 170090055

Commit:e2e3a94
Author:Shanqing Cai
Committer:TensorFlower Gardener

Merge changes from github. END_PUBLIC --- Commit 1e1b3d902 authored by Pete Warden<pete@petewarden.com> Committed by gunan<gunan@google.com>: Changed output directory for Pi CI build to fix permissions problem with nightlies (#13257) * Fix for RTLD_GLOBAL breakage of Pi builds, and removed Eigen version change for Pi that's no longer needed * Fixed Pi Zero OpenBLAS build problems and tidied up directories used * More robust checks in Pi build script * Changed output directory for Pi CI build to fix permissions problem --- Commit fe3a2e65c authored by Yan Facai (???)<facai.yan@gmail.com> Committed by drpngx<drpngx@users.noreply.github.com>: check invalid string type for dest_nodes in extract_sub_graph (#13057) * BUG: check str type * TST: add unit test * CLN: remove list check * CLN: use warning * CLN: 2 indent * CLN: raise TypeError if not list * CLN: check string only --- Commit 225ab7629 authored by Jean Wanka<jm.wanka@gmail.com> Committed by Jean Wanka<jm.wanka@gmail.com>: Fix polynomial decay with cycle for global step=0 For polynomial decay with cycle=True the learning rate at step 0 becomes NaN, because in the process of calculating it we devide by 0. This change should fix it, by setting the multiplier for the decay steps to one for global_step=0. --- Commit 286f57061 authored by Bjarke Hammersholt Roune<broune@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make Service::TransferToClient not attempt to manipulate the literal when the transfer failed, preventing a crash and allowing the caller to see the reason for the failed transfer. PiperOrigin-RevId: 169770126 --- Commit e0501bc4d authored by Yong Tang<yong.tang.github@outlook.com> Committed by Shanqing Cai<cais@google.com>: Fix GRUBlockCell parameter naming inconsistency (#13153) * Fix GRUBlockCell parameter naming inconsistency This fix tries to fix the issue in 13137 where parameter `cell_size` is used instead of `num_units`. This is inconsistent with other RNN cells. This fix adds support of `num_units` while at the same time maintains backward compatiblility for `cell_size`. This fix fixes 13137. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add `@deprecated_args` for 'cell_size' in `GRUBlockCell` This commit adds `@deprecated_args` for 'cell_size' in `GRUBlockCell` Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Address review comment Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 02a2eba05 authored by Pete Warden<pete@petewarden.com> Committed by gunan<gunan@google.com>: Fix for RTLD_GLOBAL breakage of Pi builds, and removed Eigen version change that's no longer needed (#13251) * Fix for RTLD_GLOBAL breakage of Pi builds, and removed Eigen version change for Pi that's no longer needed * Fixed Pi Zero OpenBLAS build problems and tidied up directories used * More robust checks in Pi build script --- Commit 8ef722253 authored by Sanjoy Das<sanjoy@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove a redundant setName. The EmitComputation should have emitted a function with the right name, so use a CHECK instead. PiperOrigin-RevId: 169764856 --- Commit 1b94147dc authored by Neal Wu<wun@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix broken GitHub links in tensorflow and tensorflow_models resulting from The Great Models Move (a.k.a. the research subfolder) PiperOrigin-RevId: 169763373 --- Commit b1ada5f0c authored by Justine Tunney<jart@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix TensorBoard python -m invoke in docs PiperOrigin-RevId: 169758752 --- Commit 2957cd894 authored by Mustafa Ispir<ispir@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Local run option of estimator training. PiperOrigin-RevId: 169756384 --- Commit 1dc2fe7ac authored by Gunhan Gulsoy<gunan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 166264198 PiperOrigin-RevId: 169998124

Commit:b9611a5
Author:Chris Leary
Committer:TensorFlower Gardener

[XLA] Add support for QuantizeAndDequantizeV2. PiperOrigin-RevId: 169955636

Commit:85c4a37
Author:Peter Hawkins
Committer:TensorFlower Gardener

[XLA] Adds an API to attach a device assignment to HLO operators. PiperOrigin-RevId: 169841868

Commit:97b8ccc
Author:Shanqing Cai

Merge commit for internal changes

Commit:054b882
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add fixed space sparse class stats handling. PiperOrigin-RevId: 169570470

Commit:befbc3d
Author:Shanqing Cai

Merge commit for internal changes

Commit:5ce3523
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Extending the core DebugIdentity tensorflow operation with support for writing to a singleton in memory datastructure that records a mapping from debug_urls to debug events. This simplifies reading a large number of states without writing to disk or making internal RPC calls for arbitrary nodes. PiperOrigin-RevId: 169337269

Commit:74680a3
Author:Anna R
Committer:TensorFlower Gardener

Internal change. PiperOrigin-RevId: 169303087

Commit:0332b40
Author:Amit Patankar

Merge commit for internal changes

Commit:d52cff9
Author:Anna R
Committer:TensorFlower Gardener

ApiDef will replace OpGenOverrides proto and take over responsibility for documentation from the OpDef. PiperOrigin-RevId: 169283263

Commit:c8535f3
Author:Joel Hestness
Committer:drpngx

Introduce MPI allreduce and allgather in a new contrib project (#12299) * Allreduce: Rebase to TF 1.3-rc1 (#3) * Introduce MPI allreduce in a new contrib project. This commit adds the tensorflow.contrib.mpi namespace and contrib project, which has a variety of ops that work with MPI. The MPI system works by starting a background thread which communicates between the different processes at a regular interval and schedules asynchronous reductions. At every tick, every rank will notify rank zero of the tensors it is ready to reduce, signifying completion with an empty DONE message. Rank zero will count how many ranks are ready to reduce every tensor, and, whenever a tensor is ready to reduce (that is, every rank is ready to reduce it), rank zero will issue a message to all other ranks directing them to reduce that tensor. This repeats for all the tensors that are ready to reduce, after which rank zero sends all other ranks a DONE message indicating that the tick is complete. Reviewed-by: Joel Hestness <jthestness@gmail.com> * Allreduce/Allgather: Major changes and fixes (#2) This commit constitutes many major updates to the TF MPI allreduce and allgather ops. Specifically, the following changes are included in this commit: 1) The allreduce and allgather ops had race conditions, which this commit fixes. Specifically, the BackgroundThreadLoop previously allocated temporary and output tensors after the main graph traversal thread has completed its call to MPIAll*::ComputeAsync(). Unfortunately, the ops kernel context's memory allocator is only guaranteed to be valid during the ComputeAsync call. This constraint requires ComputeAsync to allocate all tensors before returning; Otherwise, the memory allocator state may reflect allocations and deallocations from further ops that can cause races for the memory locations. To fix this, hoist the memory allocations to ComputeAsync. In this process, introduce a collective op record, which tracks the parameters of the op (e.g. input, output, and configurations). 2) Many models require capability to allreduce or allgather int64 tensors. We add functionality to handle long long data type (64-bit ints). 3) Eliminate the thread sleep. A major to-do item is to eliminate the need for polling between coordinator threads and other ranks. This change will require the coordinator rank to be able to wake up all other ranks when a collective is ready to be performed, but also for all ranks (i.e. background threads) to be woken up by graph traversal threads. In the meantime, remove the thread sleep, because it introduces significant run time overhead (e.g. >20%) for models with quick-running layers (e.g. few recurrent time-steps or few hidden nodes per layer). * mpi_ops.cc: Move toward more TF nature This commit changes a few bits and pieces to align more closely with Tensorflow structures and organization: 1) Use TF mutexes. TF mutexes provide nice scoping and management around std::mutex, and using them is consistent with other TF code. 2) Remove thread sleep at MPI initialization time. Thread sleep should not be used for polling activity. Instead, this commit replaces sleep-polling with a condition variable: The compute graph traversal thread waits on the condition variable until the background thread has completed initialization and signals the graph traversal thread that initialization is complete. 3) Slim MPI initialization check: Since TF permits many threads to be traversing the compute graph concurrently (e.g. with inter_op_parallelism_threads > 1), some graph traversal threads may not have set their GPU device ID. If such a thread executes an MPI op, it would fail the check in InitializedMPIOnSameDevice, because the background thread would be controlling a GPU with ID other than the default (0). Since graph traversal threads do not perform GPU activity, this GPU ID check was unnecessary. Remove it and refactor to just check whether MPI is initialized (IsMPIInitialized). * Rebase to TF 1.3.0-rc1 complete and tested * Allreduce: Rebase to TF 1.3-rc1 (#3) * Introduce MPI allreduce in a new contrib project. This commit adds the tensorflow.contrib.mpi namespace and contrib project, which has a variety of ops that work with MPI. The MPI system works by starting a background thread which communicates between the different processes at a regular interval and schedules asynchronous reductions. At every tick, every rank will notify rank zero of the tensors it is ready to reduce, signifying completion with an empty DONE message. Rank zero will count how many ranks are ready to reduce every tensor, and, whenever a tensor is ready to reduce (that is, every rank is ready to reduce it), rank zero will issue a message to all other ranks directing them to reduce that tensor. This repeats for all the tensors that are ready to reduce, after which rank zero sends all other ranks a DONE message indicating that the tick is complete. Reviewed-by: Joel Hestness <jthestness@gmail.com> * Allreduce/Allgather: Major changes and fixes (#2) This commit constitutes many major updates to the TF MPI allreduce and allgather ops. Specifically, the following changes are included in this commit: 1) The allreduce and allgather ops had race conditions, which this commit fixes. Specifically, the BackgroundThreadLoop previously allocated temporary and output tensors after the main graph traversal thread has completed its call to MPIAll*::ComputeAsync(). Unfortunately, the ops kernel context's memory allocator is only guaranteed to be valid during the ComputeAsync call. This constraint requires ComputeAsync to allocate all tensors before returning; Otherwise, the memory allocator state may reflect allocations and deallocations from further ops that can cause races for the memory locations. To fix this, hoist the memory allocations to ComputeAsync. In this process, introduce a collective op record, which tracks the parameters of the op (e.g. input, output, and configurations). 2) Many models require capability to allreduce or allgather int64 tensors. We add functionality to handle long long data type (64-bit ints). 3) Eliminate the thread sleep. A major to-do item is to eliminate the need for polling between coordinator threads and other ranks. This change will require the coordinator rank to be able to wake up all other ranks when a collective is ready to be performed, but also for all ranks (i.e. background threads) to be woken up by graph traversal threads. In the meantime, remove the thread sleep, because it introduces significant run time overhead (e.g. >20%) for models with quick-running layers (e.g. few recurrent time-steps or few hidden nodes per layer). * mpi_ops.cc: Move toward more TF nature This commit changes a few bits and pieces to align more closely with Tensorflow structures and organization: 1) Use TF mutexes. TF mutexes provide nice scoping and management around std::mutex, and using them is consistent with other TF code. 2) Remove thread sleep at MPI initialization time. Thread sleep should not be used for polling activity. Instead, this commit replaces sleep-polling with a condition variable: The compute graph traversal thread waits on the condition variable until the background thread has completed initialization and signals the graph traversal thread that initialization is complete. 3) Slim MPI initialization check: Since TF permits many threads to be traversing the compute graph concurrently (e.g. with inter_op_parallelism_threads > 1), some graph traversal threads may not have set their GPU device ID. If such a thread executes an MPI op, it would fail the check in InitializedMPIOnSameDevice, because the background thread would be controlling a GPU with ID other than the default (0). Since graph traversal threads do not perform GPU activity, this GPU ID check was unnecessary. Remove it and refactor to just check whether MPI is initialized (IsMPIInitialized). * Rebase to TF 1.3.0-rc1 complete and tested * Minor fixes * Point MPI message proto at contrib/mpi package * MPI Session: Fix graph handling * Pylint fixes * More pylint fixes * Python 2 pylint fix * MPI Collectives Ops: Fix coordinator shut down * Update copyrights to 2017 * Remove MPIDataType and switch to TF DataType * Add Allgather test, fix Allreduce test config * Fix BUILD file for TF sanity checks * Try guarding MPI collectives C++ files with TENSORFLOW_USE_MPI The TF build system on Github tries to build C++ source files in tensorflow/contrib/mpi_collectives even when configured with TF_NEED_MPI=0. This leads to a build failure when the mpi_collectives C++ files try to link against MPI third party headers, which are not set up. Unable to reproduce in contributor's build environment, we try guarding the MPI collectives C++ code with defines for TENSORFLOW_USE_MPI, similar to tensorflow/contrib/mpi. * Comment formatting Hopefully, this will trigger googlebot.

Commit:e55574f
Author:drpngx
Committer:GitHub

Branch 168917534 (#13077) * Use HLO name, rather than pointer address, for profile counter name. This removes a source of nondeterminism in IR generation. PiperOrigin-RevId: 168779489 * Eager gradient tape doesn't keep tensors alive. PiperOrigin-RevId: 168782341 * Add missing back-quote PiperOrigin-RevId: 168785422 * Add in a comment that I forgot to add to a previous commit; NFC. PiperOrigin-RevId: 168786760 * Update ops-related pbtxt files. PiperOrigin-RevId: 168787665 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 168788051 * Fix typo "comptuation" (computation) PiperOrigin-RevId: 168799777 * Fix a bug in export GTFlow model to shared format with sparse float split PiperOrigin-RevId: 168802503 * Add signature def utility functions for inspection of input and output types and shapes. PiperOrigin-RevId: 168820997 * Apply const qualifiers whenever appropriate. PiperOrigin-RevId: 168824461 * TFE: Clearer error message when enable_eager_execution is called more than once PiperOrigin-RevId: 168834147 * [tf.contrib.data] Add colocation constraints between Iterator and Datasets. This restores the colocation behavior that was present when Dataset objects were passed as DT_RESOURCE tensors, and avoids the (currently not supported) case where TensorFlow may attempt to split the dataset pipeline across devices. PiperOrigin-RevId: 168841061 * Optimize C++ kernels for the matrix_band_part op, which is used in various ops operating on triangular or banded matrices: * Add benchmark for matrix_band_part. * Implement simple optimized CUDA kernel instead of calling Eigen generator. * Parallelize CPU kernel for matrix_band_part. * Support on-the-fly transposition in the underlying functors (to be used for future QR op in followup). Benchmarks: First column is of the form {device}_{shape}_{num_lower,num_upper} Test case Before After Speedup cpu_(10,16,16)_(-1,-1) 5.6505e-05 6.2108e-05 -9.92% cpu_(10,16,16)_(-1,0) 0.00010848 0.00010908 -0.55% cpu_(10,16,16)_(0,-1) 0.0001055 0.00011396 -8.02% cpu_(10,16,16)_(2,2) 0.000108 0.00011706 -8.39% cpu_(10,101,101)_(-1,-1) 0.00013697 6.0558e-05 +55.79% cpu_(10,101,101)_(-1,0) 0.00054002 0.00017703 +67.22% cpu_(10,101,101)_(0,-1) 0.00051188 0.00017607 +65.60% cpu_(10,101,101)_(2,2) 0.00050449 0.00016904 +66.49% cpu_(10,256,256)_(-1,-1) 0.00032043 5.6028e-05 +82.51% cpu_(10,256,256)_(-1,0) 0.001335 0.0004015 +69.93% cpu_(10,256,256)_(0,-1) 0.0013521 0.00038862 +71.26% cpu_(10,256,256)_(2,2) 0.001269 0.00039959 +68.51% cpu_(10,1000,1000)_(-1,-1) 0.0090729 6.3419e-05 +99.30% cpu_(10,1000,1000)_(-1,0) 0.01712 0.0047594 +72.20% cpu_(10,1000,1000)_(0,-1) 0.016647 0.0046474 +72.08% cpu_(10,1000,1000)_(2,2) 0.012737 0.0041161 +67.68% cpu_(10,1024,1024)_(-1,-1) 0.0093709 5.8889e-05 +99.37% cpu_(10,1024,1024)_(-1,0) 0.017075 0.0051999 +69.55% cpu_(10,1024,1024)_(0,-1) 0.016867 0.004617 +72.63% cpu_(10,1024,1024)_(2,2) 0.013191 0.003759 +71.50% cpu_(10,2048,2048)_(-1,-1) 0.028427 6.2466e-05 +99.78% cpu_(10,2048,2048)_(-1,0) 0.048134 0.017642 +63.35% cpu_(10,2048,2048)_(0,-1) 0.048773 0.017558 +64.00% cpu_(10,2048,2048)_(2,2) 0.036153 0.015452 +57.26% cpu_(10,10,4,4)_(-1,-1) 5.8055e-05 5.8055e-05 +0.00% cpu_(10,10,4,4)_(-1,0) 0.00015557 0.0001564 -0.54% cpu_(10,10,4,4)_(0,-1) 0.00015855 0.00015199 +4.14% cpu_(10,10,4,4)_(2,2) 0.00016379 0.00018096 -10.48% cpu_(10,10,10,10)_(-1,-1) 6.0558e-05 6.0558e-05 +0.00% cpu_(10,10,10,10)_(-1,0) 0.000368 0.00038695 -5.15% cpu_(10,10,10,10)_(0,-1) 0.00036263 0.00038612 -6.48% cpu_(10,10,10,10)_(2,2) 0.00038648 0.00042963 -11.17% cpu_(10,10,16,16)_(-1,-1) 6.9022e-05 5.7578e-05 +16.58% cpu_(10,10,16,16)_(-1,0) 0.0005815 0.0001874 +67.77% cpu_(10,10,16,16)_(0,-1) 0.00059354 0.0001924 +67.58% cpu_(10,10,16,16)_(2,2) 0.00062239 0.00019097 +69.32% cpu_(10,10,101,101)_(-1,-1) 0.00014806 6.2823e-05 +57.57% cpu_(10,10,101,101)_(-1,0) 0.0039785 0.00078249 +80.33% cpu_(10,10,101,101)_(0,-1) 0.0040585 0.00076556 +81.14% cpu_(10,10,101,101)_(2,2) 0.0039514 0.00077307 +80.44% cpu_(10,10,256,256)_(-1,-1) 0.0026824 6.0558e-05 +97.74% cpu_(10,10,256,256)_(-1,0) 0.017269 0.0031619 +81.69% cpu_(10,10,256,256)_(0,-1) 0.020287 0.0030774 +84.83% cpu_(10,10,256,256)_(2,2) 0.011919 0.0026599 +77.68% cpu_(10,10,1000,1000)_(-1,-1) 0.065783 5.6982e-05 +99.91% cpu_(10,10,1000,1000)_(-1,0) 0.1361 0.054533 +59.93% cpu_(10,10,1000,1000)_(0,-1) 0.1397 0.053405 +61.77% cpu_(10,10,1000,1000)_(2,2) 0.10173 0.048561 +52.26% cpu_(10,10,1024,1024)_(-1,-1) 0.066231 7.5579e-05 +99.89% cpu_(10,10,1024,1024)_(-1,0) 0.13615 0.059931 +55.98% cpu_(10,10,1024,1024)_(0,-1) 0.13745 0.064931 +52.76% cpu_(10,10,1024,1024)_(2,2) 0.10493 0.054258 +48.29% cpu_(10,10,2048,2048)_(-1,-1) 0.23487 6.6042e-05 +99.97% cpu_(10,10,2048,2048)_(-1,0) 0.41014 0.24283 +40.79% cpu_(10,10,2048,2048)_(0,-1) 0.43621 0.26393 +39.49% cpu_(10,10,2048,2048)_(2,2) 0.29919 0.22302 +25.46% gpu_(10,16,16)_(-1,-1) 0.00010753 0.00010753 +0.00% gpu_(10,16,16)_(-1,0) 0.00011253 0.00012445 -10.59% gpu_(10,16,16)_(0,-1) 0.00012493 0.00013399 -7.25% gpu_(10,16,16)_(2,2) 0.000108 0.00011754 -8.83% gpu_(10,101,101)_(-1,-1) 0.00011849 8.7976e-05 +25.75% gpu_(10,101,101)_(-1,0) 0.00012743 0.00012243 +3.93% gpu_(10,101,101)_(0,-1) 0.00012958 0.00012362 +4.60% gpu_(10,101,101)_(2,2) 0.00011504 0.00011504 +0.00% gpu_(10,256,256)_(-1,-1) 0.00013447 9.7513e-05 +27.48% gpu_(10,256,256)_(-1,0) 0.00018752 0.00014746 +21.36% gpu_(10,256,256)_(0,-1) 0.00017798 0.00016904 +5.02% gpu_(10,256,256)_(2,2) 0.0001514 0.00013697 +9.53% gpu_(10,1000,1000)_(-1,-1) 0.0005095 9.8586e-05 +80.65% gpu_(10,1000,1000)_(-1,0) 0.00088501 0.00056589 +36.06% gpu_(10,1000,1000)_(0,-1) 0.00090456 0.00055242 +38.93% gpu_(10,1000,1000)_(2,2) 0.00080955 0.00049639 +38.68% gpu_(10,1024,1024)_(-1,-1) 0.00050902 9.7036e-05 +80.94% gpu_(10,1024,1024)_(-1,0) 0.00098789 0.00058246 +41.04% gpu_(10,1024,1024)_(0,-1) 0.001 0.00059545 +40.46% gpu_(10,1024,1024)_(2,2) 0.00082254 0.00049961 +39.26% gpu_(10,2048,2048)_(-1,-1) 0.001495 9.8944e-05 +93.38% gpu_(10,2048,2048)_(-1,0) 0.003535 0.0017736 +49.83% gpu_(10,2048,2048)_(0,-1) 0.0034965 0.0017921 +48.75% gpu_(10,2048,2048)_(2,2) 0.0027704 0.0015399 +44.41% gpu_(10,10,4,4)_(-1,-1) 0.00011086 9.1076e-05 +17.85% gpu_(10,10,4,4)_(-1,0) 0.0001235 0.00013411 -8.59% gpu_(10,10,4,4)_(0,-1) 0.00011849 0.0001204 -1.61% gpu_(10,10,4,4)_(2,2) 0.00010896 0.00013256 -21.66% gpu_(10,10,10,10)_(-1,-1) 0.00010657 9.5844e-05 +10.07% gpu_(10,10,10,10)_(-1,0) 0.00011754 0.00013602 -15.72% gpu_(10,10,10,10)_(0,-1) 0.00011909 0.00012004 -0.80% gpu_(10,10,10,10)_(2,2) 0.00013196 0.00011349 +14.00% gpu_(10,10,16,16)_(-1,-1) 0.00012898 0.00010705 +17.01% gpu_(10,10,16,16)_(-1,0) 0.00014353 0.00012338 +14.04% gpu_(10,10,16,16)_(0,-1) 0.00011599 0.00012493 -7.71% gpu_(10,10,16,16)_(2,2) 0.00011539 0.00011349 +1.65% gpu_(10,10,101,101)_(-1,-1) 0.00014699 0.00010252 +30.25% gpu_(10,10,101,101)_(-1,0) 0.0002141 0.00015497 +27.62% gpu_(10,10,101,101)_(0,-1) 0.0002017 0.00015843 +21.45% gpu_(10,10,101,101)_(2,2) 0.00018394 0.00015402 +16.27% gpu_(10,10,256,256)_(-1,-1) 0.00032747 9.0003e-05 +72.52% gpu_(10,10,256,256)_(-1,0) 0.00074494 0.00040746 +45.30% gpu_(10,10,256,256)_(0,-1) 0.00072503 0.00042391 +41.53% gpu_(10,10,256,256)_(2,2) 0.00061846 0.00038004 +38.55% gpu_(10,10,1000,1000)_(-1,-1) 0.0032645 0.00010896 +96.66% gpu_(10,10,1000,1000)_(-1,0) 0.007543 0.0038971 +48.34% gpu_(10,10,1000,1000)_(0,-1) 0.006058 0.0039405 +34.95% gpu_(10,10,1000,1000)_(2,2) 0.005198 0.003448 +33.67% gpu_(10,10,1024,1024)_(-1,-1) 0.0034155 9.1434e-05 +97.32% gpu_(10,10,1024,1024)_(-1,0) 0.007099 0.004158 +41.43% gpu_(10,10,1024,1024)_(0,-1) 0.006843 0.003849 +43.75% gpu_(10,10,1024,1024)_(2,2) 0.005506 0.0031376 +43.02% gpu_(10,10,2048,2048)_(-1,-1) 0.013119 0.00010097 +99.23% gpu_(10,10,2048,2048)_(-1,0) 0.028533 0.015175 +46.81% gpu_(10,10,2048,2048)_(0,-1) 0.028458 0.014926 +47.55% gpu_(10,10,2048,2048)_(2,2) 0.022175 0.011797 +46.80% PiperOrigin-RevId: 168849471 * * dataset_ops.read_batch_features() now discards keys for keyed Dataset. * dataset_ops.read_batch_features() ignores unnecessary repeat() when num_repeat == 1. PiperOrigin-RevId: 168855155 * Migrate TFGAN eval to opensource. PiperOrigin-RevId: 168855880 * [XLA] Remove superfluous locking from xla::ComputationBuilder. The class is thread compatible, not thread-safe. It is illegal to call non-const methods of the class concurrently. So the mutex is pointless. Also mark a couple of accessors const. PiperOrigin-RevId: 168857132 * Add ConvertGraphDefToXla to convert from GraphDef to xla::Computation. The main logic is simply refactored from tfcompile, with some minor cleanups along the way. PiperOrigin-RevId: 168857174 * Bugfix to tf.contrib.seq2seq beam_search_ops: GPU edge case of seq_len == 0. PiperOrigin-RevId: 168862288 * [tf.contrib.data] Add `batch_and_drop_remainder` transformation. This transformation, which is designed for use with `Dataset.apply()`, acts like the default of behavior of `tf.train.batch()`, which will truncate a finite input source if its number of elements is not an exact multiple of the batch size. A benefit of using this transformation is that it gives a statically known shape to the output elements, because they are all exactly `batch_size` in the 0th dimension. PiperOrigin-RevId: 168863148 * Minor renaming from tfcompile.Config to tf2xla.Config in comments. PiperOrigin-RevId: 168863860 * Certain ops don't need eager gradients to keep their inputs / outputs alive. PiperOrigin-RevId: 168864350 * [XLA] Add S64 while loop test. PiperOrigin-RevId: 168865653 * tfdbg: fix a bug in list_inputs and list_outputs wherein a tensor name like "x:1" fails to be processed because it were not converted to the node name ("x" in this example) first. Also simplify analyzer_cli_test.py a little through a new helper function. PiperOrigin-RevId: 168867948 * Adds multi_label_head in tf.contrib.estimator PiperOrigin-RevId: 168873313 * Script that generates __init__.py files based on tf_api_names annotations. PiperOrigin-RevId: 168878737 * Fixing the build command. PiperOrigin-RevId: 168881605 * Make sure all checked threads are joined before they are terminated. PiperOrigin-RevId: 168884294 * Output metrics in train mode for multihead. This is to be consistent with other heads who output the metric tensors in train mode. Outputting the metric tensors allow us for example to plot the metrics on the training set (and compare them to the metircs on the eval set). PiperOrigin-RevId: 168884726 * Automated g4 rollback of changelist 168458634 PiperOrigin-RevId: 168887778 * Adds DNNEstimator to tf.contrib.estimator. PiperOrigin-RevId: 168887825 * [tf.contrib.data] Expose `tf.contrib.data.batch_and_drop_remainder()`. PiperOrigin-RevId: 168888592 * disabling timeout test in opensource build PiperOrigin-RevId: 168890483 * Add ops that perform color transforms (including changing value, saturation and hue) in YIQ space. PiperOrigin-RevId: 168897736 * Update the minimum requirement of espsilon for batch norm. PiperOrigin-RevId: 168897907 * Adding support for capture-by-value. PiperOrigin-RevId: 168903482 * disabling failing tsan test PiperOrigin-RevId: 168903876 * disable asan for test timeout PiperOrigin-RevId: 168903999 * Internal change. PiperOrigin-RevId: 168910187 * Fix broken test: tensorflow/contrib/eager/python:datasets_test PiperOrigin-RevId: 168914742 * [XLA:CPU] Implement map fusion. PiperOrigin-RevId: 168915358 * Merge changes from github. END_PUBLIC I also integrated #13073 by hand to make TAP happy. --- Commit 92362d0f0 authored by Skye Wanderman-Milne<skyewm@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add WhileContext class and add plumbing for creating them. This change introduces WhileContext, which stores information about a while loop and will be used in future changes to generate while loop gradient graphs. Exit nodes in a while loop now have a pointer to their associated WhileContext. This will be used to retrieve the context for a given loop. This change adds an optional parameter to BuildWhileLoop() to create a WhileContext for the while loop (currently this is always true, but gradients will generate while loops without associated contexts). This change also adds a as-yet-unused option to BuildWhileLoop() to return the predicate output. PiperOrigin-RevId: 168562303 --- Commit a4f6e7c1a authored by RJ Ryan<rjryan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add mel-scale conversion matrix support to tf.contrib.signal. PiperOrigin-RevId: 168560255 --- Commit b00b6d23c authored by Henry Tan<henrytan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix a segmentation fault caused by invalid log directory in InternalFlush(). PiperOrigin-RevId: 168557063 --- Commit 2bc7a155a authored by Yong Tang<yong.tang.github@outlook.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Add uint16 support for tf.decode_raw (#12719) * Add uint16 support for tf.decode_raw This fix tries to address the request raised in 10124 where uint16 support for tf.decode_raw is needed. tf.decode_raw already support half, float32, float64, int8, int16, int32, int64, uint8. And uint16 was not supported. This fix adds uint16 support for tf.decode_raw. This fix fixes 10124. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix test failure caused by uint16 support of decode_raw and add unit tests. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 009285c09 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove benchmark for TensorShapeOld. PiperOrigin-RevId: 168551108 --- Commit dc1eda8a6 authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Fix CHECK-failure crash if a non-tuple was passed to GetTupleElement. PiperOrigin-RevId: 168550703 --- Commit 010922ed9 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 168549989 --- Commit c8a6131e9 authored by Mark Daoust<markdaoust@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: make `tf.sets` examples executable Fixes #12969 PiperOrigin-RevId: 168549712 --- Commit bece65c6f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use a map instead of a vector of Children() in the BeamEntry. The assumption is that since the entries are sparse (they are all populated, but most are never Active()), using the map will save memory and make iterating over the Children() more efficient. PiperOrigin-RevId: 168548814 --- Commit 0d5ab82ce authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 168548642 --- Commit 3331c574b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Implementing gradients for tf.image.resize_bicubic. PiperOrigin-RevId: 168547412 --- Commit 4982ef0fa authored by Martin Wicke<wicke@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add the ability to warn only once if deprecated functionality is used, and make that the default. PiperOrigin-RevId: 168545655 --- Commit 99423416a authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Make shape inference error messages for the While HLO more readable. Build the error lazily. PiperOrigin-RevId: 168531083 --- Commit d10374e45 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Discard some unneccessary logging commands. PiperOrigin-RevId: 168500721 --- Commit 83cbabb85 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix wrong format of logging message. PiperOrigin-RevId: 168497373 --- Commit eec4f1b3a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 168494944 --- Commit 69301f352 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 168494220 --- Commit 9d56f419c authored by Mingxing Tan<tanmingxing@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add crop_and_decode_jpeg_op that combines the crop and decode for better performance. PiperOrigin-RevId: 168493125 --- Commit 48ddf64d0 authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Make large params test only run in opt builds. PiperOrigin-RevId: 168491913 --- Commit 11d3ac29d authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add tests for large numbers of parameter / return values and while loops. PiperOrigin-RevId: 168487225 --- Commit 3cd6bdef5 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Added test cases on R4 slice. PiperOrigin-RevId: 168482049 --- Commit 46a81b5c3 authored by Jacques Pienaar<jpienaar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add cast S64 to F32 test. PiperOrigin-RevId: 168473650 --- Commit 59bdf598d authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add an automatically-generated "tensorflow.python.platform.build_info" script. The motivation for this script is to provide better tools for diagnosing load-time errors (such as the ones that plague the Windows build due to DLL issues). Note that the script is intended to be self-contained, so that it is possible to import it without loading the entire TensorFlow runtime. This generated script currently contains a single symbol, `is_cuda_build`, which records whether the build has GPU support or not. PiperOrigin-RevId: 168471034 --- Commit c3b86347f authored by Olivia Nordquist<nolivia@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: reenabling tests that are passing PiperOrigin-RevId: 168466361 --- Commit c728665ec authored by Henry Tan<henrytan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add const qualifiers whenever appropriate. PiperOrigin-RevId: 168465926 --- Commit bf96fcd13 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use the scalar cache in MeanGrad. PiperOrigin-RevId: 168462267 --- Commit 1cada9ea2 authored by Olivia Nordquist<nolivia@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: reenabling test that passed after 100 runs w/o timing out PiperOrigin-RevId: 168458634 --- Commit 00c865566 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Generate error (instead of segfault) when trying to copy string tensor to GPU in EagerTensor constructor. PiperOrigin-RevId: 168457320 --- Commit 655f26fc7 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Resurrects autograd-free eager gradients. PiperOrigin-RevId: 168448557 --- Commit 8f37f3002 authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Cleanups to handling of arguments during XLA compilation: * combine resource kinds in XlaCompiler::Argument::Kind, use a separate XlaResource::Kind field to distinguish different kinds of resource. * merge XlaContext::HandleOrConstant and XlaExpression, which were almost identical. * remove XlaContext::Argument; instead, build XlaExpressions directly from XlaCompiler and add them to the XlaContext. PiperOrigin-RevId: 168439341 --- Commit 7f5346a80 authored by Gunhan Gulsoy<gunan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Reduce cmake log mess. * Echo off for the .bat scripts. * TF cmake: disable warnings in some of the patched projects (gif,jpeg,lmdb). PiperOrigin-RevId: 168432070 --- Commit 2ad85aa4d authored by Mark Heffernan<meheff@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use xla/tests:xla_internal_test_main for all tests under tf/compiler/xla and remove any main() definitions in tests. This enables use of flags in all tests. PiperOrigin-RevId: 168424796 --- Commit cd377811d authored by Henry Tan<henrytan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Comment and error message consistency cleanup. PiperOrigin-RevId: 168422582 --- Commit 7c19b82af authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update tf.sparse_reset_shape so that when shrinking the shape of an empty sparse tensor, the result has a shape of all zeros. PiperOrigin-RevId: 168419639 --- Commit fcacb40d4 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: FirstReadyManager for scheduling nodes in VirtualScheduler. The current FIFOManager may yield inefficient scheduling; _Recv pushed to the FIFO blocks other nodes that can run before _Recv due to the node order in FIFO. FirstReadyManager picks a node with the earliest time_ready in the queue, avoiding this problem. Also, fixed VirtualPlacer to properly set device when Node's device name does not include job name and to set GPU:0 as default device. PiperOrigin-RevId: 168418455 --- Commit 7e47624f5 authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: eager: Initial support for iteration over tf.contrib.data.Dataset objects. TODO: - Support function-valued operation attributes in eager (Required for MapDataset, FilterDataset etc. which encode the per-element computation in a TensorFlow function) PiperOrigin-RevId: 168418250 --- Commit b0a397fce authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: eager: Remove unnecessary TFE_Context argument to TFE_OpSetDevice. PiperOrigin-RevId: 168417999 --- Commit 86211d554 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Graph transform to flatten atrous (dilated) convolutions (i.e., a sequence of SpaceToBatchND-Conv-BatchToSpaceND ops) to a regular Conv op with upsampled filters. PiperOrigin-RevId: 168414124 --- Commit 3438981ca authored by David G. Andersen<dga@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Apply exported symbol filtering to the c++ API analogously to what is filtered for the C API. Fixes bug reported in comments on #1924 PiperOrigin-RevId: 168413719 --- Commit 7e023d865 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA:CPU] Remove code from parallel CPU backend outlining that was causing unnecessary copies to be inserted, and which is no longer necessary since we added co-located buffer support for kCall. *) All bitcast copy is no longer necessary as CopyInsertion will insert copies at the root of the computation for a parameter which is live-out. *) Copy if root does not define buffer no longer necessary because colocated assignment looks at points-to set of root instruction. PiperOrigin-RevId: 168412076 --- Commit 5da4df92c authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Simplify some code in grappler_item_builder.cc, no change in logic. PiperOrigin-RevId: 168409110 --- Commit 82ec6241a authored by drpngx<drpngx@users.noreply.github.com> Committed by GitHub<noreply@github.com>: Add six and numpy imports --- Commit 9c4ce2452 authored by Mark Heffernan<meheff@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add flag parsing to more tests in xla/service specifically those which build HLO graphs. This enables, for example, dumping of the graphs with --xla_generate_hlo_graph. Also remove some superfluous tensorflow test_main dependencies. PiperOrigin-RevId: 168406746 --- Commit d4efa695c authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Relax the feed_nodes collection check, which triggers a false positive in some modes where the feed node collection is auto-generated. Keep it as a warning to help correct user-provided feed node lists. PiperOrigin-RevId: 168396408 --- Commit cbc46a856 authored by Changming Sun<chasun@microsoft.com> Committed by gunan<gunan@google.com>: Add a missing template explicit instantiation of SetZeroFunctor (#12791) --- Commit 7bb08f5bf authored by Kevin Slagle<kjslag@gmail.com> Committed by drpngx<drpngx@users.noreply.github.com>: fix ExponentialMovingAverage documentation so that ExponentialMovingAverage.apply is evaluated within control_dependencies (#12987) --- Commit e6b011763 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Extend c++ gradient_checker to complex types. PiperOrigin-RevId: 168392949 --- Commit 4086219a4 authored by Lyndon White<oxinabox@ucc.asn.au> Committed by drpngx<drpngx@users.noreply.github.com>: Correct minor typo in substr docs example (#12991) --- Commit f63aa7f49 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Migrate core TFGAN functions to opensource. PiperOrigin-RevId: 168391923 --- Commit bc6b60f1b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix tuple_losses bug caused by Python bug. PiperOrigin-RevId: 168386341 --- Commit 7a8c63da3 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Migrate `leaky_relu` to `nn_ops.py`. Will be used for TFGAN. PiperOrigin-RevId: 168386268 --- Commit f7ba16fdf authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Do not export from eval on train data steps. PiperOrigin-RevId: 168374021 --- Commit 9b9e54b34 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adding NCCL sum op, register all_sum gradient. Streamlining nccl test. PiperOrigin-RevId: 168347428 --- Commit bc300318e authored by Gunhan Gulsoy<gunan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update gemmlowp hash as the commit history seems to have changed in the repository. PiperOrigin-RevId: 168343607 --- Commit 1e96d54d9 authored by gunan<gunan@google.com> Committed by GitHub<noreply@github.com>: Also accept non-k8 CPU types in build pip package. (#12975) * Also accept non-k8 CPU types in build pip package. Fixes #12735 * Make the script work with `set -e`. --- Commit c0a4c7ffc authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Fix bug in ShapeUtil::ShapeIs that would lead to type inference errors. PiperOrigin-RevId: 168323589 --- Commit 4af9be964 authored by Amy<amy@infosleuth.net> Committed by drpngx<drpngx@users.noreply.github.com>: support passing in a source url to the mnist read_data_sets function, to make it easier to use 'fashion mnist' etc. (#12983) --- Commit 9f848734f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Tweak layer a bit to be eager friendly. PiperOrigin-RevId: 168312865 --- Commit 60f15462b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Change conv_input_scale and side_input_scale from attributes to inputs for improved flexibility, in fused_conv2d_bias_activation op. PiperOrigin-RevId: 168311988 --- Commit 4b4e10f9c authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adds dict support of eval metrics. PiperOrigin-RevId: 168310444 --- Commit ab7f22de6 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Move FusedConvBiasActivationShape out of common_shape_fns.cc to a lambda inside the op. PiperOrigin-RevId: 168300911 --- Commit 3a98035fa authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Augment metadata output with source-line info, as before. PiperOrigin-RevId: 168292527 --- Commit 349188152 authored by Yao Zhang<yaozhang@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Enable fused batch norm, which is 15-20% faster for training and inference. PiperOrigin-RevId: 168288154 --- Commit 08587d45b authored by Yuefeng Zhou<yuefengz@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Added back persistent memory tracking in queue op. The new tracking logic has avoided the crash in previous implementation: the queue_ passed to CreateTypedQueue may be unreffed if the resource is already created by another resource op that shares the same resource name and type. PiperOrigin-RevId: 168284509 --- Commit 733063d55 authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Fixing awkward wording. --- Commit c7ad6bfef authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Removing accidental hash. --- Commit 53dbc761a authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Adding Windows self check script to docs. --- Commit ed1135994 authored by Andrew Harp<andrewharp@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add -latomic flag to benchmark_model target to fix Android x86 build. PiperOrigin-RevId: 168281337 --- Commit c0348bb55 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update tf_export.py to take constant name as an argument instead of a constant. PiperOrigin-RevId: 168280613 --- Commit c3d19e40a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Cleanup training_ops to reduce code redudancy. PiperOrigin-RevId: 168280069 --- Commit 123fb01ee authored by Yao Zhang<yaozhang@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Set fused=False for batch norm, because the test assumes no bessel's correction. Fused=True would add bessel's correction to variance. PiperOrigin-RevId: 168274392 --- Commit f0e8c545e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Switch resource variables from copy-on-read to copy-on-write. RELNOTES: Change the signature of (C++) GetInputTensorFromVariable in training_op_helpers to support new copy-on-write semenatics of resource variables. PiperOrigin-RevId: 168273249 --- Commit 495cc8e47 authored by Yuan (Terry) Tang<terrytangyuan@users.noreply.github.com> Committed by drpngx<drpngx@users.noreply.github.com>: Minor wording change in timeseries module's README (#12938) * Minor wording change in timeseries module's README * Address comments --- Commit f13b876ed authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Making the default build from source version 1.4.0dev. The whl files that are built will be 1.3.0devDDMMYYYY. --- Commit 2356c0ff4 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Delete ScopedTFStatus to avoid leaking it for long running trainers(1+day). PiperOrigin-RevId: 168259652 --- Commit e15f4cae2 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Don't remove all aliases from linalg namespace. Get rid of redundant aliases. PiperOrigin-RevId: 168257658 --- Commit c58082642 authored by postBG<profile2697@gmail.com> Committed by drpngx<drpngx@users.noreply.github.com>: Fix minor typo in Programmers guide (#12965) * Fix minor typo in Programmers guide * change to "this" --- Commit 509372c2e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add a lot of operations' flops calculations PiperOrigin-RevId: 168256746 --- Commit 80ed8afc0 authored by Francois Chollet<fchollet@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add Flatten to core layers. PiperOrigin-RevId: 168254118 --- Commit a6223c01a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix locking of variables in SparseProximalGradientDescent, AdagradDA, SparseAdagradDA. PiperOrigin-RevId: 168252530 --- Commit abde00830 authored by Olivia Nordquist<nolivia@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: adding InputTensor class for symmetry with OutputTensor PiperOrigin-RevId: 168250085 --- Commit 0451032ca authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Fix variable naming style guide violation. PiperOrigin-RevId: 168245542 --- Commit a202a5a94 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 168245371 --- Commit f93e354cb authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tf.contrib.data] Switch backend Dataset representation to DT_VARIANT. This change introduces a new `DatasetWrapper` type that wraps a `DatasetBase*` and can be stored in a DT_VARIANT tensor. All Dataset ops now consume and produce DT_VARIANT instead of DT_RESOURCE, and the underlying implementation is simplified because the `DatasetWrapper` can be passed directly by value without using the `ResourceMgr`. PiperOrigin-RevId: 168240571 --- Commit a4042cd2a authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Introduces the placeholder for _TrainingExecutor, which serves the implementation of tf.estimator.train_and_evaluate. PiperOrigin-RevId: 168240151 --- Commit 10ba148f7 authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Switch control_flow_ops library to use Resource variants of Stack operators, instead of deprecated Ref variants. PiperOrigin-RevId: 168234822 --- Commit ca43fe82b authored by Ali Yahya<alive@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: TFE: Improves the interfaces of tape.watch_variable() and implicit_grad(). tape.watch_variable() replaces tape.watch() and now is called on ResourceVariable objects instead of their underlying handles. implicit_grad() now returns a list of (gradient, variable) pairs to be consistent with tf.Optimizer's interface. PiperOrigin-RevId: 168232055 --- Commit b72862dfc authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: internal change PiperOrigin-RevId: 168225993 --- Commit da3280f4d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Re-enable tsan for sdca_estimator_test. PiperOrigin-RevId: 168186374 --- Commit c936c1155 authored by Yifei Feng<yifeif@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix pip tests for contrib/gan. - Add *_impl.py so tests can still access removed symbols. - Add /python directory layer to make *_impy.py and __init__.py not in the same dir. PiperOrigin-RevId: 168161722 --- Commit ce9a2b00f authored by Toby Boyd<tobyboyd@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Performance guide update PiperOrigin-RevId: 168159289 --- Commit 3bce4f9a0 authored by Shanqing Cai<cais@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: TFE: expose tfe.num_gpus() PiperOrigin-RevId: 168154345 --- Commit 67a7cbc28 authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Changed the default eval throttle secs from 2 min to 10 mins. PiperOrigin-RevId: 168120323 --- Commit 92bed178f authored by Eugene Brevdo<ebrevdo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Reduce cmake log mess. * Echo off for the .bat scripts. * TF cmake: disable warnings in some of the patched projects (gif,jpeg,lmdb). PiperOrigin-RevId: 168119914 --- Commit 702d59582 authored by joshkyh<joshkyh@users.noreply.github.com> Committed by Yifei Feng<fengyifei2026@gmail.com>: Corrected hyperlink for audio training tutorial (#12923) --- Commit 877c9deca authored by Frank Chen<frankchn@gmail.com> Committed by Yifei Feng<fengyifei2026@gmail.com>: Reverse change eb75ded6 so that internal tests will pass. (#12933) As support for int64 global steps is not ready in TPUs, I am reversing this change so that our internal performance and regression tests will pass. --- Commit 665966438 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Re-enable grpc_session_test. PiperOrigin-RevId: 168078694 --- Commit 405def792 authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Switch CallInliner to use CallGraph::VisitNodes. PiperOrigin-RevId: 168078645 --- Commit aba3466f1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Exposes Head and factory methods in tf.contrib.estimator. PiperOrigin-RevId: 168071246 --- Commit b76565b39 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Some profiler fixes and cleanup. PiperOrigin-RevId: 168069346 --- Commit 32ffc5a81 authored by Jonas<sauercrowd@users.noreply.github.com> Committed by Yifei Feng<fengyifei2026@gmail.com>: Just a dot in order to be consistent (#12919) added a dot to the `7` to make clear it's a float (like every other number) --- Commit 0753b0c79 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Scope the scalar cache in the context. PiperOrigin-RevId: 168065417 --- Commit 48deb206b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Migrate TFGAN features to third_party. PiperOrigin-RevId: 168060880 --- Commit d2ae1311f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixing an issue in the BUILD file of the LSH ops. PiperOrigin-RevId: 168056645 --- Commit 2f440eda4 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Expose NumpyReader for reading timeseries data. PiperOrigin-RevId: 168055838 --- Commit be1916ce7 authored by Daniel Grazian<dgr@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Added functionality to allow `SqlDataset` to interpret a database column as various numeric types, including several integer types and `dtypes.float64`. PiperOrigin-RevId: 168055827 --- Commit fa2000a0b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Supporting nightly windows pip packages. PiperOrigin-RevId: 168054959 --- Commit a263ea626 authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: eager: Treat eager tensors as constants during graph construction. Unless capturing is explicitly enabled. PiperOrigin-RevId: 168052675 --- Commit 6e402d0d2 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make TODO a bit more specific. PiperOrigin-RevId: 168051381 --- Commit c779384bc authored by Daniel Grazian<dgr@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Added code example to the doc string for `SqlDataset`. PiperOrigin-RevId: 168049037 --- Commit ff6dd474a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use self._in_graph_mode consistently in ResourceVariable instead of sometimes getting it from the context. Also: fix formatting of a comment and use a more precise test to detect if initial_value is set. PiperOrigin-RevId: 168047258 --- Commit f331f528b authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Removes "fast paths" which are not fast in eager mode. PiperOrigin-RevId: 168046278 --- Commit 86f1713e5 authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Introduces TrainSpec and EvalSpec. PiperOrigin-RevId: 168040435 --- Commit c8b9e92f0 authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: eager: Move "register_function" to context.py This will allow function registration from other modules without having to import "function.py". (And besides, the function really does belong on the context). PiperOrigin-RevId: 168040411 --- Commit 74137f994 authored by Shanqing Cai<cais@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix signed int overflow issue in tensor_id.cc When a node name has a long numeric suffix, e.g., "foo/y_0/gradient_debug_09684b60f2184c67b744721915034528" (as has happened with tfdbg GradientsDebugger), the parsing algorithm in ParseTensorName() may experience signed int overflow. Replacing the types with "unsigned int" resolves the issue. PiperOrigin-RevId: 168039195 --- Commit 450c3b562 authored by Rohan Jain<rohanj@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Using rendezvous manager to pass args / rets between devices during function remote execution. This enables CPU->GPU remote device executions now. PiperOrigin-RevId: 168038285 --- Commit 82cc6529f authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixes the wording about StopIteration. PiperOrigin-RevId: 168034451 --- Commit fb5588002 authored by Gunhan Gulsoy<gunan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add a statement on install/index.md on what os are supported. PiperOrigin-RevId: 168032996 --- Commit f83f6b9ef authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Handle higher-order HLOs (e.g. While) in CallInliner and test. PiperOrigin-RevId: 168029345 --- Commit 8988ae365 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 167916124 PiperOrigin-RevId: 168916710 * Update ops-related pbtxt files. PiperOrigin-RevId: 168917157 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 168917534

Commit:1c8547e
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add ConvertGraphDefToXla to convert from GraphDef to xla::Computation. The main logic is simply refactored from tfcompile, with some minor cleanups along the way. PiperOrigin-RevId: 168857174

Commit:b76565b
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Some profiler fixes and cleanup. PiperOrigin-RevId: 168069346

Commit:18f3692
Author:Yifei Feng
Committer:gunan

Branch 167812735 (#12867) * Internal cleanup PiperOrigin-RevId: 167636242 * Move the Keras API to tf.keras. PiperOrigin-RevId: 167638421 * Automated g4 rollback of changelist 167604306 PiperOrigin-RevId: 167639833 * Call HloComputation.Accept instead of HloInstruction.Accept to get all instructions profiled. RELNOTES: n/a PiperOrigin-RevId: 167640259 * Add fast math attributes to all generated methods when fast math enabled. RELNOTES: n/a PiperOrigin-RevId: 167646637 * Extended ScratchSpace to expose its underlying scratch tensor object. PiperOrigin-RevId: 167649551 * Change zip(...)[1] to list(zip(...))[1], for python 3 compatibility. PiperOrigin-RevId: 167654035 * Add scoped timer to log jit compile times. RELNOTES: n/a PiperOrigin-RevId: 167656720 * Verify that predictions are in the expected range for ops that use thresholds, e.g. tf.contrib.metrics.streaming_auc. PiperOrigin-RevId: 167658134 * Internal change. PiperOrigin-RevId: 167658401 * Fix list formatting PiperOrigin-RevId: 167660250 * Enable java test. PiperOrigin-RevId: 167660276 * Add shape functions on debug ops. PiperOrigin-RevId: 167668811 * Increase session_bundle_test to a medium test. PiperOrigin-RevId: 167672587 * Include layout of convolution input data in the op_profile. PiperOrigin-RevId: 167680208 * Fix tf.sparse_add for SparseTensor with _ref typed values. Example: st = tf.SparseTensor( indices=[[1]], values=tf.Variable([1.0]), dense_shape=[1]) tf.sparse_add(st, st) PiperOrigin-RevId: 167681121 * Fix conversion to explicit scalar broadcast The dimensions field of a broadcast HLO op is meant to be populated with the dimensions that are broadcasted, which in case of a scalar is the empty vector. Generally, the rank of the operand of a broadcast op should always equal the size of the dimensions vector. PiperOrigin-RevId: 167686946 * Add 'unknown shape' shape functions on deprecated linalg ops. PiperOrigin-RevId: 167719029 * Be more careful in IsInitalized, and log when it is called on an unknown node_id. PiperOrigin-RevId: 167722344 * tfdbg: Refactor graph-processing code out of debug_data.py The basic idea is to separate the code in debug_data.py that handles graph structures into its own module (debug_graphs.py). This tackles an existing TODO item to simplify the code debug_data.DebugDumpDir. In a later CL, code will be added to debug_graphs.DebugGraph to allow reconstruction of the original GraphDef, i.e., the GraphDef without the Copy* and Debug* nodes inserted by tfdbg. This will be useful for, among other things, the TensorBoard Debugger Plugin. PiperOrigin-RevId: 167726113 * internal PiperOrigin-RevId: 167727508 * Update MaxPoolV2Shape to support NCHV_VECT_C. PiperOrigin-RevId: 167732437 * Deleting tf.contrib.learn.dnn benchmark tests. PiperOrigin-RevId: 167741308 * Fix off-by-one documentation error. sequence_lengths is the actual length of the sequence and therefor should not be used as zero-based indexing. The code is correct but the documentation was misleading. PiperOrigin-RevId: 167742082 * contrib summaries work in eager-graph mode (with defun) As a side effect fix issues related to using eager-defined variables in graph mode. PiperOrigin-RevId: 167744121 * Fix minor documentation error in ZlibInputStream. PiperOrigin-RevId: 167745218 * Sets the distributed training related properties of RunConfig based on TF_CONFIG. PiperOrigin-RevId: 167752997 * Improved documentation about eval ops in EstimatorSpec. PiperOrigin-RevId: 167753099 * Automated g4 rollback of changelist 156748870 PiperOrigin-RevId: 167753805 * Make cuda_solvers_gpu.cu.cc compile with nvcc8. PiperOrigin-RevId: 167754383 * Add csv dataset example to get_started/regression. PiperOrigin-RevId: 167754634 * Switches to OrderedDict to make the dictionary order deterministic so we have less randomness from graph building. PiperOrigin-RevId: 167755072 * Add int8 version of fused_conv2d_bias_activation operator for the forward phase, and support side_input and scaling parameters in float and int8 versions. PiperOrigin-RevId: 167763219 * Make the text summary write no plugin data content This is actually a safe removal because no logic makes use of the content of text plugin data. PiperOrigin-RevId: 167763880 * Avoid unnecessary buffer allocations & deallocations Before this change, when we reached the end of a file, we would (1) clear the existing buffer (which at large buffer sizes typically involved deallocating it). (2) reserve a buffer (which at large buffer sizes is non-trivial) (3) realize we had reached EoF, and therefore clear the buffer, deallocating it again. With this change, whenever the buffered reader detects an EoF condition, we remember it, so that we can short-circuit the above logic. The above optimization results in a more than 25x performance improvement for large buffers reading small files. PiperOrigin-RevId: 167766751 * [TF:XLA] In Literal: correctly handle operands with zero elements in Copy. PiperOrigin-RevId: 167769308 * Reduce batch size for resampler backward pass test, to speed up test. PiperOrigin-RevId: 167769539 * Remove `SimpleGraphExecutionState::costs_`, which is unused. PiperOrigin-RevId: 167772120 * detecting cycles when users add a control edge to a graph PiperOrigin-RevId: 167773598 * Make writer_test avoid setting content to a string That content field of the PluginData proto is going to be converted into a bytes field, and setting it to a string makes the test fail. Furthermore, the purpose of this test is to make sure that correct data is written, so setting the name of the plugin suffices. PiperOrigin-RevId: 167776457 * Propagate the original stack trace when exceptions caught be MonitoredSession are re-raised. PiperOrigin-RevId: 167781071 * Change trace.py to not access a graph as a default argument. Checks for None and access via default graph inside the function. PiperOrigin-RevId: 167788815 * Added custom metric support for tf.estimator.Estimator. PiperOrigin-RevId: 167788891 * A eager Saver that allows restore on create. PiperOrigin-RevId: 167789332 * Make content field of PluginData a bytes field The content field had previously been a string field, which had been problematic because string fields can only store UTF-8 strings. This problem can manifest in various ways. For instance, take the precision-recall curve plugin. Its summary collects data that scales in size based on the number of thresholds. When the content field is a string, the summary logic serializes the relevant data proto just fine when we only have a few thresholds (about 100). However, for large numbers of thresholds (ie, around 200), the summary logic fails to serialize and throws a cryptic error. ValueError: '\x10\xc8\x01' has type str, but isn't valid UTF-8 encoding. Non-UTF-8 strings must be converted to unicode objects before being added. Changing the content field to a bytes field fixes this issue because bytes fields are not restricted to UTF-8 strings. I just happened to have needed a long enough string for the string to no longer be a valid UTF-8 one. PiperOrigin-RevId: 167790594 * Temporarily disable tf_should_use wrapper, since it can cause python Graph/Operation/Tensor memory leaks. PiperOrigin-RevId: 167790657 * Ensure using "path" as a URI will keep working. PiperOrigin-RevId: 167793848 * Fix typo in graph transforms error message PiperOrigin-RevId: 167796563 * Merge changes from github. END_PUBLIC --- Commit 607816029 authored by Eugene Brevdo<ebrevdo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Extended ScratchSpace to expose its underlying scratch tensor object. PiperOrigin-RevId: 167649551 --- Commit db43fe68e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add fast math attributes to all generated methods when fast math enabled. RELNOTES: n/a PiperOrigin-RevId: 167646637 --- Commit aebe8cc6f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Call HloComputation.Accept instead of HloInstruction.Accept to get all instructions profiled. RELNOTES: n/a PiperOrigin-RevId: 167640259 --- Commit 0ab137cd8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 167604306 PiperOrigin-RevId: 167800256 * Update ops-related pbtxt files. PiperOrigin-RevId: 167802521 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 167804076 * Add sloppy_interleave dataset operator. When feeding data at high speed into a model from variable-latency data sources, head-of-line blocking can be a significant concern when using a deterministic input pipeline, such as interleave. This change introduces a new non-deterministic dataset operator that avoids head-of-line blocking. PiperOrigin-RevId: 167810743 * Update ops-related pbtxt files. PiperOrigin-RevId: 167811375 * tfdbg: Fix python3 breakage in grpc debug tests caused by bytes-type plugin_data content PiperOrigin-RevId: 167812508 * [XLA] Rip CheckFusionNode() out of instruction, and move it into the HLO verifier instead. CheckFusionNode() is linear in the size of the fusion node, and was called once per Fuse(), leading to run-time quadratic in the fusion node's size. PiperOrigin-RevId: 167812735 * Disable tensorflow/contrib/data/python/kernel_tests/sloppy_transformation_dataset_op_test.py in cmake.

Commit:9e9ffa3
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Unify all profile files (graph,run_meta,op_log) into one. Also allow profiler to serialize/deserialize to/from file. PiperOrigin-RevId: 167815923

Commit:4d24e67
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Make content field of PluginData a bytes field The content field had previously been a string field, which had been problematic because string fields can only store UTF-8 strings. This problem can manifest in various ways. For instance, take the precision-recall curve plugin. Its summary collects data that scales in size based on the number of thresholds. When the content field is a string, the summary logic serializes the relevant data proto just fine when we only have a few thresholds (about 100). However, for large numbers of thresholds (ie, around 200), the summary logic fails to serialize and throws a cryptic error. ValueError: '\x10\xc8\x01' has type str, but isn't valid UTF-8 encoding. Non-UTF-8 strings must be converted to unicode objects before being added. Changing the content field to a bytes field fixes this issue because bytes fields are not restricted to UTF-8 strings. I just happened to have needed a long enough string for the string to no longer be a valid UTF-8 one. PiperOrigin-RevId: 167790594

Commit:6eac979
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Include layout of convolution input data in the op_profile. PiperOrigin-RevId: 167680208

Commit:0302320
Author:Eric Liu
Committer:TensorFlower Gardener

[tpu:profiler] Write RunMetadata of the computation graph to event file, if available. The RunMetadata can be used to annotate HLO graphs with colors based on node compute time. PiperOrigin-RevId: 167477021

Commit:6cdc01c
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Automatically fill in num_classes, growing_mode, pruning_mode, learning_rate and multi_class_strategy if not specified. PiperOrigin-RevId: 167188663

Commit:ccc8180
Author:Benoit Steiner
Committer:TensorFlower Gardener

Annotate the performance database with expected distribution of values. PiperOrigin-RevId: 166876854

Commit:fa75ba9
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add debug flag to disable expensive LLVM optimization passes. RELNOTES: n/a PiperOrigin-RevId: 166766323

Commit:70a2de1
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Add reduce-precision-insertion pass options to address operation inputs. We add options to reduce the precision of individual operation inputs before optimization, and to reduce the precision of inputs to fusion nodes. This change also renames and significantly refactors the "pass timing" option enum, since it is no longer just about the pass timing. PiperOrigin-RevId: 166760001

Commit:587d728
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Refactor reduce-precision-insertion filters, add several more options. In particular, this adds the ability to add reduce-precision operations after fusion nodes based on the contents of those fusion nodes, and the ability to filter operations based on the "op_name" metadata. PiperOrigin-RevId: 166408392

Commit:b23b244
Author:Eric Liu
Committer:TensorFlower Gardener

[tpu:profiler] Support the Op Profile tool in TPU profiler. o Add an op_profile proto that defines a Profile class which assembles a hierarchical performance profile based on HLOs in trace_events. o Dump JSON-formatted op profile proto to the log directory. PiperOrigin-RevId: 166318667

Commit:95db7ae
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Makes tf.Variable return correct initialized_value and initial_value for objects created from VariableDef protos. Previously self._initial_value wasn't set in such cases which causes accessing var.initial_value to fail for variables in the imported meta graphs. PiperOrigin-RevId: 166252647

Commit:7359fec
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Implement Batchnorm Inference by expanding them into smaller ops. 1. Add batch norm inference support in batchnorm_rewriter 2. Connect xla's batchnorm inference to tf's FusedBatchNorm RELNOTES: n/a PiperOrigin-RevId: 165655351

Commit:c247826
Author:Benoit Steiner
Committer:TensorFlower Gardener

Added preliminary support for arithmetic simplifications PiperOrigin-RevId: 165476236

Commit:28ce1d1
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Merge changes from github. END_PUBLIC --- Commit 9f81374c3 authored by raymondxyang<zihao.yang@microsoft.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Add option for build more python tests in Cmake (#11853) * Ignore Windows built project * Fix deprecated methods in tf.contrib.python * Fix regex match for Windows build in contrib.keras * Fix Regex match for Windows build in session_bundle * * Fix deprecated methods * Fix regex match for Windows * Fix compatibility issue with Python 3.x * Add missing ops into Windows build for test * Enabled more testcases for Windows build * Clean code and fix typo * Add conditional cmake mode for enabling more unit testcase * Add Cmake mode for major Contrib packages * Add supplementary info in RAEDME for new cmake option * * Update tf_tests after testing with TF 1.3 * Clean code and resolve conflicts * Fix unsafe regex matches and format code * Update exclude list after testing with latest master branch * Fix missing module --- Commit 98f0e1efe authored by Yong Tang<yong.tang.github@outlook.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Dynamic ksize and strides with MaxPool (#11875) * Dynamic ksize with max_pool This fix tries to fix the issue raised in 4746 where ksize is static (attr) with max_pool. This fix changes ksize to input tensor so that it is dynamic now. This fix fixes 4746. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add dynamic ksize to MaxPoolGrad and MaxPoolGradGrad Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Add test cases for max_pool_v2 Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Fix GPU Jenkins issue. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Enable MaxPoolV2 in GPU Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Hide MaxPoolV2 and other fixes. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 02d6bc185 authored by Bairen Yi<byronyi@users.noreply.github.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: remove useless variable (#12212) --- Commit ed6b0d905 authored by namrata-ibm<bhavenamrata@gmail.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Adding support for s390x in calculation of cpu_frequency (#12201) --- Commit 627dfc9dd authored by Taehoon Lee<taehoonlee@snu.ac.kr> Committed by Taehoon Lee<taehoonlee@snu.ac.kr>: Fix typos --- Commit c0f9b0a91 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: In fast-math mode emit a tanh that has a faster min/max. PiperOrigin-RevId: 164943597 --- Commit 87605f3d6 authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Use HloEvaluator for ComputeConstant, remove the need of a dedicated compute constant backend. PiperOrigin-RevId: 164940970 --- Commit 881de45c2 authored by Taehoon Lee<me@taehoonlee.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Add bool type supports for GPU kernels (#11927) * Add bool type supports for GPU kernels * Add bool type test codes for GPU kernels --- Commit eeacdcdb1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add missing "CPU" suffix in registrations. PiperOrigin-RevId: 164939527 --- Commit de01be952 authored by namrata-ibm<bhavenamrata@gmail.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Adding support for Big Endian in graph_constructor_test and wav_io (#12179) --- Commit 26719d29f authored by QingYing Chen<pkudysj@126.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Implement CRF decode (Viterbi decode) for tensor (#12056) * Implement CRF decoding for tensors * add test code for tensor version's CRF decoding * made modifications according to pylint * add some comments for crf decode * remove useless code * add comments at the top comment of crf module and add more comments in crf_test * capitalize first char of first word in comments * replace crf_decode test code with a deterministic example --- Commit f9a81ca2f authored by Pete Warden<pete@petewarden.com> Committed by gunan<gunan@google.com>: Create CI build script for Raspberry Pi (#12190) * Create CI build script for Raspberry Pi * Moved location of Pi build script --- Commit e2a163a90 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Merge code from PR #11940 with internal changes from cl/164796436, and update Python tests to also run on GPU. PiperOrigin-RevId: 164929133 --- Commit 08bbfa187 authored by Taehoon Lee<me@taehoonlee.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Fix typos (#12195) --- Commit ab96f41fb authored by Luke Iwanski<luke@codeplay.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: [OpenCL] Extends matmul_benchmark.py to cover SYCL (#11697) * [OpenCL] Extends matmul_benchmark.py to cover SYCL * Fixed typo * /gpu:0 -> /device:GPU:0 * Fixes control_flow_ops_py_test * /gpu: -> /device:GPU: * Fixes //tensorflow/python/profiler/internal:run_metadata_test * gpu: -> GPU: * Fixes tfprof_node * [OpenCL] Fixes device path to name with many colons (#123) The device path is constructed from a device name by replacing all colons with underscores. Some device names contain more than one colon, for example 'device:SYCL:0' which gives a path 'device_SYCL_0'. The previous code would not convert this back to the original device name, but rather to 'device:SYCL_0'. An alternative fix would be to convert all underscores to colons in the device name (i.e. remove the restriction inside `replace("_", ":", 1)`), however I'm not sure if there are any device names which contain underscores. * If no gpu device aviable fake one * gpu: -> device:GPU * Fixes profiler test * /gpu:x -> /device:GPU:x * Fixes debug_io_utils_test.cc test * Fixes device_name_utils_test.cc --- Commit 35e7a3665 authored by Yong Tang<yong.tang.github@outlook.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: Remove unneeded casting of int64 for reverse_sequence (#12192) This fix remove unneeded cast of int64 for reverse_sequence: ``` lengths = math_ops.to_int64(lengths) ``` as int32 has already been enabled for reverse_sequence. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> --- Commit 9fba8c185 authored by Anna R<annarev@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add benchmark dashboard link to benchmarks doc. Also, I added a link and description for Benchmarks page to Community index page. PiperOrigin-RevId: 164924906 --- Commit bb6f32fa7 authored by Mark Heffernan<meheff@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Make HloAliasAnalysis updatable after changes to the HLO graph. As part of this change make HloAliasAnalysis a thinner layer which basically only holds a map from HloValue to HloBuffer and vice versa. PiperOrigin-RevId: 164923041 --- Commit 9103096c1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by Thomas K?ppe<tkoeppe@google.com>: Merged commit includes the following changes: 164923041 by meheff: Make HloAliasAnalysis updatable after changes to the HLO graph. As part of this change make HloAliasAnalysis a thinner layer which basically only holds a map from HloValue to HloBuffer and vice versa. -- PiperOrigin-RevId: 164923041 --- Commit 822603aed authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Merging sibling fusion instruction using multi_output_fusion PiperOrigin-RevId: 164920220 --- Commit c035aa2a8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 164917891 --- Commit e1e81d9ba authored by Luke Iwanski<luke@codeplay.com> Committed by Rasmus Munk Larsen<rmlarsen@google.com>: [OpenCL] Fixes double memcpy bug (#151) (#12173) * [OpenCL] Fixes double memcpy bug (#151) As the debg CopyOp is called on a Tensor without type, we need to use the DataType enum to get type information, and use this to pass the type on to Eigen. This is a workaround Eigen's need to have a type when calling memcpy. If the Eigen memcpy can be provided without a type requirement, then the memcpy in sycl_util is unnecessary. * Acts on feedback from: #12173/files/32cb12a9001b672425867b5a3110fd98e737a20b#r132496277 --- Commit d9ca2d86d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change PiperOrigin-RevId: 164916465 --- Commit b8d13d218 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove more parts of DCASGD missed in the first pass. (47949b) PiperOrigin-RevId: 164914552 --- Commit 73b3d52c7 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: cmake fix PiperOrigin-RevId: 164911656 --- Commit 2173b5b0a authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Allow TFE_TensorHandleCopyToDevice to have the same device as src and destination. It will reuse the same underlying buffer in those cases. PiperOrigin-RevId: 164909906 --- Commit 13eb3b90e authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Experimental C and Python APIs to invoke TensorFlow kernels on concrete values. PiperOrigin-RevId: 164902588 --- Commit 7dfabcc01 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Initialize ExecutionOptions in ComputeConstant to default values. PiperOrigin-RevId: 164894867 --- Commit c8897e9bc authored by Benoit Steiner<bsteiner@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Static required time computation PiperOrigin-RevId: 164894645 --- Commit 076158f9b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Enable implicit->explicit conversion by default. PiperOrigin-RevId: 164890915 --- Commit 58c4a4cb1 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Bugfix: number of input channels is not necessarily in the last dimension, after introduction of data_format param. PiperOrigin-RevId: 164889729 --- Commit 8f9b1af8a authored by Igor Saprykin<isaprykin@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Recover MonitoredSession when the Coordinator is requested to stop with one of the _PREEMPTION_ERRORS. When SyncReplicasOptimizer is used, a preemption in the Coordinator may result in two cases: Case 1) the session gets silently marked as complete Case 2) the session gets stuck This CL aims to solve and verify solutions for both of these problems. Fix 1 changes the should_stop logic. Fix 2 changes the CoordinatedSession.run() logic. SyncReplicasOptimizer runs a separate set of threads using a Coordinator instance. Those threads do FIFOQueue.enqueue; the main thread does a blocking FIFOQueue.dequeue. `sync_token_q` FIFOQueue is on parameter-servers. When one of the PS instances gets preempted, an AbortedError causes the Coordinator to stop via request_stop(ex). That by itself changes the state of MonitoredSession.should_stop() to True (Fix 1). Results of the blocking Dequeue operation are sent to the chief worker via Recv. What happens next depends on the amount of tokens in `sync_token_q`. If there are enough for the next call to Dequeue to return, then the low-level "tf session run() call" returns. The next iteration of the `while not MonitoredSession.should_stop()` loop decides that the training is complete (Case 1). If there are not enough tokens in `sync_token_q`, then the blocking Dequeue is going to keep waiting for them. This results in the graph execution getting stuck and the whole session getting garbage collected after 10 minutes (Case 2). We decided to fix that by re-creating a session after it gets garbage collected (Fix 2). An alternative was to try to cancel the pending Dequeue operation, but it's not clear that it is the right thing to do and it is also not easy. PiperOrigin-RevId: 164888390 --- Commit 46e4de6e5 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Undo loop fusion changes for now as they seem to be altering a few results. END_PUBLIC RELNOTES: n/a BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 164825735 PiperOrigin-RevId: 165340331

Commit:0f5d179
Author:Amit Patankar

Merge commit for internal changes

Commit:aea1134
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Turn the deviced_id and resource_id fields of TraceEvents from uint64 to uint32 to reduce memory consumption. On the wire, this should be fully compatible unless values > uint32max were actually ever used, which is unlikely. PiperOrigin-RevId: 165171764

Commit:3c482c6
Author:Shanqing Cai
Committer:TensorFlower Gardener

tfdbg: extend grpc_debug_server protocol for interactive debugging Previously, a grpc-gated debug op has two modes: DISABLED and ENABLED. This CL splits the ENABLED state into two states: READ_ONLY and READ_WRITE. * READ_ONLY is equivalent to the previous ENABLED state, wherein a debug op publishes debug tensor to the grpc debug server and proceeds. It can be regarded as a "watchpoint" that doesn't block execution. * READ_WRITE is a "breakpoint". In addition to publishing the debug tensor, it blocks and awaits a EventReply proto response from the grpc debug server before proceeding. PiperOrigin-RevId: 164987725

Commit:8ef9eee
Author:Rasmus Larsen

Merge commit for internal changes

Commit:87605f3
Author:Kay Zhu
Committer:TensorFlower Gardener

[TF:XLA] Use HloEvaluator for ComputeConstant, remove the need of a dedicated compute constant backend. PiperOrigin-RevId: 164940970

Commit:08bbfa1
Author:Taehoon Lee
Committer:Rasmus Munk Larsen

Fix typos (#12195)

Commit:ab96f41
Author:Luke Iwanski
Committer:Rasmus Munk Larsen

[OpenCL] Extends matmul_benchmark.py to cover SYCL (#11697) * [OpenCL] Extends matmul_benchmark.py to cover SYCL * Fixed typo * /gpu:0 -> /device:GPU:0 * Fixes control_flow_ops_py_test * /gpu: -> /device:GPU: * Fixes //tensorflow/python/profiler/internal:run_metadata_test * gpu: -> GPU: * Fixes tfprof_node * [OpenCL] Fixes device path to name with many colons (#123) The device path is constructed from a device name by replacing all colons with underscores. Some device names contain more than one colon, for example 'device:SYCL:0' which gives a path 'device_SYCL_0'. The previous code would not convert this back to the original device name, but rather to 'device:SYCL_0'. An alternative fix would be to convert all underscores to colons in the device name (i.e. remove the restriction inside `replace("_", ":", 1)`), however I'm not sure if there are any device names which contain underscores. * If no gpu device aviable fake one * gpu: -> device:GPU * Fixes profiler test * /gpu:x -> /device:GPU:x * Fixes debug_io_utils_test.cc test * Fixes device_name_utils_test.cc

Commit:11e2aef
Author:Bairen Yi
Committer:Rasmus Munk Larsen

[WIP] GPU Direct RDMA Out-of-Band Tensor Transport (#11392) * GPU Direct RDMA Out-of-Band Tensor Transport * [WIP] GPU Direct with customized allocator * [WIP] Data race problem * [WIP] Refactor and add checksum for GDR * [WIP] Add debug string to checksum check * Final piece of host memory fallback * Bugfix on memory region management * Add RDMA library headers as third party dependency * Revert "Add RDMA library headers as third party dependency" This reverts commit 5993e37f7f64a5c1e0d0645c59401f14e52ce3aa. * make buildifier happy * fix errors for non-RDMA target * fix good path * fix dangling pointer * add compile switch for GDR * make buildifier happy * tidy source format using clang-format --style=google * using buildifier to auto format * fix macro * fix build config * Fix a performance bug (and #11411 hopefully) * fix kUnknownNumaNode * prepare for a cleaned up refactoring * remove unnecessary changes * finishing moving to contrib * several quick fixes * fix tests * remove wrong stop() call in server join * fix a init race condition for gdr w.r.t. cuda * better reporting of errors via errno * add check for wildcard and loopback address * do not visit gpu allocators if no gpu is active * Revert "do not visit gpu allocators if no gpu is active" This reverts commit 3c54f86d24a0a2bbfc9339ccdf5bc6eff192344b. * fix for latest gcc and cpu-only build * complain louder on server side * reduce CPU overhead using event mode * leave platform-neutral gdr to future work * add docs * fix interference with MKL CPU allocator * fix testing if a tensor is on host * add several TODOs and checks * add a readme * Update README.md * update readme.md * fix several issues under VLOG * fix typo in readme * fix a race condition of GDR * remove unintended changes

Commit:4c60c96
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Make plugin_data an optional field of SummaryMetadata Every summary op writes data for a single plugin to process. Hence, each SummaryMetadata proto should have a single PluginData optional field (instead of a repeated one). This removes much complexity from TensorBoard logic that loops over the plugin data. It also simplifies the SQL schema - it can now enforce a one-to-one relationship between summary op and plugin. PiperOrigin-RevId: 164659570

Commit:43eaa4c
Author:Justine Tunney
Committer:Benoit Steiner

Make PluginData best practice be binary proto Unlike JSON, protobufs have schemas with types and generate object mappers for us. This makes it easier to implement. The proto definition makes it clear during code review when the structure of permanent data is being changed. It also gives us a history of what it looked like at past revisions. This makes it easier for us to support data in the long term. PiperOrigin-RevId: 164032776

Commit:1848070
Author:A. Unique TensorFlower
Committer:Benoit Steiner

Fine-grained memory profiling Add residual_bytes, peak_bytes and output_bytes. Allow to order/select/filter by accelerator_micros/cpu_micros/peak_bytes/residual_bytes/output_bytes Also updated the testdata. PiperOrigin-RevId: 164079214

Commit:121ac2b
Author:A. Unique TensorFlower
Committer:Benoit Steiner

Enable an event limit in a profiling request. Clients usually cannot display more than 1,000,000 usefully. Thus, cutting off in order to reduce the size of data that clients have to deal with seems useful. PiperOrigin-RevId: 164018578

Commit:19c27ef
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Fine-grained memory profiling Add residual_bytes, peak_bytes and output_bytes. Allow to order/select/filter by accelerator_micros/cpu_micros/peak_bytes/residual_bytes/output_bytes Also updated the testdata. PiperOrigin-RevId: 164079214

Commit:27e7c8f
Author:Justine Tunney
Committer:TensorFlower Gardener

Make PluginData best practice be binary proto Unlike JSON, protobufs have schemas with types and generate object mappers for us. This makes it easier to implement. The proto definition makes it clear during code review when the structure of permanent data is being changed. It also gives us a history of what it looked like at past revisions. This makes it easier for us to support data in the long term. PiperOrigin-RevId: 164032776

Commit:6f786dd
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Enable an event limit in a profiling request. Clients usually cannot display more than 1,000,000 usefully. Thus, cutting off in order to reduce the size of data that clients have to deal with seems useful. PiperOrigin-RevId: 164018578

Commit:93bff4d
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Name change to strides (plural) in XLA service's Slice op. PiperOrigin-RevId: 163924726

Commit:edac90c
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Add support to generate pprof results to tf.profiler A fun thing is, it can not only profile time,memory but also parameters, etc. PiperOrigin-RevId: 163767517

Commit:724884f
Author:Justin Lebar
Committer:TensorFlower Gardener

Show layouts in HLO graph dump. Layouts are displayed as e.g. "f32[100,200]{0,1}". But constants used to be displayed as e.g. "f32[]{42}". To avoid ambiguity, constants are now displayed as e.g. "42 (f32[])". Also gets rid of the xla_hlo_graph_layout flag, which is no longer necessary since we're now showing layouts unconditionally. PiperOrigin-RevId: 163753637

Commit:6263539
Author:Allen Lavoie
Committer:TensorFlower Gardener

Grappler memory optimization: allow inputs to gradients with non-standard names to be recomputed Includes Python tests for name-scoped gradients. PiperOrigin-RevId: 163720208

Commit:a1fba7f
Author:Vijay Vasudevan
Committer:TensorFlower Gardener

Merge changes from github. END_PUBLIC I dropped the following commit because it doesn't compile. I will follow up with Andrew to fix it or revert it. Commit 003deb88b authored by osdamv<osdamv@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: Refactor and implementation of the camera API 1, it fixes #8736 (#10771) List of commits in this CL: --- Commit 446450369 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Use identity of param variable in cudnn_rnn.RNNParamsSaveable instead of parameter variable directly. The RNNParamsSaveable is usually used in a graph which also has a saver for the cudnn param variable itself, if the same op is used for both, fails with a two savers for same op error. PiperOrigin-RevId: 163431826 --- Commit d629a8316 authored by RJ Ryan<rjryan@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Increase bound on tf.contrib.signal.inverse_stft gradient error to avoid flakiness on macOS. PiperOrigin-RevId: 163426631 --- Commit 253bcbb71 authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Use HloEvaluator for convolution in reference_util. Also Speed up HloEvaluator's HandleConvolution in non-opt build, by moving calls to HloInstruction::shape() out of the inner loop. PiperOrigin-RevId: 163416183 --- Commit 569a00e68 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update API to traffic in unique_ptrs rather than owning raw pointers PiperOrigin-RevId: 163414320 --- Commit 31a77bc77 authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Java: Update release to 1.3.0-rc1 PiperOrigin-RevId: 163413736 --- Commit 1ebbf4325 authored by Jonathan Hseu<vomjom@vomjom.net> Committed by GitHub<noreply@github.com>: Add missing grpc dependency (#11828) --- Commit 905abb1f9 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Test asserts should have `expected` first. PiperOrigin-RevId: 163409348 --- Commit d5cc143e2 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Increase timeout to deflake the test. PiperOrigin-RevId: 163407824 --- Commit ce1c7f02a authored by Eli Bendersky<eliben@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Properly include logging header in xla_internal_test_main PiperOrigin-RevId: 163405986 --- Commit 22241cd42 authored by joetoth<joetoth@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: External leveldb link changed (#11833) table_format.txt was renamed to table_format.md --- Commit 6b7314de4 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Consolidating the code to fill the partition's function library into one place. Previously, Partition() and MasterSession::RegisterPartition() both fills in the partitioned graph's function library. PiperOrigin-RevId: 163400992 --- Commit 28373cfe7 authored by Frank Chen<frankchn@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Adds preliminary support for Cloud TPUs with Cluster Resolvers. This aims to allow users to have a better experienec when specifying one or multiple Cloud TPUs for their training jobs by allowing users to use names rather than IP addresses. PiperOrigin-RevId: 163393443 --- Commit e5353c941 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Don't prune nodes that have reference inputs. PiperOrigin-RevId: 163390862 --- Commit 226510834 authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: C API: Groundwork for experimenting with TF_Tensor in device memory. TF_Tensor objects are always backed by host memory. This commit lays the groundwork for allowing TF_Tensor objects to refer to tensor data on device (e.g., GPU) memory. PiperOrigin-RevId: 163388079 --- Commit 613bf1c7c authored by Yuefeng Zhou<yuefengz@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: fix asan test failure in SingleMachineTest::ReleaseMemoryAfterDestruction. PiperOrigin-RevId: 163386941 --- Commit 4653d37a3 authored by Eli Bendersky<eliben@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Change type to appease GPU builds. PiperOrigin-RevId: 163384927 --- Commit 9f131bd15 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Internal change PiperOrigin-RevId: 163378484 --- Commit 8bc0236c8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: PiperOrigin-RevId: 163366493 --- Commit 3b97f1f9b authored by Yangzihao Wang<yangzihao@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Change to only run one round of matmul benchmark. PiperOrigin-RevId: 163364341 --- Commit a4a3a3335 authored by Yun Peng<pcloudy@google.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix ./configure on Windows (#11775) * Fix ./configure on Windows * Disable bitwise_ops_test on Windows --- Commit ae3119d16 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Small changes to op framework. PiperOrigin-RevId: 163361071 --- Commit f40189d26 authored by qjivy<ji.qiu@spreadtrum.com> Committed by Vijay Vasudevan<vrv@google.com>: PR again: Enable building label_image with jpeg/gif/png decoder for Android. (#11475) * Enable building label_image with jpeg/gif/png decoder for Android. Add dependency "android_tesnorflow_image_op" to label_image, which is not overlapped with android_tensorflow_kernels. * Running buildifier to reformat the BUILD files for sanity check. --- Commit 599165861 authored by KB Sriram<kbsriram@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: Add the Constant operator class (#11559) Create a custom operator class to create constants in the Graph, and introduce the Operator marker annotation to identify operator classes. Please see #7149 for the master tracking issue. --- Commit 86ca3506f authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Further BUILD cleanup PiperOrigin-RevId: 163360750 --- Commit 376bb063b authored by Pete Warden<petewarden@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Look inside functions to see which node types are used. PiperOrigin-RevId: 163360375 --- Commit 2139e7d8b authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tf.contrib.data] map expects a nested structure. Fixes #11786 PiperOrigin-RevId: 163359134 --- Commit d09304fca authored by Jonathan Hseu<vomjom@vomjom.net> Committed by Vijay Vasudevan<vrv@google.com>: Upgrade gRPC (#11768) * BUILD rule modifications * More build fixes * Code changes * More code fixes * Working tests * CMake build * Fix pprof * Fix header includes * CMake fix test * Bazel clean * Fix verbs * More verbs fixes * bazel clean for XLA * Windows build fix test * Add openssl/rand.h * New cmake build command * --config Release --- Commit 3cd828474 authored by David Norman<DavidNorman@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Fix error with default python path selection (#11814) * Fix error with default python path selection * Move setting of environment var outside if / else --- Commit ddd8e21b7 authored by Eli Bendersky<eliben@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Consolidate all similar main()s in tests into a single target. PiperOrigin-RevId: 163354724 --- Commit a36bca25b authored by Tayo Oguntebi<tayo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove ShapeWithoutPadding() utility function, as it is no longer needed. PiperOrigin-RevId: 163353430 --- Commit b26f9cd44 authored by David Norman<DavidNorman@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Ensure that the multi-instruction fuse can take shared inputs (#11748) * Ensure that the multi-instruction fuse can take shared inputs Note that the fuse action only works when the shared input / constant appears after all of its consumers in the list of instructions. * Add a comment describing the test --- Commit 34cbf161d authored by Jiri Simsa<jsimsa@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update Dataset API documentation. PiperOrigin-RevId: 163349457 --- Commit 2381ce5c3 authored by Abdullah Alrasheed<a.rasheed@tc-sa.com> Committed by Vijay Vasudevan<vrv@google.com>: DOC: Fix typo. (#11813) you could could be I/O bottlenecked. TO: you could be I/O bottlenecked. --- Commit e4a5c5356 authored by Toby Boyd<tobyboyd@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: ["Variable", "VariableV2", "VarHandleOp"] is the default for ps_ops=None PiperOrigin-RevId: 163344629 --- Commit 722f6f361 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix TensorForest's saveable object names so loading a savedmodel works. PiperOrigin-RevId: 163332598 --- Commit cda80a785 authored by Eric Liu<ioeric@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [tpu profiler] Dump HLO graphs in profile responses to the log directory. PiperOrigin-RevId: 163318992 --- Commit cea9ef6f5 authored by horance<horance-liu@users.noreply.github.com> Committed by Vijay Vasudevan<vrv@google.com>: Refactoring device name utils (#11797) * remove duplicated code for full_name and legacy_name for DeviceNameUtils * replace tabs * Real->Device --- Commit 1f7c0f917 authored by Kongsea<kongsea@gmail.com> Committed by Vijay Vasudevan<vrv@google.com>: Refine docstrings (#11800) --- Commit dd1f0cddd authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Supports lookup devices by fullname either in the canonical form or the legacy form. This makes DeviceSet behaves the same as DeviceMgr's FindDevice method. PiperOrigin-RevId: 163300346 --- Commit 631a364cd authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add Reduce, DynamicSlice and DynamicSliceUpdate to HloEvaluator. - Reduce is disabled explicitly for constant folding, as not all types of embedded computation can be currently supported by the evaluator. - Added support to evaluate HloModule to HloEvaluator. - Minor signature change to Evaluate(). PiperOrigin-RevId: 163299238 --- Commit a52470172 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Sets the incarnation number even when the attribute is set. PiperOrigin-RevId: 163299121 --- Commit a49fe0366 authored by Suharsh Sivakumar<suharshs@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove platform bridge for grpc_response_reader. PiperOrigin-RevId: 163295986 --- Commit 4404aa7cb authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add TODO comment explaining why the IsScalar check exists. PiperOrigin-RevId: 163292777 --- Commit 43036ac16 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove unnecessary break statements. PiperOrigin-RevId: 163291947 --- Commit fd5de4690 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add regression test for a corner case using Reduce that currently fails with the GPU backend. PiperOrigin-RevId: 163287986 --- Commit 32e198f2d authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Add tf.cross support. See #11788 PiperOrigin-RevId: 163287731 --- Commit 88abddbc3 authored by Alan Yee<alyee@ucsd.edu> Committed by Vijay Vasudevan<vrv@google.com>: Update README.md (#11793) Remove bad practices of sudo pip and install use safer pip install commands --- Commit 9b30dc3a8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove final mentions of `get_shape` in docstring. PiperOrigin-RevId: 163282839 --- Commit 423c1eea0 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BREAKING CHANGE: Fix semantic error in how maybe_batch* handles sparse tensors. PiperOrigin-RevId: 163276613 --- Commit 6028c071b authored by Justin Lebar<jlebar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Highlight incoming/outgoing edges on hover in HLO graphviz dumps, and other improvements. Other improvements: - Don't show tooltips for nodes and clusters. Previously we'd show a tooltip containing a pointer value expressed as decimal. Not so useful. - Show tooltips on edges with the to/from node names. - Fix bug wherein if we had - a node at the "edge" of the graph (so its operands aren't included unless they're referenced by another node), - with all of its operands included in the graph save one or more constants, and - those constants weren't referenced by any nodes not at the edge of the graph, we would incorrectly draw the node as "grayed out", indicating that one of its operands (namely, its constant operand) wasn't present in the graph. This is wrong because constants are inlined into their users, so they should always count as "displayed" for the purposes of determining whether a node is grayed out. PiperOrigin-RevId: 163276108 --- Commit ce7a355bd authored by Joshua V. Dillon<jvdillon@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update contrib/distributions/estimator_test build dependency. PiperOrigin-RevId: 163272464 --- Commit 1b8458a1c authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Shorten docstring line. PiperOrigin-RevId: 163269709 --- Commit 69e323cc6 authored by Asim Shankar<ashankar@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fix comment ypo PiperOrigin-RevId: 163266376 --- Commit 08790e73d authored by Chris Leary<leary@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Fix a bug in cloning outfeeds, carried the wrong shape. PiperOrigin-RevId: 163265592 --- Commit 1bad826d6 authored by Yangzihao Wang<yangzihao@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Rollback of GPU kernel implementation of transpose for tensors with one small dimension. END_PUBLIC BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 162525519 PiperOrigin-RevId: 163490703

Commit:94934d9
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Optionally output a new TreePath proto during TensorForest inference for ultimate interpretability. PiperOrigin-RevId: 163466324

Commit:ef06051
Author:Vijay Vasudevan

Merge commit for internal changes

Commit:067deeb
Author:Eric Liu
Committer:TensorFlower Gardener

[tpu:profiler] Make duration of tracing changeable. Make the default duration 2 seconds. Also avoid dumping empty trace to log directory. PiperOrigin-RevId: 163248374

Commit:51cf7d0
Author:Benoit Steiner
Committer:TensorFlower Gardener

Updated the memory optimization config to introduce an explicit default value. This will make it possible change change the default behavior in the future by updating the meta optimizer code to interpret that default value differently (e.g we could assume default means heuristics). The default value remains OFF. PiperOrigin-RevId: 163239483

Commit:caca1c5
Author:Benoit Steiner
Committer:TensorFlower Gardener

Introduced a default setting for constant folding, currently set to OFF. Will be turned to on later on. PiperOrigin-RevId: 163233994

Commit:62de0b9
Author:Vijay Vasudevan
Committer:GitHub

Branch 163121296 (#11767) * Update ops-related pbtxt files. PiperOrigin-RevId: 163014080 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 163014834 * Removing session reset since destroying the session object would delete its variables as well. Resetting session might unintentionally close other sessions in the same process. PiperOrigin-RevId: 163019166 * [XLA] Teach CPU and GPU compilers to optionally invoke the HLO insert-reduce-precision-operations pass. This also required a few additions and fixups. We add pieces to ReducePrecisionInsertion to translate between the protocol-buffer representation of the pass options and the predicate-function actually used in the pass. To facilitate this translation, we also add a function to HloOpcode to return the number of opcodes so that we can iterate over the whole set easily. PiperOrigin-RevId: 163037250 * Refactor HLO graph dumping. This also makes a few minor cosmetic changes, like moving the fusion type out of the fusion node and into the out-of-line computation and adjusting the arrow labels that we use to indicate operand numbers. PiperOrigin-RevId: 163038795 * Use correct order of arguments in call of valid_bitcast_callback_. There are platforms where bitcasts are not symmetric. I.e. there are shapes A and B so that A->B is a bitcast, but B->A not. So we have to consider the correct order when calling valid_bitcast_callback_. PiperOrigin-RevId: 163058665 * Two improvements to pip.sh 1. Distinguish between passed and skipped tests. 2. Allow skipping the smoke test of tensorflow install in clean virtualenv with NO_TEST_ON_INSTALL=1 PiperOrigin-RevId: 163065599 * [XLA] Update StatusOr implementation to use more nuanced type traits. Previously we would evaluate the is_copy_constructible trait before template parameters were fully defined; e.g. StatusOr<ThingIAmDefiningRightNow>, which could lead to surprising effects. Also, previously it was not possible to provide an error status to a StatusOr<T> where T was not default-constructible. PiperOrigin-RevId: 163073057 * [TF:XLA] Register a no-op kernel for ControlTrigger, but forbid the JIT marking pass from compiling ControlTrigger nodes. CL in preparation for compiling dynamic RNN gradients via XLA. PiperOrigin-RevId: 163073212 * Improve the HLO graph dumper's output. - Truncate long shapes. It's not uncommon to have giant tuples, and displaying the whole thing makes the graph unreadable. - Don't traverse into the users of a node with < 16 users. These are probably not interesting, and traversing into them can quickly blow up the graph, making it un-renderable. - Allow nodes which have multiple trivial subcomputations (e.g. select-and-scatter) to have those computations inlined. - Match additional patterns in MatchTrivialComputation PiperOrigin-RevId: 163079329 * If the value to be forwarded from a loop to its gradient is a constant, clone the constant instead of repeatedly pushing it onto a stack on each iteration. This should never consume more memory than the stack approach (notwithstanding swapping), and frequently should be much better. This change is in preparation for enabling XLA compilation of RNN gradients. PiperOrigin-RevId: 163082165 * [TF:XLA] Make the shape of a TensorArray flow value a scalar. Previously we used an f32[0] value, since the exact flow value does not matter, however this causes problems when a TensorArray computation is placed in a loop since the shape of the flow value is no longer loop invariant. PiperOrigin-RevId: 163082452 * Automated g4 rollback of changelist 163019166 PiperOrigin-RevId: 163083436 * Automated g4 rollback of changelist 162769374 PiperOrigin-RevId: 163086518 * internal change PiperOrigin-RevId: 163088509 * Clarify docstring for tf.rank. PiperOrigin-RevId: 163089480 * Reduce gather_op_test timeouts by reducing the size of testHigherRank. PiperOrigin-RevId: 163090428 * Add PopulationCount op (popcnt): element-wise counts the number of "on" bits. PiperOrigin-RevId: 163090921 * Show fusion nodes inline in HLO graph dumper. To make this work sanely I had to change NodeFilter so that it says to dump all nodes inside subcomputations. Previously, we passed an explicit NodeFilter down to DumpSubcomputation, and used that to control whether or not we dumped nodes in there. But this becomes unwieldy with inline fusion nodes, as sometimes you want to look at 'filter', and other times you want to look at 'filter_', and there's no good way to tell why. I also had to remove the heuristic whereby we'd pull in operands of nodes with just some operands shown. With the much bigger nodes that are generated by this change, the graph was becoming illegible. I think most of the confusion that heuristic was attempting to avoid is addressed by the fact that we "gray out" incomplete nodes. PiperOrigin-RevId: 163091423 * errors: Avoid stripping error details when convering POSIX errors to Status This change is made out of a desire to have additional information be reported when there are filesystem errors (for e.g. see #11628) PiperOrigin-RevId: 163091773 * C API: Fix a bug with TF_OperationGetAttrTensor when TF_STRING tensors are involved. The TensorBuffer owned by a TF_Tensor object has a different memory layout than the TensorBuffer owned by the corresponding tensorflow::Tensor object. This change consolidates conversions between the runtime's tensorflow::Tensor and the C API's TF_Tensor objects into a pair helper functions. The added test: CApiAttributesTest.StringTensor fails without corresponding changes to c_api.cc PiperOrigin-RevId: 163091789 * Speed up tf.contrib.signal spectral_ops_test.py by reducing the size of the gradient test. PiperOrigin-RevId: 163092423 * Add new CompareAndBitpackOp. PiperOrigin-RevId: 163093146 * Update ops-related pbtxt files. PiperOrigin-RevId: 163094455 * Minor tweaks to avoid unnecessary copies PiperOrigin-RevId: 163101160 * [BatchNormGrad] Add end-to-end test for BatchNormGrad RELNOTES: n/a PiperOrigin-RevId: 163101568 * Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 163102070 * [XLA] Add more unit tests for DynamicSlice and DynamicUpdateSlice. PiperOrigin-RevId: 163102445 * Adding missing deps to targets in llvm.BUILD. This was only working in non-sandboxed builds. PiperOrigin-RevId: 163103908 * Pass batch_size in params when use_tpu=False. PiperOrigin-RevId: 163105673 * Remove duplicate import. PiperOrigin-RevId: 163108237 * Implementation of UnsortedSegmentSum in tf2xla bridge. PiperOrigin-RevId: 163109769 * Add gradient checking tests for nn.moments(). PiperOrigin-RevId: 163110994 * Improved the speed of constant folding PiperOrigin-RevId: 163113085 * Convert configure to python. PiperOrigin-RevId: 163114551 * [TF:XLA] Ignore control edges from Enter nodes to the graph sink during loop functionalization. PiperOrigin-RevId: 163115904 * Support customized residual function in the residual wrapper. PiperOrigin-RevId: 163121296

Commit:82d7252
Author:Dandelion Man?
Committer:Amit Patankar

Add support for display_name and summary_description to the tf.summary.tensor_summary op. The display_name will be used to display the series in TensorBoard, in lieu of the tag, assuming that it is specified. (When it is not specified, behavior will stay the same.) The summary_description will allow the user to write a longform readable description of the summary series for display in TensorBoard. Markdown will be supported. This will make it possible for TensorBoard-2 summary ops to give the user direct control over the display name, solving https://github.com/tensorflow/tensorboard/issues/59. PiperOrigin-RevId: 162566261

Commit:d1a9ea6
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Teach CPU and GPU compilers to optionally invoke the HLO insert-reduce-precision-operations pass. This also required a few additions and fixups. We add pieces to ReducePrecisionInsertion to translate between the protocol-buffer representation of the pass options and the predicate-function actually used in the pass. To facilitate this translation, we also add a function to HloOpcode to return the number of opcodes so that we can iterate over the whole set easily. PiperOrigin-RevId: 163037250

Commit:cc34d2a
Author:Vijay Vasudevan

Merge commit for internal changes

Commit:2661f68
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Add support for sin(x) transcendental. PiperOrigin-RevId: 162889962

Commit:ccb051c
Author:Shanqing Cai

Merge commit for internal changes

Commit:557f4b8
Author:Fred Reiss
Committer:Jonathan Hseu

Added comment to clarify semantics of opt_level. (#11607)

Commit:fce5222
Author:Manjunath Kudlur
Committer:TensorFlower Gardener

Added DT_VARIANT type. A tensor with DT_VARIANT type can store arbitrary C++ data structures. DT_VARIANT is implemented using a type-erased data structure similar to std::any, but with extensions to make it compatible with tensorflow::Tensor. In particular, Encode and Decode methods need to be provided by C++ classes whose objects are stored in Variant. PiperOrigin-RevId: 162754827

Commit:83faa85
Author:Manjunath Kudlur
Committer:TensorFlower Gardener

Automated g4 rollback of changelist 162668355 PiperOrigin-RevId: 162733043

Commit:19cc9a7
Author:Manjunath Kudlur
Committer:TensorFlower Gardener

Added DT_VARIANT type. A tensor with DT_VARIANT type can store arbitrary C++ data structures. DT_VARIANT is implemented using a type-erased data structure similar to std::any, but with extensions to make it compatible with tensorflow::Tensor. In particular, Encode and Decode methods need to be provided by C++ classes whose objects are stored in Variant. PiperOrigin-RevId: 162668355

Commit:7caedc3
Author:Dandelion Man?
Committer:TensorFlower Gardener

Add support for display_name and summary_description to the tf.summary.tensor_summary op. The display_name will be used to display the series in TensorBoard, in lieu of the tag, assuming that it is specified. (When it is not specified, behavior will stay the same.) The summary_description will allow the user to write a longform readable description of the summary series for display in TensorBoard. Markdown will be supported. This will make it possible for TensorBoard-2 summary ops to give the user direct control over the display name, solving https://github.com/tensorflow/tensorboard/issues/59. PiperOrigin-RevId: 162566261

Commit:c2182e0
Author:Eli Bendersky
Committer:TensorFlower Gardener

[XLA] Move the flag from stream_assignment_flags into DebugOptions PiperOrigin-RevId: 162506964

Commit:a555b78
Author:Suharsh Sivakumar
Committer:TensorFlower Gardener

Add output_partitions support in distributed runtime. PiperOrigin-RevId: 162456565

Commit:9293c55
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

[XLA] Get rid of ServiceFlags by absorbing it into DebugOptions. After this change HloModuleConfig::hlo_profiling_enabled_ is redundant. I'll remove it in a future change. PiperOrigin-RevId: 162436163

Commit:09e9b15
Author:Eric Liu
Committer:TensorFlower Gardener

Add a gRPC client for profiling TPU (contrib/tpu/profiler/) This contains a gRPC client that starts/stops tracing and processes/stores the result trace data into a TensorBoard log directory. This also exposes trace_events proto classes via tf.contrib.tpu.profiler public API so that TensorBoard's profile plugin can process and visualize the profile. PiperOrigin-RevId: 162247333

Commit:a0ffaf3
Author:Frank Chen
Committer:TensorFlower Gardener

Merge changes from github. END_PUBLIC --- Commit fe5338177 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. PiperOrigin-RevId: 161727345 --- Commit c65f69119 authored by Eugene Brevdo<ebrevdo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Factor out DenseUpdate ops into dense_update_functor build dep. Also add support for complex types. PiperOrigin-RevId: 161726749 --- Commit 9a172989e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 161726324 --- Commit fd5530d6e authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: adding bazel-toolchains repo to workspace. This repo will be necessary for remote execution (specifically for cross OS compilation) PiperOrigin-RevId: 161719899 --- Commit 71c4ec8ed authored by Derek Murray<mrry@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add a mechanism for switching between multiple iterators by feeding a handle. With this change, you can do the following: 1. Fetch a string handle for any iterator, by evaluating the result of `Iterator.string_handle()`. 2. Define an `Iterator` object based on a `tf.string` placeholder handle. 3. Feed the placeholder using an evaluated string handle to use a particular iterator in a particular step. Concretely, this allows you to define two iterators for a training dataset and a test dataset, and choose which one to use on a per-run basis: ```python train_iterator = tf.contrib.data.Dataset(...).make_one_shot_iterator() train_iterator_handle = sess.run(train_iterator.string_handle()) test_iterator = tf.contrib.data.Dataset(...).make_one_shot_iterator() test_iterator_handle = sess.run(test_iterator.string_handle()) handle = tf.placeholder(tf.string, shape=[]) iterator = tf.contrib.data.Iterator.from_string_handle( handle, train_iterator.output_types) next_element = iterator.get_next() loss = f(next_element) train_loss = sess.run(loss, feed_dict={handle: train_iterator_handle}) test_loss = sess.run(loss, feed_dict={handle: test_iterator_handle}) ``` PiperOrigin-RevId: 161719836 --- Commit 6d6dda807 authored by Kay Zhu<kayzhu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Fix an issue where plugin/Executor backend is used by default when TF is built from source with XLA support. See Github issue #11122. The priority of the executor backend is set to be higher than the default (50) and CPUs (<100), and is therefore selected as the default when tf.device is not explicitly specified. PiperOrigin-RevId: 161717173 --- Commit 6b28eb084 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Rename HloLocation to HloPosition, to avoid ambiguity with MemoryLocation. PiperOrigin-RevId: 161716528 --- Commit 8e7f57371 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Expose tf.contrib.nn.rank_sampled_softmax_loss. PiperOrigin-RevId: 161716450 --- Commit e424d209a authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Use a more numerically accurate formulation of ResourceApplyRMSProp. PiperOrigin-RevId: 161706120 --- Commit 45a58d378 authored by Skye Wanderman-Milne<skyewm@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Introduce Python-only extensions to the C API Implements an incomplete version of Operation._add_control_input() using a new extension to make sure the plumbing works. This also adds header guards to c_api_internal.h, which were missing. For some reason the missing guards caused problems in the cmake build even though there doesn't appear to be any #include cycles. PiperOrigin-RevId: 161705859 --- Commit 4f5433634 authored by Jonathan Hseu<jhseu@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Rename TpuEstimator to TPUEstimator and TpuConfig to TPUConfig to follow PEP8 naming conventions. PiperOrigin-RevId: 161704561 --- Commit 38180d7bb authored by Yun Peng<pcloudy@google.com> Committed by gunan<gunan@google.com>: Disable nn_test on Windows (#11445) --- Commit e1de7a1b0 authored by Yun Peng<pcloudy@google.com> Committed by gunan<gunan@google.com>: Windows Bazel Build: Build TensorFlow with wrapper-less CROSSTOOL (#11454) --- Commit c9d03a568 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add tf.contrib.nn.rank_sampled_softmax_loss, a variant of tf.nn.sampled_softmax_loss that has been shown to improve rank loss. Paper: https://arxiv.org/abs/1707.03073 PiperOrigin-RevId: 161702455 --- Commit 9aa0dcbf2 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add shape check for MakeQuantileSummariesOp. PiperOrigin-RevId: 161698801 --- Commit 9c4da4a24 authored by vhasanov<KyotoSunshine@users.noreply.github.com> Committed by Frank Chen<frankchn@gmail.com>: Deleted unnecessary repetition of the same text. (#11459) The same text was repeated two times. I deleted the repetition. --- Commit d1e3cadda authored by DimanNe<dimanne@gmail.com> Committed by drpngx<drpngx@users.noreply.github.com>: Fix linking options issued by bazel in oorder to make gradients register (#11449) --- Commit 8605f7ab8 authored by Taehoon Lee<me@taehoonlee.com> Committed by Frank Chen<frankchn@gmail.com>: Fix typos (#11444) --- Commit 7c1fe9068 authored by Karl Lessard<karllessard@users.noreply.github.com> Committed by Frank Chen<frankchn@gmail.com>: [Java] Add base classes and utilities for operation wrappers. (#11188) * Add base classes and utilities for operation wrappers. * Rename Input interface to Operand * Introduce changes after code review --- Commit 2195db6d8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Remove unused flag: xla_hlo_graph_for_compute_constant PiperOrigin-RevId: 161686867 --- Commit a72fc31bc authored by Martin Wicke<martin.wicke@gmail.com> Committed by Martin Wicke<martin.wicke@gmail.com>: Remove tabs. Unassign contrib/framework. --- Commit 6e74bd65a authored by Martin Wicke<martin.wicke@gmail.com> Committed by Martin Wicke<martin.wicke@gmail.com>: Add CODEOWNERS Added what we know about contrib mainly, and some well-separated components. --- Commit de546d066 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BUILD cleanup in tensorflow/compiler/... PiperOrigin-RevId: 161679855 --- Commit 576c7b1ec authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: BEGIN_PUBLIC Automated g4 rollback of changelist 161218103 PiperOrigin-RevId: 161868747

Commit:a891c24
Author:Frank Chen

Merge commit for internal changes

Commit:1d1f99d
Author:A. Unique TensorFlower
Committer:TensorFlower Gardener

Make ClientLibraryTestBase able to test all layouts for all input arguments. PiperOrigin-RevId: 161622376

Commit:78cc877
Author:horance
Committer:Frank Chen

modify SaverDef default version with v2 (#11429)

Commit:90d6421
Author:Shanqing Cai
Committer:TensorFlower Gardener

Merge changes from github. END_PUBLIC --- Commit d0f53f77f authored by Penghao Cen<scorpiocph@gmail.com> Committed by Shanqing Cai<cais@google.com>: Minor fix typo (#11323) --- Commit 02fcf564e authored by Chris Song<sjhshy@gmail.com> Committed by Chris Song<sjhshy@gmail.com>: Fix misspells. --- Commit 764c9b6b4 authored by Louis Tiao<ltiao@users.noreply.github.com> Committed by GitHub<noreply@github.com>: Fixed typo in docstring --- Commit f8cd1283e authored by Shanqing Cai<cais@google.com> Committed by Shanqing Cai<cais@google.com>: Chaser --- Commit 01383b946 authored by Shanqing Cai<cais@google.com> Committed by Shanqing Cai<cais@google.com>: Adapt TensorFlowTestCase.setUp() to new reset_default_graph() semantics Avoid calling reset_default_graph() directly to prevent exceptions in cases where test methods error out from within nested graph contexts, which can leave _default_graph_stack non-empty in certain Python versions. --- Commit 0ffc37890 authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Removing second declaration of functions. --- Commit f9c9cacb0 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Refactor ElementalIrEmitter's slice index finding code into IrArray::Index::SourceIndexOfSlice(). PiperOrigin-RevId: 161140653 --- Commit ba297aec9 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 161138258 --- Commit 68d666737 authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixes a reentrant lock issue with tensors using ndarray memory which uses tensor memory. PiperOrigin-RevId: 161137788 --- Commit a2ee8bca3 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx to stream_executor. PiperOrigin-RevId: 161137741 --- Commit 755fa7b50 authored by Mark Daoust<markdaoust@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Block generate_test, and docs generating from running in python3. - Doc generation is currently unsupported in python3 - These both end in errors in python 3.5.1+ PiperOrigin-RevId: 161137467 --- Commit 97cbcac45 authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Fix failure in functionalize_control_flow rewrite for Enter nodes that are unused. Make sure we ignore such nodes without producing an error. PiperOrigin-RevId: 161136545 --- Commit dabcb60bc authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add reasonable error messages to Builder::Build for bad parameter numbers. PiperOrigin-RevId: 161136262 --- Commit 0cbd249e8 authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add complex tensors support to `matrix_determinant`. PiperOrigin-RevId: 161132422 --- Commit 335f1f14d authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Extend static shape inference for SparseTensors with dense_shapes constructed using slicing. PiperOrigin-RevId: 161132391 --- Commit 53604916e authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixed the missing labels test in TPUEstimator. PiperOrigin-RevId: 161131282 --- Commit 9f57dc8dd authored by Bruno Rosa<bruno.rosa@eldorado.org.br> Committed by Bruno Rosa<bruno.rosa@eldorado.org.br>: Use mcpu instead of march for ppc64le march is not support by gcc on ppc64le --- Commit 7d5c74a9c authored by Skye Wanderman-Milne<skyewm@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Move duplicate detection logic from Graph to FunctionLibraryDefinition Turns out this is more useful, since there are many function libraries that don't belong to a graph. This will be used in a future change. Note that this maintains the current behavior of Graph. In addition, updates FunctionDefsEqual() to handle unset attr entries (I ran into this when using this in said future change). PiperOrigin-RevId: 161126628 --- Commit 2caec3af1 authored by Shanqing Cai<cais@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Disable more timeseries py tests failing in OSS PIP GPU builds PiperOrigin-RevId: 161124799 --- Commit 0b5cce367 authored by Eugene Brevdo<ebrevdo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Get TopK op working on GPU again. Extend using cub's radix sort. 1. Undo rollback of Andreas Kirsch's initial implementation. 2. Use cub segmented radix sort if Andreas' heap-based impl for large k and small num_cols (thresholds of k=100, n=1000 determined empirically). 3. Use cub segmented radix sort if k == num_cols (this case is always faster). 4. Added benchmarks. Benchmarks show that the GPU implementation is up to 3x slower for small k but can be 10x faster for large num_cols and k. Benchmarks: Benchmark: m_128_n_10_k_5_use_gpu_False wall_time: 0.000166 s Throughput: 0.0077 GB/s Benchmark: m_128_n_10_k_5_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s Benchmark: m_128_n_10_k_9_use_gpu_False wall_time: 0.00017 s Throughput: 0.00751 GB/s Benchmark: m_128_n_10_k_9_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s Benchmark: m_128_n_10_k_10_use_gpu_False wall_time: 0.00017 s Throughput: 0.00753 GB/s Benchmark: m_128_n_10_k_10_use_gpu_True wall_time: 0.000775 s Throughput: 0.00165 GB/s Benchmark: m_128_n_100_k_1_use_gpu_False wall_time: 0.000155 s Throughput: 0.0826 GB/s Benchmark: m_128_n_100_k_1_use_gpu_True wall_time: 0.000796 s Throughput: 0.0161 GB/s Benchmark: m_128_n_100_k_50_use_gpu_False wall_time: 0.000247 s Throughput: 0.0519 GB/s Benchmark: m_128_n_100_k_50_use_gpu_True wall_time: 0.0008 s Throughput: 0.016 GB/s Benchmark: m_128_n_100_k_99_use_gpu_False wall_time: 0.000261 s Throughput: 0.049 GB/s Benchmark: m_128_n_100_k_99_use_gpu_True wall_time: 0.000794 s Throughput: 0.0161 GB/s Benchmark: m_128_n_100_k_100_use_gpu_False wall_time: 0.000239 s Throughput: 0.0536 GB/s Benchmark: m_128_n_100_k_100_use_gpu_True wall_time: 0.000777 s Throughput: 0.0165 GB/s Benchmark: m_128_n_1000_k_1_use_gpu_False wall_time: 0.000324 s Throughput: 0.395 GB/s Benchmark: m_128_n_1000_k_1_use_gpu_True wall_time: 0.000916 s Throughput: 0.14 GB/s Benchmark: m_128_n_1000_k_10_use_gpu_False wall_time: 0.00042 s Throughput: 0.305 GB/s Benchmark: m_128_n_1000_k_10_use_gpu_True wall_time: 0.000902 s Throughput: 0.142 GB/s Benchmark: m_128_n_1000_k_500_use_gpu_False wall_time: 0.0011 s Throughput: 0.116 GB/s Benchmark: m_128_n_1000_k_500_use_gpu_True wall_time: 0.00097 s Throughput: 0.132 GB/s Benchmark: m_128_n_1000_k_990_use_gpu_False wall_time: 0.00133 s Throughput: 0.0962 GB/s Benchmark: m_128_n_1000_k_990_use_gpu_True wall_time: 0.000993 s Throughput: 0.129 GB/s Benchmark: m_128_n_1000_k_1000_use_gpu_False wall_time: 0.00102 s Throughput: 0.126 GB/s Benchmark: m_128_n_1000_k_1000_use_gpu_True wall_time: 0.000964 s Throughput: 0.133 GB/s Benchmark: m_128_n_10000_k_10_use_gpu_False wall_time: 0.002 s Throughput: 0.64 GB/s Benchmark: m_128_n_10000_k_10_use_gpu_True wall_time: 0.00288 s Throughput: 0.445 GB/s Benchmark: m_128_n_10000_k_100_use_gpu_False wall_time: 0.00233 s Throughput: 0.549 GB/s Benchmark: m_128_n_10000_k_100_use_gpu_True wall_time: 0.00325 s Throughput: 0.394 GB/s Benchmark: m_128_n_10000_k_5000_use_gpu_False wall_time: 0.0127 s Throughput: 0.101 GB/s Benchmark: m_128_n_10000_k_5000_use_gpu_True wall_time: 0.00381 s Throughput: 0.336 GB/s Benchmark: m_128_n_10000_k_9900_use_gpu_False wall_time: 0.015 s Throughput: 0.0853 GB/s Benchmark: m_128_n_10000_k_9900_use_gpu_True wall_time: 0.00438 s Throughput: 0.292 GB/s Benchmark: m_128_n_10000_k_10000_use_gpu_False wall_time: 0.0104 s Throughput: 0.123 GB/s Benchmark: m_128_n_10000_k_10000_use_gpu_True wall_time: 0.00427 s Throughput: 0.3 GB/s Benchmark: m_128_n_100000_k_100_use_gpu_False wall_time: 0.0148 s Throughput: 0.865 GB/s Benchmark: m_128_n_100000_k_100_use_gpu_True wall_time: 0.0262 s Throughput: 0.488 GB/s Benchmark: m_128_n_100000_k_1000_use_gpu_False wall_time: 0.0201 s Throughput: 0.636 GB/s Benchmark: m_128_n_100000_k_1000_use_gpu_True wall_time: 0.0263 s Throughput: 0.486 GB/s Benchmark: m_128_n_100000_k_50000_use_gpu_False wall_time: 0.214 s Throughput: 0.0599 GB/s Benchmark: m_128_n_100000_k_50000_use_gpu_True wall_time: 0.0322 s Throughput: 0.398 GB/s Benchmark: m_128_n_100000_k_99000_use_gpu_False wall_time: 0.262 s Throughput: 0.0489 GB/s Benchmark: m_128_n_100000_k_99000_use_gpu_True wall_time: 0.0377 s Throughput: 0.34 GB/s Benchmark: m_128_n_100000_k_100000_use_gpu_False wall_time: 0.118 s Throughput: 0.108 GB/s Benchmark: m_128_n_100000_k_100000_use_gpu_True wall_time: 0.0365 s Throughput: 0.351 GB/s END_PUBLIC BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 157169178 PiperOrigin-RevId: 161476569