These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
| Commit: | e0059bb | |
|---|---|---|
| Author: | Xin Zhou | |
| Committer: | TensorFlower Gardener | |
[XLA] Add support for int4 types in literal. PiperOrigin-RevId: 515099842
| Commit: | 5e9700e | |
|---|---|---|
| Author: | Adrian Kuegel | |
| Committer: | TensorFlower Gardener | |
Remove xla_gpu_softmax_fusion flag. The softmax fusion feature has been deleted. PiperOrigin-RevId: 514986409
| Commit: | 248bb25 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
[XLA] Update partition assignment with an overrideable partitioning algorithm selector method. PiperOrigin-RevId: 514939043
| Commit: | 8ad80b6 | |
|---|---|---|
| Author: | Songyi Han | |
| Committer: | TensorFlower Gardener | |
Support gather for all quantization schemes This enables optimized weight-only for all schemes in XLA opset. PiperOrigin-RevId: 513824366
| Commit: | c9d1771 | |
|---|---|---|
| Author: | Dan Suh | |
| Committer: | TensorFlower Gardener | |
Populate SaverDef from TF Quantizer's cpp layer. PiperOrigin-RevId: 513729502
| Commit: | 29ca3c1 | |
|---|---|---|
| Author: | Reed Wanderman-Milne | |
| Committer: | TensorFlower Gardener | |
Rename BFloat16Normalization to FloatNormalization and support other types. BFloat16Normalization has been renamed to FloatNormalization, and FP8 is now also supported in addition to BF16. A subsequent change will use FloatNormalization to support FP8 inputs/outputs in all instructions that currently support FP16 inputs/outputs. Similarly, BFloat16Support is renamed to FloatSupport. This is a very mechanical change. The vast majority of changes are renaming classes, methods, local variables, BUILD target names, and filenames to not refer to the type BF16. The only changes in logic are (1) adding a low_precision_type_ field to FloatSupport, (2) adding methods LowPrecisionType() and HighPrecisionType() to FloatSupport, and (3) changing FloatNormalization to use such methods instead of hardcoding BF16 and F32. PiperOrigin-RevId: 513657280
| Commit: | 0f1066e | |
|---|---|---|
| Author: | Jie Sun | |
| Committer: | TensorFlower Gardener | |
add a tag to indicate which source this InputPipelineAnalysisResult is from. PiperOrigin-RevId: 513621717
| Commit: | 38a4470 | |
|---|---|---|
| Author: | He Jiang | |
| Committer: | TensorFlower Gardener | |
Let minibenchmark support in-memory model. The in-memory model will be passed to subprocess through pipe. PiperOrigin-RevId: 513573577
| Commit: | d9da53c | |
|---|---|---|
| Author: | Alan Kelly | |
| Committer: | TensorFlower Gardener | |
assertProtoEqual calls checkFloatEqAndReplace which recurses the proto and compares all floats/doubles using relative tolerance. The current version's comparison is not valid for all floats, for example we probably want 50000 and 49999.9 to compare equal but they won't in the original version as the float format strings cut digits, it doesn't round so could result in 50000 and 49999, depending on the formatting provided. This means that tests will be less brittle as a single different bit will not cause a failure if rtol is used. PiperOrigin-RevId: 512999626
| Commit: | 98c639b | |
|---|---|---|
| Author: | Dan Suh | |
| Committer: | TensorFlower Gardener | |
Find the name of the "file_prefix" tensor from the cpp layer of TF Quantizer. Instead of finding "file_prefix" tensor from the python layer, find the name from the cpp layer so that the information relayed by the `ExportedModel` is more complete and the utilities for saving the model aren't fragmented. PiperOrigin-RevId: 512921226
| Commit: | 231de16 | |
|---|---|---|
| Author: | Xin Zhou | |
| Committer: | TensorFlower Gardener | |
[XLA] Add int4 types: U4/S4. PiperOrigin-RevId: 512150267
| Commit: | dea1165 | |
|---|---|---|
| Author: | Matt Kreileder | |
| Committer: | TensorFlower Gardener | |
This CL introduces a new message, `CompilationCachingSettings`, in the acceleration config proto file and adds a field of this message type into the `TFLiteSettings` message. `CompilationCachingSettings` defines standardised compilation caching fields. Specifically we give (binary) stable delegates a set of fields that can be used for compilation caching. See also tensorflow/lite/core/experimental/acceleration/configuration/c/stable_delegate.h and an example stable delegate tensorflow/lite/delegates/utils/experimental/sample_stable_delegate. PiperOrigin-RevId: 512038131
| Commit: | 3f43b4e | |
|---|---|---|
| Author: | Dan Suh | |
| Committer: | TensorFlower Gardener | |
Support quantizing models that use asset files to initialize resources. This change allows TF Quantizer to quantize and export models that use asset files to initialize resources like hash tables. A canonical example is a model that uses [`tf.lookup.TextFileInitializer`](https://www.tensorflow.org/api_docs/python/tf/lookup/TextFileInitializer). `AssetFileDef`s should be added to the collection in order to identify the string tensor to which the file names should be passed. Corresponding `AssetFileDef`s are extracted from the export passes of the TF Quantizer. Asset files are copied to the output Saved Model directories. PiperOrigin-RevId: 511932899
| Commit: | 5eaf871 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | Michael Hudgins | |
Merged commit includes the following changes: 511837617 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 509256232. 511836298 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA:CPU] Outline fusion regions in the presence of implicit constant-like operands -- 511832706 by A. Unique TensorFlower<gardener@tensorflow.org>: Remove decorator usage for parallel devices as a first step in removing `control_flow_ops.py`'s dependency on `def_function.py` to eliminate circular dependencies. -- 511832154 by A. Unique TensorFlower<gardener@tensorflow.org>: [StableHLO to MHLO] Handle bounds of Gather op Based on: https://github.com/openxla/stablehlo/pull/908 -- 511832050 by A. Unique TensorFlower<gardener@tensorflow.org>: Specify `save_type='checkpoint'` when calling `trackable_children`. -- 511830259 by A. Unique TensorFlower<gardener@tensorflow.org>: Uses the same clustering algorithm in the TF2XLA bridge for TAC passes. They have better results than the current clustering in the number of clusters. -- 511830057 by A. Unique TensorFlower<gardener@tensorflow.org>: Add an option to strip location information from ops. These end up becoming huge during large fusions (like TpuRewritePass) and in practice don't help much with readability, as locations for TF are typically indicated by the op name. -- 511826780 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] Add all-gather-start/done multi-operand shape inference tests -- 511815596 by A. Unique TensorFlower<gardener@tensorflow.org>: Describe potential problem in case if the LLVM toolchain is used. -- 511810373 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/1ed8df8df17f431936090acde5122456c5eed394. -- 511801672 by A. Unique TensorFlower<gardener@tensorflow.org>: Plumb CollectiveAllToAllV2 to MLIR. Also fix a inconsistency regarding CollectiveGatherV2. Follow up of https://github.com/tensorflow/tensorflow/pull/59598 -- 511799604 by A. Unique TensorFlower<gardener@tensorflow.org>: Integrate LLVM at llvm/llvm-project@219ba2fb7b0a Updates LLVM usage to match [219ba2fb7b0a](https://github.com/llvm/llvm-project/commit/219ba2fb7b0a) -- 511797844 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/9b50571c5b76f32103ad5099b0993581af3d0592. -- 511793598 by A. Unique TensorFlower<gardener@tensorflow.org>: Update type-checks of `DatasetV1` and `DatasetV2` to use abstract types. Changes usages of the internal `DatasetV1` and `DatasetV2` types to use the `tensorflow.types.data` versions instead of the concrete implementations. This helps reduce the tendency for cyclic dependencies involving the `dataset_ops.py` module. Usages of the concrete type (e.g. instantiation, member access) are not affected by this change. -- 511765379 by A. Unique TensorFlower<gardener@tensorflow.org>: [GmlSt] Remove 'distribute' attribute from ParallelOp tiling params. -- 511760944 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 510948939. 511756866 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 482514900. 511749935 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA:CPU Next] Disable tiling of linalg.generic. -- 511746972 by A. Unique TensorFlower<gardener@tensorflow.org>: [GmlSt] Do not tile linalg.generic if it was tiled already. The "tiled labels" are not populated correctly. Theoretically, we shouldn't check for the parent. -- 511745314 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA:CPU Next] Fix scalarization of scf.for. If the type of the output is not specified, then FromElementsOp creates a 1D tensor void FromElementsOp::build(OpBuilder &builder, OperationState &result, ValueRange elements) { assert(!elements.empty() && "expected at least one element"); Type resultType = RankedTensorType::get( {static_cast<int64_t>(elements.size())}, elements.front().getType()); build(builder, result, resultType, elements); } The test that i had before in scalarization.mlir was 1D by coincidence and therefore worked. -- 511741158 by A. Unique TensorFlower<gardener@tensorflow.org>: FIll calculates the output tensor if all the information required is available during Prepare. This means that the output will be available for subsequent operator's Prepare and Eval will be free -- 511737032 by A. Unique TensorFlower<gardener@tensorflow.org>: [GmlSt] Remove gml_st.materialize. -- 511736776 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA:CPU Next] Disable scf.if vectorization. -- 511734615 by A. Unique TensorFlower<gardener@tensorflow.org>: [DelegatePerformance] Removed a strengthened precondition and changed the metric value type from float to double. The change removes the strengthened precondition from the inherited method computeModelReport() in the derived class ModelBenchmarkReport. It also changes the metric value type from float to double to avoid potential calculation errors. -- 511730756 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA:CPU Next] Add a pattern to tile linalg.generic to 1 and fuse greedily. -- 511726589 by A. Unique TensorFlower<gardener@tensorflow.org>: [GmlSt] Remove useless peeling label. -- 511719057 by A. Unique TensorFlower<gardener@tensorflow.org>: Transforming grouped convolution to depth wise when possible. -- 511716904 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #59775: Fix tensor_or_memref build error without math.h Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/59775 https://github.com/tensorflow/tensorflow/pull/59536#issuecomment-1439039311 Since pevious PR has been rolled back, this one adopted reviewer's feedback to fix Mac build error. Thanks @reedwm for the feedback and suggestion! Copybara import of the project: -- 4f9f856e0159f5dd7c4e8d0e3b5232e55795f700 by Chao Chen <cchen104@amd.com>: fix tensor_or_memref build error without math.h Merging this change closes #59775 -- 511715187 by A. Unique TensorFlower<gardener@tensorflow.org>: compat: Update forward compatibility horizon to 2023-02-23 -- 511715185 by A. Unique TensorFlower<gardener@tensorflow.org>: Update GraphDef version to 1416. -- 511712677 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] Further speedup HloModule::Print by using integer key for CanonicalNameMap. -- 511712612 by A. Unique TensorFlower<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. -- 511708539 by A. Unique TensorFlower<gardener@tensorflow.org>: Update ops-related pbtxt files. -- 511706378 by A. Unique TensorFlower<gardener@tensorflow.org>: Add SegmentMaxV2op with num_segments as additional input. The only difference with SegmentMax is the additional input `num_segment`. This helps in evaluating the output shape in compile time. -- 511703897 by A. Unique TensorFlower<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. -- 511703121 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #59616: ReLU Epilogue Fusion for FP8 GEMMs in XLA Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/59616 Enables the epilogue fusion of ReLU activations for FP8 GEMMs. Copybara import of the project: -- 8813bb2940dd4e5d53bb4092c1a1ee2a2f00a13b by Philipp Hack <phack@nvidia.com>: Epilogue fusion of ReLU activations for FP8 GEMMs. -- c9ee1d5869eaafa7e89842e9a460b96c9b912481 by Philipp Hack <phack@nvidia.com>: Epilogue fusion of ReLU activations for FP8 GEMMs. Merging this change closes #59616 -- 511693497 by A. Unique TensorFlower<gardener@tensorflow.org>: Add SegmentMinV2op with num_segments as additional input. The only difference with SegmentMin is the additional input `num_segment`. This helps in evaluating the output shape in compile time. -- 511692851 by A. Unique TensorFlower<gardener@tensorflow.org>: Go: Update generated wrapper functions for TensorFlow ops. -- 511690064 by A. Unique TensorFlower<gardener@tensorflow.org>: [jax2tf] Use CUDA and ROCM instead of GPU for XlaCallModuleOp platforms JAX is moving to using ROCM and CUDA instead of the generic GPU platform type and it is already supporting separate lowerings for ROCM and CUDA. To keep up with this functionality, we move the XlaCallModuleOp to supporting ROCM and CUDA platforms. -- 511687577 by A. Unique TensorFlower<gardener@tensorflow.org>: VLOG(1) upon fallback in phase 2. Whether the new or old bridge ran is usually the first thing to find out when debugging. So when fallback from new bridge to old bridge happens, it should be reported with log level 1. -- 511684159 by A. Unique TensorFlower<gardener@tensorflow.org>: Cast BF16 Depthwise conv2D ops to f32 ops. This change is to cast BF16 Depthwise Conv2d ops to f32 to make it ready for quantization. But, as the Depthwise conv2D quantization is disabled due to performance improvement issue for now, this change does not guarantee the BF16 Depthwise Conv2D quantization. -- 511677183 by A. Unique TensorFlower<gardener@tensorflow.org>: Added a util function to process einsum -- 511676949 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data Retry empty repetitions when repeating data service dataset. The ForeverRepeat op assumes if the first repetition produces no data, all future repetitions will produce no data. That is not always true. For example, when using tf.data service, different repetitions may produce different numbers of elements, and empty repetitions should be retried. -- 511657402 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] More speedups to HloModule::Print. -- 511647352 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 511611390. 511644926 by A. Unique TensorFlower<gardener@tensorflow.org>: Refactor TfrtPipelineOptions to a separate file so that it can be reused. -- 511642132 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/410c7f3b07f1d1170b13242a2cf9af4ec7edd33f. -- 511640480 by A. Unique TensorFlower<gardener@tensorflow.org>: Update `DimsAre` matcher to be polymorphic over both `TfLiteTensor` and `TfLiteIntArrays`. -- 511639990 by A. Unique TensorFlower<gardener@tensorflow.org>: Fix Relayout handling under XLA SPMD. According to the plans by samuelslee@, now that chensunx@ contributed the pass to lower Relayout to Identity. -- 511639775 by A. Unique TensorFlower<gardener@tensorflow.org>: Integrate LLVM at llvm/llvm-project@a7b6978285c1 Updates LLVM usage to match [a7b6978285c1](https://github.com/llvm/llvm-project/commit/a7b6978285c1) -- 511637257 by A. Unique TensorFlower<gardener@tensorflow.org>: Include full node_def info in activity watcher. -- 511637139 by A. Unique TensorFlower<gardener@tensorflow.org>: Update TFRT dependency to use revision http://github.com/tensorflow/runtime/commit/bd588322b3d6660903ee0df9b55d2f589aa5dc8a. -- 511632436 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 511597567. 511611390 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #57956: Performance Enhancements for Sparse Embedding Lookups Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/57956 Introduces performance options for sparse embedding lookups that can appreciably speed up the training of recommendation systems. Sparse lookups alternatively accept inputs described by RaggedTensors which are more memory efficient. Performance is further increased by the optional use of a simplified and typically faster embedding lookup. In the sparse embedding micro benchmarks in tensorflow/python/eager/benchmarks_test.py, the number of examples per second on a DGX A100 system increases from approx. 1,300 with SparseTensor and without simplified lookup to approx. 11,200 with RaggedTensor inputs and simplified lookup (+760%). The combination of SparseTensor inputs and simplified lookup yields approx. 3,000 examples per second (+130%). Copybara import of the project: -- 00ee1a9deea66c511b4e6b32f961c7598a299db3 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- fe6eb184ab75dfb7399919e617014d4e61388221 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 47157c3e57ef7700044e6627563c813eafc7ba9c by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 8826a9b387e32223f59397e7047c9f29f3fea674 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 4e770724eddc7450b82c58f89a56968c1ec0aa99 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 3d1096aafcecbdf83d700272e088c2225ec058d6 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- c115b80151098e3c574c52625cee97c4c41af40f by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 842591cef9cffa40b32e176ded14453bfbc96678 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 2a307f3f09a88b26586ad81685685a78ccfebcd1 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- ee90704c034d0c04557e7870670f10c17a999290 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 9e216e3d3f541d07951b00b457e7f764300b5da8 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- c393302d3804b35dacedba125448da080f68af8b by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 5165f071ebf6d0d7568789e6f5e9034707149b61 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- f3e88c4c6d75910c3cecbe310d779048cc84c5a6 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 83847914532dbab5fb2ac052983fdc85cde81c49 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 497f9c39efe136ba9539f46f1625996a1a81b8f4 by Philipp Hack <phack@nvidia.com>: Adds performance enhancements for sparse embedding lookups. -- 2166e5defae2f93b66bee8544f544c033e0ec66b by Philipp Hack <phack@nvidia.com>: Performance enhancements for sparse embedding lookups. -- 13dd2c877d75f043be1ce014ceae1027a9876d41 by Philipp Hack <phack@nvidia.com>: Performance enhancements for sparse embedding lookups. -- a83f4d4890ca7f0ad3982a8d224a15be1ee21ee9 by Philipp Hack <phack@nvidia.com>: Performance enhancements for sparse embedding lookups. Merging this change closes #57956 -- 511609283 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data Ramp down `autotune_buffer_optimization` experiment. -- 511606842 by A. Unique TensorFlower<gardener@tensorflow.org>: Avoid double counting graph building time for nested functions. -- 511606056 by A. Unique TensorFlower<gardener@tensorflow.org>: PR #59619: [NVIDIA TF] Throw error only for non-empty function definitions. Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/59619 During Function Serialization and Deserialization with Saved Models there can be Node ops with type `func` with default values that have no associated name. In such cases, we shouldn't look for undefined empty strings in Function Library. This is a trivial change and helps with how Saved Models are loaded. Copybara import of the project: -- e0fe60f9a9db8886cf423949bbd2e5da796397ae by Pavani Majety <pmajety@nvidia.com>: [BugFix] Throw error only for non-empty function definitions. Add reason for the required change. Add comment to both locations. Merging this change closes #59619 -- 511605660 by A. Unique TensorFlower<gardener@tensorflow.org>: Fix typo in comment -- 511604096 by A. Unique TensorFlower<gardener@tensorflow.org>: #tf-data-service Put distributed_save_test and snapshot_ft_test together. -- 511603520 by A. Unique TensorFlower<gardener@tensorflow.org>: update tracking bug now that the step id is propagated correctly. -- 511602286 by A. Unique TensorFlower<gardener@tensorflow.org>: add a size() member function to HloProtoMap. -- 511599929 by A. Unique TensorFlower<gardener@tensorflow.org>: gpu_delegate: Update to support Mali G715 -- 511597567 by A. Unique TensorFlower<gardener@tensorflow.org>: [tf-lite] Enable parallel transpose. -- 511596662 by A. Unique TensorFlower<gardener@tensorflow.org>: add an aggregated stats for per step result. the goal is to remove all_reduce_db_per_core. we will lose the capabilities of separate compute time and synchronization time for TPU only. but I think it is fine, most of collective in TPU now are async ops, so these algorithm is no longer very informative. we will count all async collectives as synchronization time for tpu. for gpu, this doesn't apply. -- 511590747 by A. Unique TensorFlower<gardener@tensorflow.org>: [PJRT C API] Add a README file to provide communication channel and resources. -- 511585380 by A. Unique TensorFlower<gardener@tensorflow.org>: Automated rollback of changelist 511565925. 511578143 by A. Unique TensorFlower<gardener@tensorflow.org>: [XLA] Minor fixes in ShardingPropagation. - Moves the misplaced comment block of replicate_on_last_tile_dim_. - Uses existing variable root_instr to eliminate redundant accesses. - Replaces bitwise and with logical and. - Uses reverse iterator instead of reversing the list. -- 511569312 by A. Unique TensorFlower<gardener@tensorflow.org>: Remove `type_spec` direct dependency on `framework/ops.py`. Changes `type_spec` references to `tf.Tensor` to use an abstract base type for `isinstance` checks. Uses the conversion function in `tensor_conversion_registry` instead of its wrapper in `ops`. While `tensor_conversion_registry` has an indirect dependency on `ops.py`, there is future work planned to remove that dependency. -- 511568795 by A. Unique TensorFlower<gardener@tensorflow.org>: Check the shape of indices matches the shape of dense_shape in sparse_fill_empty_rows op -- 511567163 by A. Unique TensorFlower<gardener@tensorflow.org>: [xla-next][mlir][sparse] add mhlo sparsity rewriting to pipeline -- 511565925 by A. Unique TensorFlower<gardener@tensorflow.org>: Update visibility to fix OSS build -- PiperOrigin-RevId: 511837617
| Commit: | 8d163fb | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Introduce a new flag to enable TPU model quantization support. As the TPU model quantization support is unstable, we introduce a flag to trigger the feature for other users to avoid unexpected errors. PiperOrigin-RevId: 511338951
| Commit: | ec1f4e4 | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Removed unused import. PiperOrigin-RevId: 511321244
| Commit: | 188e9f9 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Add extension field in GoogleEdgeTpuSettings. PiperOrigin-RevId: 511237828
| Commit: | 3c2afe0 | |
|---|---|---|
| Author: | David Rim | |
| Committer: | TensorFlower Gardener | |
Add experimental flag to disable layer norm and other mul->fc fusing. PiperOrigin-RevId: 510440994
| Commit: | 5a10d92 | |
|---|---|---|
| Author: | Wilsin Gosti | |
| Committer: | TensorFlower Gardener | |
#tf-data Save gap times used by stage based autotune in model proto. PiperOrigin-RevId: 510257921
| Commit: | e027bb7 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Fix collective shape mismatch error on GPU and CPU. The key for InstanceRec is extended to also include the step_id, such that collective instances from different steps become independent from each other, as they conceptually are. This aligns the treatment of Collective Ops and Send/Recv Ops (via the Rendezvous) under DTensor. A (hacky but effective) special case filtering is added to only enable this under DTensor without changing the V2 Collective Op signatures. MWMS does not (always) guarantee steps from different workers have the same step_id, and cannot be enrolled into this change. The test case demonstrates the typical failure pattern. Note the test case fails on TPU for a different reason (fixing on TPU needs dynamic shape support). PiperOrigin-RevId: 510237456
| Commit: | 688b63d | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Define GoogleEdgeTpuSettings in configuration.proto PiperOrigin-RevId: 510178940
| Commit: | 342af17 | |
|---|---|---|
| Author: | Johannes Reifferscheid | |
| Committer: | TensorFlower Gardener | |
Add a flag for experimental deallocation. PiperOrigin-RevId: 510105479
| Commit: | 9b4ed39 | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Source data transfer addresses from the worker server. This doesn't change any existing functionality but will support users being able to opt of a forthcoming data transfer experiment. Overview of changes: - The selection of a default data transfer protocol now happens in `DataServiceClient` rather than `DataServiceDatasetOp`. - Servers now convey to clients a list of (protocol, address) pairs rather than a single address, and clients choose from this list using the user inputted protocol. PiperOrigin-RevId: 509993158
| Commit: | eee393b | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Update flatbuffer generated configuration_generated.h.oss PiperOrigin-RevId: 509876768
| Commit: | 3a39e69 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Extends `MetricsHookInterface` with the ability to export more granular compilation metrics. 1. Extends `CompilationLogEntry` with `pass_metrics` defined by a new `PassMetrics` proto. 2. Renames `RecordStageLatency` into more generic `RecordCompilationMetrics`. PiperOrigin-RevId: 509867105
| Commit: | e57e25d | |
|---|---|---|
| Author: | Ilia Sergachev | |
| Committer: | TensorFlower Gardener | |
[XLA:GPU] Add a flag to use Triton-based GEMM emitter for any GEMM that it supports. Also move Triton-GEMM-related code closer together. PiperOrigin-RevId: 509781425
| Commit: | d3509a4 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
[tf.data] Implement `warm_start` feature for all the asynchronous operations. PiperOrigin-RevId: 509533594
| Commit: | b49cc3f | |
|---|---|---|
| Author: | Swachhand Lokhande | |
| Committer: | TensorFlower Gardener | |
Add a compiled_using_pjrt field to persistent cache key. This is to differentiate the executables that would be built by XLA and PJRT clients when they are being serialized to disk for persistence. There would be no changes to the filenames of XLA serialized executables. PJRT serialized executables will have '__pjrt' appended to their filenames. This also adds the SerializedExecutable function back to PjRtDeviceCompilerClient. Removes XLA dependencies for `//tensorflow/lite/delegates/flex:delegate_data` A unit test indirectly depends on `//tensorflow/core/ops:string_ops_op_lib` under `//tensorflow/compiler/jit:xla_kernel_creator`. This adds the dependency to the test target directly instead. PiperOrigin-RevId: 509277222
| Commit: | 90f4292 | |
|---|---|---|
| Author: | Jun Xu | |
| Committer: | TensorFlower Gardener | |
Asynchronously remove functions in remote workers. Before this CL, context.remove_function only removes function_defs from local context. After this CL, function_defs in remote contexts should also be removed. This should prevent duplicated function_defs error when run transformed functions in distributed environment. This should also fix a memory leak in remote devices since now when an EagerDefinedFunction gets deleted, it will also be deleted from remote eager context. PiperOrigin-RevId: 509259962
| Commit: | f687b71 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
[DelegatePerformance] Replace initialization_max_regression_percentage_allowed and average_warm_up_max_regression_percentage_allowed with startup_overhead_max_regression_percentage_allowed. startup overhead time = initialization time + average warmup time - average inference time PiperOrigin-RevId: 509218901
| Commit: | f775935 | |
|---|---|---|
| Author: | Dan Suh | |
| Committer: | TensorFlower Gardener | |
Remove unused variable shared_names from TF Quantizer's `ExportedModel`. Since TF quantizer now uses `SaverDef` to restore / save variables, `variable_shared_names` of `ExportedModel` is no longer used from the Python side. This change cleans up the production & use of `variable_shared_names` in the TF Quantizer's c++ API. PiperOrigin-RevId: 509160886
| Commit: | e692db8 | |
|---|---|---|
| Author: | Dan Suh | |
| Committer: | TensorFlower Gardener | |
Use Saver to save or restore variables for TF Quantizer. This change fixes a workaround for restoring variable by using `Saver`. In the previous workaround, empty variables with the same `shared_name` were created to restore from checkpoint, but this caused problems especially when the variable's node name and shared name differed. Using `Saver` follows the saved model semantics more closely and is the method used from within the saved model library. Additionally, to make the saving and restoring behavior more correct, it now sets the location of the `VarHandleOp` the same as their shared_names from `InsertRestoreOpPass`. The name of the VarHandleOp should be set to its shared_name. This is required when the exported model is re-imported as MLIR. The MLIR import process looks at the variable's location name (== node name) before looking at the shared_name to restore variables. When the two names are different, the session cannot find the node from the graph and get the variables' tensor values. PiperOrigin-RevId: 509112645
| Commit: | 3e24055 | |
|---|---|---|
| Author: | Sergey Kozub | |
| Committer: | TensorFlower Gardener | |
Add support for emitting the int8x32 cuDNN convolution reordering custom calls 1) Add a debug flag `xla_gpu_enable_cudnn_int8x32_convolution_reordering` (disabled by default). 2) If the flag is set, add reordering custom call during the vectorization pass. 3) Update convolution layout normalization callback to make sure both the input and the output to the reordering custom call has NCHW_VECT_C physical layout. 4) Correctly set convolution and filter handle attributes in the DNN implementation. PiperOrigin-RevId: 508678175
| Commit: | c9f8e0f | |
|---|---|---|
| Author: | Sergey Kozub | |
| Committer: | TensorFlower Gardener | |
Add `reordered_int8_nchw_vect` flag to convolution backend proto. This is necessary to disambiguate layouts that could not be otherwise detected by XlaConvShapesToStreamExecutorLayouts, in this case int8x32 reordered filter and bias. PiperOrigin-RevId: 508586274
| Commit: | a5dedd2 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Internal change PiperOrigin-RevId: 508522408
| Commit: | 7ee0a6b | |
|---|---|---|
| Author: | Swachhand Lokhande | |
| Committer: | TensorFlower Gardener | |
Add a compiled_using_pjrt field to persistent cache key. This is to differentiate the executables that would be built by XLA and PJRT clients when they are being serialized to disk for persistence. There would be no changes to the filenames of XLA serialized executables. PJRT serialized executables will have '__pjrt' appended to their filenames. This also adds the SerializedExecutable function back to PjRtDeviceCompilerClient. Removes XLA dependencies for `//tensorflow/lite/delegates/flex:delegate_data` PiperOrigin-RevId: 508512797
| Commit: | f8ffe16 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Added hardware_cluster_ids to the EdgeTPU settings. PiperOrigin-RevId: 508445790
| Commit: | bfd2853 | |
|---|---|---|
| Author: | Justin Lebar | |
| Committer: | TensorFlower Gardener | |
Remove ConfigProto.experimental.friendly_name in favor of experimental.session_metadata.name. In cl/507687578 I added ConfigProto.experimental.friendly_name and used it to set the name on XLA clusters created by autoclustering. I didn't realize that we already had experimental.session_metadata.name, which is basically the same thing. So this CL removes friendly_name and converts autoclustering to use session_metadata.name. Very sorry for the churn. PiperOrigin-RevId: 508434449
| Commit: | 19f20fd | |
|---|---|---|
| Author: | Justin Lebar | |
| Committer: | TensorFlower Gardener | |
Add experimental `friendly_name` config option. Today when you use XLA autoclustering, the XLA module names have the form e.g. cluster_15111732669523428041_0__XlaCompiledKernel_true__XlaHasReferenceVars_false__XlaNumConstantArgs_0__XlaNumResourceArgs_0_.2393 If you have a program with many TensorFlow nets, it is hard to tell which cluster_1234 belongs to which net. With this change, you can set a name in the TF session config and the name gets propagated down to the XLA cluster name: my_friendly_name_15111732669523428041_0__<snip> For now this only works for nets that enable XLA via autoclustering. I'd like to do something similar for nets that use tf.function and inference-converter. PiperOrigin-RevId: 507687578
| Commit: | 4941c8c | |
|---|---|---|
| Author: | Roman Dzhabarov | |
| Committer: | TensorFlower Gardener | |
[xla/metrics] Clarify comment on the task_index in the metrics.proto. PiperOrigin-RevId: 507599269
| Commit: | f158f88 | |
|---|---|---|
| Author: | Ilia Sergachev | |
| Committer: | TensorFlower Gardener | |
[XLA:GPU] Add triton-based matmul emitter. PiperOrigin-RevId: 507498160
| Commit: | cf48610 | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Handle snapshot and stream completion in tf.data service dispatcher. PiperOrigin-RevId: 507023442
| Commit: | 03ea9a8 | |
|---|---|---|
| Author: | Marcello Maggioni | |
| Committer: | TensorFlower Gardener | |
[XLA] Add way to allow propagation to output only to a subset of root instruction tuple shardings. PiperOrigin-RevId: 506935285
| Commit: | 5a86ef6 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
[XLA] Create skeleton for a partition assignment pass, which annotates the given module with (good) shardings, by adding: - an HLO pass: PartitionAssignment - a base class: PartitioningAlgorithm, - a no-op derived class extending PartitioningAlgorithm: NoopPartitioning, and - a flag to determine the algorithm (kind/type): xla_partitioning_algorithm. PiperOrigin-RevId: 506423268
| Commit: | 846ebbb | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Remove time (AKA time_fraction) field, since it's no longer used. We now compute this in the frontend to avoid storing this redundant field in the protobuf. PiperOrigin-RevId: 506372540
| Commit: | 8c77288 | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Manage snapshot streams assignments in tf.data service dispatcher. Related changes: - Added `DispatcherService::GetSnapshotStreams`, a new readonly API for seeing the state of snapshot stream assignments from the dispatcher's perspective. - Made `DispatcherConfig.worker_timeout_ms` configurable. PiperOrigin-RevId: 506287683
| Commit: | 4ffdd6f | |
|---|---|---|
| Author: | Ian Hua | |
| Committer: | TensorFlower Gardener | |
Fix build breakage for DPB. PiperOrigin-RevId: 506261904
| Commit: | c31e086 | |
|---|---|---|
| Author: | Yang Chen | |
| Committer: | TensorFlower Gardener | |
#tf-data-service SnapshotSplitProvider verifies local split indexes. It compares the expected next split index with the one received from the dispatcher. If not equal, it reads the split from files. If equal, it will return the result from the dispatcher. PiperOrigin-RevId: 505797858
| Commit: | 7456b69 | |
|---|---|---|
| Author: | Yishuang Pang | |
| Committer: | TensorFlower Gardener | |
Add a compiled_using_pjrt field to persistent cache key. This is to differentiate the executables that would be built by XLA and PJRT clients when they are being serialized to disk for persistence. There would be no changes to the filenames of XLA serialized executables. PJRT serialized executables will have '__pjrt' appended to their filenames. PiperOrigin-RevId: 505721399
| Commit: | 7d5c1fe | |
|---|---|---|
| Author: | Yang Chen | |
| Committer: | TensorFlower Gardener | |
#tf-data-service GetSnapshotSplit returns the split indexes. The worker needs it to validate the split it receives and the split it reads from files do not have gaps. For example, in the following sequence: worker requests the kth split -> dispatcher receives request -> worker dies -> worker restarts -> worker reads from files and processes (k-1) splits -> dispatcher writes the kth split -> worker requests the next split -> dispatcher returns the (k+1)th split In the above sequence, the worker misses the kth split. If the worker knows the split index, it will wait for the kth split to appear in the file system when it receives the (k+1)th split. PiperOrigin-RevId: 505144776
| Commit: | 54bb01d | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Input tf.data service worker address into snapshot split providers. PiperOrigin-RevId: 505136715
| Commit: | e3cbf21 | |
|---|---|---|
| Author: | Ilia Sergachev | |
| Committer: | TensorFlower Gardener | |
[XLA][NFC] Fix mistypes in comments and strings. PiperOrigin-RevId: 505097297
| Commit: | dfe994a | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Add tf.data service worker readonly API for getting the progress of snapshot tasks. PiperOrigin-RevId: 505091618
| Commit: | 8421f7a | |
|---|---|---|
| Author: | Catherine Payne | |
| Committer: | TensorFlower Gardener | |
Removing unused enum values for tfxla bridge flag from TF Public API PiperOrigin-RevId: 504986360
| Commit: | 34d5ec1 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Create Quantization Options for StableHLO. PiperOrigin-RevId: 504519209
| Commit: | e780748 | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Add functionality to detect lost workers in tf.data service dispatcher. PiperOrigin-RevId: 504407156
| Commit: | 50cdf7a | |
|---|---|---|
| Author: | Dan Suh | |
| Committer: | TensorFlower Gardener | |
Preserve function aliases through the TF quantizer. Function aliases associate user-defined names to TF functions. This information is in the `MetaGraphDef` and is used to identify specific functions of interest (the actual function names change every time the model is saved as SavedModel). This change preserves the function aliases through the TF functions, so that the same functions (quantized, after the quantization passes) are still associated with the same aliases. This is done by preventing the aliased functions from being inlined by `MarkFunctionsNoinlinePass`. The pass receives a list of names that should not be inlined, so the names should be passed along from the quantizer's python API. PiperOrigin-RevId: 504396824
| Commit: | 952a5ed | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Remove aggregated memory BW utilization - server. This is the server-side change, the parent CL already removed this information from the frontend. Previously, we were reporting sum(bws)/sum(peak_bws). But now we report bw/peak_bw for each different memory separately. The aggregated stat doesn't mean much, since it's combining apples and oranges. Removing redundant and misleading information. PiperOrigin-RevId: 504375462
| Commit: | 5760ad4 | |
|---|---|---|
| Author: | Swachhand Lokhande | |
| Committer: | TensorFlower Gardener | |
Add a compiled_using_pjrt field to persistent cache key. This is to differentiate the executables that would be built by XLA and PJRT clients when they are being serialized to disk for persistence. There would be no changes to the filenames of XLA serialized executables. PJRT serialized executables will have '__pjrt' appended to their filenames. PiperOrigin-RevId: 504338570
| Commit: | 232354c | |
|---|---|---|
| Author: | Johannes Reifferscheid | |
| Committer: | TensorFlower Gardener | |
Tool for executing IR at different compilation stages. PiperOrigin-RevId: 504216243
| Commit: | a450771 | |
|---|---|---|
| Author: | Jie Sun | |
| Committer: | TensorFlower Gardener | |
deprecation unused computation_name PiperOrigin-RevId: 504129492
| Commit: | aee9c7f | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
In tf.data service heartbeat proto, change snapshot task type from repeated to map. This enables a cleaner interaction between the dispatcher and snapshot manager and helps document that a worker can have only one active task per snapshot. PiperOrigin-RevId: 504071976
| Commit: | c5da75a | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Fix typos in comments describing FP8 dtypes. Number of mantissa bits was incorrect. PiperOrigin-RevId: 503512825
| Commit: | 9b5d7c6 | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Move distributed snapshot task protos from dispatcher.proto to common.proto. These make more sense to be accessed from common.proto in a forthcoming CL testing worker<->dispatcher interactions. Also `TaskDef` is already in common.proto so maybe `SnapshotTaskDef` should be there, too. PiperOrigin-RevId: 503484754
| Commit: | 48f9d3d | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Adds BACKEND_PASSES stage to the `CompilationLogEntry` proto. PiperOrigin-RevId: 503476779
| Commit: | b08ccd4 | |
|---|---|---|
| Author: | Parker Schuh | |
| Committer: | TensorFlower Gardener | |
Move xplane_to_trace_events and trace_events_to_json to tsl. PiperOrigin-RevId: 503320621
| Commit: | e70ef6c | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Propagate `tensorflow.SessionMetadata` in `EagerContext` PiperOrigin-RevId: 503214970
| Commit: | d872675 | |
|---|---|---|
| Author: | Yang Chen | |
| Committer: | TensorFlower Gardener | |
#tf-data-service Distributed snapshot task def supports compression. PiperOrigin-RevId: 502978855
| Commit: | eb08bba | |
|---|---|---|
| Author: | Matt Callanan | |
| Committer: | TensorFlower Gardener | |
Create class to manage tf.data service dispatcher snapshots. This is mostly a 1:1 restructuring with the following changes: 1) Added simple snapshot recovery from on-disk state. 2) Removed all members tracking snapshot, stream, and source completion. I think these may have been structured incorrectly, and either way they weren't tested or used. I'll reevaluate when stream completion is implemented. 3) Removed some validations that weren't tested and/or were related to #1. Will add back after addressing #1. 4) Renamed directory -> path. PiperOrigin-RevId: 502934739
| Commit: | 50f28e7 | |
|---|---|---|
| Author: | Peter Buchlovsky | |
| Committer: | TensorFlower Gardener | |
[LatencyHidingScheduler] Save core frequency in schedule proto. PiperOrigin-RevId: 502719264
| Commit: | a6c5ea0 | |
|---|---|---|
| Author: | Yang Chen | |
| Committer: | TensorFlower Gardener | |
#tf-data-service Use dynamic sharding for distributed snapshots. PiperOrigin-RevId: 502694864
| Commit: | e97a908 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Use proto array to track memory bytes accessed. Also report BWs for each BW type, not just HBM. We track on-chip read, on-chip write, off-chip read+write, and all bytes. Also remove overall BW utilization, since it combines different BWs and peak BWs together, so it doesn't give a good sense of where the memory bottlenecks are. PiperOrigin-RevId: 502640430
| Commit: | 2f9d51d | |
|---|---|---|
| Author: | Yang Chen | |
| Committer: | TensorFlower Gardener | |
#tf-data-service Run `SnapshotStreamWriter` in `WorkerImpl`. The workers will run SnapshotStreamWriter when they receive requests in the dispatcher heartbeat responses. PiperOrigin-RevId: 501966318
| Commit: | c377e93 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
deprecate computation name which is redundant. PiperOrigin-RevId: 501941984
| Commit: | 74c86fa | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Clean up server-side bw fields that the frontend no longer uses. The frontend no longer uses these fields, so remove them from the server side. PiperOrigin-RevId: 501928865
| Commit: | 0e5a06d | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Adds CODE_GENERATION stage to the `CompilationLogEntry` proto. PiperOrigin-RevId: 501923237
| Commit: | dcb1ee6 | |
|---|---|---|
| Author: | Jie Sun | |
| Committer: | TensorFlower Gardener | |
deprecate computation name which is redundant. PiperOrigin-RevId: 501919730
| Commit: | a81e0c0 | |
|---|---|---|
| Author: | Rahul Joshi | |
| Committer: | TensorFlower Gardener | |
[XLA/GPU] Introduce latency hiding scheduler in XLA/GPU pipeline - Plug in a default 'NOP' model where all instructions have the same latency. - Add a flag to enable the latency hiding scheduler, it will be disabled by default. PiperOrigin-RevId: 501916365
| Commit: | 203a572 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Frontend: Add on-chip memory BW reports to the op profile xprof view. This is the follow-on CL from the previous server-side changes. Previously we only reported overall and hbm memory utilization. However, each memory has its own bandwidth and utilization, so it's better to report them separately instead of in aggregate. Eventually, we should probably remove the aggregate utilization, which is the only utilization we originally had. PiperOrigin-RevId: 501913756
| Commit: | 17f90f2 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
deprecate computation name which is redundant. PiperOrigin-RevId: 501909535
| Commit: | 7a6509d | |
|---|---|---|
| Author: | Jie Sun | |
| Committer: | TensorFlower Gardener | |
deprecate computation name which is redundant. PiperOrigin-RevId: 501888732
| Commit: | 1541e58 | |
|---|---|---|
| Author: | TensorFlower Gardener | |
Merge pull request #59098 from Tixxx:tixxx/disable_mlir_pretty_print PiperOrigin-RevId: 501769378
| Commit: | b8bb4f4 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Handle cross-program prefetch for multiple buffers PiperOrigin-RevId: 501712511
| Commit: | bae6b22 | |
|---|---|---|
| Author: | Anlun Xu | |
| Committer: | TensorFlower Gardener | |
[XLA:GPU] Enable AOT autotuning for GEMMs PiperOrigin-RevId: 501709191
| Commit: | 9bdb20f | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Add `CompilationEnvironments` to `ExecutableBuildOptions` PiperOrigin-RevId: 501676104
| Commit: | 1ddac25 | |
|---|---|---|
| Author: | TJ | |
rebase and fix compile error
| Commit: | 36d1a91 | |
|---|---|---|
| Author: | TJ | |
| Committer: | TJ | |
Change disable_pretty_form to enable_pretty_form
| Commit: | dd499a2 | |
|---|---|---|
| Author: | Mangpo Phothilimthana | |
| Committer: | TensorFlower Gardener | |
Remove flag_config_ from HloModuleConfig. PiperOrigin-RevId: 501472408
| Commit: | 20487ed | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Add comment about HBM memory space. PiperOrigin-RevId: 501427230
| Commit: | fce0f2c | |
|---|---|---|
| Author: | Daniel Chen | |
| Committer: | TensorFlower Gardener | |
Add GeneratedCodeInfo annotator to tf ops python wrapper generator. PiperOrigin-RevId: 501375027
| Commit: | 6e0da4f | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Server-side: Add on-chip memory BW reports to the op profile xprof view. This is the server-side change, the child CL will take care of the front-end changes. Previously we only reported overall and hbm memory utilization. However, each memory has its own bandwidth and utilization, so it's better to report them separately instead of in aggregate. Eventually, we should probably remove the aggregate utilization, which is the only utilization we originally had. PiperOrigin-RevId: 501330504
| Commit: | 678188d | |
|---|---|---|
| Author: | Alexander Belyaev | |
| Committer: | TensorFlower Gardener | |
[XLA:CPU Next] Add tiling and fusion transformations to XLA:CPU Next pipeline. The pipeline is disabled by default. If `enable_tiling_and_fusion` flag is true, then tiling/fusion/peeling/vectorization transformations will be used instead of linalg-elementwise fusion. PiperOrigin-RevId: 501206180
| Commit: | e02a9a6 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Add proto serialization/deserialization support for `xla::CompilationEnvironments` PiperOrigin-RevId: 501188397
| Commit: | 85cc017 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Adds HLO_PASSES stage to the `CompilationLogEntry` proto. PiperOrigin-RevId: 501126130
| Commit: | 474a2ca | |
|---|---|---|
| Author: | Tres Popp | |
| Committer: | TensorFlower Gardener | |
Remove xla_cpu_enable_mlir_lowering This is now replaced with --xla_cpu_use_xla_runtime PiperOrigin-RevId: 500955199
| Commit: | 2d0b4f2 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Add offset counter for tf ops python wrapper generator. The offset counter will calculate the byte offsets of where the REGISTER_OP is called and output the result in a text file. PiperOrigin-RevId: 500811592
| Commit: | 2039ea2 | |
|---|---|---|
| Author: | Jie Sun | |
| Committer: | TensorFlower Gardener | |
add a timeline string in memory view preprocess proto to 1. hold neato graph processed/generated by the tool. will be populated in future cl. 2. in internal code, we use graphviz to render the neato graph and save graphviz url into the same field (replacing). allow a timeline graph url to be show in frontend of memory viewer tools. PiperOrigin-RevId: 500776195
| Commit: | e34f775 | |
|---|---|---|
| Author: | A. Unique TensorFlower | |
| Committer: | TensorFlower Gardener | |
Text format and parser for dynamic_shape_metadata_prefix_bytes PiperOrigin-RevId: 500764229
| Commit: | 922f519 | |
|---|---|---|
| Author: | Peter Buchlovsky | |
| Committer: | TensorFlower Gardener | |
[LatencyHidingScheduler] Add HloModule to latency-hiding schedule proto. PiperOrigin-RevId: 500687813
| Commit: | a99f386 | |
|---|---|---|
| Author: | He Jiang | |
| Committer: | TensorFlower Gardener | |
Mark is_precission_loss_allowed as obsolete based on https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/gpu/delegate_options.h PiperOrigin-RevId: 500669676