Proto commits in graphcore/tensorflow

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

Commit:a25bcc9
Author:Jack Hunt
Committer:Jack Hunt

FP8 Conv Ops Summary: FP8 conv implementation that tries to maintain the usual conv APIs. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: awf Maniphest Tasks: T65648 Differential Revision: https://phabricator.sourcevertex.net/D76354

The documentation is generated from this commit.

Commit:fd3c9b5
Author:Samuel Hornby
Committer:Samuel Hornby

FP8 Matmul custom op Summary: Provide the FP8 * FP8 matmul operation Test Plan: included Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jackh, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Maniphest Tasks: T65647 Differential Revision: https://phabricator.sourcevertex.net/D72812

Commit:b1f1f0c
Author:George Pawelczak
Committer:George Pawelczak

Pass the executable options to serielize/deserialize Summary: Make sure the engine is re-created with correct options. Fix T67132 TF1.15 Only Test Plan: Poprun tests pass CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, christiana, frederikm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, christiana, frederikm Maniphest Tasks: T67132 Differential Revision: https://phabricator.sourcevertex.net/D72622

Commit:535c61b
Author:George Pawelczak
Committer:George Pawelczak

Pass the executable options to serielize/deserialize Summary: Make sure the engine is re-created with correct options. Fix T67132 Test Plan: Poprun tests pass CI Reviewers: christiana, frederikm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: christiana, frederikm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T67132 Differential Revision: https://phabricator.sourcevertex.net/D72619

Commit:e03ae94
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Implement ConvertToF8/ConvertFromF8 custom ops Summary: This commit adds two custom ops for converting to and from f8 tensors represented by tuple of u8 data and u8 metadata scalar. Fix tuple support in `HloPoplarTestBase::ExecuteNoHloPasses` for f8_test. Fix T65650. Test Plan: CI, new numeric f8_test Reviewers: georgep, samuelh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: samuelh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T65650 Differential Revision: https://phabricator.sourcevertex.net/D71192

Commit:b979f29
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Replace selects which could be computed in compile time with mask fusion Summary: This commit adds new path - mask_finder that searches for select with condition we can compute at compile time. If this select has broadcast of constant as one of its true/false operands, we can replace such select with a sequence of poplar copy() Fix T36290. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk, gauthamg Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Subscribers: georgep Maniphest Tasks: T36290 Differential Revision: https://phabricator.sourcevertex.net/D41764

Commit:b377563
Author:Jake
Committer:Jake

Use remote buffers to store entry computation arguments and results, when available. Summary: What's changed: - Use remote buffers instead of data streams to handle entry computation arguments and results. - Saves HEXOPT space when there are large streams. - Disabled This opens the option to only copy exactly what we need, instead of unconditionally copying everything. Ref T63600 Test Plan: CI tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, gauthamg, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, gauthamg, babakk Subscribers: babakk, tomm Maniphest Tasks: T63600 Differential Revision: https://phabricator.sourcevertex.net/D67730

Commit:ed6f862
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add allow-non-inplace flag to fusion config Summary: This commit adds flag to fusion config allowing fusions to indicate if they support both inplace and non-inplace variants of lowering. Fix T61776. Test Plan: CI, no functional changes Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk Maniphest Tasks: T61776 Differential Revision: https://phabricator.sourcevertex.net/D66706

Commit:a85a0e5
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add allow-non-inplace flag to fusion config Summary: This commit adds flag to fusion config allowing fusions to indicate if they support both inplace and non-inplace variants of lowering. Fix T61776. Test Plan: CI, no functional changes Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk Maniphest Tasks: T61776 Differential Revision: https://phabricator.sourcevertex.net/D66706

Commit:fc09013
Author:Sam Hornby
Committer:George Pawelczak

Make inputs to tensor lists uninitialised Summary: To prevent copies before the while loops mark these inputs as uninitialised Reviewers: vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Subscribers: babakk Maniphest Tasks: T53525 Differential Revision: https://phabricator.sourcevertex.net/D64827

Commit:aece6fc
Author:Sam Hornby
Committer:Sam Hornby

Make inputs to tensor lists uninitialised Summary: To prevent copies before the while loops mark these inputs as uninitialised Reviewers: vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Subscribers: babakk Maniphest Tasks: T53525 Differential Revision: https://phabricator.sourcevertex.net/D64827

Commit:bc6122f
Author:Jake
Committer:George Pawelczak

Add softmax and stable softmax as ipu ops. Summary: Allow users to target the poplibs softmax and stable softmax from the python frontend. This doesn't replace the TF2XLA softmax because I'm not sure whether it's always better. Ref T59577 Test Plan: Added a softmax test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Subscribers: babakk, georgep Maniphest Tasks: T59577 Differential Revision: https://phabricator.sourcevertex.net/D65228

Commit:d4fc01a
Author:yanislavd
Committer:George Pawelczak

Add a `StaticMultiUpdateAdd` instruction Summary: Add a multi update add instruction, in which the update indices are a static attribute. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Subscribers: babakk, georgep Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D63985

Commit:7829694
Author:Jake
Committer:Jake

Add softmax and stable softmax as ipu ops. Summary: Allow users to target the poplibs softmax and stable softmax from the python frontend. This doesn't replace the TF2XLA softmax because I'm not sure whether it's always better. Ref T59577 Test Plan: Added a softmax test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Subscribers: babakk, georgep Maniphest Tasks: T59577 Differential Revision: https://phabricator.sourcevertex.net/D65228

Commit:af7718d
Author:yanislavd
Committer:yanislavd

Add a `StaticMultiUpdateAdd` instruction Summary: Add a multi update add instruction, in which the update indices are a static attribute. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Subscribers: babakk, georgep Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D63985

Commit:2b42c68
Author:George Pawelczak
Committer:George Pawelczak

Merge branch 'poplar/r2.5/release' into poplar/r2.6/release

Commit:1c7babe
Author:yanislavd
Committer:yanislavd

Add a StaticMultiSlice HLO instruction Summary: Add a StaticMultiSlice HLO instruction Test Plan: Numerical test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, alfiee, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: harrym Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D62944

Commit:2a321d1
Author:yanislavd
Committer:yanislavd

Add a StaticMultiSlice HLO instruction Summary: Add a StaticMultiSlice HLO instruction Test Plan: Numerical test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, alfiee, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: harrym Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D62944

Commit:e0d6128
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add CloneMethod_DeduceNewOrderOrBypass Summary: This commit adds deducing bypass method, allowing to pass input as is unless it has an allocation target. Ref T51153. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh Maniphest Tasks: T51153 Differential Revision: https://phabricator.sourcevertex.net/D61820

Commit:cb2bf54
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add CloneMethod_DeduceNewOrderOrBypass Summary: This commit adds deducing bypass method, allowing to pass input as is unless it has an allocation target. Ref T51153. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh Maniphest Tasks: T51153 Differential Revision: https://phabricator.sourcevertex.net/D61820

Commit:1e0b974
Author:Sam Hornby
Committer:Sam Hornby

Provide option to optimise for latency Summary: Provide an option to aim to reduce number of packets at all costs. Extend all our visitors to have an {Ap/Pre}pendToSequence and always force {In/Out}feed programs to be added by these methods `TF2.5 only` version of D60642 lint Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, jackh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jackh Subscribers: alfiee, marieanne, vladimirm, harrym, babakk, jackh Maniphest Tasks: T54941 Differential Revision: https://phabricator.sourcevertex.net/D61243

Commit:bd5df52
Author:Sam Hornby
Committer:Sam Hornby

Provide option to optimise for latency Summary: Provide an option to aim to reduce number of packets at all costs. Extend all our visitors to have an {Ap/Pre}pendToSequence and always force {In/Out}feed programs to be added by these methods `TF1.15 only` Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, jackh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jackh, babakk Subscribers: jackh, babakk, harrym, vladimirm, marieanne Maniphest Tasks: T54941 Differential Revision: https://phabricator.sourcevertex.net/D60642

Commit:03a0de4
Author:Babak Khataee
Committer:Babak Khataee

Adding config option for controlling how much tile memory a dynamic-slice can use before being replaced. Summary: Adding `ipu_config.slices.replace_dynamic_slice_threshold` config option for controlling how much tile memory a dynamic-slice can use before being considered for replacement. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh, vladimirm, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: godfrey.da.costa, samho Maniphest Tasks: T54306 Differential Revision: https://phabricator.sourcevertex.net/D61016

Commit:d05cf94
Author:Babak Khataee
Committer:Babak Khataee

Adding config option for controlling how much tile memory a dynamic-slice can use before being replaced. Summary: Adding `ipu_config.slices.replace_dynamic_slice_threshold` config option for controlling how much tile memory a dynamic-slice can use before being considered for replacement. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh, vladimirm, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: godfrey.da.costa, samho Maniphest Tasks: T54306 Differential Revision: https://phabricator.sourcevertex.net/D61016

Commit:c5119bd
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add deferred allocation for deduce/bypass copies Summary: Explicit copies prevent loop parameters from having proper layout. This commit adds deferred allocation support for copies. Fix T55199. Test Plan: CI, new test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T55199 Differential Revision: https://phabricator.sourcevertex.net/D59864

Commit:64e9686
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add deferred allocation for deduce/bypass copies Summary: Explicit copies prevent loop parameters from having proper layout. This commit adds deferred allocation support for copies. Fix T55199. Test Plan: CI, new test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T55199 Differential Revision: https://phabricator.sourcevertex.net/D59864

Commit:e669b18
Author:Babak Khataee
Committer:George Pawelczak

Adding TF op for gcl::allReduceWithinReplica Summary: Adding op/kernel/inst and poplar op def for gcl::allReduceWithinReplica, follows the usual pattern. Adds the python function within_replicas.all_reduce which accepts a list of sharded tensors and returns the reduced results gathered over all the shards. Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, yanislavd, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd, vladimirm, samuelh Maniphest Tasks: T53767 Differential Revision: https://phabricator.sourcevertex.net/D59164

Commit:c085e64
Author:Babak Khataee
Committer:Babak Khataee

Adding TF op for gcl::allReduceWithinReplica Summary: Adding op/kernel/inst and poplar op def for gcl::allReduceWithinReplica, follows the usual pattern. Adds the python function within_replicas.all_reduce which accepts a list of sharded tensors and returns the reduced results gathered over all the shards. Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, yanislavd, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd, vladimirm, samuelh Maniphest Tasks: T53767 Differential Revision: https://phabricator.sourcevertex.net/D59164

Commit:f96cd5f
Author:Babak Khataee
Committer:George Pawelczak

Adding TF op for gcl:reduceScatterWithinReplica Summary: Initial op/kernel/inst/opdef for calling gcl::reduceScatterWithinReplica. Follows the usual pattern. Adds the python function within_replicas.reduce_scater which accepts a list of sharded tensors and returns a tuple of reduced results scattered over the shards. TF2.4 Only (TF1 - D59367) Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Maniphest Tasks: T52884 Differential Revision: https://phabricator.sourcevertex.net/D58826

Commit:dda5015
Author:Babak Khataee
Committer:George Pawelczak

Adding initial ops for targetting gcl::AllGatherWithinReplica call. Summary: Initial op/kernel/inst/opdef for calling gcl::AllGatherWithinReplica. Follows the usual pattern. Adds the python function `within_replicas.all_gather` which accepts a list of sharded tensors and returns a gathered tensor for each shard via a tuple. Test Plan: New tests. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, dominicm, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: vladimirm, georgep Maniphest Tasks: T52751 Differential Revision: https://phabricator.sourcevertex.net/D58119

Commit:178069d
Author:Christian aan de Wiel
Committer:George Pawelczak

Some bugprone and performance fixes Summary: Linter fixes Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: jakeh, vladimirm, babakk, georgep Maniphest Tasks: T53781 Differential Revision: https://phabricator.sourcevertex.net/D58742

Commit:822ebd0
Author:Babak Khataee
Committer:Babak Khataee

TF1 - Adding TF op for gcl:reduceScatterWithinReplica Summary: Initial op/kernel/inst/opdef for calling gcl::reduceScatterWithinReplica. Follows the usual pattern. Adds the python function within_replicas.reduce_scater which accepts a list of sharded tensors and returns a tuple of reduced results scattered over the shards. TF1.15 Only (TF2 - D58826) Original diff failed to Merge due to BUILD differences but also had a runtime difference in the python API (tensor.ref doesn't exist in TF1) Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: alfiee Maniphest Tasks: T52884 Differential Revision: https://phabricator.sourcevertex.net/D59367

Commit:6323583
Author:Babak Khataee
Committer:Babak Khataee

Adding initial ops for targetting gcl::AllGatherWithinReplica call. Summary: Initial op/kernel/inst/opdef for calling gcl::AllGatherWithinReplica. Follows the usual pattern. Adds the python function `within_replicas.all_gather` which accepts a list of sharded tensors and returns a gathered tensor for each shard via a tuple. Test Plan: New tests. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, dominicm, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: vladimirm, georgep Maniphest Tasks: T52751 Differential Revision: https://phabricator.sourcevertex.net/D58119

Commit:eafb977
Author:George Pawelczak
Committer:George Pawelczak

Merge branch 'poplar/r2.4/release' into poplar/r2.5/release

Commit:692da6f
Author:Jake
Committer:Jake

Target poplibs GeluErf Summary: What's changed: - Target poplibs gelu_erf. TF2.4 Only TF2 version of D57755. Resolves T52832 Test Plan: CI + new test case. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T52832 Differential Revision: https://phabricator.sourcevertex.net/D58115

Commit:8a1319e
Author:Jake
Committer:Jake

Target poplibs GeluErf Summary: What's changed: - Target poplibs gelu_erf. TF1.15 Only Another diff for TF2 Ref T52832 Test Plan: CI + new test case. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T52832 Differential Revision: https://phabricator.sourcevertex.net/D57755

Commit:10f4ecd
Author:George Pawelczak
Committer:George Pawelczak

Add ipu.control_flow_ops.barrier Summary: Add a barrier op to force control flow. Ref T52106 TF2.4 Only Test Plan: CI, added new tests Reviewers: babakk, samuelh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: babakk, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T52106 Differential Revision: https://phabricator.sourcevertex.net/D57600

Commit:1ba1aff
Author:George Pawelczak
Committer:George Pawelczak

Add ipu.control_flow_ops.barrier Summary: Add a barrier op to force control flow. Ref T52106 TF1.15 Only Test Plan: CI, added new tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk Subscribers: vladimirm, douglaso Maniphest Tasks: T52106 Differential Revision: https://phabricator.sourcevertex.net/D57602

Commit:d69e6ed
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add copy config and clone method to backend config Summary: This commit allow copy instruction to specify which clone method it would like to use for the output tensor. Fix T51151. Test Plan: CI, new test, check tile balance for copies with CloneMethod_PreserveOrderUnlessAliases Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Subscribers: jackh, georgep Maniphest Tasks: T51151 Differential Revision: https://phabricator.sourcevertex.net/D56691

Commit:69d7fad
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Add copy config and clone method to backend config Summary: This commit allow copy instruction to specify which clone method it would like to use for the output tensor. Fix T51151. Test Plan: CI, new test, check tile balance for copies with CloneMethod_PreserveOrderUnlessAliases Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Subscribers: jackh, georgep Maniphest Tasks: T51151 Differential Revision: https://phabricator.sourcevertex.net/D56691

Commit:c217f0c
Author:Piotr Chmiel
Committer:Piotr Chmiel

Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667

Commit:5bae55f
Author:Piotr Chmiel
Committer:Piotr Chmiel

Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667

Commit:12d2734
Author:George White
Committer:George White

Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56748

Commit:6de845d
Author:George White
Committer:George White

Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56748

Commit:7954103
Author:George White

Revert InterIpuCopy rename Summary: This reverts commit c34fbba7204f. Test Plan: revert-hammer Reviewers: Subscribers:

Commit:bca6c9a
Author:George White

Revert InterIpuCopy rename Summary: This reverts commit 5e8286bbb96b. Test Plan: revert-hammer Reviewers: Subscribers:

Commit:5e8286b
Author:George White
Committer:George White

Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56224

Commit:c34fbba
Author:George White
Committer:George White

Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56224

Commit:5e5b154
Author:George Pawelczak

Revert “Fuse scale with reduction” Summary: This reverts commit 872c25960fc50e216984f1c9f415fb3bfe94d7a8. Test Plan: revert-hammer Reviewers: Subscribers:

Commit:4e1f49d
Author:George Pawelczak

Revert “Fuse scale with reduction” Summary: This reverts commit db36495955742f40dc5ce7eafe1bf9096479ceef. Test Plan: revert-hammer Reviewers: Subscribers:

Commit:de4efb5
Author:Gautham Ganapathy
Committer:Gautham Ganapathy

Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF2.4 Only TF1 diff: D53583 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D55699

Commit:32a7060
Author:Gautham Ganapathy
Committer:Gautham Ganapathy

Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF1.15 Only TF2 diff: D55699 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D53583

Commit:db36495
Author:Piotr Chmiel
Committer:Piotr Chmiel

Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667

Commit:872c259
Author:Piotr Chmiel
Committer:Piotr Chmiel

Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667

Commit:19dde1a
Author:Gautham Ganapathy

Revert "Implement GradientAccumulatorAddWithScale" This reverts commit 4e7bb3206235438527ea5d2fd11cf14ca5288936.

Commit:d9f9933
Author:Gautham Ganapathy

Revert "Implement GradientAccumulatorAddWithScale" This reverts commit f6eb0436d06329d8e4267a238e0f2bc728a581ff.

Commit:4e7bb32
Author:Gautham Ganapathy
Committer:Gautham Ganapathy

Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF2.4 Only TF1 diff: D53583 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D55699

Commit:f6eb043
Author:Gautham Ganapathy
Committer:Gautham Ganapathy

Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF1.15 Only TF2 diff: D55699 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D53583

Commit:c58213b
Author:Babak Khataee
Committer:Babak Khataee

Adding missing return statements. Summary: Precursor to setting `-Werror=return-type` compiler flag which makes missing return statements an error. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49252 Differential Revision: https://phabricator.sourcevertex.net/D55485

Commit:755c0f2
Author:Babak Khataee
Committer:Babak Khataee

Adding missing return statements. Summary: Precursor to setting `-Werror=return-type` compiler flag which makes missing return statements an error. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49252 Differential Revision: https://phabricator.sourcevertex.net/D55485

Commit:02e2af1
Author:George White
Committer:George White

Combine multiple gather operations in to AllGather Summary: - Create a colocator to merge multiple gather operations in to a single AllGather operation where possible. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, alfiee, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, babakk Subscribers: georgep Maniphest Tasks: T48296 Differential Revision: https://phabricator.sourcevertex.net/D54111

Commit:9248ae5
Author:George White
Committer:George White

Combine multiple gather operations in to AllGather Summary: - Create a colocator to merge multiple gather operations in to a single AllGather operation where possible. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, alfiee, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, babakk Subscribers: georgep Maniphest Tasks: T48296 Differential Revision: https://phabricator.sourcevertex.net/D54111

Commit:7172879
Author:Mark Fowden
Committer:Mark Fowden

Remove _profiling from IPUConfig Summary: Removes the hidden _profiling category from IPUConfig that was added to temporarily support tests that still used profiling features. Also remove auto_assign_report_subdirectories from the internal config protobuf and executor as it's now redundant. Depends on D54505 Fixes T39600 Applies to both branches. Test Plan: CI. Removed relevant IPUConfig tests. Reviewers: alfiee, georgew, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T39600 Differential Revision: https://phabricator.sourcevertex.net/D54432

Commit:9255c74
Author:Mark Fowden
Committer:Mark Fowden

Remove _profiling from IPUConfig Summary: Removes the hidden _profiling category from IPUConfig that was added to temporarily support tests that still used profiling features. Also remove auto_assign_report_subdirectories from the internal config protobuf and executor as it's now redundant. Depends on D54505 Fixes T39600 Applies to both branches. Test Plan: CI. Removed relevant IPUConfig tests. Reviewers: alfiee, georgew, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T39600 Differential Revision: https://phabricator.sourcevertex.net/D54432

Commit:e5c42f0
Author:Babak Khataee
Committer:Babak Khataee

Update PrngSeedState to call poplar::setStochasticRounding Summary: Adding StochasticRoundingMethod_None option so stochastic rounding can be disabled/enabled via the PrngSeedState class. This makes it easier to keep calls to poplar::setStochasticRounding in sync with the order of poplar program execution, as we already do that for the other stochastic rounding modes. Test Plan: CI + New C++ tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48700 Differential Revision: https://phabricator.sourcevertex.net/D54429

Commit:9023901
Author:Babak Khataee
Committer:Babak Khataee

Update PrngSeedState to call poplar::setStochasticRounding Summary: Adding StochasticRoundingMethod_None option so stochastic rounding can be disabled/enabled via the PrngSeedState class. This makes it easier to keep calls to poplar::setStochasticRounding in sync with the order of poplar program execution, as we already do that for the other stochastic rounding modes. Test Plan: CI + New C++ tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48700 Differential Revision: https://phabricator.sourcevertex.net/D54429

Commit:b2a9380
Author:Babak Khataee
Committer:Babak Khataee

Removing deterministicWorkers backend option Summary: Removing deterministicWorkers backend option since it's a global setting and so cant be set per instruction, which was the original intention. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48677 Differential Revision: https://phabricator.sourcevertex.net/D54293

Commit:ffab0de
Author:Babak Khataee
Committer:Babak Khataee

Removing deterministicWorkers backend option Summary: Removing deterministicWorkers backend option since it's a global setting and so cant be set per instruction, which was the original intention. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48677 Differential Revision: https://phabricator.sourcevertex.net/D54293

Commit:d838b4c
Author:George Pawelczak
Committer:George Pawelczak

Add Poplar checks into embedded runtime Summary: Ref T48682 Test Plan: CI Reviewers: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T48682 Differential Revision: https://phabricator.sourcevertex.net/D54281

Commit:523a059
Author:George Pawelczak
Committer:George Pawelczak

Add Poplar checks into embedded runtime Summary: Ref T48682 Test Plan: CI Reviewers: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T48682 Differential Revision: https://phabricator.sourcevertex.net/D54281

Commit:43fac35
Author:George Pawelczak
Committer:George Pawelczak

Remove HloReplicationIndexInstruction Summary: Fix T47046 Test Plan: CI Reviewers: babakk, vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: babakk, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T47046 Differential Revision: https://phabricator.sourcevertex.net/D54265

Commit:7921881
Author:George Pawelczak
Committer:George Pawelczak

Remove HloReplicationIndexInstruction Summary: Fix T47046 Test Plan: CI Reviewers: babakk, vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: babakk, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T47046 Differential Revision: https://phabricator.sourcevertex.net/D54265

Commit:d67a243
Author:Gautham Ganapathy
Committer:Gautham Ganapathy

Add reduce-mean support in reduce-scatter Summary: REF T47313 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Subscribers: hakons Maniphest Tasks: T47313 Differential Revision: https://phabricator.sourcevertex.net/D53827

Commit:979528b
Author:Gautham Ganapathy
Committer:Gautham Ganapathy

Add reduce-mean support in reduce-scatter Summary: REF T47313 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Subscribers: hakons Maniphest Tasks: T47313 Differential Revision: https://phabricator.sourcevertex.net/D53827

Commit:051581e
Author:Babak Khataee
Committer:Babak Khataee

Adding enable_experimental_prng stability flag. Summary: Adding enable_experimental_prng stability flag for conditionally enabling work related to sr/prng seed management. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T46295 Differential Revision: https://phabricator.sourcevertex.net/D51859

Commit:30ade88
Author:Babak Khataee
Committer:Babak Khataee

Adding enable_experimental_prng stability flag. Summary: Adding enable_experimental_prng stability flag for conditionally enabling work related to sr/prng seed management. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T46295 Differential Revision: https://phabricator.sourcevertex.net/D51859

Commit:46ac05a
Author:Babak Khataee
Committer:Babak Khataee

Adding StochasticRoundingMethod backend option (TF1) Summary: Fixing up D51361 for TF1 - Removed TF2 specific optypes from NeedsSpecificSeedType This change adds the StochasticRoundingMethod option to the backend config. This is intended to be used as an explicit way of describing how we want to perform stochastic rounding (with an identical seed/differing seed or either). By having an extra backend option we avoid having to overload the meaning of being replica identical and having to add an extra category to the replica dataflow analysis, which will further complicate it. StochasticRoundingMethod gets set by the AddStochasticRoundingOptions so only instructions which require a specific type of seed will cause the seeds to be changed. It's currently setup so that instructions which read/restructure data don't change the seed. TF1.15 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T45895 Differential Revision: https://phabricator.sourcevertex.net/D51636

Commit:9b27851
Author:Babak Khataee
Committer:Babak Khataee

Adding StochasticRoundingMethod backend option Summary: This change adds the `StochasticRoundingMethod` option to the backend config. This is intended to be used as an explicit way of describing how we want to perform stochastic rounding (with an identical seed/differing seed or either). By having an extra backend option we avoid having to overload the meaning of being replica identical and having to add an extra category to the replica dataflow analysis, which will further complicate it. `StochasticRoundingMethod` gets set by the `AddStochasticRoundingOptions` so only instructions which require a specific type of seed will cause the seeds to be changed. It's currently setup so that instructions which read/restructure data don't change the seed. (TF1 - D51636) TF2.4 Only Test Plan: C++ Tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T45895 Differential Revision: https://phabricator.sourcevertex.net/D51361

Commit:2a11142
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Allow merging of the remote buffers across identical clusters Summary: Follow the logic in subcomputation graph caching and compare elementwise cluster computations. Allow merging buffers across identical clusters so they could be reused. Propagate all merged indices so new size will be changes in all remote buffer info structures. Fix T45972. Test Plan: CI, fixed HW test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Maniphest Tasks: T45972 Differential Revision: https://phabricator.sourcevertex.net/D51403

Commit:c1caec5
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Allow merging of the remote buffers across identical clusters Summary: Follow the logic in subcomputation graph caching and compare elementwise cluster computations. Allow merging buffers across identical clusters so they could be reused. Propagate all merged indices so new size will be changes in all remote buffer info structures. Fix T45972. Test Plan: CI, fixed HW test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Maniphest Tasks: T45972 Differential Revision: https://phabricator.sourcevertex.net/D51403

Commit:57734a3
Author:Christian aan de Wiel
Committer:Christian aan de Wiel

Move `enable_fast_math` to algebraic simplifier config Summary: TF2.4 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Subscribers: georgep, vladimirm Maniphest Tasks: T45300 Differential Revision: https://phabricator.sourcevertex.net/D51300

Commit:6d8f8d6
Author:Christian aan de Wiel
Committer:Christian aan de Wiel

Move `enable_fast_math` to algebraic simplifier config Summary: TF1.15 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Subscribers: jamiep, vladimirm, georgep Maniphest Tasks: T45300 Differential Revision: https://phabricator.sourcevertex.net/D51010

Commit:1ecc322
Author:Samuel Hornby
Committer:Samuel Hornby

Make accumulation count of resource update runtime input Summary: Provide gradient accumulation op inside the resource update, and use this when finding the gradient accumulation count later of resource updates. Also adapt passes to handle this as an optional. TF2.4 only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, markf Subscribers: markf, jakeh Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D50184

Commit:0d01511
Author:Samuel Hornby
Committer:Samuel Hornby

Make accumulation count of resource update runtime input Summary: Provide gradient accumulation op inside the resource update, and use this when finding the gradient accumulation count later of resource updates. Also adapt passes to handle this as an optional. TF1.15 only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: markf, jakeh Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D51231

Commit:b7d0ca2
Author:George Pawelczak

Merge branch 'poplar/r2.4/release' into poplar/r2.4/merge

Commit:78a1cf2
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Use CollectiveBalancedReorder for replicated tensor sharding clusters Summary: This commit provides both host and runtime rearrangements for the clusters with reduce-scatter/all-gather to ensure the minimal exchange is required. Fix T35351. Overview of the changes: Replicated resource update elementwise clustering pass: - Marks elementwise clusters with 'partitioned_elementwise_cluster' attribute to allow custom visitor for those clusters later. - Insert new custom instructions: collective-rearrange and undo-collective-rearrange before reduce-scatter and after all-gather for unpartitioned remote buffers. Add replicated elementwise cluster visitor, and add additional validation rules in it. GCL collective balance reorder may return any particular shape depending on the input layout, so validate it not only against XLA shape, but also replica slice and collectives tensor. Add host rearrangement for remote buffers in poplar executor class. This is host-side equivalent of the collective-reorder/undo-collective-reorder instructions. Test Plan: CI, additional host rearrangement code in replicated_resource_update_elementwise_clustering_hw_test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Subscribers: hakons Maniphest Tasks: T35351 Differential Revision: https://phabricator.sourcevertex.net/D44634

Commit:1fe7aa4
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Use CollectiveBalancedReorder for replicated tensor sharding clusters Summary: This commit provides both host and runtime rearrangements for the clusters with reduce-scatter/all-gather to ensure the minimal exchange is required. Fix T35351. Overview of the changes: Replicated resource update elementwise clustering pass: - Marks elementwise clusters with 'partitioned_elementwise_cluster' attribute to allow custom visitor for those clusters later. - Insert new custom instructions: collective-rearrange and undo-collective-rearrange before reduce-scatter and after all-gather for unpartitioned remote buffers. Add replicated elementwise cluster visitor, and add additional validation rules in it. GCL collective balance reorder may return any particular shape depending on the input layout, so validate it not only against XLA shape, but also replica slice and collectives tensor. Add host rearrangement for remote buffers in poplar executor class. This is host-side equivalent of the collective-reorder/undo-collective-reorder instructions. Test Plan: CI, additional host rearrangement code in replicated_resource_update_elementwise_clustering_hw_test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Subscribers: hakons Maniphest Tasks: T35351 Differential Revision: https://phabricator.sourcevertex.net/D44634

Commit:bcabd5f
Author:Christian aan de Wiel
Committer:Christian aan de Wiel

Add dot strengh reduction optimisation Summary: TF2.4 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Subscribers: hakons, vladimirm, markf Maniphest Tasks: T44870, T45300 Differential Revision: https://phabricator.sourcevertex.net/D50808

Commit:8cf92fb
Author:Christian aan de Wiel
Committer:Christian aan de Wiel

Add dot strengh reduction optimisation Summary: TF1.15 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Subscribers: markf, vladimirm, hakons Maniphest Tasks: T44870, T45300 Differential Revision: https://phabricator.sourcevertex.net/D50520

Commit:473aef2
Author:Samuel Hornby
Committer:Samuel Hornby

Add gradient accumulation count op Summary: To be used to track dynamic counts for resource update op Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Subscribers: markf Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D50437

Commit:ad4ca24
Author:Samuel Hornby
Committer:Samuel Hornby

Add gradient accumulation count op Summary: To be used to track dynamic counts for resource update op Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Subscribers: markf Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D50437

Commit:96b97c2
Author:George Pawelczak
Committer:George Pawelczak

Track whether an executable can stall on lack of inputs Summary: Track whether the compiled module can stall without more data. Ref T41143 Test Plan: CI Pipeline already tested Added a test for IO tiles which stalled. Reviewers: jakeh, gauthamg, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T41143 Differential Revision: https://phabricator.sourcevertex.net/D49943

Commit:8657e1b
Author:George Pawelczak
Committer:George Pawelczak

Track whether an executable can stall on lack of inputs Summary: Track whether the compiled module can stall without more data. Ref T41143 Test Plan: CI Pipeline already tested Added a test for IO tiles which stalled. Reviewers: jakeh, gauthamg, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T41143 Differential Revision: https://phabricator.sourcevertex.net/D49943

Commit:4b6bd6b
Author:Alfie Edwards
Committer:Alfie Edwards

Adding ReduceMany op and colocation helper Summary: TF2.4 Only Adding hlo-only ReduceMany op. Reductions (including fusion reductions) can be combined into these ReduceMany ops. The interface to control this is this is a new ipu config option optimizations.maximum_reduce_many_buffer_size. This also has a change to the clustering scheduler to prevent a memory regression in a test. The change makes it so that ops with a valid colocator helper will not be put into their own cluster if the buffer size for the colocator is zero. This will prevent colocator helpers added in future from affecting the schedule in unrelated models. V1 Diff: D49202 Test Plan: Tests check that that simple reduces and reduce fusions get combined in hlo according to the specified optimizations.maximum_reduce_many_buffer_size. There is also a test which executes a graph with a ReduceMany and checks the output values. Reviewers: #tensorflow, simonl, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T40084 Differential Revision: https://phabricator.sourcevertex.net/D48529

Commit:7d02f9e
Author:Alfie Edwards
Committer:Alfie Edwards

Adding ReduceMany op and colocation helper Summary: TF1.15 Only Adding hlo-only ReduceMany op. Reductions (including fusion reductions) can be combined into these ReduceMany ops. The interface to control this is this is a new ipu config option optimizations.maximum_reduce_many_buffer_size. This also has a change to the clustering scheduler to prevent a memory regression in a test. The change makes it so that ops with a valid colocator helper will not be put into their own cluster if the buffer size for the colocator is zero. This will prevent colocator helpers added in future from affecting the schedule in unrelated models. V2 Diff: D48529 Test Plan: Tests check that that simple reduces and reduce fusions get combined in hlo according to the specified optimizations.maximum_reduce_many_buffer_size. There is also a test which executes a graph with a ReduceMany and checks the output values. Reviewers: #tensorflow, simonl, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: zhenyingl, mariok, georgep Maniphest Tasks: T40084 Differential Revision: https://phabricator.sourcevertex.net/D49202

Commit:4c148c6
Author:Håkon Sandsmark
Committer:Håkon Sandsmark

Remove verified streams Summary: Fixes T43482. TF2.4 Only. Test Plan: Tested with Poplar with the public API removed as in D49012. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh Maniphest Tasks: T43482 Differential Revision: https://phabricator.sourcevertex.net/D49852

Commit:c081a5d
Author:Håkon Sandsmark
Committer:Håkon Sandsmark

Remove verified streams Summary: Fixes T43482. TF1.15 Only. Test Plan: Tested with Poplar with the public API removed as in D49012. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, anthonyb, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh Maniphest Tasks: T43482 Differential Revision: https://phabricator.sourcevertex.net/D49757

Commit:b7c6387
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Populate gcl::CollectiveBalancedHostRearrangement objects in PoplarExecutableCore Summary: This commit creates all host rearrangement objects in advance and speeds up run preparations. Fix T44246. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T44246 Differential Revision: https://phabricator.sourcevertex.net/D49526

Commit:c3dda97
Author:Vladimir Menshakov
Committer:Vladimir Menshakov

Populate gcl::CollectiveBalancedHostRearrangement objects in PoplarExecutableCore Summary: This commit creates all host rearrangement objects in advance and speeds up run preparations. Fix T44246. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T44246 Differential Revision: https://phabricator.sourcevertex.net/D49526

Commit:b5cc1a6
Author:Alfie Edwards
Committer:Alfie Edwards

Adding poplar options flags for slice operations Summary: Adds a slices.poplar_options dictionary to the config similar to matmuls.poplar_options. The options specified get passed into calls to popops::multiSlice, popops::multiUpdate, popops::multiUpdateAdd, and popops::embedding::plan. Slice options can also be specified per-pipeline-stage as part of PipelineStageOptions. Reviewers: #tensorflow, simonl, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T42623 Differential Revision: https://phabricator.sourcevertex.net/D49190