These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
| Commit: | a25bcc9 | |
|---|---|---|
| Author: | Jack Hunt | |
| Committer: | Jack Hunt | |
FP8 Conv Ops Summary: FP8 conv implementation that tries to maintain the usual conv APIs. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: awf Maniphest Tasks: T65648 Differential Revision: https://phabricator.sourcevertex.net/D76354
The documentation is generated from this commit.
| Commit: | fd3c9b5 | |
|---|---|---|
| Author: | Samuel Hornby | |
| Committer: | Samuel Hornby | |
FP8 Matmul custom op Summary: Provide the FP8 * FP8 matmul operation Test Plan: included Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jackh, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Maniphest Tasks: T65647 Differential Revision: https://phabricator.sourcevertex.net/D72812
| Commit: | b1f1f0c | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Pass the executable options to serielize/deserialize Summary: Make sure the engine is re-created with correct options. Fix T67132 TF1.15 Only Test Plan: Poprun tests pass CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, christiana, frederikm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, christiana, frederikm Maniphest Tasks: T67132 Differential Revision: https://phabricator.sourcevertex.net/D72622
| Commit: | 535c61b | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Pass the executable options to serielize/deserialize Summary: Make sure the engine is re-created with correct options. Fix T67132 Test Plan: Poprun tests pass CI Reviewers: christiana, frederikm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: christiana, frederikm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T67132 Differential Revision: https://phabricator.sourcevertex.net/D72619
| Commit: | e03ae94 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Implement ConvertToF8/ConvertFromF8 custom ops Summary: This commit adds two custom ops for converting to and from f8 tensors represented by tuple of u8 data and u8 metadata scalar. Fix tuple support in `HloPoplarTestBase::ExecuteNoHloPasses` for f8_test. Fix T65650. Test Plan: CI, new numeric f8_test Reviewers: georgep, samuelh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: samuelh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T65650 Differential Revision: https://phabricator.sourcevertex.net/D71192
| Commit: | b979f29 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Replace selects which could be computed in compile time with mask fusion Summary: This commit adds new path - mask_finder that searches for select with condition we can compute at compile time. If this select has broadcast of constant as one of its true/false operands, we can replace such select with a sequence of poplar copy() Fix T36290. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk, gauthamg Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Subscribers: georgep Maniphest Tasks: T36290 Differential Revision: https://phabricator.sourcevertex.net/D41764
| Commit: | b377563 | |
|---|---|---|
| Author: | Jake | |
| Committer: | Jake | |
Use remote buffers to store entry computation arguments and results, when available. Summary: What's changed: - Use remote buffers instead of data streams to handle entry computation arguments and results. - Saves HEXOPT space when there are large streams. - Disabled This opens the option to only copy exactly what we need, instead of unconditionally copying everything. Ref T63600 Test Plan: CI tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, gauthamg, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, gauthamg, babakk Subscribers: babakk, tomm Maniphest Tasks: T63600 Differential Revision: https://phabricator.sourcevertex.net/D67730
| Commit: | ed6f862 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add allow-non-inplace flag to fusion config Summary: This commit adds flag to fusion config allowing fusions to indicate if they support both inplace and non-inplace variants of lowering. Fix T61776. Test Plan: CI, no functional changes Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk Maniphest Tasks: T61776 Differential Revision: https://phabricator.sourcevertex.net/D66706
| Commit: | a85a0e5 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add allow-non-inplace flag to fusion config Summary: This commit adds flag to fusion config allowing fusions to indicate if they support both inplace and non-inplace variants of lowering. Fix T61776. Test Plan: CI, no functional changes Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk Maniphest Tasks: T61776 Differential Revision: https://phabricator.sourcevertex.net/D66706
| Commit: | fc09013 | |
|---|---|---|
| Author: | Sam Hornby | |
| Committer: | George Pawelczak | |
Make inputs to tensor lists uninitialised Summary: To prevent copies before the while loops mark these inputs as uninitialised Reviewers: vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Subscribers: babakk Maniphest Tasks: T53525 Differential Revision: https://phabricator.sourcevertex.net/D64827
| Commit: | aece6fc | |
|---|---|---|
| Author: | Sam Hornby | |
| Committer: | Sam Hornby | |
Make inputs to tensor lists uninitialised Summary: To prevent copies before the while loops mark these inputs as uninitialised Reviewers: vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd Subscribers: babakk Maniphest Tasks: T53525 Differential Revision: https://phabricator.sourcevertex.net/D64827
| Commit: | bc6122f | |
|---|---|---|
| Author: | Jake | |
| Committer: | George Pawelczak | |
Add softmax and stable softmax as ipu ops. Summary: Allow users to target the poplibs softmax and stable softmax from the python frontend. This doesn't replace the TF2XLA softmax because I'm not sure whether it's always better. Ref T59577 Test Plan: Added a softmax test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Subscribers: babakk, georgep Maniphest Tasks: T59577 Differential Revision: https://phabricator.sourcevertex.net/D65228
| Commit: | d4fc01a | |
|---|---|---|
| Author: | yanislavd | |
| Committer: | George Pawelczak | |
Add a `StaticMultiUpdateAdd` instruction Summary: Add a multi update add instruction, in which the update indices are a static attribute. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Subscribers: babakk, georgep Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D63985
| Commit: | 7829694 | |
|---|---|---|
| Author: | Jake | |
| Committer: | Jake | |
Add softmax and stable softmax as ipu ops. Summary: Allow users to target the poplibs softmax and stable softmax from the python frontend. This doesn't replace the TF2XLA softmax because I'm not sure whether it's always better. Ref T59577 Test Plan: Added a softmax test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm Subscribers: babakk, georgep Maniphest Tasks: T59577 Differential Revision: https://phabricator.sourcevertex.net/D65228
| Commit: | af7718d | |
|---|---|---|
| Author: | yanislavd | |
| Committer: | yanislavd | |
Add a `StaticMultiUpdateAdd` instruction Summary: Add a multi update add instruction, in which the update indices are a static attribute. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, babakk Subscribers: babakk, georgep Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D63985
| Commit: | 2b42c68 | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Merge branch 'poplar/r2.5/release' into poplar/r2.6/release
| Commit: | 1c7babe | |
|---|---|---|
| Author: | yanislavd | |
| Committer: | yanislavd | |
Add a StaticMultiSlice HLO instruction Summary: Add a StaticMultiSlice HLO instruction Test Plan: Numerical test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, alfiee, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: harrym Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D62944
| Commit: | 2a321d1 | |
|---|---|---|
| Author: | yanislavd | |
| Committer: | yanislavd | |
Add a StaticMultiSlice HLO instruction Summary: Add a StaticMultiSlice HLO instruction Test Plan: Numerical test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, alfiee, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: harrym Maniphest Tasks: T54057 Differential Revision: https://phabricator.sourcevertex.net/D62944
| Commit: | e0d6128 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add CloneMethod_DeduceNewOrderOrBypass Summary: This commit adds deducing bypass method, allowing to pass input as is unless it has an allocation target. Ref T51153. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh Maniphest Tasks: T51153 Differential Revision: https://phabricator.sourcevertex.net/D61820
| Commit: | cb2bf54 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add CloneMethod_DeduceNewOrderOrBypass Summary: This commit adds deducing bypass method, allowing to pass input as is unless it has an allocation target. Ref T51153. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh Maniphest Tasks: T51153 Differential Revision: https://phabricator.sourcevertex.net/D61820
| Commit: | 1e0b974 | |
|---|---|---|
| Author: | Sam Hornby | |
| Committer: | Sam Hornby | |
Provide option to optimise for latency Summary: Provide an option to aim to reduce number of packets at all costs. Extend all our visitors to have an {Ap/Pre}pendToSequence and always force {In/Out}feed programs to be added by these methods `TF2.5 only` version of D60642 lint Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, jackh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jackh Subscribers: alfiee, marieanne, vladimirm, harrym, babakk, jackh Maniphest Tasks: T54941 Differential Revision: https://phabricator.sourcevertex.net/D61243
| Commit: | bd5df52 | |
|---|---|---|
| Author: | Sam Hornby | |
| Committer: | Sam Hornby | |
Provide option to optimise for latency Summary: Provide an option to aim to reduce number of packets at all costs. Extend all our visitors to have an {Ap/Pre}pendToSequence and always force {In/Out}feed programs to be added by these methods `TF1.15 only` Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, jackh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jackh, babakk Subscribers: jackh, babakk, harrym, vladimirm, marieanne Maniphest Tasks: T54941 Differential Revision: https://phabricator.sourcevertex.net/D60642
| Commit: | 03a0de4 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding config option for controlling how much tile memory a dynamic-slice can use before being replaced. Summary: Adding `ipu_config.slices.replace_dynamic_slice_threshold` config option for controlling how much tile memory a dynamic-slice can use before being considered for replacement. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh, vladimirm, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: godfrey.da.costa, samho Maniphest Tasks: T54306 Differential Revision: https://phabricator.sourcevertex.net/D61016
| Commit: | d05cf94 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding config option for controlling how much tile memory a dynamic-slice can use before being replaced. Summary: Adding `ipu_config.slices.replace_dynamic_slice_threshold` config option for controlling how much tile memory a dynamic-slice can use before being considered for replacement. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, samuelh, vladimirm, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: godfrey.da.costa, samho Maniphest Tasks: T54306 Differential Revision: https://phabricator.sourcevertex.net/D61016
| Commit: | c5119bd | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add deferred allocation for deduce/bypass copies Summary: Explicit copies prevent loop parameters from having proper layout. This commit adds deferred allocation support for copies. Fix T55199. Test Plan: CI, new test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T55199 Differential Revision: https://phabricator.sourcevertex.net/D59864
| Commit: | 64e9686 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add deferred allocation for deduce/bypass copies Summary: Explicit copies prevent loop parameters from having proper layout. This commit adds deferred allocation support for copies. Fix T55199. Test Plan: CI, new test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, samuelh, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T55199 Differential Revision: https://phabricator.sourcevertex.net/D59864
| Commit: | e669b18 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | George Pawelczak | |
Adding TF op for gcl::allReduceWithinReplica Summary: Adding op/kernel/inst and poplar op def for gcl::allReduceWithinReplica, follows the usual pattern. Adds the python function within_replicas.all_reduce which accepts a list of sharded tensors and returns the reduced results gathered over all the shards. Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, yanislavd, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd, vladimirm, samuelh Maniphest Tasks: T53767 Differential Revision: https://phabricator.sourcevertex.net/D59164
| Commit: | c085e64 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding TF op for gcl::allReduceWithinReplica Summary: Adding op/kernel/inst and poplar op def for gcl::allReduceWithinReplica, follows the usual pattern. Adds the python function within_replicas.all_reduce which accepts a list of sharded tensors and returns the reduced results gathered over all the shards. Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, yanislavd, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, yanislavd, vladimirm, samuelh Maniphest Tasks: T53767 Differential Revision: https://phabricator.sourcevertex.net/D59164
| Commit: | f96cd5f | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | George Pawelczak | |
Adding TF op for gcl:reduceScatterWithinReplica Summary: Initial op/kernel/inst/opdef for calling gcl::reduceScatterWithinReplica. Follows the usual pattern. Adds the python function within_replicas.reduce_scater which accepts a list of sharded tensors and returns a tuple of reduced results scattered over the shards. TF2.4 Only (TF1 - D59367) Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Maniphest Tasks: T52884 Differential Revision: https://phabricator.sourcevertex.net/D58826
| Commit: | dda5015 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | George Pawelczak | |
Adding initial ops for targetting gcl::AllGatherWithinReplica call. Summary: Initial op/kernel/inst/opdef for calling gcl::AllGatherWithinReplica. Follows the usual pattern. Adds the python function `within_replicas.all_gather` which accepts a list of sharded tensors and returns a gathered tensor for each shard via a tuple. Test Plan: New tests. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, dominicm, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: vladimirm, georgep Maniphest Tasks: T52751 Differential Revision: https://phabricator.sourcevertex.net/D58119
| Commit: | 178069d | |
|---|---|---|
| Author: | Christian aan de Wiel | |
| Committer: | George Pawelczak | |
Some bugprone and performance fixes Summary: Linter fixes Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: jakeh, vladimirm, babakk, georgep Maniphest Tasks: T53781 Differential Revision: https://phabricator.sourcevertex.net/D58742
| Commit: | 822ebd0 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
TF1 - Adding TF op for gcl:reduceScatterWithinReplica Summary: Initial op/kernel/inst/opdef for calling gcl::reduceScatterWithinReplica. Follows the usual pattern. Adds the python function within_replicas.reduce_scater which accepts a list of sharded tensors and returns a tuple of reduced results scattered over the shards. TF1.15 Only (TF2 - D58826) Original diff failed to Merge due to BUILD differences but also had a runtime difference in the python API (tensor.ref doesn't exist in TF1) Test Plan: New tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee Subscribers: alfiee Maniphest Tasks: T52884 Differential Revision: https://phabricator.sourcevertex.net/D59367
| Commit: | 6323583 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding initial ops for targetting gcl::AllGatherWithinReplica call. Summary: Initial op/kernel/inst/opdef for calling gcl::AllGatherWithinReplica. Follows the usual pattern. Adds the python function `within_replicas.all_gather` which accepts a list of sharded tensors and returns a gathered tensor for each shard via a tuple. Test Plan: New tests. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, dominicm, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: vladimirm, georgep Maniphest Tasks: T52751 Differential Revision: https://phabricator.sourcevertex.net/D58119
| Commit: | eafb977 | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Merge branch 'poplar/r2.4/release' into poplar/r2.5/release
| Commit: | 692da6f | |
|---|---|---|
| Author: | Jake | |
| Committer: | Jake | |
Target poplibs GeluErf Summary: What's changed: - Target poplibs gelu_erf. TF2.4 Only TF2 version of D57755. Resolves T52832 Test Plan: CI + new test case. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T52832 Differential Revision: https://phabricator.sourcevertex.net/D58115
| Commit: | 8a1319e | |
|---|---|---|
| Author: | Jake | |
| Committer: | Jake | |
Target poplibs GeluErf Summary: What's changed: - Target poplibs gelu_erf. TF1.15 Only Another diff for TF2 Ref T52832 Test Plan: CI + new test case. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T52832 Differential Revision: https://phabricator.sourcevertex.net/D57755
| Commit: | 10f4ecd | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Add ipu.control_flow_ops.barrier Summary: Add a barrier op to force control flow. Ref T52106 TF2.4 Only Test Plan: CI, added new tests Reviewers: babakk, samuelh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: babakk, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T52106 Differential Revision: https://phabricator.sourcevertex.net/D57600
| Commit: | 1ba1aff | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Add ipu.control_flow_ops.barrier Summary: Add a barrier op to force control flow. Ref T52106 TF1.15 Only Test Plan: CI, added new tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, babakk Subscribers: vladimirm, douglaso Maniphest Tasks: T52106 Differential Revision: https://phabricator.sourcevertex.net/D57602
| Commit: | d69e6ed | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add copy config and clone method to backend config Summary: This commit allow copy instruction to specify which clone method it would like to use for the output tensor. Fix T51151. Test Plan: CI, new test, check tile balance for copies with CloneMethod_PreserveOrderUnlessAliases Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Subscribers: jackh, georgep Maniphest Tasks: T51151 Differential Revision: https://phabricator.sourcevertex.net/D56691
| Commit: | 69d7fad | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Add copy config and clone method to backend config Summary: This commit allow copy instruction to specify which clone method it would like to use for the output tensor. Fix T51151. Test Plan: CI, new test, check tile balance for copies with CloneMethod_PreserveOrderUnlessAliases Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, jackh Subscribers: jackh, georgep Maniphest Tasks: T51151 Differential Revision: https://phabricator.sourcevertex.net/D56691
| Commit: | c217f0c | |
|---|---|---|
| Author: | Piotr Chmiel | |
| Committer: | Piotr Chmiel | |
Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667
| Commit: | 5bae55f | |
|---|---|---|
| Author: | Piotr Chmiel | |
| Committer: | Piotr Chmiel | |
Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667
| Commit: | 12d2734 | |
|---|---|---|
| Author: | George White | |
| Committer: | George White | |
Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56748
| Commit: | 6de845d | |
|---|---|---|
| Author: | George White | |
| Committer: | George White | |
Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56748
| Commit: | 7954103 | |
|---|---|---|
| Author: | George White | |
Revert InterIpuCopy rename Summary: This reverts commit c34fbba7204f. Test Plan: revert-hammer Reviewers: Subscribers:
| Commit: | bca6c9a | |
|---|---|---|
| Author: | George White | |
Revert InterIpuCopy rename Summary: This reverts commit 5e8286bbb96b. Test Plan: revert-hammer Reviewers: Subscribers:
| Commit: | 5e8286b | |
|---|---|---|
| Author: | George White | |
| Committer: | George White | |
Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56224
| Commit: | c34fbba | |
|---|---|---|
| Author: | George White | |
| Committer: | George White | |
Rename `IpuInterCopy` as `InterIpuCopy` Summary: This commit renames all mentions of `IpuInterCopy` in camel-case, kebab- case and snake-case with their `InterIpuCopy` counterpart, which sounds better, and is easier to search. Fixes T49914. Test Plan: Use the existing tests. This is an aesthetic change only. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49914 Differential Revision: https://phabricator.sourcevertex.net/D56224
| Commit: | 5e5b154 | |
|---|---|---|
| Author: | George Pawelczak | |
Revert “Fuse scale with reduction” Summary: This reverts commit 872c25960fc50e216984f1c9f415fb3bfe94d7a8. Test Plan: revert-hammer Reviewers: Subscribers:
| Commit: | 4e1f49d | |
|---|---|---|
| Author: | George Pawelczak | |
Revert “Fuse scale with reduction” Summary: This reverts commit db36495955742f40dc5ce7eafe1bf9096479ceef. Test Plan: revert-hammer Reviewers: Subscribers:
| Commit: | de4efb5 | |
|---|---|---|
| Author: | Gautham Ganapathy | |
| Committer: | Gautham Ganapathy | |
Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF2.4 Only TF1 diff: D53583 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D55699
| Commit: | 32a7060 | |
|---|---|---|
| Author: | Gautham Ganapathy | |
| Committer: | Gautham Ganapathy | |
Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF1.15 Only TF2 diff: D55699 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D53583
| Commit: | db36495 | |
|---|---|---|
| Author: | Piotr Chmiel | |
| Committer: | Piotr Chmiel | |
Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667
| Commit: | 872c259 | |
|---|---|---|
| Author: | Piotr Chmiel | |
| Committer: | Piotr Chmiel | |
Fuse scale with reduction Summary: Fixes T42432 popops supports fusing single element, f32 scale with reduction having one of the following types ADD, LOG_ADD, SQUARE_ADD Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm Subscribers: georgep Maniphest Tasks: T42432 Differential Revision: https://phabricator.sourcevertex.net/D55667
| Commit: | 19dde1a | |
|---|---|---|
| Author: | Gautham Ganapathy | |
Revert "Implement GradientAccumulatorAddWithScale" This reverts commit 4e7bb3206235438527ea5d2fd11cf14ca5288936.
| Commit: | d9f9933 | |
|---|---|---|
| Author: | Gautham Ganapathy | |
Revert "Implement GradientAccumulatorAddWithScale" This reverts commit f6eb0436d06329d8e4267a238e0f2bc728a581ff.
| Commit: | 4e7bb32 | |
|---|---|---|
| Author: | Gautham Ganapathy | |
| Committer: | Gautham Ganapathy | |
Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF2.4 Only TF1 diff: D53583 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D55699
| Commit: | f6eb043 | |
|---|---|---|
| Author: | Gautham Ganapathy | |
| Committer: | Gautham Ganapathy | |
Implement GradientAccumulatorAddWithScale Summary: Replace GradientAccumulatorAdd with GradientAccumulatorAddWithScale, which takes in an additional scale parameter for scaling the accumulator value prior to accumulation. The objective is to enable accumulation of the type `acc <- acc * acc_scale + grad * grad_scale`, which will enable us to support a running mean. In this implementation, `grad * grad_scale` will be computed in Python and passed to the new op along with `acc_scale` REF T46005 TF1.15 Only TF2 diff: D55699 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, vladimirm, samuelh Maniphest Tasks: T46005 Differential Revision: https://phabricator.sourcevertex.net/D53583
| Commit: | c58213b | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding missing return statements. Summary: Precursor to setting `-Werror=return-type` compiler flag which makes missing return statements an error. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49252 Differential Revision: https://phabricator.sourcevertex.net/D55485
| Commit: | 755c0f2 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding missing return statements. Summary: Precursor to setting `-Werror=return-type` compiler flag which makes missing return statements an error. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T49252 Differential Revision: https://phabricator.sourcevertex.net/D55485
| Commit: | 02e2af1 | |
|---|---|---|
| Author: | George White | |
| Committer: | George White | |
Combine multiple gather operations in to AllGather Summary: - Create a colocator to merge multiple gather operations in to a single AllGather operation where possible. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, alfiee, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, babakk Subscribers: georgep Maniphest Tasks: T48296 Differential Revision: https://phabricator.sourcevertex.net/D54111
| Commit: | 9248ae5 | |
|---|---|---|
| Author: | George White | |
| Committer: | George White | |
Combine multiple gather operations in to AllGather Summary: - Create a colocator to merge multiple gather operations in to a single AllGather operation where possible. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, alfiee, babakk Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, alfiee, babakk Subscribers: georgep Maniphest Tasks: T48296 Differential Revision: https://phabricator.sourcevertex.net/D54111
| Commit: | 7172879 | |
|---|---|---|
| Author: | Mark Fowden | |
| Committer: | Mark Fowden | |
Remove _profiling from IPUConfig Summary: Removes the hidden _profiling category from IPUConfig that was added to temporarily support tests that still used profiling features. Also remove auto_assign_report_subdirectories from the internal config protobuf and executor as it's now redundant. Depends on D54505 Fixes T39600 Applies to both branches. Test Plan: CI. Removed relevant IPUConfig tests. Reviewers: alfiee, georgew, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T39600 Differential Revision: https://phabricator.sourcevertex.net/D54432
| Commit: | 9255c74 | |
|---|---|---|
| Author: | Mark Fowden | |
| Committer: | Mark Fowden | |
Remove _profiling from IPUConfig Summary: Removes the hidden _profiling category from IPUConfig that was added to temporarily support tests that still used profiling features. Also remove auto_assign_report_subdirectories from the internal config protobuf and executor as it's now redundant. Depends on D54505 Fixes T39600 Applies to both branches. Test Plan: CI. Removed relevant IPUConfig tests. Reviewers: alfiee, georgew, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T39600 Differential Revision: https://phabricator.sourcevertex.net/D54432
| Commit: | e5c42f0 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Update PrngSeedState to call poplar::setStochasticRounding Summary: Adding StochasticRoundingMethod_None option so stochastic rounding can be disabled/enabled via the PrngSeedState class. This makes it easier to keep calls to poplar::setStochasticRounding in sync with the order of poplar program execution, as we already do that for the other stochastic rounding modes. Test Plan: CI + New C++ tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48700 Differential Revision: https://phabricator.sourcevertex.net/D54429
| Commit: | 9023901 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Update PrngSeedState to call poplar::setStochasticRounding Summary: Adding StochasticRoundingMethod_None option so stochastic rounding can be disabled/enabled via the PrngSeedState class. This makes it easier to keep calls to poplar::setStochasticRounding in sync with the order of poplar program execution, as we already do that for the other stochastic rounding modes. Test Plan: CI + New C++ tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48700 Differential Revision: https://phabricator.sourcevertex.net/D54429
| Commit: | b2a9380 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Removing deterministicWorkers backend option Summary: Removing deterministicWorkers backend option since it's a global setting and so cant be set per instruction, which was the original intention. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48677 Differential Revision: https://phabricator.sourcevertex.net/D54293
| Commit: | ffab0de | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Removing deterministicWorkers backend option Summary: Removing deterministicWorkers backend option since it's a global setting and so cant be set per instruction, which was the original intention. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T48677 Differential Revision: https://phabricator.sourcevertex.net/D54293
| Commit: | d838b4c | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Add Poplar checks into embedded runtime Summary: Ref T48682 Test Plan: CI Reviewers: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T48682 Differential Revision: https://phabricator.sourcevertex.net/D54281
| Commit: | 523a059 | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Add Poplar checks into embedded runtime Summary: Ref T48682 Test Plan: CI Reviewers: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T48682 Differential Revision: https://phabricator.sourcevertex.net/D54281
| Commit: | 43fac35 | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Remove HloReplicationIndexInstruction Summary: Fix T47046 Test Plan: CI Reviewers: babakk, vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: babakk, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T47046 Differential Revision: https://phabricator.sourcevertex.net/D54265
| Commit: | 7921881 | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Remove HloReplicationIndexInstruction Summary: Fix T47046 Test Plan: CI Reviewers: babakk, vladimirm, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: babakk, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T47046 Differential Revision: https://phabricator.sourcevertex.net/D54265
| Commit: | d67a243 | |
|---|---|---|
| Author: | Gautham Ganapathy | |
| Committer: | Gautham Ganapathy | |
Add reduce-mean support in reduce-scatter Summary: REF T47313 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Subscribers: hakons Maniphest Tasks: T47313 Differential Revision: https://phabricator.sourcevertex.net/D53827
| Commit: | 979528b | |
|---|---|---|
| Author: | Gautham Ganapathy | |
| Committer: | Gautham Ganapathy | |
Add reduce-mean support in reduce-scatter Summary: REF T47313 Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, vladimirm, jackh, hakons Subscribers: hakons Maniphest Tasks: T47313 Differential Revision: https://phabricator.sourcevertex.net/D53827
| Commit: | 051581e | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding enable_experimental_prng stability flag. Summary: Adding enable_experimental_prng stability flag for conditionally enabling work related to sr/prng seed management. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T46295 Differential Revision: https://phabricator.sourcevertex.net/D51859
| Commit: | 30ade88 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding enable_experimental_prng stability flag. Summary: Adding enable_experimental_prng stability flag for conditionally enabling work related to sr/prng seed management. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T46295 Differential Revision: https://phabricator.sourcevertex.net/D51859
| Commit: | 46ac05a | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding StochasticRoundingMethod backend option (TF1) Summary: Fixing up D51361 for TF1 - Removed TF2 specific optypes from NeedsSpecificSeedType This change adds the StochasticRoundingMethod option to the backend config. This is intended to be used as an explicit way of describing how we want to perform stochastic rounding (with an identical seed/differing seed or either). By having an extra backend option we avoid having to overload the meaning of being replica identical and having to add an extra category to the replica dataflow analysis, which will further complicate it. StochasticRoundingMethod gets set by the AddStochasticRoundingOptions so only instructions which require a specific type of seed will cause the seeds to be changed. It's currently setup so that instructions which read/restructure data don't change the seed. TF1.15 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T45895 Differential Revision: https://phabricator.sourcevertex.net/D51636
| Commit: | 9b27851 | |
|---|---|---|
| Author: | Babak Khataee | |
| Committer: | Babak Khataee | |
Adding StochasticRoundingMethod backend option Summary: This change adds the `StochasticRoundingMethod` option to the backend config. This is intended to be used as an explicit way of describing how we want to perform stochastic rounding (with an identical seed/differing seed or either). By having an extra backend option we avoid having to overload the meaning of being replica identical and having to add an extra category to the replica dataflow analysis, which will further complicate it. `StochasticRoundingMethod` gets set by the `AddStochasticRoundingOptions` so only instructions which require a specific type of seed will cause the seeds to be changed. It's currently setup so that instructions which read/restructure data don't change the seed. (TF1 - D51636) TF2.4 Only Test Plan: C++ Tests Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T45895 Differential Revision: https://phabricator.sourcevertex.net/D51361
| Commit: | 2a11142 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Allow merging of the remote buffers across identical clusters Summary: Follow the logic in subcomputation graph caching and compare elementwise cluster computations. Allow merging buffers across identical clusters so they could be reused. Propagate all merged indices so new size will be changes in all remote buffer info structures. Fix T45972. Test Plan: CI, fixed HW test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Maniphest Tasks: T45972 Differential Revision: https://phabricator.sourcevertex.net/D51403
| Commit: | c1caec5 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Allow merging of the remote buffers across identical clusters Summary: Follow the logic in subcomputation graph caching and compare elementwise cluster computations. Allow merging buffers across identical clusters so they could be reused. Propagate all merged indices so new size will be changes in all remote buffer info structures. Fix T45972. Test Plan: CI, fixed HW test Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Maniphest Tasks: T45972 Differential Revision: https://phabricator.sourcevertex.net/D51403
| Commit: | 57734a3 | |
|---|---|---|
| Author: | Christian aan de Wiel | |
| Committer: | Christian aan de Wiel | |
Move `enable_fast_math` to algebraic simplifier config Summary: TF2.4 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Subscribers: georgep, vladimirm Maniphest Tasks: T45300 Differential Revision: https://phabricator.sourcevertex.net/D51300
| Commit: | 6d8f8d6 | |
|---|---|---|
| Author: | Christian aan de Wiel | |
| Committer: | Christian aan de Wiel | |
Move `enable_fast_math` to algebraic simplifier config Summary: TF1.15 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf Subscribers: jamiep, vladimirm, georgep Maniphest Tasks: T45300 Differential Revision: https://phabricator.sourcevertex.net/D51010
| Commit: | 1ecc322 | |
|---|---|---|
| Author: | Samuel Hornby | |
| Committer: | Samuel Hornby | |
Make accumulation count of resource update runtime input Summary: Provide gradient accumulation op inside the resource update, and use this when finding the gradient accumulation count later of resource updates. Also adapt passes to handle this as an optional. TF2.4 only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, markf Subscribers: markf, jakeh Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D50184
| Commit: | 0d01511 | |
|---|---|---|
| Author: | Samuel Hornby | |
| Committer: | Samuel Hornby | |
Make accumulation count of resource update runtime input Summary: Provide gradient accumulation op inside the resource update, and use this when finding the gradient accumulation count later of resource updates. Also adapt passes to handle this as an optional. TF1.15 only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, markf Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: markf, jakeh Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D51231
| Commit: | b7d0ca2 | |
|---|---|---|
| Author: | George Pawelczak | |
Merge branch 'poplar/r2.4/release' into poplar/r2.4/merge
| Commit: | 78a1cf2 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Use CollectiveBalancedReorder for replicated tensor sharding clusters Summary: This commit provides both host and runtime rearrangements for the clusters with reduce-scatter/all-gather to ensure the minimal exchange is required. Fix T35351. Overview of the changes: Replicated resource update elementwise clustering pass: - Marks elementwise clusters with 'partitioned_elementwise_cluster' attribute to allow custom visitor for those clusters later. - Insert new custom instructions: collective-rearrange and undo-collective-rearrange before reduce-scatter and after all-gather for unpartitioned remote buffers. Add replicated elementwise cluster visitor, and add additional validation rules in it. GCL collective balance reorder may return any particular shape depending on the input layout, so validate it not only against XLA shape, but also replica slice and collectives tensor. Add host rearrangement for remote buffers in poplar executor class. This is host-side equivalent of the collective-reorder/undo-collective-reorder instructions. Test Plan: CI, additional host rearrangement code in replicated_resource_update_elementwise_clustering_hw_test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Subscribers: hakons Maniphest Tasks: T35351 Differential Revision: https://phabricator.sourcevertex.net/D44634
| Commit: | 1fe7aa4 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Use CollectiveBalancedReorder for replicated tensor sharding clusters Summary: This commit provides both host and runtime rearrangements for the clusters with reduce-scatter/all-gather to ensure the minimal exchange is required. Fix T35351. Overview of the changes: Replicated resource update elementwise clustering pass: - Marks elementwise clusters with 'partitioned_elementwise_cluster' attribute to allow custom visitor for those clusters later. - Insert new custom instructions: collective-rearrange and undo-collective-rearrange before reduce-scatter and after all-gather for unpartitioned remote buffers. Add replicated elementwise cluster visitor, and add additional validation rules in it. GCL collective balance reorder may return any particular shape depending on the input layout, so validate it not only against XLA shape, but also replica slice and collectives tensor. Add host rearrangement for remote buffers in poplar executor class. This is host-side equivalent of the collective-reorder/undo-collective-reorder instructions. Test Plan: CI, additional host rearrangement code in replicated_resource_update_elementwise_clustering_hw_test. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh, georgep, hakons Subscribers: hakons Maniphest Tasks: T35351 Differential Revision: https://phabricator.sourcevertex.net/D44634
| Commit: | bcabd5f | |
|---|---|---|
| Author: | Christian aan de Wiel | |
| Committer: | Christian aan de Wiel | |
Add dot strengh reduction optimisation Summary: TF2.4 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Subscribers: hakons, vladimirm, markf Maniphest Tasks: T44870, T45300 Differential Revision: https://phabricator.sourcevertex.net/D50808
| Commit: | 8cf92fb | |
|---|---|---|
| Author: | Christian aan de Wiel | |
| Committer: | Christian aan de Wiel | |
Add dot strengh reduction optimisation Summary: TF1.15 Only Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, hakons Subscribers: markf, vladimirm, hakons Maniphest Tasks: T44870, T45300 Differential Revision: https://phabricator.sourcevertex.net/D50520
| Commit: | 473aef2 | |
|---|---|---|
| Author: | Samuel Hornby | |
| Committer: | Samuel Hornby | |
Add gradient accumulation count op Summary: To be used to track dynamic counts for resource update op Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Subscribers: markf Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D50437
| Commit: | ad4ca24 | |
|---|---|---|
| Author: | Samuel Hornby | |
| Committer: | Samuel Hornby | |
Add gradient accumulation count op Summary: To be used to track dynamic counts for resource update op Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, markf, georgep Subscribers: markf Maniphest Tasks: T41151 Differential Revision: https://phabricator.sourcevertex.net/D50437
| Commit: | 96b97c2 | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Track whether an executable can stall on lack of inputs Summary: Track whether the compiled module can stall without more data. Ref T41143 Test Plan: CI Pipeline already tested Added a test for IO tiles which stalled. Reviewers: jakeh, gauthamg, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T41143 Differential Revision: https://phabricator.sourcevertex.net/D49943
| Commit: | 8657e1b | |
|---|---|---|
| Author: | George Pawelczak | |
| Committer: | George Pawelczak | |
Track whether an executable can stall on lack of inputs Summary: Track whether the compiled module can stall without more data. Ref T41143 Test Plan: CI Pipeline already tested Added a test for IO tiles which stalled. Reviewers: jakeh, gauthamg, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Reviewed By: jakeh, hakons, #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved Maniphest Tasks: T41143 Differential Revision: https://phabricator.sourcevertex.net/D49943
| Commit: | 4b6bd6b | |
|---|---|---|
| Author: | Alfie Edwards | |
| Committer: | Alfie Edwards | |
Adding ReduceMany op and colocation helper Summary: TF2.4 Only Adding hlo-only ReduceMany op. Reductions (including fusion reductions) can be combined into these ReduceMany ops. The interface to control this is this is a new ipu config option optimizations.maximum_reduce_many_buffer_size. This also has a change to the clustering scheduler to prevent a memory regression in a test. The change makes it so that ops with a valid colocator helper will not be put into their own cluster if the buffer size for the colocator is zero. This will prevent colocator helpers added in future from affecting the schedule in unrelated models. V1 Diff: D49202 Test Plan: Tests check that that simple reduces and reduce fusions get combined in hlo according to the specified optimizations.maximum_reduce_many_buffer_size. There is also a test which executes a graph with a ReduceMany and checks the output values. Reviewers: #tensorflow, simonl, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T40084 Differential Revision: https://phabricator.sourcevertex.net/D48529
| Commit: | 7d02f9e | |
|---|---|---|
| Author: | Alfie Edwards | |
| Committer: | Alfie Edwards | |
Adding ReduceMany op and colocation helper Summary: TF1.15 Only Adding hlo-only ReduceMany op. Reductions (including fusion reductions) can be combined into these ReduceMany ops. The interface to control this is this is a new ipu config option optimizations.maximum_reduce_many_buffer_size. This also has a change to the clustering scheduler to prevent a memory regression in a test. The change makes it so that ops with a valid colocator helper will not be put into their own cluster if the buffer size for the colocator is zero. This will prevent colocator helpers added in future from affecting the schedule in unrelated models. V2 Diff: D48529 Test Plan: Tests check that that simple reduces and reduce fusions get combined in hlo according to the specified optimizations.maximum_reduce_many_buffer_size. There is also a test which executes a graph with a ReduceMany and checks the output values. Reviewers: #tensorflow, simonl, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: zhenyingl, mariok, georgep Maniphest Tasks: T40084 Differential Revision: https://phabricator.sourcevertex.net/D49202
| Commit: | 4c148c6 | |
|---|---|---|
| Author: | Håkon Sandsmark | |
| Committer: | Håkon Sandsmark | |
Remove verified streams Summary: Fixes T43482. TF2.4 Only. Test Plan: Tested with Poplar with the public API removed as in D49012. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh Maniphest Tasks: T43482 Differential Revision: https://phabricator.sourcevertex.net/D49852
| Commit: | c081a5d | |
|---|---|---|
| Author: | Håkon Sandsmark | |
| Committer: | Håkon Sandsmark | |
Remove verified streams Summary: Fixes T43482. TF1.15 Only. Test Plan: Tested with Poplar with the public API removed as in D49012. Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, anthonyb, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, jakeh Maniphest Tasks: T43482 Differential Revision: https://phabricator.sourcevertex.net/D49757
| Commit: | b7c6387 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Populate gcl::CollectiveBalancedHostRearrangement objects in PoplarExecutableCore Summary: This commit creates all host rearrangement objects in advance and speeds up run preparations. Fix T44246. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T44246 Differential Revision: https://phabricator.sourcevertex.net/D49526
| Commit: | c3dda97 | |
|---|---|---|
| Author: | Vladimir Menshakov | |
| Committer: | Vladimir Menshakov | |
Populate gcl::CollectiveBalancedHostRearrangement objects in PoplarExecutableCore Summary: This commit creates all host rearrangement objects in advance and speeds up run preparations. Fix T44246. Test Plan: CI Reviewers: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep, hakons, jakeh Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Maniphest Tasks: T44246 Differential Revision: https://phabricator.sourcevertex.net/D49526
| Commit: | b5cc1a6 | |
|---|---|---|
| Author: | Alfie Edwards | |
| Committer: | Alfie Edwards | |
Adding poplar options flags for slice operations Summary: Adds a slices.poplar_options dictionary to the config similar to matmuls.poplar_options. The options specified get passed into calls to popops::multiSlice, popops::multiUpdate, popops::multiUpdateAdd, and popops::embedding::plan. Slice options can also be specified per-pipeline-stage as part of PipelineStageOptions. Reviewers: #tensorflow, simonl, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Reviewed By: #tensorflow, #framework_ip_review_-_any_oss_or_third-party_code_use_has_been_approved, georgep Subscribers: georgep Maniphest Tasks: T42623 Differential Revision: https://phabricator.sourcevertex.net/D49190