These 6 commits are when the Protocol Buffers files have changed:
Commit: | c67c9fc | |
---|---|---|
Author: | xihui-wu | |
Committer: | GitHub |
Take SwiftProtobuf dependency out from ModelSupport (#694)
The documentation is generated from this commit.
Commit: | 912eac1 | |
---|---|---|
Author: | Xihui Wu | |
Committer: | Xihui Wu |
address
The documentation is generated from this commit.
Commit: | 7adce8d | |
---|---|---|
Author: | Xihui Wu | |
Committer: | Xihui Wu |
Separate out GPT2 & BERT support from ModelSupport module
Commit: | 6b444e6 | |
---|---|---|
Author: | Michelle Casbon | |
Committer: | GitHub |
Move BytePairEncoder to ModelSupport (#358) * Move BytePairEncoder to ModelSupport Move out of Model/Text to avoid circular dependencies caused by usage in datasets. Rename file to match struct name. Move CMake entries to match file location.
Commit: | dd3e547 | |
---|---|---|
Author: | Anthony Platanios | |
Committer: | GitHub |
[WIP] Added support for BERT. (#231) * Added initial support for BERT. * Renamed 'LayerNormalization' to 'LayerNorm'. * Added a 'TextModels' SwiftPM target. * Fixed some of the compilation errors. * Added 'Optimizer' protocol. * Removed 'truncatedNormalInitializer'. * Added initial support for BERT. * Renamed 'LayerNormalization' to 'LayerNorm'. * Added a 'TextModels' SwiftPM target. * Fixed some of the compilation errors. * Added 'Optimizer' protocol. * Removed 'truncatedNormalInitializer'. * Minor cleanup. * Change `@differentiable` function default arguments from closures to functions. Change `@differentiable` function default arguments from closures to function references. Related issues: - https://bugs.swift.org/browse/TF-690 - https://bugs.swift.org/browse/TF-1030 * Fix non-differentiability error using `withoutDerivative(at:)`. Fix non-differentiability error: ``` swift-models/Models/Text/BERT.swift:292:6: error: function is not differentiable @differentiable(wrt: self) ~^~~~~~~~~~~~~~~~~~~~~~~~~ swift-models/Models/Text/BERT.swift:293:17: note: when differentiating this function definition public func callAsFunction(_ input: TextBatch) -> Tensor<Scalar> { ^ swift-models/Models/Text/BERT.swift:299:58: note: cannot differentiate through 'inout' arguments let positionPaddingIndex = withoutDerivative(at: { () -> Int in ^ ``` By using `withoutDerivative(at:)` at the correct location. * Add code for CoLA task. Add code and data utilities for the CoLA task. Code shared by eaplatanios@. Original sources are listed in comments at the top of each file. This is progress towards end-to-end BERT training. Todo: implement a main function with data loading and training loop. * Add working main function. The BERT for CoLA training loop compiles: https://i.imgur.com/5KyewAg.png Todo: - Fine-tune training so that loss decreases. - Generalize dataset utilities to work with CoLA remote URL. * Tune learning rate schedule, add gradient clipping. Loss still does not steadily decrease: ``` [Epoch: 0] Loss: 0.50369537 [Epoch: 1] Loss: 0.7813513 [Epoch: 2] Loss: 1.0023696 [Epoch: 3] Loss: 0.8235911 [Epoch: 4] Loss: 0.621686 [Epoch: 5] Loss: 0.93954027 [Epoch: 6] Loss: 0.76672614 [Epoch: 7] Loss: 0.45236698 [Epoch: 8] Loss: 0.6538984 [Epoch: 9] Loss: 0.7307098 [Epoch: 10] Loss: 0.90539706 [Epoch: 11] Loss: 0.6684798 [Epoch: 12] Loss: 0.5408703 [Epoch: 13] Loss: 1.113673 ``` * Made some minor edits to get the BERT classifier training to work for CoLA. (#293) * Rename "epoch" to "step" in training loop. The training loop operates over minibatches, not batches. Thus, "step" is the correct term, not "epoch". * Add CoLA evaluation. Evaluation reveals the model is not actually learning: ``` True positives: 0 True negatives: 322 False positives: 0 False negatives: 322 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ``` We ought to debug the loss function and BERT classifier class count. * Fix BERT training. Change class count to 2 and use softmax cross entropy. The evaluation metric now improves but sometimes decreases back to zero. The model isn't very stable, perhaps there's more room for improvement. After 80 steps: ``` True positives: 567 True negatives: 170 False positives: 152 False negatives: 170 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.3192948 ``` After 130 steps: ``` True positives: 717 True negatives: 0 False positives: 322 False negatives: 0 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ``` * Make training loop an infinite loop. Improvement todo: make training loop print epochs. * Fixed BERT. (#294) Fix various issues: 1. The sigmoid cross entropy loss was applied on logits of shape `[B, 1]` and labels of shape `[B]`. This forced a silent broadcast of logits to shape `[B, B]`, which resulted in the loss not being informative for training. 2. The batch size was too small. I added a comment in the main script code explaining how batching works in my data pipelines. 3. This is a minor one but there is bug with how I was copying the prefetching iterator. This was a temporary solution but I just disabled prefetching for the dev and test sets so that the dev set is the same across runs. This is not currently tuned but it's working. After ~20 steps MCC should be at about 0.28 and after ~200 steps it should be getting close to 0.50. * Minor edits. - Remove extraneous comment. - Remove trailing whitespace. - Change `dump` to `print`. * Temporarily disabled bucketing. * Delete extraneous file. Co-authored-by: Dan Zheng <danielzheng@google.com>
The documentation is generated from this commit.
Commit: | e36eeb5 | |
---|---|---|
Author: | Anthony Platanios | |
Committer: | Dan Zheng |
Added initial support for BERT.
The documentation is generated from this commit.