These 21 commits are when the Protocol Buffers files have changed:
| Commit: | 5f6fb67 | |
|---|---|---|
| Author: | Nisaba Authors | |
| Committer: | Copybara-Service | |
Make TropicalWeight constructor explicit PiperOrigin-RevId: 875202875
The documentation is generated from this commit.
| Commit: | 3796a3d | |
|---|---|---|
| Author: | Brian Roark | |
| Committer: | Copybara-Service | |
No public description PiperOrigin-RevId: 781118078
| Commit: | 92e66f5 | |
|---|---|---|
| Author: | Brian Roark | |
| Committer: | Copybara-Service | |
For on-the-fly word-piece segmentation in the decoder. Uses (now standard) word-initial word-piece markers. Prior trie-based methods using word-internal word-piece markers are retained in the decoder for backwards compatibility. PiperOrigin-RevId: 624994656
| Commit: | 5293817 | |
|---|---|---|
| Author: | Kyle Gorman | |
| Committer: | Copybara-Service | |
No public description PiperOrigin-RevId: 601483328
| Commit: | 1ff6a79 | |
|---|---|---|
| Author: | Alexander Gutkin | |
| Committer: | Copybara-Service | |
Finite-state transducer (FST)-based Joint n-gram (aka pair or paired LM) decoder. Initial revision. PiperOrigin-RevId: 583400625
| Commit: | 5a66c15 | |
|---|---|---|
| Author: | Cibu Johny | |
| Committer: | Copybara-Service | |
Correcting the lint and formatting errors. No changes to functionality or logic. PiperOrigin-RevId: 545488775
| Commit: | fd3ae9f | |
|---|---|---|
| Author: | Alexander Gutkin | |
| Committer: | Copybara-Service | |
Fixing Nisaba build. PiperOrigin-RevId: 527318348
| Commit: | 96e1768 | |
|---|---|---|
| Author: | Nisaba Authors | |
| Committer: | Copybara-Service | |
Internal change PiperOrigin-RevId: 520138673
| Commit: | 50066c6 | |
|---|---|---|
| Author: | Alexander Gutkin | |
| Committer: | Copybara-Service | |
Clarify the use of Apun Iyek in Meetei Mayek. PiperOrigin-RevId: 422627463
| Commit: | 972782c | |
|---|---|---|
| Author: | Alexander Gutkin | |
| Committer: | Copybara-Service | |
Initial revision of Thaana (`Thaa`) script for Dhivehi. As a side change, introducing protocol buffer describing Brahmic script-specific configuration parameters. This is very helpful for Thaana which, while possessing some features common with Brahmic scripts, nevertheless does not strictly qualify as an abugida. In particular, it does not possess inherent vowels. ### Notes: 1. In occasional words found online, some of the consonant clusters are missing `SUKUN`, which is currently disallowed by the ISO and well-formedness grammar. I've checked 50K word list from [An Crúbadán](http://crubadan.org/) and we are getting 1,005 *well-formedness* failures, mostly in loan words and foreign names. Most of these are due to omitted `SUKUN`. On the same word list, the number of romanisation failures is 900. 1. At the moment, similar to other Brahmic scripts, the vowel hiatus is marked with a `.` in our romanisations of Dhivehi. 1. Some of the words for test are taken from Divehi wordlist at [An Crúbadán](http://crubadan.org/). PiperOrigin-RevId: 401246711
| Commit: | b34830f | |
|---|---|---|
| Author: | Cibu Johny | |
| Committer: | Copybara-Service | |
Common functions to read letter_languages and unicode_string text protos and verify them. PiperOrigin-RevId: 396437953
| Commit: | d49ffe9 | |
|---|---|---|
| Author: | Cibu Johny | |
| Committer: | Copybara-Service | |
Internal change PiperOrigin-RevId: 392411949
| Commit: | 15c765a | |
|---|---|---|
| Author: | Cibu Johny | |
| Committer: | Copybara-Service | |
Updating unicode_strings proto fields uname_prefix and to_uname_prefix to accept multiple prefix values. Existing text protocol buffers work as is. PiperOrigin-RevId: 390350611
| Commit: | 0f0ab6e | |
|---|---|---|
| Author: | Alexander Gutkin | |
| Committer: | Copybara-Service | |
Initial revision of a letter-language/region index. PiperOrigin-RevId: 388473914
| Commit: | 20cc204 | |
|---|---|---|
| Author: | Alexander Gutkin | |
| Committer: | Copybara-Service | |
Adding timer for computing elapsed time. PiperOrigin-RevId: 386015346
| Commit: | ce849c2 | |
|---|---|---|
| Author: | Alexander Gutkin | |
| Committer: | Copybara-Service | |
Moving script operations under `script` subdirectory. PiperOrigin-RevId: 386014618
| Commit: | 6d127e9 | |
|---|---|---|
| Author: | Cibu Johny | |
| Committer: | Copybara-Service | |
Adding to_uname_prefix field to unicode_strings proto. With this field, separate prefix can be specified for to_uname field in the Item messages in the file. PiperOrigin-RevId: 374437704
| Commit: | 31379b8 | |
|---|---|---|
| Author: | Alexander Gutkin | |
Migrating Oriya definitions from tsv files to Google protocol buffers.
| Commit: | ae9f683 | |
|---|---|---|
| Author: | Alexander Gutkin | |
Internal change.
| Commit: | be25ef8 | |
|---|---|---|
| Author: | Alexander Gutkin | |
Simplified Unicode strings proto and the corresponding parser.
| Commit: | 4235313 | |
|---|---|---|
| Author: | Alexander Gutkin | |
Scaffolding for protocol buffer format for representing the script data. This compiles easily into the corresponding intermediate Pynini/Thrax string files in TSV format. The benefit of this additional layer of representation is that it allows us to have more informative encoding of the source data while retaining the bells and whistles of the original format, such as comments.