Proto commits in google-research/nisaba

These 21 commits are when the Protocol Buffers files have changed:

Commit:5f6fb67
Author:Nisaba Authors
Committer:Copybara-Service

Make TropicalWeight constructor explicit PiperOrigin-RevId: 875202875

The documentation is generated from this commit.

Commit:3796a3d
Author:Brian Roark
Committer:Copybara-Service

No public description PiperOrigin-RevId: 781118078

Commit:92e66f5
Author:Brian Roark
Committer:Copybara-Service

For on-the-fly word-piece segmentation in the decoder. Uses (now standard) word-initial word-piece markers. Prior trie-based methods using word-internal word-piece markers are retained in the decoder for backwards compatibility. PiperOrigin-RevId: 624994656

Commit:5293817
Author:Kyle Gorman
Committer:Copybara-Service

No public description PiperOrigin-RevId: 601483328

Commit:1ff6a79
Author:Alexander Gutkin
Committer:Copybara-Service

Finite-state transducer (FST)-based Joint n-gram (aka pair or paired LM) decoder. Initial revision. PiperOrigin-RevId: 583400625

Commit:5a66c15
Author:Cibu Johny
Committer:Copybara-Service

Correcting the lint and formatting errors. No changes to functionality or logic. PiperOrigin-RevId: 545488775

Commit:fd3ae9f
Author:Alexander Gutkin
Committer:Copybara-Service

Fixing Nisaba build. PiperOrigin-RevId: 527318348

Commit:96e1768
Author:Nisaba Authors
Committer:Copybara-Service

Internal change PiperOrigin-RevId: 520138673

Commit:50066c6
Author:Alexander Gutkin
Committer:Copybara-Service

Clarify the use of Apun Iyek in Meetei Mayek. PiperOrigin-RevId: 422627463

Commit:972782c
Author:Alexander Gutkin
Committer:Copybara-Service

Initial revision of Thaana (`Thaa`) script for Dhivehi. As a side change, introducing protocol buffer describing Brahmic script-specific configuration parameters. This is very helpful for Thaana which, while possessing some features common with Brahmic scripts, nevertheless does not strictly qualify as an abugida. In particular, it does not possess inherent vowels. ### Notes: 1. In occasional words found online, some of the consonant clusters are missing `SUKUN`, which is currently disallowed by the ISO and well-formedness grammar. I've checked 50K word list from [An Crúbadán](http://crubadan.org/) and we are getting 1,005 *well-formedness* failures, mostly in loan words and foreign names. Most of these are due to omitted `SUKUN`. On the same word list, the number of romanisation failures is 900. 1. At the moment, similar to other Brahmic scripts, the vowel hiatus is marked with a `.` in our romanisations of Dhivehi. 1. Some of the words for test are taken from Divehi wordlist at [An Crúbadán](http://crubadan.org/). PiperOrigin-RevId: 401246711

Commit:b34830f
Author:Cibu Johny
Committer:Copybara-Service

Common functions to read letter_languages and unicode_string text protos and verify them. PiperOrigin-RevId: 396437953

Commit:d49ffe9
Author:Cibu Johny
Committer:Copybara-Service

Internal change PiperOrigin-RevId: 392411949

Commit:15c765a
Author:Cibu Johny
Committer:Copybara-Service

Updating unicode_strings proto fields uname_prefix and to_uname_prefix to accept multiple prefix values. Existing text protocol buffers work as is. PiperOrigin-RevId: 390350611

Commit:0f0ab6e
Author:Alexander Gutkin
Committer:Copybara-Service

Initial revision of a letter-language/region index. PiperOrigin-RevId: 388473914

Commit:20cc204
Author:Alexander Gutkin
Committer:Copybara-Service

Adding timer for computing elapsed time. PiperOrigin-RevId: 386015346

Commit:ce849c2
Author:Alexander Gutkin
Committer:Copybara-Service

Moving script operations under `script` subdirectory. PiperOrigin-RevId: 386014618

Commit:6d127e9
Author:Cibu Johny
Committer:Copybara-Service

Adding to_uname_prefix field to unicode_strings proto. With this field, separate prefix can be specified for to_uname field in the Item messages in the file. PiperOrigin-RevId: 374437704

Commit:31379b8
Author:Alexander Gutkin

Migrating Oriya definitions from tsv files to Google protocol buffers.

Commit:ae9f683
Author:Alexander Gutkin

Internal change.

Commit:be25ef8
Author:Alexander Gutkin

Simplified Unicode strings proto and the corresponding parser.

Commit:4235313
Author:Alexander Gutkin

Scaffolding for protocol buffer format for representing the script data. This compiles easily into the corresponding intermediate Pynini/Thrax string files in TSV format. The benefit of this additional layer of representation is that it allows us to have more informative encoding of the source data while retaining the bells and whistles of the original format, such as comments.