Proto commits in google/sentencepiece

These 30 commits are when the Protocol Buffers files have changed:

Commit:53de765
Author:Taku Kudo

allows to load precomputed seed sentencepieces for unigram from a file.

The documentation is generated from this commit.

Commit:e58bb68
Author:Taku Kudo

add pretokenization_delimiter options. Initialize seed pieces more accurately.

Commit:9f3ed99
Author:Taku Kudo

Sync internal to github. DP related features are added.

Commit:fab966a
Author:Taku Kudo

sync from internal

Commit:3a5bc58
Author:Taku Kudo

Revert "sync from internal" This reverts commit 05db0894d8ea44b203c3501306061cde9e42c48e.

Commit:05db089
Author:Taku Kudo

sync from internal

Commit:8eaa672
Author:Taku Kudo

change the type of input_sentence_size from int32 to uint64

Commit:f7bc3db
Author:Taku Kudo

merges internal changes to github

Commit:8b921ac
Author:Taku Kudo

Revert the default size of piece length.

Commit:329383b
Author:Taku Kudo

Initial release of 0.19. Merged internal sentencepiece.

Commit:61d59db
Author:Sentencepiece Team
Committer:Keith Stevens

Project import generated by Copybara. PiperOrigin-RevId: 251772401

Commit:59b48eb
Author:Taku Kudo

added --treat_whitespace_as_suffix option to make _ be a suffix of word.

Commit:7b19d68
Author:Taku Kudo

use builtin protobuf-lite package in third_party

Commit:904edfe
Author:Taku Kudo

deperecated mining_sentence_size and training_sentence_size. Load all sentences by default.

Commit:5f635d0
Author:Taku Kudo

support to change the piece of unk/bos/eos/pad

Commit:55e4da4
Author:Taku Kudo

added --max_sentence_length flag

Commit:6d0fd75
Author:Taku Kudo

added --split_by_number flag

Commit:b40cca7
Author:Taku Kudo

Added --use_all_vocab=true flag for WORD/CHAR model

Commit:b6a74ee
Author:Taku Kudo

Added self testing feature.

Commit:5dac483
Author:Taku Kudo

Added --unk_surface option to allow user to change unknown surface string.

Commit:e437e30
Author:Taku Kudo

Support vocab restriction feature

Commit:f228e55
Author:Taku Kudo

Reimplement Trainer with Proto reflection

Commit:ca8754b
Author:Taku Kudo

Add --hard_vocab_limit flag.

Commit:ecbd55a
Author:Taku Kudo
Committer:GitHub

Merge pull request #53 from google/sr Support to change ids of <unk>, <s>, </s>

Commit:d102897
Author:Taku Kudo

Support to change ids of <unk>, <s>, </s>

Commit:ea41f40
Author:Taku Kudo
Committer:GitHub

Merge pull request #48 from tetsuok/fix-default-vocab-size Fix inconsistent default vocab size

Commit:9f3bc53
Author:Tetsuo Kiso

Fix typo in examples of control symbols

Commit:ca20a5a
Author:Tetsuo Kiso

Fix inconsistent default vocab size

Commit:c6a1a19
Author:Taku Kudo

Add Sample/NBestEncode

Commit:2928ce5
Author:Taku Kudo

Initialize repository