These 30 commits are when the Protocol Buffers files have changed:
Commit: | 53de765 | |
---|---|---|
Author: | Taku Kudo |
allows to load precomputed seed sentencepieces for unigram from a file.
The documentation is generated from this commit.
Commit: | e58bb68 | |
---|---|---|
Author: | Taku Kudo |
add pretokenization_delimiter options. Initialize seed pieces more accurately.
Commit: | 9f3ed99 | |
---|---|---|
Author: | Taku Kudo |
Sync internal to github. DP related features are added.
Commit: | fab966a | |
---|---|---|
Author: | Taku Kudo |
sync from internal
Commit: | 3a5bc58 | |
---|---|---|
Author: | Taku Kudo |
Revert "sync from internal" This reverts commit 05db0894d8ea44b203c3501306061cde9e42c48e.
Commit: | 05db089 | |
---|---|---|
Author: | Taku Kudo |
sync from internal
Commit: | 8eaa672 | |
---|---|---|
Author: | Taku Kudo |
change the type of input_sentence_size from int32 to uint64
Commit: | f7bc3db | |
---|---|---|
Author: | Taku Kudo |
merges internal changes to github
Commit: | 8b921ac | |
---|---|---|
Author: | Taku Kudo |
Revert the default size of piece length.
Commit: | 329383b | |
---|---|---|
Author: | Taku Kudo |
Initial release of 0.19. Merged internal sentencepiece.
Commit: | 61d59db | |
---|---|---|
Author: | Sentencepiece Team | |
Committer: | Keith Stevens |
Project import generated by Copybara. PiperOrigin-RevId: 251772401
Commit: | 59b48eb | |
---|---|---|
Author: | Taku Kudo |
added --treat_whitespace_as_suffix option to make _ be a suffix of word.
Commit: | 7b19d68 | |
---|---|---|
Author: | Taku Kudo |
use builtin protobuf-lite package in third_party
Commit: | 904edfe | |
---|---|---|
Author: | Taku Kudo |
deperecated mining_sentence_size and training_sentence_size. Load all sentences by default.
Commit: | 5f635d0 | |
---|---|---|
Author: | Taku Kudo |
support to change the piece of unk/bos/eos/pad
Commit: | 55e4da4 | |
---|---|---|
Author: | Taku Kudo |
added --max_sentence_length flag
Commit: | 6d0fd75 | |
---|---|---|
Author: | Taku Kudo |
added --split_by_number flag
Commit: | b40cca7 | |
---|---|---|
Author: | Taku Kudo |
Added --use_all_vocab=true flag for WORD/CHAR model
Commit: | b6a74ee | |
---|---|---|
Author: | Taku Kudo |
Added self testing feature.
Commit: | 5dac483 | |
---|---|---|
Author: | Taku Kudo |
Added --unk_surface option to allow user to change unknown surface string.
Commit: | e437e30 | |
---|---|---|
Author: | Taku Kudo |
Support vocab restriction feature
Commit: | f228e55 | |
---|---|---|
Author: | Taku Kudo |
Reimplement Trainer with Proto reflection
Commit: | ca8754b | |
---|---|---|
Author: | Taku Kudo |
Add --hard_vocab_limit flag.
Commit: | ecbd55a | |
---|---|---|
Author: | Taku Kudo | |
Committer: | GitHub |
Merge pull request #53 from google/sr Support to change ids of <unk>, <s>, </s>
Commit: | d102897 | |
---|---|---|
Author: | Taku Kudo |
Support to change ids of <unk>, <s>, </s>
Commit: | ea41f40 | |
---|---|---|
Author: | Taku Kudo | |
Committer: | GitHub |
Merge pull request #48 from tetsuok/fix-default-vocab-size Fix inconsistent default vocab size
Commit: | 9f3bc53 | |
---|---|---|
Author: | Tetsuo Kiso |
Fix typo in examples of control symbols
Commit: | ca20a5a | |
---|---|---|
Author: | Tetsuo Kiso |
Fix inconsistent default vocab size
Commit: | c6a1a19 | |
---|---|---|
Author: | Taku Kudo |
Add Sample/NBestEncode
Commit: | 2928ce5 | |
---|---|---|
Author: | Taku Kudo |
Initialize repository