Proto commits in modelscope/dash-infer

These 6 commits are when the Protocol Buffers files have changed:

Commit:a9cbbdc
Author:yjc9696
Committer:GitHub

support MOE EP (#73) Co-authored-by: yangjiacheng.yjc <yangjiacheng.yjc@alibaba-inc.com>

The documentation is generated from this commit.

Commit:163850f
Author:zhenglaiwen.zlw
Committer:zhenglaiwen.zlw

some bugfix - uuid crash issue - update lora implement - set page size by param - delete deprecated files

Commit:a8b9f8e
Author:Jiejing Zhang
Committer:zhenglaiwen.zlw

Update For Version 2.0: add support for CUDA and VLM (#43) * release dashinfer 2.0 version thirdparty: add cutlass. python: spanattention build from source. benchmark: add stop model in the end.

Commit:a216786
Author:Jiejing Zhang
Committer:Jiejing Zhang

Update For Version 2.0: add support for CUDA and VLM (#43) * release dashinfer 2.0 version thirdparty: add cutlass. python: spanattention build from source. benchmark: add stop model in the end.

Commit:9ef6e35
Author:zhenglaiwen.zlw
Committer:zhenglaiwen.zlw

fix memory leak bug, add default config to helper, update convert_model api - bugfix - helper: check if get empty generated_elem - fix python input memory leak - avoid async copy python inputs - fix bug caused by inconsistent definition of RequestHandle - engine - worker, model: EnqueueRequest -> StartRequestImpl - generation: output token_logprobs - helper - add defualt config - add ConfigManager to merge and check user config - use torch related api only within the helper class - release torch model after conversion - examples - cpp: erase screen before get inputs - py: shutdown executor after finishing tasks - py: use jinja template to format prompt - py: update ipynb basic example and corresponding doc - doc - add model_type to root readme - update modelscope notebook pic and doc - update future plan in root readme

Commit:877529e
Author:Laiwen Zheng
Committer:Laiwen Zheng

add source code