package jetstream_proto

Get desktop application:
View/edit binary Protocol Buffers messages

rpc Decode (DecodeRequest, stream DecodeResponse)
jetstream.proto:25
Query LLM to generate text or tokens.
message DecodeRequest
jetstream.proto:30
- int32 max_tokens = 4
  The maximum output length of a sequence. It's used in JetStream to control the output/decode length of a sequence. It would not be used in the engine. We should always set max_tokens <= (max_target_length - max_prefill_predict_length). max_target_length is the maximum length of a sequence; max_prefill_predict_length is the maximum length of the input/prefill of a sequence.
- oneof content
  The client can pass the inputs either as a string, in which case the server will tokenize it, or as tokens, in which case it's the client's responsibility to ensure they tokenize its input strings with the correct tokenizer.
  - DecodeRequest.TextContent text_content = 5
  - DecodeRequest.TokenContent token_content = 6
- oneof metadata_optional
  - DecodeRequest.Metadata metadata = 7
- int32 num_samples = 8
- string lora_adapter_id = 9
- bool has_bos = 10
  Indicates whether the content has a beginning of sequence (BOS) token.
message DecodeResponse
jetstream.proto:73
- oneof content
  - DecodeResponse.InitialContent initial_content = 2
  - DecodeResponse.StreamContent stream_content = 3
rpc HealthCheck (HealthCheckRequest, HealthCheckResponse)
jetstream.proto:27
Checks if the model server is live.
message HealthCheckRequest
jetstream.proto:96
(message has no fields)
message HealthCheckResponse
jetstream.proto:98
- bool is_live = 1
  Denotes whether the model server is live

Used in: DecodeRequest

float start_time = 1

Used in: DecodeRequest

string text = 1

Used in: DecodeRequest

repeated int32 token_ids = 1

InitialContent supports returning initial one-off response data from the stream. It's a placeholder for future features such as history cache.

Used in: DecodeResponse

(message has no fields)

Used in: DecodeResponse

repeated StreamContent.Sample samples = 1
Supports multiple samples in the StreamContent. The Sample list size depends on text generation strategy the engine used.

Used in: StreamContent

string text = 1
The text string decoded from token id(s).
repeated int32 token_ids = 2
List of token ids, one list per sample. When speculative decoding is disabled, the list size should be 1; When speculative decoding is enabled, the list size should be >= 1.