Get desktop application:
View/edit binary Protocol Buffers messages
/ Model Info
/ Empty request
(message has no fields)
/ Service discovery
/ Empty request
(message has no fields)
/ Other shards urls
/ Empties batch cache
/ Optional batch id
/ Empty response
(message has no fields)
/ Remove requests from a cached batch
/ Batch ID
/ Requests to keep
/ Filtered Batch (cached)
/ Warmup the model and compute max cache size
/ Batch to warmup on
/ Empty response
/ Maximum number of tokens supported by the model
/ Prefill batch and decode first token
/ Batch
/ Generation
/ Next batch (cached)
/ Decode token for a list of prefilled batches
/ Cached batches
/ Decodes
/ Next batch (cached)
/ Health check
(message has no fields)
(message has no fields)
Used in:
,/ Batch ID
/ Individual requests
/ Batch size (==len(requests))
/ Maximum number of tokens this batch will grow to
Used in:
, , ,/ Batch ID
/ Individual requests ids
/ Batch size (==len(requests))
/ Maximum number of tokens this batch will grow to
Used in:
Used in:
/ Output
/ Number of generated tokens
/ Finish reason
/ Seed
Used in:
,/ Request ID
/ Prefill tokens (optional)
/ Token ID
/ Logprob
/ Text
/ Is it a special token
/ Complete generated text
Used in:
/ exponential scaling output probability distribution
/ restricting to the k highest probability elements
/ restricting to top tokens summing to prob_cut_off <= prob_cut_off
/ restricting to top tokens summing to prob_cut_off <= prob_cut_off
/ apply sampling on the logits
/ random seed for sampling
/ repetition penalty
/ token watermarking using "A Watermark for Large Language Models"
Used in:
/ Prefill Token IDs
/ Prefill Logprobs
/ Prefill tokens
Used in:
/ Request ID
/ The generation context
/ Context truncation
/ Next Token Chooser Parameters
/ Stopping Criteria Parameters
/ Return prefill logprobs
Used in:
/ Maximum number of generated tokens
/ Optional stopping sequences
/ Ignore end of sequence token / used for benchmarking