Get desktop application:
View/edit binary Protocol Buffers messages
/ Model Info
/ Empty request
(message has no fields)
/ Service discovery
/ Empty request
(message has no fields)
/ Other shards urls
/ Empties batch cache
/ Optional batch id
/ Empty response
(message has no fields)
/ Remove requests from a cached batch
/ Batch ID
/ Requests to keep
/ Filtered Batch (cached)
/ Warmup the model and compute max cache size
/ Batch to warmup on
/ Empty response
/ Maximum number of tokens supported by the model
/ Prefill batch and decode first token
/ Batch
/ Optional cached batch
/ Generation
/ Next batch (cached)
/ Embed
/ Batch
/ Embeddings
/ Error message on failure
/ Classify
/ Batch
/ Classifications
/ Error message on failure
/ Decode token for a list of prefilled batches
/ Cached batches
/ Decodes
/ Next batch (cached)
/ Health check
(message has no fields)
(message has no fields)
/ Download adapter
/ Adapter Parameters
/ Adapter source
/ Token for external API (predibase / HuggingFace)
/ True if download occurred, false if skipped
/ Fraction of the adapter memory limit consumed by the adapter. / If no limit is set, will return 0. / When the total across all loaded adapters exceeds / the adapter_memory_fraction limit, no more adapters / will be loaded to GPU and LoRAX will begin swapping.
/ Load adapter
/ Adapter Parameters
/ Adapter source
/ Adapter index
/ Token for external API (predibase / HuggingFace)
/ True if load occurred, false if skipped
/ Offload adapter
/ Adapter Parameters
/ Adapter source
/ Adapter index
/ True if offload occurred, false if skipped
Used in:
, , ,/ Adapter IDs
/ Adapter weights for merging
/ Merge strategy (default: linear)
/ [0, 1], 0: full pruning, 1: no pruning
/ Majority sign method (default: total)
Used in:
, , ,/ Adapters loaded using the HuggingFace Hub
/ Adapters loaded via remote filesystem path
/ Adapters loaded via local filesystem path
/ Adapters loaded via predibase
Used in:
/ Alternative Token IDs
/ Alternative Logprobs
/ Alternative tokens
Used in:
, , ,/ Batch ID
/ Individual requests
/ Batch size (==len(requests))
/ Maximum number of tokens this batch will grow to
/ Maximum number of Paged Attention blocks
Used in:
, , , ,/ Batch ID
/ Individual requests ids
/ Batch size (==len(requests))
/ Maximum number of tokens this batch will grow to
/ Number of tokens in the next forward
Used in:
/ Request ID
/ Classifications
Used in:
/ Request ID
/ Embedding values
Used in:
/ Request ID
/ Entities
/ XXX
Used in:
Used in:
/ Output
/ Number of generated tokens
/ Number of skipped tokens due to speculative decoding hits
/ Finish reason
/ Seed
Used in:
,/ Request ID
/ Prefill tokens (optional)
/ Next tokens
/ Complete generated text
/ Prefill tokens length
Used in:
/ Binary image data.
/ Image MIME type.
Used in:
/ Plain text data
/ Image data
Used in:
/ Total method
/ Frequency method
Used in:
/ Linear combination of adapters
/ TIES method for combining adapters
/ DARE method for combining adapters
/ DARE + TIES method for combining adapters
Used in:
/ exponential scaling output probability distribution
/ restricting to the k highest probability elements
/ restricting to top tokens summing to prob_cut_off <= prob_cut_off
/ restricting to top tokens summing to prob_cut_off <= prob_cut_off
/ apply sampling on the logits
/ random seed for sampling
/ repetition penalty
/ frequency penalty
/ presence penalty
/ token watermarking using "A Watermark for Large Language Models"
/ adapter to use with lora exchange
/ JSON schema used for constrained decoding (Outlines)
/ returning the k highest probability alternatives
Used in:
/ Token IDs
/ Logprobs
/ decoded text for each token
/ is special for each token
/ Alternative tokens (optional)
Used in:
/ Adapter params
/ Adapter source
/ Adapter index
Used in:
/ Request ID
/ The generation context
/ Tokenized inputs
/ Context truncation
/ Next Token Chooser Parameters
/ Stopping Criteria Parameters
/ Return prefill logprobs
/ Adapter index
/ Paged attention blocks
/ Paged attention slots
/ Tokens that can be retrieved from the KV cache. / This value is set for the first prefill and never reset
/ Chunk of tokens that must be computed for the first prefill / This value is set for the first prefill and never reset
Used in:
/ Maximum number of generated tokens
/ Optional stopping sequences
/ Ignore end of sequence token / used for benchmarking
Used in:
/ Token IDs
/ Chunks