Key Default Type Description
auth-token
(none) String Authentication token for secured Triton servers.
compression
(none) String Compression algorithm for request body. Currently only gzip is supported. When enabled, the request body will be compressed to reduce network bandwidth.
custom-headers
(none) Map Custom HTTP headers as key-value pairs. Example: 'X-Custom-Header:value,X-Another:value2'
flatten-batch-dim
false Boolean Whether to flatten the batch dimension for array inputs. When true, shape [1,N] becomes [N]. Defaults to false.
priority
(none) Integer Request priority level (0-255). Higher values indicate higher priority.
sequence-end
false Boolean Whether this request marks the end of a sequence for stateful models. When true, Triton will release the model's state after processing this request. See Triton Stateful Models for more details.
sequence-id
(none) String Sequence ID for stateful models. A sequence represents a series of inference requests that must be routed to the same model instance to maintain state across requests (e.g., for RNN/LSTM models). See Triton Stateful Models for more details.
sequence-start
false Boolean Whether this request marks the start of a new sequence for stateful models. When true, Triton will initialize the model's state before processing this request. See Triton Stateful Models for more details.