Key Default Type Description
auth-token
(none) String Authentication token for secured Triton servers.
compression
(none) String Compression algorithm for request body. Currently only gzip is supported. When enabled, the request body will be compressed to reduce network bandwidth.
custom-headers
(none) Map Custom HTTP headers as key-value pairs. Example: 'X-Custom-Header:value,X-Another:value2'
endpoint
(none) String Full URL of the Triton Inference Server endpoint, e.g., https://triton-server:8000/v2/models. Both HTTP and HTTPS are supported; HTTPS is recommended for production.
flatten-batch-dim
false Boolean Whether to flatten the batch dimension for array inputs. When true, shape [1,N] becomes [N]. Defaults to false.
model-name
(none) String Name of the model to invoke on Triton server.
model-version
"latest" String Version of the model to use. Defaults to 'latest'.
priority
(none) Integer Request priority level (0-255). Higher values indicate higher priority.
sequence-end
false Boolean Whether this request marks the end of a sequence for stateful models. When true, Triton will release the model's state after processing this request. See Triton Stateful Models for more details.
sequence-id
(none) String Sequence ID for stateful models. A sequence represents a series of inference requests that must be routed to the same model instance to maintain state across requests (e.g., for RNN/LSTM models). See Triton Stateful Models for more details.
sequence-start
false Boolean Whether this request marks the start of a new sequence for stateful models. When true, Triton will initialize the model's state before processing this request. See Triton Stateful Models for more details.
timeout
30 s Duration HTTP request timeout (connect + read + write). This applies per individual request and is separate from Flink's async timeout. Defaults to 30 seconds.