auth-token |
(none) |
String |
Authentication token for secured Triton servers. |
compression |
(none) |
String |
Compression algorithm for request body. Currently only gzip is supported. When enabled, the request body will be compressed to reduce network bandwidth. |
custom-headers |
(none) |
Map |
Custom HTTP headers as key-value pairs. Example: 'X-Custom-Header:value,X-Another:value2' |
endpoint |
(none) |
String |
Full URL of the Triton Inference Server endpoint, e.g., https://triton-server:8000/v2/models. Both HTTP and HTTPS are supported; HTTPS is recommended for production. |
flatten-batch-dim |
false |
Boolean |
Whether to flatten the batch dimension for array inputs. When true, shape [1,N] becomes [N]. Defaults to false. |
model-name |
(none) |
String |
Name of the model to invoke on Triton server. |
model-version |
"latest" |
String |
Version of the model to use. Defaults to 'latest'. |
priority |
(none) |
Integer |
Request priority level (0-255). Higher values indicate higher priority. |
sequence-end |
false |
Boolean |
Whether this request marks the end of a sequence for stateful models. When true, Triton will release the model's state after processing this request. See Triton Stateful Models for more details. |
sequence-id |
(none) |
String |
Sequence ID for stateful models. A sequence represents a series of inference requests that must be routed to the same model instance to maintain state across requests (e.g., for RNN/LSTM models). See Triton Stateful Models for more details. |
sequence-start |
false |
Boolean |
Whether this request marks the start of a new sequence for stateful models. When true, Triton will initialize the model's state before processing this request. See Triton Stateful Models for more details. |
timeout |
30 s |
Duration |
HTTP request timeout (connect + read + write). This applies per individual request and is separate from Flink's async timeout. Defaults to 30 seconds. |