GPT Inference Parameters
Parameters for controlling GPT model behavior during inference. For passing as a dictionary to the inference_parameters
argument in the .generate()
or .infer()
method.
Parameter | Type | Description |
---|---|---|
min_new_tokens | int | Minimum number of tokens to generate. |
max_new_tokens | int | Maximum number of tokens to generate. |
do_sample | bool | If True, enables additional decoding strategies. |
prompt_lookup_num_tokens | int | Faster inference by specifying number of tokens for prompt lookup decoding. |
temperature | float; 0.0–1.0 | Higher temperature values result in more stochastic/creative outputs. Lower temperature values result in more deterministic/predictable outputs. |
top_k | int | To limit outputs to k most probable tokens. |
top_p | int | Probability threshold required for output tokens to be considered. |
repetition_penalty | float | Higher values discourage repetitive outputs. |
Updated 5 months ago