core.inference.text_generation_controllers.encoder_decoder_text_generation_controller#

Module Contents#

Classes#

EncoderDecoderTextGenerationController

The text generation controller for encoder-decoder architecture

API#

class core.inference.text_generation_controllers.encoder_decoder_text_generation_controller.EncoderDecoderTextGenerationController(
inference_wrapped_model: megatron.core.inference.model_inference_wrappers.abstract_model_inference_wrapper.AbstractModelInferenceWrapper,
tokenizer,
)#

Bases: megatron.core.inference.text_generation_controllers.text_generation_controller.TextGenerationController

The text generation controller for encoder-decoder architecture

This class inherits from TextGenerationController, adding features relating to encoder input encoder_prompt

Initialization

prep_inference_input(
prompts_tokens: torch.Tensor,
active_requests: OrderedDict[str, megatron.core.inference.inference_request.InferenceRequest],
use_attention_mask: bool = False,
) Dict[str, Any]#

Preparing input data for inference, using respective wrapper’s prep_inference_input method # pylint: disable=line-too-long

Parameters:
  • prompts_tokens (torch.Tensor) – A tensor of shape [batch_size, max_sequence_length]

  • active_requests (OrderedDict[str, InferenceRequest]) – The input active requests

  • use_attention_mask (bool) – Whether to use an attention mask. Should be set to True only when exclusively doing prefill (no decode) with variable prompt lengths.

Returns:

A dict of the inference input for the current batch.