`core.inference.text_generation_controllers.encoder_decoder_text_generation_controller`#

Module Contents#

Classes#

EncoderDecoderTextGenerationController

The text generation controller for encoder-decoder architecture

API#

class core.inference.text_generation_controllers.encoder_decoder_text_generation_controller.EncoderDecoderTextGenerationController( inference_wrapped_model: megatron.core.inference.model_inference_wrappers.abstract_model_inference_wrapper.AbstractModelInferenceWrapper, tokenizer, )#

Bases: megatron.core.inference.text_generation_controllers.text_generation_controller.TextGenerationController

The text generation controller for encoder-decoder architecture

This class inherits from TextGenerationController, adding features relating to encoder input encoder_prompt

Initialization

prep_inference_input( prompts_tokens: torch.Tensor, active_requests: OrderedDict[str, megatron.core.inference.inference_request.InferenceRequest], use_attention_mask: bool = False, ) → Dict[str, Any]#

Preparing input data for inference, using respective wrapper’s prep_inference_input method # pylint: disable=line-too-long

Parameters:

prompts_tokens (torch.Tensor) – A tensor of shape [batch_size, max_sequence_length]
active_requests (OrderedDict[str, InferenceRequest]) – The input active requests
use_attention_mask (bool) – Whether to use an attention mask. Should be set to True only when exclusively doing prefill (no decode) with variable prompt lengths.

Returns:

A dict of the inference input for the current batch.

core.inference.text_generation_controllers.encoder_decoder_text_generation_controller#

Module Contents#

Classes#

API#

`core.inference.text_generation_controllers.encoder_decoder_text_generation_controller`#