NVIDIA Docs Hub NVIDIA Morpheus morpheus.stages.preprocess.preprocess_nlp_stage.PreprocessNLPStage

morpheus.stages.preprocess.preprocess_nlp_stage.PreprocessNLPStage

class PreprocessNLPStage(c, vocab_hash_file='data/bert-base-cased-hash.txt', truncation=False, do_lower_case=False, add_special_tokens=False, stride=- 1, column='data')[source]

Bases: morpheus.stages.preprocess.preprocess_base_stage.PreprocessBaseStage

Prepare NLP input DataFrames for inference.

Parameters

cmorpheus.config.Config
vocab_hash_filestr
truncationbool
do_lower_casebool
add_special_tokensbool
strideint
columnstr

Attributes

has_multi_input_ports
has_multi_output_ports
input_ports
is_built
name
output_ports
unique_name

Methods

`accepted_types`()	Returns accepted input types for this stage.
`build`(builder[, do_propagate])	Build this stage.
`can_build`([check_ports])	Determines if all inputs have been built allowing this node to be built.
`get_all_input_stages`()	Get all input stages to this stage.
`get_all_inputs`()	Get all input senders to this stage.
`get_all_output_stages`()	Get all output stages from this stage.
`get_all_outputs`()	Get all output receivers from this stage.
`get_needed_columns`()	Stages which need to have columns inserted into the dataframe, should populate the `self._needed_columns` dictionary with mapping of column names to `morpheus.common.TypeId`.
`join`()	Awaitable method that stages can implement this to perform cleanup steps when pipeline is stopped.
`on_start`()	This function can be overridden to add usecase-specific implementation at the start of any stage in the pipeline.
`pre_process_batch`(x, vocab_hash_file, ...)	For NLP category usecases, this function performs pre-processing.
`start_async`()	This function is called along with on_start during stage initialization.
`stop`()	Stages can implement this to perform cleanup steps when pipeline is stopped.
`supports_cpp_node`()	Specifies whether this Stage is capable of creating C++ nodes.

_build(builder, in_ports_streams)[source]

This function is responsible for constructing this stage’s internal mrc.SegmentObject object. The input of this function contains the returned value from the upstream stage.

The input values are the mrc.Builder for this stage and a StreamPair tuple which contain the input mrc.SegmentObject object and the message data type.

Parameters

buildermrc.Builder
in_ports_streamsmorpheus.pipeline.pipeline.StreamPair

Returns

typing.List[morpheus.pipeline.pipeline.StreamPair]

accepted_types()[source]

build(builder, do_propagate=True)[source]

Build this stage.

Parameters

buildermrc.Builder
do_propagatebool, optional

can_build(check_ports=False)[source]

Determines if all inputs have been built allowing this node to be built.

Parameters

check_portsbool, optional

Returns

bool

get_all_input_stages()[source]

Get all input stages to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_inputs()[source]

Get all input senders to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

get_all_output_stages()[source]

Get all output stages from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.StreamWrapper]

get_all_outputs()[source]

Get all output receivers from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

get_needed_columns()[source]

property has_multi_input_ports: bool

Indicates if this stage has multiple input ports.

Returns

bool

property has_multi_output_ports: bool

Indicates if this stage has multiple output ports.

Returns

bool

property input_ports: List[morpheus.pipeline.receiver.Receiver]

Input ports to this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Receiver]

property is_built: bool

Indicates if this stage has been built.

Returns

bool

async join()[source]

property name: str

The name of the stage. Used in logging. Each derived class should override this property with a unique name.

Returns

str

on_start()[source]

property output_ports: List[morpheus.pipeline.sender.Sender]

Output ports from this stage.

Returns

typing.List[morpheus.pipeline.pipeline.Sender]

static pre_process_batch(x, vocab_hash_file, do_lower_case, seq_len, stride, truncation, add_special_tokens, column)[source]

For NLP category usecases, this function performs pre-processing.

Parameters

xmorpheus.pipeline.messages.MultiMessage
vocab_hashfilestr
do_lower_casebool
seq_lenint
strideint
truncationbool
add_special_tokensbool
columnstr

Returns

morpheus.pipeline.messages.MultiInferenceNLPMessage

async start_async()[source]

stop()[source]

supports_cpp_node()[source]

property unique_name: str

Unique name of stage. Generated by appending stage id to stage name.

Returns

str