Class FilterDetectionsStage
Defined in File filter_detection.hpp
Base Type
public mrc::pymrc::PythonNode< std::shared_ptr< MultiMessage >, std::shared_ptr< MultiMessage > >
-
class FilterDetectionsStage : public mrc::pymrc::PythonNode<std::shared_ptr<MultiMessage>, std::shared_ptr<MultiMessage>>
FilterDetectionsStage is used to filter rows from a dataframe based on values in a tensor or dataframe column using a specified criteria. Rows in the
meta
dataframe are excluded if their associated value in the datasource indicated byfield_name
is less than or equal tothreshold
.This stage can operate in two different modes set by the
copy
argument. When thecopy
argument istrue
(default), rows that meet the filter criteria are copied into a new dataframe. Whenfalse
sliced views are used instead.Setting
copy=true
should be used when the number of matching records is expected to be both high and in non-adjacent rows. In this mode, the stage will generate only one output message for each incoming message, regardless of the size of the input and the number of matching records. However this comes at the cost of needing to allocate additional memory and perform the copy. Note: In most other stages, messages emitted contain a reference to the originalMessageMeta
emitted into the pipeline by the source stage. When using copy mode this won’t be the case and could cause the originalMessageMeta
to be deallocated after this stage.Setting
copy=false
should be used when either the number of matching records is expected to be very low or are likely to be contained in adjacent rows. In this mode, slices of contiguous blocks of rows are emitted in multiple output messages. Performing a slice is relatively low-cost, however for each incoming message the number of emitted messages could be high (in the worst case scenario as high as half the number of records in the incoming message). Depending on the downstream stages, this can cause performance issues, especially if those stages need to acquire the Python GIL.Public Types
- using base_t = mrc::pymrc::PythonNode<std::shared_ptr<MultiMessage>, std::shared_ptr<MultiMessage>>
Public Functions
-
FilterDetectionsStage(float threshold, bool copy, FilterSource filter_source, std::string field_name = "probs")
Construct a new Filter Detections Stage object.
- Parameters
threshold – : Threshold to classify
copy – : Whether or not to perform a copy default=true
filter_source – : Indicate if the values used for filtering exist in either an output tensor (
FilterSource::TENSOR
) or a column in a Dataframe (FilterSource::DATAFRAME
).field_name – : Name of the tensor or Dataframe column to filter on default=”probs”