morpheus.utils.column_info

Functions

column_listjoin(df, col_name) Returns the array series df[col_name] as flattened string series.
create_increment_col(df, column_name[, ...]) Create a new integer column counting unique occurrences of values in column_name grouped per-day using the timestamp values in timestamp_column and then grouping by groupby_column returning incrementing values starting at 1.
process_dataframe(df_in, input_schema) Processes a dataframe according to the given schema.

Classes

BoolColumn(name, dtype, input_name[, ...]) Subclass of RenameColumn, adds the ability to map a set custom values as boolean values.
ColumnInfo(name, dtype) Defines a single column and type-cast.
CustomColumn(name, dtype, process_column_fn) Subclass of ColumnInfo, defines a column to be computed by a user-defined function process_column_fn.
DataFrameInputSchema([json_columns, ...]) Defines the schema specifying the columns to be included in the output DataFrame.
DateTimeColumn(name, dtype, input_name) Subclass of RenameColumn, specific to casting UTC localized datetime values.
DistinctIncrementColumn(name, dtype, input_name) Subclass of RenameColumn, counts the unique occurrences of a value in groupby_column over a specific time window period based on dates in the timestamp_column field.
IncrementColumn(name, dtype, input_name, ...) Subclass of DateTimeColumn, counts the unique occurrences of a value in groupby_column over a specific time window period based on dates in the input_name field.
PreparedDFInfo(df, columns_to_preserve) Represents the result of preparing a DataFrame along with avilable columns to be preserved.
RenameColumn(name, dtype, input_name) Subclass of ColumnInfo, adds the ability to also perform a rename.
StringCatColumn(name, dtype, input_columns, sep) Subclass of ColumnInfo, concatenates values from multiple columns into a new string column separated by sep.
StringJoinColumn(name, dtype, input_name, sep) Subclass of RenameColumn, converts incoming list values to string by joining by sep.
column_listjoin(df, col_name)[source]

Returns the array series df[col_name] as flattened string series.

Parameters
df

The dataframe from which to get the column.

col_name

The column to transform.

Returns
pandas.Series

A series with the arrays in the column flattened to strings.

create_increment_col(df, column_name, groupby_column='username', timestamp_column='timestamp', period='D')[source]

Create a new integer column counting unique occurrences of values in column_name grouped per-day using the timestamp values in timestamp_column and then grouping by groupby_column returning incrementing values starting at 1.

Parameters
df

The input dataframe.

column_name

Name of the column in which unique occurrences are counted.

groupby_column

The column to group by.

timestamp_column

The column containing timestamp values.

period: str, default “D”

The period to group by.

Returns
pandas.Series

The new column with incrementing values.

process_dataframe(df_in, input_schema)[source]

Processes a dataframe according to the given schema.

Parameters
df_in

The input dataframe to process.

input_schema

The schema used to process the dataframe.

Returns
pandas.DataFrame

The processed dataframe.

Previous morpheus.utils.atomic_integer.AtomicInteger
Next morpheus.utils.column_info.BoolColumn
© Copyright 2024, NVIDIA. Last updated on Apr 11, 2024.