morpheus.utils.column_info

Functions

column_listjoin(df, col_name)

Returns the array series df[col_name] as flattened string series.

create_increment_col(df, column_name[, ...])

Create a new integer column counting unique occurrences of values in column_name grouped per-day using the timestamp values in timestamp_column and then grouping by groupby_column returning incrementing values starting at 1.

process_dataframe(df_in, input_schema)

Processes a dataframe according to the given schema.

Classes

BoolColumn(name, dtype, input_name[, ...])

Subclass of RenameColumn, adds the ability to map a set custom values as boolean values.

ColumnInfo(name, dtype)

Defines a single column and type-cast.

CustomColumn(name, dtype, process_column_fn)

Subclass of ColumnInfo, defines a column to be computed by a user-defined function process_column_fn.

DataFrameInputSchema([json_columns, ...])

Defines the schema specifying the columns to be included in the output DataFrame.

DateTimeColumn(name, dtype, input_name)

Subclass of RenameColumn, specific to casting UTC localized datetime values.

DistinctIncrementColumn(name, dtype, input_name)

Subclass of RenameColumn, counts the unique occurrences of a value in groupby_column over a specific time window period based on dates in the timestamp_column field.

IncrementColumn(name, dtype, input_name, ...)

Subclass of DateTimeColumn, counts the unique occurrences of a value in groupby_column over a specific time window period based on dates in the input_name field.

RenameColumn(name, dtype, input_name)

Subclass of ColumnInfo, adds the ability to also perform a rename.

StringCatColumn(name, dtype, input_columns, sep)

Subclass of ColumnInfo, concatenates values from multiple columns into a new string column separated by sep.

StringJoinColumn(name, dtype, input_name, sep)

Subclass of RenameColumn, converts incoming list values to string by joining by sep.

column_listjoin(df, col_name)[source]

Returns the array series df[col_name] as flattened string series.

Parameters:
dfpandas.DataFrame

The dataframe from which to get the column.

col_namestr

The column to transform.

Returns:
pandas.Series

A series with the arrays in the column flattened to strings.

create_increment_col(df, column_name, groupby_column='username', timestamp_column='timestamp', period='D')[source]

Create a new integer column counting unique occurrences of values in column_name grouped per-day using the timestamp values in timestamp_column and then grouping by groupby_column returning incrementing values starting at 1.

Parameters:
dfpandas.DataFrame

The input dataframe.

column_namestr

Name of the column in which unique occurrences are counted.

groupby_columnstr, default “username”

The column to group by.

timestamp_columnstr, default “timestamp”

The column containing timestamp values.

period: str, default “D”

The period to group by.

Returns:
pandas.Series

The new column with incrementing values.

process_dataframe(df_in, input_schema)[source]

Processes a dataframe according to the given schema.

Parameters:
df_inpandas.DataFrame or cudf.DataFrame

The input dataframe to process.

input_schemaobject

The schema used to process the dataframe.

Returns:
pandas.DataFrame

The processed dataframe.

© Copyright 2023, NVIDIA. Last updated on Aug 23, 2023.