morpheus.utils.column_info
Functions
column_listjoin (df, col_name) |
Returns the array series df[col_name] as flattened string series. |
create_increment_col (df, column_name[, ...]) |
Create a new integer column counting unique occurrences of values in column_name grouped per-day using the timestamp values in timestamp_column and then grouping by groupby_column returning incrementing values starting at 1 . |
process_dataframe (df_in, input_schema) |
Processes a dataframe according to the given schema. |
Classes
BoolColumn (name, dtype, input_name[, ...]) |
Subclass of RenameColumn , adds the ability to map a set custom values as boolean values. |
ColumnInfo (name, dtype) |
Defines a single column and type-cast. |
CustomColumn (name, dtype, process_column_fn) |
Subclass of ColumnInfo , defines a column to be computed by a user-defined function process_column_fn . |
DataFrameInputSchema ([json_columns, ...]) |
Defines the schema specifying the columns to be included in the output DataFrame . |
DateTimeColumn (name, dtype, input_name) |
Subclass of RenameColumn , specific to casting UTC localized datetime values. |
DistinctIncrementColumn (name, dtype, input_name) |
Subclass of RenameColumn , counts the unique occurrences of a value in groupby_column over a specific time window period based on dates in the timestamp_column field. |
IncrementColumn (name, dtype, input_name, ...) |
Subclass of DateTimeColumn , counts the unique occurrences of a value in groupby_column over a specific time window period based on dates in the input_name field. |
PreparedDFInfo (df, columns_to_preserve) |
Represents the result of preparing a DataFrame along with avilable columns to be preserved. |
RenameColumn (name, dtype, input_name) |
Subclass of ColumnInfo , adds the ability to also perform a rename. |
StringCatColumn (name, dtype, input_columns, sep) |
Subclass of ColumnInfo , concatenates values from multiple columns into a new string column separated by sep . |
StringJoinColumn (name, dtype, input_name, sep) |
Subclass of RenameColumn , converts incoming list values to string by joining by sep . |
- column_listjoin(df, col_name)[source]
Returns the array series
df[col_name]
as flattened string series.- Parameters
- df
- col_name
The dataframe from which to get the column.
The column to transform.
- Returns
- pandas.Series
A series with the arrays in the column flattened to strings.
- create_increment_col(df, column_name, groupby_column='username', timestamp_column='timestamp', period='D')[source]
Create a new integer column counting unique occurrences of values in
column_name
grouped per-day using the timestamp values intimestamp_column
and then grouping bygroupby_column
returning incrementing values starting at1
.- Parameters
- df
- column_name
- groupby_column
- timestamp_column
- period: str, default “D”
The input dataframe.
Name of the column in which unique occurrences are counted.
The column to group by.
The column containing timestamp values.
The period to group by.
- Returns
- pandas.Series
The new column with incrementing values.
- process_dataframe(df_in, input_schema)[source]
Processes a dataframe according to the given schema.
- Parameters
- df_in
- input_schema
The input dataframe to process.
The schema used to process the dataframe.
- Returns
- pandas.DataFrame
The processed dataframe.