Functions
|
Returns the array series |
|
Create a new integer column counting unique occurrences of values in |
|
Processes a dataframe according to the given schema. |
Classes
|
Subclass of |
|
Defines a single column and type-cast. |
|
Subclass of |
|
Defines the schema specifying the columns to be included in the output |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
|
Subclass of |
- column_listjoin(df, col_name)[source]
Returns the array series
df[col_name]
as flattened string series.- Parameters:
- dfpandas.DataFrame
- col_namestr
The dataframe from which to get the column.
The column to transform.
- Returns:
- pandas.Series
A series with the arrays in the column flattened to strings.
- create_increment_col(df, column_name, groupby_column='username', timestamp_column='timestamp', period='D')[source]
Create a new integer column counting unique occurrences of values in
column_name
grouped per-day using the timestamp values intimestamp_column
and then grouping bygroupby_column
returning incrementing values starting at1
.- Parameters:
- dfpandas.DataFrame
- column_namestr
- groupby_columnstr, default “username”
- timestamp_columnstr, default “timestamp”
- period: str, default “D”
The input dataframe.
Name of the column in which unique occurrences are counted.
The column to group by.
The column containing timestamp values.
The period to group by.
- Returns:
- pandas.Series
The new column with incrementing values.
- process_dataframe(df_in, input_schema)[source]
Processes a dataframe according to the given schema.
- Parameters:
- df_inpandas.DataFrame or cudf.DataFrame
- input_schemaobject
The input dataframe to process.
The schema used to process the dataframe.
- Returns:
- pandas.DataFrame
The processed dataframe.