NVIDIA Docs Hub NVIDIA Morpheus morpheus.models.dfencoder.dataframe.EncoderDataFrame

morpheus.models.dfencoder.dataframe.EncoderDataFrame

class EncoderDataFrame(*args, **kwargs)[source]

Bases: pandas.core.frame.DataFrame

Attributes

T
at
attrs
axes
columns
dtypes
empty
flags
iat
iloc
index
loc
ndim
shape
size
style
values

Methods

`abs`()	Return a Series/DataFrame with absolute numeric value of each element.
`add`(other[, axis, level, fill_value])	Get Addition of dataframe and other, element-wise (binary operator `add`).
`add_prefix`(prefix)	Prefix labels with string `prefix`.
`add_suffix`(suffix)	Suffix labels with string `suffix`.
`agg`([func, axis])	Aggregate using one or more operations over the specified axis.
`aggregate`([func, axis])	Aggregate using one or more operations over the specified axis.
`align`(other[, join, axis, level, copy, ...])	Align two objects on their axes with the specified join method.
`all`([axis, bool_only, skipna, level])	Return whether all elements are True, potentially over an axis.
`any`([axis, bool_only, skipna, level])	Return whether any element is True, potentially over an axis.
`append`(other[, ignore_index, ...])	Append rows of `other` to the end of caller, returning a new object.
`apply`(func[, axis, raw, result_type, args])	Apply a function along an axis of the DataFrame.
`applymap`(func[, na_action])	Apply a function to a Dataframe elementwise.
`asfreq`(freq[, method, how, normalize, ...])	Convert time series to specified frequency.
`asof`(where[, subset])	Return the last row(s) without any NaNs before `where`.
`assign`(**kwargs)	Assign new columns to a DataFrame.
`astype`(dtype[, copy, errors])	Cast a pandas object to a specified dtype `dtype`.
`at_time`(time[, asof, axis])	Select values at particular time of day (e.g., 9:30AM).
`backfill`([axis, inplace, limit, downcast])	Synonym for `DataFrame.fillna()` with `method='bfill'`.
`between_time`(start_time, end_time[, ...])	Select values between particular times of the day (e.g., 9:00-9:30 AM).
`bfill`([axis, inplace, limit, downcast])	Synonym for `DataFrame.fillna()` with `method='bfill'`.
`bool`()	Return the bool of a single element Series or DataFrame.
`boxplot`([column, by, ax, fontsize, rot, ...])	Make a box plot from DataFrame columns.
`clip`([lower, upper, axis, inplace])	Trim values at input threshold(s).
`combine`(other, func[, fill_value, overwrite])	Perform column-wise combine with another DataFrame.
`combine_first`(other)	Update null elements with value in the same location in `other`.
`compare`(other[, align_axis, keep_shape, ...])	Compare to another DataFrame and show the differences.
`convert_dtypes`([infer_objects, ...])	Convert columns to best possible dtypes using dtypes supporting `pd.NA`.
`copy`([deep])	Make a copy of this object's indices and data.
`corr`([method, min_periods])	Compute pairwise correlation of columns, excluding NA/null values.
`corrwith`(other[, axis, drop, method])	Compute pairwise correlation.
`count`([axis, level, numeric_only])	Count non-NA cells for each column or row.
`cov`([min_periods, ddof])	Compute pairwise covariance of columns, excluding NA/null values.
`cummax`([axis, skipna])	Return cumulative maximum over a DataFrame or Series axis.
`cummin`([axis, skipna])	Return cumulative minimum over a DataFrame or Series axis.
`cumprod`([axis, skipna])	Return cumulative product over a DataFrame or Series axis.
`cumsum`([axis, skipna])	Return cumulative sum over a DataFrame or Series axis.
`describe`([percentiles, include, exclude, ...])	Generate descriptive statistics.
`diff`([periods, axis])	First discrete difference of element.
`div`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator `truediv`).
`divide`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator `truediv`).
`dot`(other)	Compute the matrix multiplication between the DataFrame and other.
`drop`([labels, axis, index, columns, level, ...])	Drop specified labels from rows or columns.
`drop_duplicates`([subset, keep, inplace, ...])	Return DataFrame with duplicate rows removed.
`droplevel`(level[, axis])	Return Series/DataFrame with requested index / column level(s) removed.
`dropna`([axis, how, thresh, subset, inplace])	Remove missing values.
`duplicated`([subset, keep])	Return boolean Series denoting duplicate rows.
`eq`(other[, axis, level])	Get Equal to of dataframe and other, element-wise (binary operator `eq`).
`equals`(other)	Test whether two objects contain the same elements.
`eval`(expr[, inplace])	Evaluate a string describing operations on DataFrame columns.
`ewm`([com, span, halflife, alpha, ...])	Provide exponential weighted (EW) functions.
`expanding`([min_periods, center, axis, method])	Provide expanding transformations.
`explode`(column[, ignore_index])	Transform each element of a list-like to a row, replicating index values.
`ffill`([axis, inplace, limit, downcast])	Synonym for `DataFrame.fillna()` with `method='ffill'`.
`fillna`([value, method, axis, inplace, ...])	Fill NA/NaN values using the specified method.
`filter`([items, like, regex, axis])	Subset the dataframe rows or columns according to the specified index labels.
`first`(offset)	Select initial periods of time series data based on a date offset.
`first_valid_index`()	Return index for first non-NA value or None, if no NA value is found.
`floordiv`(other[, axis, level, fill_value])	Get Integer division of dataframe and other, element-wise (binary operator `floordiv`).
`from_dict`(data[, orient, dtype, columns])	Construct DataFrame from dict of array-like or dicts.
`from_records`(data[, index, exclude, ...])	Convert structured or record ndarray to DataFrame.
`ge`(other[, axis, level])	Get Greater than or equal to of dataframe and other, element-wise (binary operator `ge`).
`get`(key[, default])	Get item from object for given key (ex: DataFrame column).
`groupby`([by, axis, level, as_index, sort, ...])	Group DataFrame using a mapper or by a Series of columns.
`gt`(other[, axis, level])	Get Greater than of dataframe and other, element-wise (binary operator `gt`).
`head`([n])	Return the first `n` rows.
`hist`([column, by, grid, xlabelsize, xrot, ...])	Make a histogram of the DataFrame's columns.
`idxmax`([axis, skipna])	Return index of first occurrence of maximum over requested axis.
`idxmin`([axis, skipna])	Return index of first occurrence of minimum over requested axis.
`infer_objects`()	Attempt to infer better dtypes for object columns.
`info`([verbose, buf, max_cols, memory_usage, ...])	Print a concise summary of a DataFrame.
`insert`(loc, column, value[, allow_duplicates])	Insert column into DataFrame at specified location.
`interpolate`([method, axis, limit, inplace, ...])	Fill NaN values using an interpolation method.
`isin`(values)	Whether each element in the DataFrame is contained in values.
`isna`()	Detect missing values.
`isnull`()	Detect missing values.
`items`()	Iterate over (column name, Series) pairs.
`iteritems`()	Iterate over (column name, Series) pairs.
`iterrows`()	Iterate over DataFrame rows as (index, Series) pairs.
`itertuples`([index, name])	Iterate over DataFrame rows as namedtuples.
`join`(other[, on, how, lsuffix, rsuffix, sort])	Join columns of another DataFrame.
`keys`()	Get the 'info axis' (see Indexing for more).
`kurt`([axis, skipna, level, numeric_only])	Return unbiased kurtosis over requested axis.
`kurtosis`([axis, skipna, level, numeric_only])	Return unbiased kurtosis over requested axis.
`last`(offset)	Select final periods of time series data based on a date offset.
`last_valid_index`()	Return index for last non-NA value or None, if no NA value is found.
`le`(other[, axis, level])	Get Less than or equal to of dataframe and other, element-wise (binary operator `le`).
`lookup`(row_labels, col_labels)	Label-based "fancy indexing" function for DataFrame.
`lt`(other[, axis, level])	Get Less than of dataframe and other, element-wise (binary operator `lt`).
`mad`([axis, skipna, level])	Return the mean absolute deviation of the values over the requested axis.
`mask`(cond[, other, inplace, axis, level, ...])	Replace values where the condition is True.
`max`([axis, skipna, level, numeric_only])	Return the maximum of the values over the requested axis.
`mean`([axis, skipna, level, numeric_only])	Return the mean of the values over the requested axis.
`median`([axis, skipna, level, numeric_only])	Return the median of the values over the requested axis.
`melt`([id_vars, value_vars, var_name, ...])	Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
`memory_usage`([index, deep])	Return the memory usage of each column in bytes.
`merge`(right[, how, on, left_on, right_on, ...])	Merge DataFrame or named Series objects with a database-style join.
`min`([axis, skipna, level, numeric_only])	Return the minimum of the values over the requested axis.
`mod`(other[, axis, level, fill_value])	Get Modulo of dataframe and other, element-wise (binary operator `mod`).
`mode`([axis, numeric_only, dropna])	Get the mode(s) of each element along the selected axis.
`mul`(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator `mul`).
`multiply`(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator `mul`).
`ne`(other[, axis, level])	Get Not equal to of dataframe and other, element-wise (binary operator `ne`).
`nlargest`(n, columns[, keep])	Return the first `n` rows ordered by `columns` in descending order.
`notna`()	Detect existing (non-missing) values.
`notnull`()	Detect existing (non-missing) values.
`nsmallest`(n, columns[, keep])	Return the first `n` rows ordered by `columns` in ascending order.
`nunique`([axis, dropna])	Count number of distinct elements in specified axis.
`pad`([axis, inplace, limit, downcast])	Synonym for `DataFrame.fillna()` with `method='ffill'`.
`pct_change`([periods, fill_method, limit, freq])	Percentage change between the current and a prior element.
`pipe`(func, args, *kwargs)	Apply func(self, args, *kwargs).
`pivot`([index, columns, values])	Return reshaped DataFrame organized by given index / column values.
`pivot_table`([values, index, columns, ...])	Create a spreadsheet-style pivot table as a DataFrame.
`plot`	alias of `pandas.plotting._core.PlotAccessor`
`pop`(item)	Return item and drop from frame.
`pow`(other[, axis, level, fill_value])	Get Exponential power of dataframe and other, element-wise (binary operator `pow`).
`prod`([axis, skipna, level, numeric_only, ...])	Return the product of the values over the requested axis.
`product`([axis, skipna, level, numeric_only, ...])	Return the product of the values over the requested axis.
`quantile`([q, axis, numeric_only, interpolation])	Return values at the given quantile over requested axis.
`query`(expr[, inplace])	Query the columns of a DataFrame with a boolean expression.
`radd`(other[, axis, level, fill_value])	Get Addition of dataframe and other, element-wise (binary operator `radd`).
`rank`([axis, method, numeric_only, ...])	Compute numerical data ranks (1 through n) along axis.
`rdiv`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator `rtruediv`).
`reindex`([labels, index, columns, axis, ...])	Conform Series/DataFrame to new index with optional filling logic.
`reindex_like`(other[, method, copy, limit, ...])	Return an object with matching indices as other object.
`rename`([mapper, index, columns, axis, copy, ...])	Alter axes labels.
`rename_axis`([mapper, index, columns, axis, ...])	Set the name of the axis for the index or columns.
`reorder_levels`(order[, axis])	Rearrange index levels using input order.
`replace`([to_replace, value, inplace, limit, ...])	Replace values given in `to_replace` with `value`.
`resample`(rule[, axis, closed, label, ...])	Resample time-series data.
`reset_index`([level, drop, inplace, ...])	Reset the index, or a level of it.
`rfloordiv`(other[, axis, level, fill_value])	Get Integer division of dataframe and other, element-wise (binary operator `rfloordiv`).
`rmod`(other[, axis, level, fill_value])	Get Modulo of dataframe and other, element-wise (binary operator `rmod`).
`rmul`(other[, axis, level, fill_value])	Get Multiplication of dataframe and other, element-wise (binary operator `rmul`).
`rolling`(window[, min_periods, center, ...])	Provide rolling window calculations.
`round`([decimals])	Round a DataFrame to a variable number of decimal places.
`rpow`(other[, axis, level, fill_value])	Get Exponential power of dataframe and other, element-wise (binary operator `rpow`).
`rsub`(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator `rsub`).
`rtruediv`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator `rtruediv`).
`sample`([n, frac, replace, weights, ...])	Return a random sample of items from an axis of object.
`select_dtypes`([include, exclude])	Return a subset of the DataFrame's columns based on the column dtypes.
`sem`([axis, skipna, level, ddof, numeric_only])	Return unbiased standard error of the mean over requested axis.
`set_axis`(labels[, axis, inplace])	Assign desired index to given axis.
`set_flags`(*[, copy, allows_duplicate_labels])	Return a new object with updated flags.
`set_index`(keys[, drop, append, inplace, ...])	Set the DataFrame index using existing columns.
`shift`([periods, freq, axis, fill_value])	Shift index by desired number of periods with an optional time `freq`.
`skew`([axis, skipna, level, numeric_only])	Return unbiased skew over requested axis.
`slice_shift`([periods, axis])	Equivalent to `shift` without copying data.
`sort_index`([axis, level, ascending, ...])	Sort object by labels (along an axis).
`sort_values`(by[, axis, ascending, inplace, ...])	Sort by the values along either axis.
`sparse`	alias of `pandas.core.arrays.sparse.accessor.SparseFrameAccessor`
`squeeze`([axis])	Squeeze 1 dimensional axis objects into scalars.
`stack`([level, dropna])	Stack the prescribed level(s) from columns to index.
`std`([axis, skipna, level, ddof, numeric_only])	Return sample standard deviation over requested axis.
`sub`(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator `sub`).
`subtract`(other[, axis, level, fill_value])	Get Subtraction of dataframe and other, element-wise (binary operator `sub`).
`sum`([axis, skipna, level, numeric_only, ...])	Return the sum of the values over the requested axis.
`swap`([likelihood])	Performs random swapping of data.
`swapaxes`(axis1, axis2[, copy])	Interchange axes and swap values axes appropriately.
`swaplevel`([i, j, axis])	Swap levels i and j in a `MultiIndex`.
`tail`([n])	Return the last `n` rows.
`take`(indices[, axis, is_copy])	Return the elements in the given positional indices along an axis.
`to_clipboard`([excel, sep])	Copy object to the system clipboard.
`to_csv`([path_or_buf, sep, na_rep, ...])	Write object to a comma-separated values (csv) file.
`to_dict`([orient, into])	Convert the DataFrame to a dictionary.
`to_excel`(excel_writer[, sheet_name, na_rep, ...])	Write object to an Excel sheet.
`to_feather`(path, **kwargs)	Write a DataFrame to the binary Feather format.
`to_gbq`(destination_table[, project_id, ...])	Write a DataFrame to a Google BigQuery table.
`to_hdf`(path_or_buf, key[, mode, complevel, ...])	Write the contained data to an HDF5 file using HDFStore.
`to_html`([buf, columns, col_space, header, ...])	Render a DataFrame as an HTML table.
`to_json`([path_or_buf, orient, date_format, ...])	Convert the object to a JSON string.
`to_latex`([buf, columns, col_space, header, ...])	Render object to a LaTeX tabular, longtable, or nested table/tabular.
`to_markdown`([buf, mode, index, storage_options])	Print DataFrame in Markdown-friendly format.
`to_numpy`([dtype, copy, na_value])	Convert the DataFrame to a NumPy array.
`to_parquet`([path, engine, compression, ...])	Write a DataFrame to the binary parquet format.
`to_period`([freq, axis, copy])	Convert DataFrame from DatetimeIndex to PeriodIndex.
`to_pickle`(path[, compression, protocol, ...])	Pickle (serialize) object to file.
`to_records`([index, column_dtypes, index_dtypes])	Convert DataFrame to a NumPy record array.
`to_sql`(name, con[, schema, if_exists, ...])	Write records stored in a DataFrame to a SQL database.
`to_stata`(path[, convert_dates, write_index, ...])	Export DataFrame object to Stata dta format.
`to_string`([buf, columns, col_space, header, ...])	Render a DataFrame to a console-friendly tabular output.
`to_timestamp`([freq, how, axis, copy])	Cast to DatetimeIndex of timestamps, at beginning of period.
`to_xarray`()	Return an xarray object from the pandas object.
`to_xml`([path_or_buffer, index, root_name, ...])	Render a DataFrame to an XML document.
`transform`(func[, axis])	Call `func` on self producing a DataFrame with transformed values.
`transpose`(*args[, copy])	Transpose index and columns.
`truediv`(other[, axis, level, fill_value])	Get Floating division of dataframe and other, element-wise (binary operator `truediv`).
`truncate`([before, after, axis, copy])	Truncate a Series or DataFrame before and after some index value.
`tshift`([periods, freq, axis])	Shift the time index, using the index's frequency if available.
`tz_convert`(tz[, axis, level, copy])	Convert tz-aware axis to target time zone.
`tz_localize`(tz[, axis, level, copy, ...])	Localize tz-naive index of a Series or DataFrame to target time zone.
`unstack`([level, fill_value])	Pivot a level of the (necessarily hierarchical) index labels.
`update`(other[, join, overwrite, ...])	Modify in place using non-NA values from another DataFrame.
`value_counts`([subset, normalize, sort, ...])	Return a Series containing counts of unique rows in the DataFrame.
`var`([axis, skipna, level, ddof, numeric_only])	Return unbiased variance over requested axis.
`where`(cond[, other, inplace, axis, level, ...])	Replace values where the condition is False.
`xs`(key[, axis, level, drop_level])	Return cross-section from the Series/DataFrame.

abs()[source]

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns

abs

See also

numpy.absolute

Notes

For complex inputs, 1.2 + 1j, the absolute value is \(\sqrt{ a^2 + b^2 }\).

Examples

Absolute numeric values in a Series.

Copy
Copied!

            
            >>> s = pd.Series([-1.10, 2, -3.33, 4])
>>> s.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64

Absolute numeric values in a Series with complex numbers.

Copy
Copied!

            
            >>> s = pd.Series([1.2 + 1j])
>>> s.abs()
0    1.56205
dtype: float64

Absolute numeric values in a Series with a Timedelta element.

Copy
Copied!

            
            >>> s = pd.Series([pd.Timedelta('1 days')])
>>> s.abs()
0   1 days
dtype: timedelta64[ns]

Select rows with data closest to certain value using argsort (from StackOverflow).

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     'a': [4, 5, 6, 7],
...     'b': [10, 20, 30, 40],
...     'c': [100, 50, -30, -50]
... })
>>> df
a    b    c
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
>>> df.loc[(df.c - 43).abs().argsort()]
a    b    c
1    5   20   50
0    4   10  100
2    6   30  -30
3    7   40  -50

add(other, axis='columns', level=None, fill_value=None)[source]

Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

add_prefix(prefix)[source]

Prefix labels with string prefix.

For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.

Parameters

prefixstr

Returns

Series or DataFrame

See also

Series.add_suffix
DataFrame.add_suffix

Examples

Copy
Copied!

            
            >>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64

Copy
Copied!

            
            >>> s.add_prefix('item_')
item_0    1
item_1    2
item_2    3
item_3    4
dtype: int64

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
A  B
0  1  3
1  2  4
2  3  5
3  4  6

Copy
Copied!

            
            >>> df.add_prefix('col_')
col_A  col_B
0       1       3
1       2       4
2       3       5
3       4       6

add_suffix(suffix)[source]

Suffix labels with string suffix.

For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.

Parameters

suffixstr

Returns

Series or DataFrame

See also

Series.add_prefix
DataFrame.add_prefix

Examples

Copy
Copied!

            
            >>> s = pd.Series([1, 2, 3, 4])
>>> s
0    1
1    2
2    3
3    4
dtype: int64

Copy
Copied!

            
            >>> s.add_suffix('_item')
0_item    1
1_item    2
2_item    3
3_item    4
dtype: int64

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df
A  B
0  1  3
1  2  4
2  3  5
3  4  6

Copy
Copied!

            
            >>> df.add_suffix('_col')
A_col  B_col
0       1       3
1       2       4
2       3       5
3       4       6

agg(func=None, axis=0, *args, **kwargs)[source]

Aggregate using one or more operations over the specified axis.

Parameters

funcfunction, str, list or dict
axis{0 or ‘index’, 1 or ‘columns’}, default 0
*args
**kwargs

Returns

scalar, Series or DataFrame
The aggregation operations are always performed over an axis, either the
index (default) or the column axis. This behavior is different from
numpy aggregation functions (mean, median, prod, sum, std,
var), where the default is to compute the aggregation of the flattened
array, e.g., numpy.mean(arr_2d) as opposed to
numpy.mean(arr_2d, axis=0).
agg is an alias for aggregate. Use the alias.

See also

DataFrame.apply
DataFrame.transform
core.groupby.GroupBy
core.resample.Resampler
core.window.Rolling
core.window.Expanding
core.window.ExponentialMovingWindow

Notes

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

Copy
Copied!

            
            >>> df.agg(['sum', 'min'])
A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

Copy
Copied!

            
            >>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

Copy
Copied!

            
            >>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

Copy
Copied!

            
            >>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64

aggregate(func=None, axis=0, *args, **kwargs)[source]

Aggregate using one or more operations over the specified axis.

Parameters

funcfunction, str, list or dict
axis{0 or ‘index’, 1 or ‘columns’}, default 0
*args
**kwargs

Returns

scalar, Series or DataFrame
The aggregation operations are always performed over an axis, either the
index (default) or the column axis. This behavior is different from
numpy aggregation functions (mean, median, prod, sum, std,
var), where the default is to compute the aggregation of the flattened
array, e.g., numpy.mean(arr_2d) as opposed to
numpy.mean(arr_2d, axis=0).
agg is an alias for aggregate. Use the alias.

See also

DataFrame.apply
DataFrame.transform
core.groupby.GroupBy
core.resample.Resampler
core.window.Rolling
core.window.Expanding
core.window.ExponentialMovingWindow

Notes

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])

Aggregate these functions over the rows.

Copy
Copied!

            
            >>> df.agg(['sum', 'min'])
A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0

Different aggregations per column.

Copy
Copied!

            
            >>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0

Aggregate different functions over the columns and rename the index of the resulting DataFrame.

Copy
Copied!

            
            >>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0

Aggregate over the columns.

Copy
Copied!

            
            >>> df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64

align(other, join='outer', axis=None, level=None, copy=True, fill_value=None, method=None, limit=None, fill_axis=0, broadcast_axis=None)[source]

Align two objects on their axes with the specified join method.

Join method is specified for each axis Index.

Parameters

otherDataFrame or Series
join{‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’
axisallowed axis of the other object, default None
levelint or level name, default None
copybool, default True
fill_valuescalar, default np.NaN
method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
limitint, default None
fill_axis{0 or ‘index’, 1 or ‘columns’}, default 0
broadcast_axis{0 or ‘index’, 1 or ‘columns’}, default None

Returns

(left, right)(DataFrame, type of other)

all(axis=0, bool_only=None, skipna=True, level=None, **kwargs)[source]

Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
bool_onlybool, default None
skipnabool, default True
levelint or level name, default None
**kwargsany, default None

Returns

Series or DataFrame

See also

Series.all
DataFrame.any

Examples

Series

Copy
Copied!

            
            >>> pd.Series([True, True]).all()
True
>>> pd.Series([True, False]).all()
False
>>> pd.Series([], dtype="float64").all()
True
>>> pd.Series([np.nan]).all()
True
>>> pd.Series([np.nan]).all(skipna=False)
True

DataFrames

Create a dataframe from a dictionary.

Copy
Copied!

            
            >>> df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
>>> df
col1   col2
0  True   True
1  True  False

Default behaviour checks if column-wise values all return True.

Copy
Copied!

            
            >>> df.all()
col1     True
col2    False
dtype: bool

Specify axis='columns' to check if row-wise values all return True.

Copy
Copied!

            
            >>> df.all(axis='columns')
0     True
1    False
dtype: bool

Or axis=None for whether every value is True.

Copy
Copied!

            
            >>> df.all(axis=None)
False

any(axis=0, bool_only=None, skipna=True, level=None, **kwargs)[source]

Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
bool_onlybool, default None
skipnabool, default True
levelint or level name, default None
**kwargsany, default None

Returns

Series or DataFrame

See also

numpy.any
Series.any
Series.all
DataFrame.any
DataFrame.all

Examples

Series

For Series input, the output is a scalar indicating whether any element is True.

Copy
Copied!

            
            >>> pd.Series([False, False]).any()
False
>>> pd.Series([True, False]).any()
True
>>> pd.Series([], dtype="float64").any()
False
>>> pd.Series([np.nan]).any()
False
>>> pd.Series([np.nan]).any(skipna=False)
True

DataFrame

Whether each column contains at least one True element (the default).

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
>>> df
A  B  C
0  1  0  0
1  2  2  0

Copy
Copied!

            
            >>> df.any()
A     True
B     True
C    False
dtype: bool

Aggregating over the columns.

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
A  B
0   True  1
1  False  2

Copy
Copied!

            
            >>> df.any(axis='columns')
0    True
1    True
dtype: bool

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
>>> df
A  B
0   True  1
1  False  0

Copy
Copied!

            
            >>> df.any(axis='columns')
0    True
1    False
dtype: bool

Aggregating over the entire DataFrame with axis=None.

Copy
Copied!

            
            >>> df.any(axis=None)
True

any for an empty DataFrame is an empty Series.

Copy
Copied!

            
            >>> pd.DataFrame([]).any()
Series([], dtype: bool)

append(other, ignore_index=False, verify_integrity=False, sort=False)[source]

Append rows of other to the end of caller, returning a new object.

Columns in other that are not in the caller are added as new columns.

Parameters

otherDataFrame or Series/dict-like object, or list of these
ignore_indexbool, default False
verify_integritybool, default False
sortbool, default False

Returns

DataFrame

See also

concat

Notes

If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.

Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'), index=['x', 'y'])
>>> df
A  B
x  1  2
y  3  4
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'), index=['x', 'y'])
>>> df.append(df2)
A  B
x  1  2
y  3  4
x  5  6
y  7  8

With ignore_index set to True:

Copy
Copied!

            
            >>> df.append(df2, ignore_index=True)
A  B
0  1  2
1  3  4
2  5  6
3  7  8

The following, while not recommended methods for generating DataFrames, show two ways to generate a DataFrame from multiple data sources.

Less efficient:

Copy
Copied!

            
            >>> df = pd.DataFrame(columns=['A'])
>>> for i in range(5):
...     df = df.append({'A': i}, ignore_index=True)
>>> df
A
0  0
1  1
2  2
3  3
4  4

More efficient:

Copy
Copied!

            
            >>> pd.concat([pd.DataFrame([i], columns=['A']) for i in range(5)],
...           ignore_index=True)
A
0  0
1  1
2  2
3  3
4  4

apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)[source]

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

Parameters

funcfunction
axis{0 or ‘index’, 1 or ‘columns’}, default 0
rawbool, default False
result_type{‘expand’, ‘reduce’, ‘broadcast’, None}, default None
argstuple
**kwargs

Returns

Series or DataFrame

See also

DataFrame.applymap
DataFrame.aggregate
DataFrame.transform

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df
A  B
0  4  9
1  4  9
2  4  9

Using a numpy universal function (in this case the same as np.sqrt(df)):

Copy
Copied!

            
            >>> df.apply(np.sqrt)
A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0

Using a reducing function on either axis

Copy
Copied!

            
            >>> df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64

Copy
Copied!

            
            >>> df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64

Returning a list-like will result in a Series

Copy
Copied!

            
            >>> df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

Passing result_type='expand' will expand list-like results to columns of a Dataframe

Copy
Copied!

            
            >>> df.apply(lambda x: [1, 2], axis=1, result_type='expand')
0  1
0  1  2
1  1  2
2  1  2

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

Copy
Copied!

            
            >>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
foo  bar
0    1    2
1    1    2
2    1    2

Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

Copy
Copied!

            
            >>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
A  B
0  1  2
1  1  2
2  1  2

applymap(func, na_action=None, **kwargs)[source]

Apply a function to a Dataframe elementwise.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters

funccallable
na_action{None, ‘ignore’}, default None
**kwargs

Returns

DataFrame

See also

DataFrame.apply

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
0      1
0  1.000  2.120
1  3.356  4.567

Copy
Copied!

            
            >>> df.applymap(lambda x: len(str(x)))
0  1
0  3  4
1  5  5

Like Series.map, NA values can be ignored:

Copy
Copied!

            
            >>> df_copy = df.copy()
>>> df_copy.iloc[0, 0] = pd.NA
>>> df_copy.applymap(lambda x: len(str(x)), na_action='ignore')
0  1
0  <NA>  4
1     5  5

Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

Copy
Copied!

            
            >>> df.applymap(lambda x: x**2)
0          1
0   1.000000   4.494400
1  11.262736  20.857489

But it’s better to avoid applymap in that case.

Copy
Copied!

            
            >>> df ** 2
0          1
0   1.000000   4.494400
1  11.262736  20.857489

asfreq(freq, method=None, how=None, normalize=False, fill_value=None)[source]

Convert time series to specified frequency.

Returns the original data conformed to a new index with the specified frequency.

If the index of this DataFrame is a PeriodIndex, the new index is the result of transforming the original index with PeriodIndex.asfreq (so the original index will map one-to-one to the new index).

Otherwise, the new index will be equivalent to pd.date_range(start, end, freq=freq) where start and end are, respectively, the first and last entries in the original index (see pandas.date_range()). The values corresponding to any timesteps in the new index which were not present in the original index will be null (NaN), unless a method for filling such unknowns is provided (see the method parameter below).

The resample() method is more appropriate if an operation on each group of timesteps (such as an aggregate) is necessary to represent the data at the new frequency.

Parameters

freqDateOffset or str
method{‘backfill’/’bfill’, ‘pad’/’ffill’}, default None
how{‘start’, ‘end’}, default end
normalizebool, default False
fill_valuescalar, optional

Returns

DataFrame

See also

reindex

Notes

To learn more about the frequency strings, please see this link.

Examples

Start by creating a series with 4 one minute timestamps.

Copy
Copied!

            
            >>> index = pd.date_range('1/1/2000', periods=4, freq='T')
>>> series = pd.Series([0.0, None, 2.0, 3.0], index=index)
>>> df = pd.DataFrame({'s': series})
>>> df
s
2000-01-01 00:00:00    0.0
2000-01-01 00:01:00    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:03:00    3.0

Upsample the series into 30 second bins.

Copy
Copied!

            
            >>> df.asfreq(freq='30S')
s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    NaN
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    NaN
2000-01-01 00:03:00    3.0

Upsample again, providing a fill value.

Copy
Copied!

            
            >>> df.asfreq(freq='30S', fill_value=9.0)
s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    9.0
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    9.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    9.0
2000-01-01 00:03:00    3.0

Upsample again, providing a method.

Copy
Copied!

            
            >>> df.asfreq(freq='30S', method='bfill')
s
2000-01-01 00:00:00    0.0
2000-01-01 00:00:30    NaN
2000-01-01 00:01:00    NaN
2000-01-01 00:01:30    2.0
2000-01-01 00:02:00    2.0
2000-01-01 00:02:30    3.0
2000-01-01 00:03:00    3.0

asof(where, subset=None)[source]

Return the last row(s) without any NaNs before where.

The last row (for each element in where, if list) without any NaN is taken. In case of a DataFrame, the last row without NaN considering only the subset of columns (if not None)

If there is no good value, NaN is returned for a Series or a Series of NaN values for a DataFrame

Parameters

wheredate or array-like of dates
subsetstr or array-like of str, default None

Returns

scalar, Series, or DataFrame

See also

merge_asof

Notes

Dates are assumed to be sorted. Raises if this is not the case.

Examples

A Series and a scalar where.

Copy
Copied!

            
            >>> s = pd.Series([1, 2, np.nan, 4], index=[10, 20, 30, 40])
>>> s
10    1.0
20    2.0
30    NaN
40    4.0
dtype: float64

Copy
Copied!

            
            >>> s.asof(20)
2.0

For a sequence where, a Series is returned. The first value is NaN, because the first element of where is before the first index value.

Copy
Copied!

            
            >>> s.asof([5, 20])
5     NaN
20    2.0
dtype: float64

Missing values are not considered. The following is 2.0, not NaN, even though NaN is at the index location for 30.

Copy
Copied!

            
            >>> s.asof(30)
2.0

Take all columns into consideration

Copy
Copied!

            
            >>> df = pd.DataFrame({'a': [10, 20, 30, 40, 50],
...                    'b': [None, None, None, None, 500]},
...                   index=pd.DatetimeIndex(['2018-02-27 09:01:00',
...                                           '2018-02-27 09:02:00',
...                                           '2018-02-27 09:03:00',
...                                           '2018-02-27 09:04:00',
...                                           '2018-02-27 09:05:00']))
>>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']))
a   b
2018-02-27 09:03:30 NaN NaN
2018-02-27 09:04:30 NaN NaN

Take a single column into consideration

Copy
Copied!

            
            >>> df.asof(pd.DatetimeIndex(['2018-02-27 09:03:30',
...                           '2018-02-27 09:04:30']),
...         subset=['a'])
a   b
2018-02-27 09:03:30   30.0 NaN
2018-02-27 09:04:30   40.0 NaN

assign(**kwargs)[source]

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters

**kwargsdict of {str: callable or Series}

Returns

DataFrame

Notes

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
...                   index=['Portland', 'Berkeley'])
>>> df
temp_c
Portland    17.0
Berkeley    25.0

Where the value is a callable, evaluated on df:

Copy
Copied!

            
            >>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

Copy
Copied!

            
            >>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

Copy
Copied!

            
            >>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
...           temp_k=lambda x: (x['temp_f'] +  459.67) * 5 / 9)
temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15

astype(dtype, copy=True, errors='raise')[source]

Cast a pandas object to a specified dtype dtype.

Parameters

dtypedata type, or dict of column name -> data type
copybool, default True
errors{‘raise’, ‘ignore’}, default ‘raise’

Returns

castedsame type as caller

See also

to_datetime
to_timedelta
to_numeric
numpy.ndarray.astype

Notes

Deprecated since version 1.3.0:Using astype to convert from timezone-naive dtype to timezone-aware dtype is deprecated and will raise in a future version. Use Series.dt.tz_localize() instead.

Examples

Create a DataFrame:

Copy
Copied!

            
            >>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object

Cast all columns to int32:

Copy
Copied!

            
            >>> df.astype('int32').dtypes
col1    int32
col2    int32
dtype: object

Cast col1 to int32 using a dictionary:

Copy
Copied!

            
            >>> df.astype({'col1': 'int32'}).dtypes
col1    int32
col2    int64
dtype: object

Create a series:

Copy
Copied!

            
            >>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype('int64')
0    1
1    2
dtype: int64

Convert to categorical type:

Copy
Copied!

            
            >>> ser.astype('category')
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]

Convert to ordered categorical type with custom ordering:

Copy
Copied!

            
            >>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(
...     categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

Note that using copy=False and changing data on a new pandas object may propagate changes:

Copy
Copied!

            
            >>> s1 = pd.Series([1, 2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1  # note that s1[0] has changed too
0    10
1     2
dtype: int64

Create a series of dates:

Copy
Copied!

            
            >>> ser_date = pd.Series(pd.date_range('20200101', periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[ns]

property at: pandas.core.indexing._AtIndexer

Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

Raises

KeyError

See also

DataFrame.iat
DataFrame.loc
Series.at

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df
A   B   C
4   0   2   3
5   0   4   1
6  10  20  30

Get value at specified row/column pair

Copy
Copied!

            
            >>> df.at[4, 'B']
2

Set value at specified row/column pair

Copy
Copied!

            
            >>> df.at[4, 'B'] = 10
>>> df.at[4, 'B']
10

Get value within a Series

Copy
Copied!

            
            >>> df.loc[5].at['B']
4

at_time(time, asof=False, axis=None)[source]

Select values at particular time of day (e.g., 9:30AM).

Parameters

timedatetime.time or str
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Returns

Series or DataFrame

Raises

TypeError

See also

between_time
first
last
DatetimeIndex.indexer_at_time

Examples

Copy
Copied!

            
            >>> i = pd.date_range('2018-04-09', periods=4, freq='12H')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 00:00:00  1
2018-04-09 12:00:00  2
2018-04-10 00:00:00  3
2018-04-10 12:00:00  4

Copy
Copied!

            
            >>> ts.at_time('12:00')
A
2018-04-09 12:00:00  2
2018-04-10 12:00:00  4

property attrs: dict[Hashable, Any]

Dictionary of global attributes of this dataset.

Warning

attrs is experimental and may change without warning.

See also

DataFrame.flags

property axes: list[Index]

backfill(axis=None, inplace=False, limit=None, downcast=None)[source]

Synonym for DataFrame.fillna() with method='bfill'.

Returns

Series/DataFrame or None

between_time(start_time, end_time, include_start=True, include_end=True, axis=None)[source]

Select values between particular times of the day (e.g., 9:00-9:30 AM).

By setting start_time to be later than end_time, you can get the times that are not between the two times.

Parameters

start_timedatetime.time or str
end_timedatetime.time or str
include_startbool, default True
include_endbool, default True
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Returns

Series or DataFrame

Raises

TypeError

See also

at_time
first
last
DatetimeIndex.indexer_between_time

Examples

Copy
Copied!

            
            >>> i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4

Copy
Copied!

            
            >>> ts.between_time('0:15', '0:45')
A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are not between two times by setting start_time later than end_time:

Copy
Copied!

            
            >>> ts.between_time('0:45', '0:15')
A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4

bfill(axis=None, inplace=False, limit=None, downcast=None)[source]

Synonym for DataFrame.fillna() with method='bfill'.

Returns

Series/DataFrame or None

bool()[source]

Return the bool of a single element Series or DataFrame.

This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).

Returns

bool

See also

Series.astype
DataFrame.astype
numpy.bool_

Examples

The method will only work for single element objects with a boolean value:

Copy
Copied!

            
            >>> pd.Series([True]).bool()
True
>>> pd.Series([False]).bool()
False

Copy
Copied!

            
            >>> pd.DataFrame({'col': [True]}).bool()
True
>>> pd.DataFrame({'col': [False]}).bool()
False

boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)[source]

Make a box plot from DataFrame columns.

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

For further details see Wikipedia’s entry for boxplot.

Parameters

columnstr or list of str, optional
bystr or array-like, optional
axobject of class matplotlib.axes.Axes, optional
fontsizefloat or str
rotint or float, default 0
gridbool, default True
figsizeA tuple (width, height) in inches
layouttuple (rows, columns), optional
return_type{‘axes’, ‘dict’, ‘both’} or None, default ‘axes’
backendstr, default None
**kwargs

Returns

result

See also

Series.plot.hist
matplotlib.pyplot.boxplot

Notes

The return type depends on the return_type parameter:

‘axes’ : object of class matplotlib.axes.Axes
‘dict’ : dict of matplotlib.lines.Line2D objects
‘both’ : a namedtuple with structure (ax, lines)

For data grouped with by, return a Series of the above or a numpy array:

Series
array (for return_type = None)

Use return_type='dict' when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.

Examples

Boxplots can be created for every column in the dataframe by df.boxplot() or indicating the columns to be used:

Boxplots of variables distributions grouped by the values of a third variable can be created using the option by. For instance:

A list of strings (i.e. ['X', 'Y']) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:

The layout of boxplot can be adjusted giving a tuple to layout:

Additional formatting can be done to the boxplot, like suppressing the grid (grid=False), rotating the labels in the x-axis (i.e. rot=45) or changing the fontsize (i.e. fontsize=15):

The parameter return_type can be used to select the type of element returned by boxplot. When return_type='axes' is selected, the matplotlib axes on which the boxplot is drawn are returned:

Copy
Copied!

            
            >>> boxplot = df.boxplot(column=['Col1', 'Col2'], return_type='axes')
>>> type(boxplot)
<class 'matplotlib.axes._subplots.AxesSubplot'>

When grouping with by, a Series mapping columns to return_type is returned:

Copy
Copied!

            
            >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type='axes')
>>> type(boxplot)
<class 'pandas.core.series.Series'>

If return_type is None, a NumPy array of axes with the same shape as layout is returned:

Copy
Copied!

            
            >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X',
...                      return_type=None)
>>> type(boxplot)
<class 'numpy.ndarray'>

clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)[source]

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis.

Parameters

lowerfloat or array-like, default None
upperfloat or array-like, default None
axisint or str axis name, optional
inplacebool, default False
*args, **kwargs

Returns

Series or DataFrame or None

See also

Series.clip
DataFrame.clip
numpy.clip

Examples

Copy
Copied!

            
            >>> data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
>>> df = pd.DataFrame(data)
>>> df
col_0  col_1
0      9     -2
1     -3     -7
2      0      6
3     -1      8
4      5     -5

Clips per column using lower and upper thresholds:

Copy
Copied!

            
            >>> df.clip(-4, 6)
col_0  col_1
0      6     -2
1     -3     -4
2      0      6
3     -1      6
4      5     -4

Clips using specific lower and upper thresholds per column element:

Copy
Copied!

            
            >>> t = pd.Series([2, -4, -1, 6, 3])
>>> t
0    2
1   -4
2   -1
3    6
4    3
dtype: int64

Copy
Copied!

            
            >>> df.clip(t, t + 4, axis=0)
col_0  col_1
0      6      2
1     -3     -4
2      0      3
3      6      8
4      5      3

Clips using specific lower threshold per column element, with missing values:

Copy
Copied!

            
            >>> t = pd.Series([2, -4, np.NaN, 6, 3])
>>> t
0    2.0
1   -4.0
2    NaN
3    6.0
4    3.0
dtype: float64

Copy
Copied!

            
            >>> df.clip(t, axis=0)
col_0  col_1
0      9      2
1     -3     -4
2      0      6
3      6      8
4      5      3

columns: Index

combine(other, func, fill_value=None, overwrite=True)[source]

Perform column-wise combine with another DataFrame.

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

Parameters

otherDataFrame
funcfunction
fill_valuescalar value, default None
overwritebool, default True

Returns

DataFrame

See also

DataFrame.combine_first

Examples

Combine using a simple function that chooses the smaller column.

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller)
A  B
0  0  3
1  0  3

Example using a true element-wise combine function.

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, np.minimum)
A  B
0  1  2
1  0  3

Using fill_value fills Nones prior to passing the column to the merge function.

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
A    B
0  0 -5.0
1  0  4.0

However, if the same element in both dataframes is None, that None is preserved

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5)
A    B
0  0 -5.0
1  0  3.0

Example that demonstrates the use of overwrite and behavior when the axis differ between the dataframes.

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
>>> df1.combine(df2, take_smaller)
A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0

Copy
Copied!

            
            >>> df1.combine(df2, take_smaller, overwrite=False)
A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0

Demonstrating the preference of the passed in dataframe.

Copy
Copied!

            
            >>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
>>> df2.combine(df1, take_smaller)
A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN

Copy
Copied!

            
            >>> df2.combine(df1, take_smaller, overwrite=False)
A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0

combine_first(other)[source]

Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two.

Parameters

otherDataFrame

Returns

DataFrame

See also

DataFrame.combine

Examples

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2)
A    B
0  1.0  3.0
1  0.0  4.0

Null values still persist if the location of that null value does not exist in other

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2)
A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0

compare(other, align_axis=1, keep_shape=False, keep_equal=False)[source]

Compare to another DataFrame and show the differences.

New in version 1.1.0.

Parameters

otherDataFrame

Object to compare with.

align_axis{0 or ‘index’, 1 or ‘columns’}, default 1

Determine which axis to align the comparison on.

0, or ‘index’Resulting differences are stacked vertically
with rows drawn alternately from self and other.
1, or ‘columns’Resulting differences are aligned horizontally
with columns drawn alternately from self and other.

keep_shapebool, default False

If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.

keep_equalbool, default False

If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.

Returns

DataFrame

Raises

ValueError

See also

Series.compare
DataFrame.equals

Notes

Matching NaNs will not appear as a difference.

Can only compare identically-labeled (i.e. same shape, identical row and column labels) DataFrames

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(
...     {
...         "col1": ["a", "a", "b", "b", "a"],
...         "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
...         "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
...     },
...     columns=["col1", "col2", "col3"],
... )
>>> df
col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

Copy
Copied!

            
            >>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2
col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

Align the differences on columns

Copy
Copied!

            
            >>> df.compare(df2)
col1       col3
self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

Stack the differences on rows

Copy
Copied!

            
            >>> df.compare(df2, align_axis=0)
col1  col3
0 self     a   NaN
other    c   NaN
2 self   NaN   3.0
other  NaN   4.0

Keep the equal values

Copy
Copied!

            
            >>> df.compare(df2, keep_equal=True)
col1       col3
self other self other
0    a     c  1.0   1.0
2    b     b  3.0   4.0

Keep all original rows and columns

Copy
Copied!

            
            >>> df.compare(df2, keep_shape=True)
col1       col2       col3
self other self other self other
0    a     c  NaN   NaN  NaN   NaN
1  NaN   NaN  NaN   NaN  NaN   NaN
2  NaN   NaN  NaN   NaN  3.0   4.0
3  NaN   NaN  NaN   NaN  NaN   NaN
4  NaN   NaN  NaN   NaN  NaN   NaN

Keep all original rows and columns and also all original values

Copy
Copied!

            
            >>> df.compare(df2, keep_shape=True, keep_equal=True)
col1       col2       col3
self other self other self other
0    a     c  1.0   1.0  1.0   1.0
1    a     a  2.0   2.0  2.0   2.0
2    b     b  3.0   3.0  3.0   4.0
3    b     b  NaN   NaN  4.0   4.0
4    a     a  5.0   5.0  5.0   5.0

convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True)[source]

Convert columns to best possible dtypes using dtypes supporting pd.NA.

New in version 1.0.0.

Parameters

infer_objectsbool, default True
convert_stringbool, default True
convert_integerbool, default True
convert_booleanbool, defaults True
convert_floatingbool, defaults True

Returns

Series or DataFrame

See also

infer_objects
to_datetime
to_timedelta
to_numeric

Notes

By default, convert_dtypes will attempt to convert a Series (or each Series in a DataFrame) to dtypes that support pd.NA. By using the options convert_string, convert_integer, convert_boolean and convert_boolean, it is possible to turn off individual conversions to StringDtype, the integer extension types, BooleanDtype or floating extension types, respectively.

For object-dtyped columns, if infer_objects is True, use the inference rules as during normal Series/DataFrame construction. Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer or floating extension type, otherwise leave as object.

If the dtype is integer, convert to an appropriate integer extension type.

If the dtype is numeric, and consists of all integers, convert to an appropriate integer extension type. Otherwise, convert to an appropriate floating extension type.

Changed in version 1.2:Starting with pandas 1.2, this method also converts float columns to the nullable floating extension type.

In the future, as new dtypes are added that support pd.NA, the results of this method will change to support those new dtypes.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(
...     {
...         "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
...         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
...         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
...         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
...         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
...         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
...     }
... )

Start with a DataFrame with default dtypes.

Copy
Copied!

            
            >>> df
a  b      c    d     e      f
0  1  x   True    h  10.0    NaN
1  2  y  False    i   NaN  100.5
2  3  z    NaN  NaN  20.0  200.0

Copy
Copied!

            
            >>> df.dtypes
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

Convert the DataFrame to use best possible dtypes.

Copy
Copied!

            
            >>> dfn = df.convert_dtypes()
>>> dfn
a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0

Copy
Copied!

            
            >>> dfn.dtypes
a      Int32
b     string
c    boolean
d     string
e      Int64
f    Float64
dtype: object

Start with a Series of strings and missing data represented by np.nan.

Copy
Copied!

            
            >>> s = pd.Series(["a", "b", np.nan])
>>> s
0      a
1      b
2    NaN
dtype: object

Obtain a Series with dtype StringDtype.

Copy
Copied!

            
            >>> s.convert_dtypes()
0       a
1       b
2    <NA>
dtype: string

copy(deep=True)[source]

Make a copy of this object’s indices and data.

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Parameters

deepbool, default True

Returns

copySeries or DataFrame

Notes

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).

While Index objects are copied when deep=True, the underlying numpy array is not copied for performance reasons. Since Index is immutable, the underlying data can be safely shared and a copy is not needed.

Examples

Copy
Copied!

            
            >>> s = pd.Series([1, 2], index=["a", "b"])
>>> s
a    1
b    2
dtype: int64

Copy
Copied!

            
            >>> s_copy = s.copy()
>>> s_copy
a    1
b    2
dtype: int64

Shallow copy versus default (deep) copy:

Copy
Copied!

            
            >>> s = pd.Series([1, 2], index=["a", "b"])
>>> deep = s.copy()
>>> shallow = s.copy(deep=False)

Shallow copy shares data and index with original.

Copy
Copied!

            
            >>> s is shallow
False
>>> s.values is shallow.values and s.index is shallow.index
True

Deep copy has own copy of data and index.

Copy
Copied!

            
            >>> s is deep
False
>>> s.values is deep.values or s.index is deep.index
False

Updates to the data shared by shallow copy and original is reflected in both; deep copy remains unchanged.

Copy
Copied!

            
            >>> s[0] = 3
>>> shallow[1] = 4
>>> s
a    3
b    4
dtype: int64
>>> shallow
a    3
b    4
dtype: int64
>>> deep
a    1
b    2
dtype: int64

Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy.

Copy
Copied!

            
            >>> s = pd.Series([[1, 2], [3, 4]])
>>> deep = s.copy()
>>> s[0][0] = 10
>>> s
0    [10, 2]
1     [3, 4]
dtype: object
>>> deep
0    [10, 2]
1     [3, 4]
dtype: object

corr(method='pearson', min_periods=1)[source]

Compute pairwise correlation of columns, excluding NA/null values.

Parameters

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: callable with input two 1d ndarrays
and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional

Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.

Returns

DataFrame

See also

DataFrame.corrwith
Series.corr

Examples

Copy
Copied!

            
            >>> def histogram_intersection(a, b):
...     v = np.minimum(a, b).sum().round(decimals=1)
...     return v
>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
...                   columns=['dogs', 'cats'])
>>> df.corr(method=histogram_intersection)
dogs  cats
dogs   1.0   0.3
cats   0.3   1.0

corrwith(other, axis=0, drop=False, method='pearson')[source]

Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

Parameters

otherDataFrame, Series

Object with which to compute correlations.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

The axis to use. 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise.

dropbool, default False

Drop missing indices from result.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable

Method of correlation:

pearson : standard correlation coefficient
kendall : Kendall Tau correlation coefficient
spearman : Spearman rank correlation
callable: callable with input two 1d ndarrays
and returning a float.

Returns

Series

See also

DataFrame.corr

count(axis=0, level=None, numeric_only=False)[source]

Count non-NA cells for each column or row.

The values None, NaN, NaT, and optionally numpy.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
levelint or str, optional
numeric_onlybool, default False

Returns

Series or DataFrame

See also

Series.count
DataFrame.value_counts
DataFrame.shape
DataFrame.isna

Examples

Constructing DataFrame from a dictionary:

Copy
Copied!

            
            >>> df = pd.DataFrame({"Person":
...                    ["John", "Myla", "Lewis", "John", "Myla"],
...                    "Age": [24., np.nan, 21., 33, 26],
...                    "Single": [False, True, True, True, False]})
>>> df
Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2   Lewis  21.0    True
3    John  33.0    True
4    Myla  26.0   False

Notice the uncounted NA values:

Copy
Copied!

            
            >>> df.count()
Person    5
Age       4
Single    5
dtype: int64

Counts for each row:

Copy
Copied!

            
            >>> df.count(axis='columns')
0    3
1    2
2    3
3    3
4    3
dtype: int64

cov(min_periods=None, ddof=1)[source]

Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters

min_periodsint, optional
ddofint, default 1

Returns

DataFrame

See also

Series.cov
core.window.ExponentialMovingWindow.cov
core.window.Expanding.cov
core.window.Rolling.cov

Notes

Returns the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-ddof.

For DataFrames that have Series that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
...                   columns=['dogs', 'cats'])
>>> df.cov()
dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667

Copy
Copied!

            
            >>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(1000, 5),
...                   columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov()
a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795

Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

Copy
Copied!

            
            >>> np.random.seed(42)
>>> df = pd.DataFrame(np.random.randn(20, 3),
...                   columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = np.nan
>>> df.loc[df.index[5:10], 'b'] = np.nan
>>> df.cov(min_periods=12)
a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202

cummax(axis=None, skipna=True, *args, **kwargs)[source]

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
skipnabool, default True
*args, **kwargs

Returns

Series or DataFrame

See also

core.window.Expanding.max
DataFrame.max
DataFrame.cummax
DataFrame.cummin
DataFrame.cumsum
DataFrame.cumprod

Examples

Series

Copy
Copied!

            
            >>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

Copy
Copied!

            
            >>> s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64

To include NA values in the operation, use skipna=False

Copy
Copied!

            
            >>> s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

Copy
Copied!

            
            >>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

Copy
Copied!

            
            >>> df.cummax()
A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row, use axis=1

Copy
Copied!

            
            >>> df.cummax(axis=1)
A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0

cummin(axis=None, skipna=True, *args, **kwargs)[source]

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
skipnabool, default True
*args, **kwargs

Returns

Series or DataFrame

See also

core.window.Expanding.min
DataFrame.min
DataFrame.cummax
DataFrame.cummin
DataFrame.cumsum
DataFrame.cumprod

Examples

Series

Copy
Copied!

            
            >>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

Copy
Copied!

            
            >>> s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64

To include NA values in the operation, use skipna=False

Copy
Copied!

            
            >>> s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

Copy
Copied!

            
            >>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

Copy
Copied!

            
            >>> df.cummin()
A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row, use axis=1

Copy
Copied!

            
            >>> df.cummin(axis=1)
A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

cumprod(axis=None, skipna=True, *args, **kwargs)[source]

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
skipnabool, default True
*args, **kwargs

Returns

Series or DataFrame

See also

core.window.Expanding.prod
DataFrame.prod
DataFrame.cummax
DataFrame.cummin
DataFrame.cumsum
DataFrame.cumprod

Examples

Series

Copy
Copied!

            
            >>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

Copy
Copied!

            
            >>> s.cumprod()
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64

To include NA values in the operation, use skipna=False

Copy
Copied!

            
            >>> s.cumprod(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

Copy
Copied!

            
            >>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

Copy
Copied!

            
            >>> df.cumprod()
A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row, use axis=1

Copy
Copied!

            
            >>> df.cumprod(axis=1)
A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0

cumsum(axis=None, skipna=True, *args, **kwargs)[source]

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
skipnabool, default True
*args, **kwargs

Returns

Series or DataFrame

See also

core.window.Expanding.sum
DataFrame.sum
DataFrame.cummax
DataFrame.cummin
DataFrame.cumsum
DataFrame.cumprod

Examples

Series

Copy
Copied!

            
            >>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

Copy
Copied!

            
            >>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

To include NA values in the operation, use skipna=False

Copy
Copied!

            
            >>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

DataFrame

Copy
Copied!

            
            >>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

Copy
Copied!

            
            >>> df.cumsum()
A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row, use axis=1

Copy
Copied!

            
            >>> df.cumsum(axis=1)
A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0

describe(percentiles=None, include=None, exclude=None, datetime_is_numeric=False)[source]

Generate descriptive statistics.

Descriptive statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

Parameters

percentileslist-like of numbers, optional
include‘all’, list-like of dtypes or None (default), optional
excludelist-like of dtypes or None (default), optional,
datetime_is_numericbool, default False

Returns

Series or DataFrame

See also

DataFrame.count
DataFrame.max
DataFrame.min
DataFrame.mean
DataFrame.std
DataFrame.select_dtypes

Notes

For numeric data, the result’s index will include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.

If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

Examples

Describing a numeric Series.

Copy
Copied!

            
            >>> s = pd.Series([1, 2, 3])
>>> s.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

Describing a categorical Series.

Copy
Copied!

            
            >>> s = pd.Series(['a', 'a', 'b', 'c'])
>>> s.describe()
count     4
unique    3
top       a
freq      2
dtype: object

Describing a timestamp Series.

Copy
Copied!

            
            >>> s = pd.Series([
...   np.datetime64("2000-01-01"),
...   np.datetime64("2010-01-01"),
...   np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)
count                      3
mean     2006-09-01 08:00:00
min      2000-01-01 00:00:00
25%      2004-12-31 12:00:00
50%      2010-01-01 00:00:00
75%      2010-01-01 00:00:00
max      2010-01-01 00:00:00
dtype: object

Describing a DataFrame. By default only numeric fields are returned.

Copy
Copied!

            
            >>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),
...                    'numeric': [1, 2, 3],
...                    'object': ['a', 'b', 'c']
...                   })
>>> df.describe()
numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Describing all columns of a DataFrame regardless of data type.

Copy
Copied!

            
            >>> df.describe(include='all')  
categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              f      NaN      a
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN

Describing a column from a DataFrame by accessing it as an attribute.

Copy
Copied!

            
            >>> df.numeric.describe()
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

Copy
Copied!

            
            >>> df.describe(include=[np.number])
numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

Including only string columns in a DataFrame description.

Copy
Copied!

            
            >>> df.describe(include=[object])  
object
count       3
unique      3
top         a
freq        1

Including only categorical columns from a DataFrame description.

Copy
Copied!

            
            >>> df.describe(include=['category'])
categorical
count            3
unique           3
top              d
freq             1

Excluding numeric columns from a DataFrame description.

Copy
Copied!

            
            >>> df.describe(exclude=[np.number])  
categorical object
count            3      3
unique           3      3
top              f      a
freq             1      1

Excluding object columns from a DataFrame description.

Copy
Copied!

            
            >>> df.describe(exclude=[object])  
categorical  numeric
count            3      3.0
unique           3      NaN
top              f      NaN
freq             1      NaN
mean           NaN      2.0
std            NaN      1.0
min            NaN      1.0
25%            NaN      1.5
50%            NaN      2.0
75%            NaN      2.5
max            NaN      3.0

diff(periods=1, axis=0)[source]

First discrete difference of element.

Calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row).

Parameters

periodsint, default 1
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Returns

Dataframe

See also

Dataframe.pct_change
Dataframe.shift
Series.diff

Notes

For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in Dataframe, however dtype of the result is always float64.

Examples

Difference with previous row

Copy
Copied!

            
            >>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
...                    'b': [1, 1, 2, 3, 5, 8],
...                    'c': [1, 4, 9, 16, 25, 36]})
>>> df
a  b   c
0  1  1   1
1  2  1   4
2  3  2   9
3  4  3  16
4  5  5  25
5  6  8  36

Copy
Copied!

            
            >>> df.diff()
a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0

Difference with previous column

Copy
Copied!

            
            >>> df.diff(axis=1)
a  b   c
0 NaN  0   0
1 NaN -1   3
2 NaN -1   7
3 NaN -1  13
4 NaN  0  20
5 NaN  2  28

Difference with 3rd previous row

Copy
Copied!

            
            >>> df.diff(periods=3)
a    b     c
0  NaN  NaN   NaN
1  NaN  NaN   NaN
2  NaN  NaN   NaN
3  3.0  2.0  15.0
4  3.0  4.0  21.0
5  3.0  6.0  27.0

Difference with following row

Copy
Copied!

            
            >>> df.diff(periods=-1)
a    b     c
0 -1.0  0.0  -3.0
1 -1.0 -1.0  -5.0
2 -1.0 -1.0  -7.0
3 -1.0 -2.0  -9.0
4 -1.0 -3.0 -11.0
5  NaN  NaN   NaN

Overflow in input dtype

Copy
Copied!

            
            >>> df = pd.DataFrame({'a': [1, 0]}, dtype=np.uint8)
>>> df.diff()
a
0    NaN
1  255.0

div(other, axis='columns', level=None, fill_value=None)[source]

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

divide(other, axis='columns', level=None, fill_value=None)[source]

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

dot(other)[source]

Compute the matrix multiplication between the DataFrame and other.

This method computes the matrix product between the DataFrame and the values of an other Series, DataFrame or a numpy array.

It can also be called using self @ other in Python >= 3.5.

Parameters

otherSeries, DataFrame or array-like

Returns

Series or DataFrame

See also

Series.dot

Notes

The dimensions of DataFrame and other must be compatible in order to compute the matrix multiplication. In addition, the column names of DataFrame and the index of other must contain the same values, as they will be aligned prior to the multiplication.

The dot method for Series computes the inner product, instead of the matrix product here.

Examples

Here we multiply a DataFrame with a Series.

Copy
Copied!

            
            >>> df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> s = pd.Series([1, 1, 2, 1])
>>> df.dot(s)
0    -4
1     5
dtype: int64

Here we multiply a DataFrame with another DataFrame.

Copy
Copied!

            
            >>> other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(other)
0   1
0   1   4
1   2   2

Note that the dot method give the same result as @

Copy
Copied!

            
            >>> df @ other
0   1
0   1   4
1   2   2

The dot method works also if other is an np.array.

Copy
Copied!

            
            >>> arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(arr)
0   1
0   1   4
1   2   2

Note how shuffling of the objects does not change the result.

Copy
Copied!

            
            >>> s2 = s.reindex([1, 0, 2, 3])
>>> df.dot(s2)
0    -4
1     5
dtype: int64

drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')[source]

Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide for more information about the now unused levels.

Parameters

labelssingle label or list-like
axis{0 or ‘index’, 1 or ‘columns’}, default 0
indexsingle label or list-like
columnssingle label or list-like
levelint or level name, optional
inplacebool, default False
errors{‘ignore’, ‘raise’}, default ‘raise’

Returns

DataFrame or None

Raises

KeyError

See also

DataFrame.loc
DataFrame.dropna
DataFrame.drop_duplicates
Series.drop

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(np.arange(12).reshape(3, 4),
...                   columns=['A', 'B', 'C', 'D'])
>>> df
A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Drop columns

Copy
Copied!

            
            >>> df.drop(['B', 'C'], axis=1)
A   D
0  0   3
1  4   7
2  8  11

Copy
Copied!

            
            >>> df.drop(columns=['B', 'C'])
A   D
0  0   3
1  4   7
2  8  11

Drop a row by index

Copy
Copied!

            
            >>> df.drop([0, 1])
A  B   C   D
2  8  9  10  11

Drop columns and/or rows of MultiIndex DataFrame

Copy
Copied!

            
            >>> midx = pd.MultiIndex(levels=[['lama', 'cow', 'falcon'],
...                              ['speed', 'weight', 'length']],
...                      codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                             [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = pd.DataFrame(index=midx, columns=['big', 'small'],
...                   data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
...                         [250, 150], [1.5, 0.8], [320, 250],
...                         [1, 0.8], [0.3, 0.2]])
>>> df
big     small
lama    speed   45.0    30.0
weight  200.0   100.0
length  1.5     1.0
cow     speed   30.0    20.0
weight  250.0   150.0
length  1.5     0.8
falcon  speed   320.0   250.0
weight  1.0     0.8
length  0.3     0.2

Copy
Copied!

            
            >>> df.drop(index='cow', columns='small')
big
lama    speed   45.0
weight  200.0
length  1.5
falcon  speed   320.0
weight  1.0
length  0.3

Copy
Copied!

            
            >>> df.drop(index='length', level=1)
big     small
lama    speed   45.0    30.0
weight  200.0   100.0
cow     speed   30.0    20.0
weight  250.0   150.0
falcon  speed   320.0   250.0
weight  1.0     0.8

drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False)[source]

Return DataFrame with duplicate rows removed.

Considering certain columns is optional. Indexes, including time indexes are ignored.

Parameters

subsetcolumn label or sequence of labels, optional
keep{‘first’, ‘last’, False}, default ‘first’
inplacebool, default False
ignore_indexbool, default False

Returns

DataFrame or None

See also

DataFrame.value_counts

Examples

Consider dataset containing ramen rating.

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, it removes duplicate rows based on all columns.

Copy
Copied!

            
            >>> df.drop_duplicates()
brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

To remove duplicates on specific column(s), use subset.

Copy
Copied!

            
            >>> df.drop_duplicates(subset=['brand'])
brand style  rating
0  Yum Yum   cup     4.0
2  Indomie   cup     3.5

To remove duplicates and keep last occurrences, use keep.

Copy
Copied!

            
            >>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
brand style  rating
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
4  Indomie  pack     5.0

droplevel(level, axis=0)[source]

Return Series/DataFrame with requested index / column level(s) removed.

Parameters

levelint, str, or list-like
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Returns

Series/DataFrame

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])

Copy
Copied!

            
            >>> df.columns = pd.MultiIndex.from_tuples([
...     ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])

Copy
Copied!

            
            >>> df
level_1   c   d
level_2   e   f
a b
1 2      3   4
5 6      7   8
9 10    11  12

Copy
Copied!

            
            >>> df.droplevel('a')
level_1   c   d
level_2   e   f
b
2        3   4
6        7   8
10      11  12

Copy
Copied!

            
            >>> df.droplevel('level_2', axis=1)
level_1   c   d
a b
1 2      3   4
5 6      7   8
9 10    11  12

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)[source]

Remove missing values.

See the User Guide for more on which values are considered missing, and how to work with missing data.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
how{‘any’, ‘all’}, default ‘any’
threshint, optional
subsetarray-like, optional
inplacebool, default False

Returns

DataFrame or None

See also

DataFrame.isna
DataFrame.notna
DataFrame.fillna
Series.dropna
Index.dropna

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
...                    "toy": [np.nan, 'Batmobile', 'Bullwhip'],
...                    "born": [pd.NaT, pd.Timestamp("1940-04-25"),
...                             pd.NaT]})
>>> df
name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Drop the rows where at least one element is missing.

Copy
Copied!

            
            >>> df.dropna()
name        toy       born
1  Batman  Batmobile 1940-04-25

Drop the columns where at least one element is missing.

Copy
Copied!

            
            >>> df.dropna(axis='columns')
name
0    Alfred
1    Batman
2  Catwoman

Drop the rows where all elements are missing.

Copy
Copied!

            
            >>> df.dropna(how='all')
name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep only the rows with at least 2 non-NA values.

Copy
Copied!

            
            >>> df.dropna(thresh=2)
name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Define in which columns to look for missing values.

Copy
Copied!

            
            >>> df.dropna(subset=['name', 'toy'])
name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

Keep the DataFrame with valid entries in the same variable.

Copy
Copied!

            
            >>> df.dropna(inplace=True)
>>> df
name        toy       born
1  Batman  Batmobile 1940-04-25

property dtypes

Return the dtypes in the DataFrame.

This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns. Columns with mixed types are stored with the object dtype. See the User Guide for more.

Returns

pandas.Series

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'float': [1.0],
...                    'int': [1],
...                    'datetime': [pd.Timestamp('20180310')],
...                    'string': ['foo']})
>>> df.dtypes
float              float64
int                  int64
datetime    datetime64[ns]
string              object
dtype: object

duplicated(subset=None, keep='first')[source]

Return boolean Series denoting duplicate rows.

Considering certain columns is optional.

Parameters

subsetcolumn label or sequence of labels, optional
keep{‘first’, ‘last’, False}, default ‘first’

Returns

Series

See also

Index.duplicated
Series.duplicated
Series.drop_duplicates
DataFrame.drop_duplicates

Examples

Consider dataset containing ramen rating.

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
...     'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
...     'rating': [4, 4, 3.5, 15, 5]
... })
>>> df
brand style  rating
0  Yum Yum   cup     4.0
1  Yum Yum   cup     4.0
2  Indomie   cup     3.5
3  Indomie  pack    15.0
4  Indomie  pack     5.0

By default, for each set of duplicated values, the first occurrence is set on False and all others on True.

Copy
Copied!

            
            >>> df.duplicated()
0    False
1     True
2    False
3    False
4    False
dtype: bool

By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True.

Copy
Copied!

            
            >>> df.duplicated(keep='last')
0     True
1    False
2    False
3    False
4    False
dtype: bool

By setting keep on False, all duplicates are True.

Copy
Copied!

            
            >>> df.duplicated(keep=False)
0     True
1     True
2    False
3    False
4    False
dtype: bool

To find duplicates on specific column(s), use subset.

Copy
Copied!

            
            >>> df.duplicated(subset=['brand'])
0    False
1     True
2    False
3     True
4     True
dtype: bool

property empty: bool

Indicator whether DataFrame is empty.

True if DataFrame is entirely empty (no items), meaning any of the axes are of length 0.

Returns

bool

See also

Series.dropna
DataFrame.dropna

Notes

If DataFrame contains only NaNs, it is still not considered empty. See the example below.

Examples

An example of an actual empty DataFrame. Notice the index is empty:

Copy
Copied!

            
            >>> df_empty = pd.DataFrame({'A' : []})
>>> df_empty
Empty DataFrame
Columns: [A]
Index: []
>>> df_empty.empty
True

If we only have NaNs in our DataFrame, it is not considered empty! We will need to drop the NaNs to make the DataFrame empty:

Copy
Copied!

            
            >>> df = pd.DataFrame({'A' : [np.nan]})
>>> df
A
0 NaN
>>> df.empty
False
>>> df.dropna().empty
True

eq(other, axis='columns', level=None)[source]

Get Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
levelint or label

Returns

DataFrame of bool

See also

DataFrame.eq
DataFrame.ne
DataFrame.le
DataFrame.lt
DataFrame.ge
DataFrame.gt

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

Copy
Copied!

            
            >>> df == 100
cost  revenue
A  False     True
B  False    False
C   True    False

Copy
Copied!

            
            >>> df.eq(100)
cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

Copy
Copied!

            
            >>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

Copy
Copied!

            
            >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

Copy
Copied!

            
            >>> df == [250, 100]
cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

Copy
Copied!

            
            >>> df.eq([250, 250, 100], axis='index')
cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

Copy
Copied!

            
            >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
revenue
A      300
B      250
C      100
D      150

Copy
Copied!

            
            >>> df.gt(other)
cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost  revenue
Q1 A   250      100
B   150      250
C   100      300
Q2 A   150      200
B   300      175
C   220      225

Copy
Copied!

            
            >>> df.le(df_multindex, level=1)
cost  revenue
Q1 A   True     True
B   True     True
C   True     True
Q2 A  False     True
B   True    False
C   True    False

equals(other)[source]

Test whether two objects contain the same elements.

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns must be of the same dtype.

Parameters

otherSeries or DataFrame

Returns

bool

See also

Series.eq
DataFrame.eq
testing.assert_series_equal
testing.assert_frame_equal
numpy.array_equal

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({1: [10], 2: [20]})
>>> df
1   2
0  10  20

DataFrames df and exactly_equal have the same types and values for their elements and column labels, which will return True.

Copy
Copied!

            
            >>> exactly_equal = pd.DataFrame({1: [10], 2: [20]})
>>> exactly_equal
1   2
0  10  20
>>> df.equals(exactly_equal)
True

DataFrames df and different_column_type have the same element types and values, but have different types for the column labels, which will still return True.

Copy
Copied!

            
            >>> different_column_type = pd.DataFrame({1.0: [10], 2.0: [20]})
>>> different_column_type
1.0  2.0
0   10   20
>>> df.equals(different_column_type)
True

DataFrames df and different_data_type have different types for the same values for their elements, and will return False even though their column labels are the same values and types.

Copy
Copied!

            
            >>> different_data_type = pd.DataFrame({1: [10.0], 2: [20.0]})
>>> different_data_type
1     2
0  10.0  20.0
>>> df.equals(different_data_type)
False

eval(expr, inplace=False, **kwargs)[source]

Evaluate a string describing operations on DataFrame columns.

Operates on columns only, not specific rows or elements. This allows eval to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.

Parameters

exprstr
inplacebool, default False
**kwargs

Returns

ndarray, scalar, pandas object, or None

See also

DataFrame.query
DataFrame.assign
eval

Notes

For more details see the API documentation for eval(). For detailed examples see enhancing performance with eval.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df
A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2
>>> df.eval('A + B')
0    11
1    10
2     9
3     8
4     7
dtype: int64

Assignment is allowed though by default the original DataFrame is not modified.

Copy
Copied!

            
            >>> df.eval('C = A + B')
A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
>>> df
A   B
0  1  10
1  2   8
2  3   6
3  4   4
4  5   2

Use inplace=True to modify the original DataFrame.

Copy
Copied!

            
            >>> df.eval('C = A + B', inplace=True)
>>> df
A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7

Multiple columns can be assigned to using multi-line expressions:

Copy
Copied!

            
            >>> df.eval(
...     '''
... C = A + B
... D = A - B
... '''
... )
A   B   C  D
0  1  10  11 -9
1  2   8  10 -6
2  3   6   9 -3
3  4   4   8  0
4  5   2   7  3

ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0, times=None)[source]

Provide exponential weighted (EW) functions.

Available EW functions: mean(), var(), std(), corr(), cov().

Exactly one parameter: com, span, halflife, or alpha must be provided.

Parameters

comfloat, optional
spanfloat, optional
halflifefloat, str, timedelta, optional
alphafloat, optional
min_periodsint, default 0
adjustbool, default True
ignore_nabool, default False
axis{0, 1}, default 0
timesstr, np.ndarray, Series, default None

Returns

DataFrame

See also

rolling
expanding

Notes

More details can be found at: Exponentially weighted windows.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Copy
Copied!

            
            >>> df.ewm(com=0.5).mean()
B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213

Specifying times with a timedelta halflife when computing mean.

Copy
Copied!

            
            >>> times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
>>> df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
B
0  0.000000
1  0.585786
2  1.523889
3  1.523889
4  3.233686

expanding(min_periods=1, center=None, axis=0, method='single')[source]

Provide expanding transformations.

Parameters

min_periodsint, default 1
centerbool, default False
axisint or str, default 0
methodstr {‘single’, ‘table’}, default ‘single’

Returns

a Window sub-classed for the particular operation

See also

rolling
ewm

Notes

By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
>>> df
B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Copy
Copied!

            
            >>> df.expanding(2).sum()
B
0  NaN
1  1.0
2  3.0
3  3.0
4  7.0

explode(column, ignore_index=False)[source]

Transform each element of a list-like to a row, replicating index values.

New in version 0.25.0.

Parameters

columnIndexLabel
ignore_indexbool, default False

Returns

DataFrame

Raises

ValueError

See also

DataFrame.unstack
DataFrame.melt
Series.explode

Notes

This routine will explode list-likes including lists, tuples, sets, Series, and np.ndarray. The result dtype of the subset rows will be object. Scalars will be returned unchanged, and empty list-likes will result in a np.nan for that row. In addition, the ordering of rows in the output will be non-deterministic when exploding sets.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [[0, 1, 2], 'foo', [], [3, 4]],
...                    'B': 1,
...                    'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
>>> df
A  B          C
0  [0, 1, 2]  1  [a, b, c]
1        foo  1        NaN
2         []  1         []
3     [3, 4]  1     [d, e]

Single-column explode.

Copy
Copied!

            
            >>> df.explode('A')
A  B          C
0    0  1  [a, b, c]
0    1  1  [a, b, c]
0    2  1  [a, b, c]
1  foo  1        NaN
2  NaN  1         []
3    3  1     [d, e]
3    4  1     [d, e]

Multi-column explode.

Copy
Copied!

            
            >>> df.explode(list('AC'))
A  B    C
0    0  1    a
0    1  1    b
0    2  1    c
1  foo  1  NaN
2  NaN  1  NaN
3    3  1    d
3    4  1    e

ffill(axis=None, inplace=False, limit=None, downcast=None)[source]

Synonym for DataFrame.fillna() with method='ffill'.

Returns

Series/DataFrame or None

fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)[source]

Fill NA/NaN values using the specified method.

Parameters

valuescalar, dict, Series, or DataFrame
method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
axis{0 or ‘index’, 1 or ‘columns’}
inplacebool, default False
limitint, default None
downcastdict, default is None

Returns

DataFrame or None

See also

interpolate
reindex
asfreq

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[np.nan, 2, np.nan, 0],
...                    [3, 4, np.nan, 1],
...                    [np.nan, np.nan, np.nan, 5],
...                    [np.nan, 3, np.nan, 4]],
...                   columns=list("ABCD"))
>>> df
A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

Replace all NaN elements with 0s.

Copy
Copied!

            
            >>> df.fillna(0)
A   B   C   D
0   0.0 2.0 0.0 0
1   3.0 4.0 0.0 1
2   0.0 0.0 0.0 5
3   0.0 3.0 0.0 4

We can also propagate non-null values forward or backward.

Copy
Copied!

            
            >>> df.fillna(method="ffill")
A   B   C   D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   3.0 4.0 NaN 5
3   3.0 3.0 NaN 4

Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.

Copy
Copied!

            
            >>> values = {"A": 0, "B": 1, "C": 2, "D": 3}
>>> df.fillna(value=values)
A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 2.0 1
2   0.0 1.0 2.0 5
3   0.0 3.0 2.0 4

Only replace the first NaN element.

Copy
Copied!

            
            >>> df.fillna(value=values, limit=1)
A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 NaN 1
2   NaN 1.0 NaN 5
3   NaN 3.0 NaN 4

When filling using a DataFrame, replacement happens along the same column names and same indices

Copy
Copied!

            
            >>> df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE"))
>>> df.fillna(df2)
A   B   C   D
0   0.0 2.0 0.0 0
1   3.0 4.0 0.0 1
2   0.0 0.0 0.0 5
3   0.0 3.0 0.0 4

filter(items=None, like=None, regex=None, axis=None)[source]

Subset the dataframe rows or columns according to the specified index labels.

Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Parameters

itemslist-like
likestr
regexstr (regular expression)
axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Returns

same type as input object

See also

DataFrame.loc

Notes

The items, like, and regex parameters are enforced to be mutually exclusive.

axis defaults to the info axis that is used when indexing with [].

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(np.array(([1, 2, 3], [4, 5, 6])),
...                   index=['mouse', 'rabbit'],
...                   columns=['one', 'two', 'three'])
>>> df
one  two  three
mouse     1    2      3
rabbit    4    5      6

Copy
Copied!

            
            >>> # select columns by name
>>> df.filter(items=['one', 'three'])
one  three
mouse     1      3
rabbit    4      6

Copy
Copied!

            
            >>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1)
one  three
mouse     1      3
rabbit    4      6

Copy
Copied!

            
            >>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0)
one  two  three
rabbit    4    5      6

first(offset)[source]

Select initial periods of time series data based on a date offset.

When having a DataFrame with dates as index, this function can select the first few rows based on a date offset.

Parameters

offsetstr, DateOffset or dateutil.relativedelta

Returns

Series or DataFrame

Raises

TypeError

See also

last
at_time
between_time

Examples

Copy
Copied!

            
            >>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the first 3 days:

Copy
Copied!

            
            >>> ts.first('3D')
A
2018-04-09  1
2018-04-11  2

Notice the data for 3 first calendar days were returned, not the first 3 days observed in the dataset, and therefore data for 2018-04-13 was not returned.

first_valid_index()[source]

Return index for first non-NA value or None, if no NA value is found.

Returns

scalartype of index

Notes

If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.

property flags: pandas.core.flags.Flags

Get the properties associated with this pandas object.

The available flags are

Flags.allows_duplicate_labels

See also

Flags
DataFrame.attrs

Notes

“Flags” differ from “metadata”. Flags reflect properties of the pandas object (the Series or DataFrame). Metadata refer to properties of the dataset, and should be stored in DataFrame.attrs.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags
<Flags(allows_duplicate_labels=True)>

Flags can be get or set using .

Copy
Copied!

            
            >>> df.flags.allows_duplicate_labels
True
>>> df.flags.allows_duplicate_labels = False

Or by slicing with a key

Copy
Copied!

            
            >>> df.flags["allows_duplicate_labels"]
False
>>> df.flags["allows_duplicate_labels"] = True

floordiv(other, axis='columns', level=None, fill_value=None)[source]

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

classmethod from_dict(data, orient='columns', dtype=None, columns=None)[source]

Construct DataFrame from dict of array-like or dicts.

Creates DataFrame object from dictionary by columns or by index allowing dtype specification.

Parameters

datadict
orient{‘columns’, ‘index’}, default ‘columns’
dtypedtype, default None
columnslist, default None

Returns

DataFrame

See also

DataFrame.from_records
DataFrame

Examples

By default the keys of the dict become the DataFrame columns:

Copy
Copied!

            
            >>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data)
col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Specify orient='index' to create the DataFrame using dictionary keys as rows:

Copy
Copied!

            
            >>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
>>> pd.DataFrame.from_dict(data, orient='index')
0  1  2  3
row_1  3  2  1  0
row_2  a  b  c  d

When using the ‘index’ orientation, the column names can be specified manually:

Copy
Copied!

            
            >>> pd.DataFrame.from_dict(data, orient='index',
...                        columns=['A', 'B', 'C', 'D'])
A  B  C  D
row_1  3  2  1  0
row_2  a  b  c  d

classmethod from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)[source]

Convert structured or record ndarray to DataFrame.

Creates a DataFrame object from a structured ndarray, sequence of tuples or dicts, or DataFrame.

Parameters

datastructured ndarray, sequence of tuples or dicts, or DataFrame
indexstr, list of fields, array-like
excludesequence, default None
columnssequence, default None
coerce_floatbool, default False
nrowsint, default None

Returns

DataFrame

See also

DataFrame.from_dict
DataFrame

Examples

Data can be provided as a structured ndarray:

Copy
Copied!

            
            >>> data = np.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],
...                 dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> pd.DataFrame.from_records(data)
col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of dicts:

Copy
Copied!

            
            >>> data = [{'col_1': 3, 'col_2': 'a'},
...         {'col_1': 2, 'col_2': 'b'},
...         {'col_1': 1, 'col_2': 'c'},
...         {'col_1': 0, 'col_2': 'd'}]
>>> pd.DataFrame.from_records(data)
col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

Data can be provided as a list of tuples with corresponding columns:

Copy
Copied!

            
            >>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
>>> pd.DataFrame.from_records(data, columns=['col_1', 'col_2'])
col_1 col_2
0      3     a
1      2     b
2      1     c
3      0     d

ge(other, axis='columns', level=None)[source]

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
levelint or label

Returns

DataFrame of bool

See also

DataFrame.eq
DataFrame.ne
DataFrame.le
DataFrame.lt
DataFrame.ge
DataFrame.gt

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

Copy
Copied!

            
            >>> df == 100
cost  revenue
A  False     True
B  False    False
C   True    False

Copy
Copied!

            
            >>> df.eq(100)
cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

Copy
Copied!

            
            >>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

Copy
Copied!

            
            >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

Copy
Copied!

            
            >>> df == [250, 100]
cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

Copy
Copied!

            
            >>> df.eq([250, 250, 100], axis='index')
cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

Copy
Copied!

            
            >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
revenue
A      300
B      250
C      100
D      150

Copy
Copied!

            
            >>> df.gt(other)
cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost  revenue
Q1 A   250      100
B   150      250
C   100      300
Q2 A   150      200
B   300      175
C   220      225

Copy
Copied!

            
            >>> df.le(df_multindex, level=1)
cost  revenue
Q1 A   True     True
B   True     True
C   True     True
Q2 A  False     True
B   True    False
C   True    False

get(key, default=None)[source]

Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters

keyobject

Returns

valuesame type as items contained in object

groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)[source]

Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters

bymapping, function, label, or list of labels
axis{0 or ‘index’, 1 or ‘columns’}, default 0
levelint, level name, or sequence of such, default None
as_indexbool, default True
sortbool, default True
group_keysbool, default True
squeezebool, default False
observedbool, default False
dropnabool, default True

Returns

DataFrameGroupBy

See also

resample

Notes

See the user guide for more.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
...                               'Parrot', 'Parrot'],
...                    'Max Speed': [380., 370., 24., 26.]})
>>> df
Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
>>> df.groupby(['Animal']).mean()
Max Speed
Animal
Falcon      375.0
Parrot       25.0

Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

Copy
Copied!

            
            >>> arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
...           ['Captive', 'Wild', 'Captive', 'Wild']]
>>> index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
>>> df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
...                   index=index)
>>> df
Max Speed
Animal Type
Falcon Captive      390.0
Wild         350.0
Parrot Captive       30.0
Wild          20.0
>>> df.groupby(level=0).mean()
Max Speed
Animal
Falcon      370.0
Parrot       25.0
>>> df.groupby(level="Type").mean()
Max Speed
Type
Captive      210.0
Wild         185.0

We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True:

Copy
Copied!

            
            >>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])

Copy
Copied!

            
            >>> df.groupby(by=["b"]).sum()
a   c
b
1.0 2   3
2.0 2   5

Copy
Copied!

            
            >>> df.groupby(by=["b"], dropna=False).sum()
a   c
b
1.0 2   3
2.0 2   5
NaN 1   4

Copy
Copied!

            
            >>> l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
>>> df = pd.DataFrame(l, columns=["a", "b", "c"])

Copy
Copied!

            
            >>> df.groupby(by="a").sum()
b     c
a
a   13.0   13.0
b   12.3  123.0

Copy
Copied!

            
            >>> df.groupby(by="a", dropna=False).sum()
b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0

gt(other, axis='columns', level=None)[source]

Get Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
levelint or label

Returns

DataFrame of bool

See also

DataFrame.eq
DataFrame.ne
DataFrame.le
DataFrame.lt
DataFrame.ge
DataFrame.gt

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

Copy
Copied!

            
            >>> df == 100
cost  revenue
A  False     True
B  False    False
C   True    False

Copy
Copied!

            
            >>> df.eq(100)
cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

Copy
Copied!

            
            >>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

Copy
Copied!

            
            >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

Copy
Copied!

            
            >>> df == [250, 100]
cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

Copy
Copied!

            
            >>> df.eq([250, 250, 100], axis='index')
cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

Copy
Copied!

            
            >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
revenue
A      300
B      250
C      100
D      150

Copy
Copied!

            
            >>> df.gt(other)
cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost  revenue
Q1 A   250      100
B   150      250
C   100      300
Q2 A   150      200
B   300      175
C   220      225

Copy
Copied!

            
            >>> df.le(df_multindex, level=1)
cost  revenue
Q1 A   True     True
B   True     True
C   True     True
Q2 A  False     True
B   True    False
C   True    False

head(n=5)[source]

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n].

Parameters

nint, default 5

Returns

same type as caller

See also

DataFrame.tail

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the first 5 lines

Copy
Copied!

            
            >>> df.head()
animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey

Viewing the first n lines (three in this case)

Copy
Copied!

            
            >>> df.head(3)
animal
0  alligator
1        bee
2     falcon

For negative values of n

Copy
Copied!

            
            >>> df.head(-3)
animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot

hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, backend=None, legend=False, **kwargs)[source]

Make a histogram of the DataFrame’s columns.

A histogram is a representation of the distribution of data. This function calls matplotlib.pyplot.hist(), on each series in the DataFrame, resulting in one histogram per column.

Parameters

dataDataFrame
columnstr or sequence, optional
byobject, optional
gridbool, default True
xlabelsizeint, default None
xrotfloat, default None
ylabelsizeint, default None
yrotfloat, default None
axMatplotlib axes object, default None
sharexbool, default True if ax is None else False
shareybool, default False
figsizetuple, optional
layouttuple, optional
binsint or sequence, default 10
backendstr, default None
legendbool, default False
**kwargs

Returns

matplotlib.AxesSubplot or numpy.ndarray of them

See also

matplotlib.pyplot.hist

Examples

This example draws a histogram based on the length and width of some animals, displayed in three bins

property iat: pandas.core.indexing._iAtIndexer

Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

Raises

IndexError

See also

DataFrame.at
DataFrame.loc
DataFrame.iloc

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
...                   columns=['A', 'B', 'C'])
>>> df
A   B   C
0   0   2   3
1   0   4   1
2  10  20  30

Get value at specified row/column pair

Copy
Copied!

            
            >>> df.iat[1, 2]
1

Set value at specified row/column pair

Copy
Copied!

            
            >>> df.iat[1, 2] = 10
>>> df.iat[1, 2]
10

Get value within a series

Copy
Copied!

            
            >>> df.loc[0].iat[1]
2

idxmax(axis=0, skipna=True)[source]

Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
skipnabool, default True

Returns

Series

Raises

ValueError

See also

Series.idxmax

Notes

This method is the DataFrame version of ndarray.argmax.

Examples

Consider a dataset containing food consumption in Argentina.

Copy
Copied!

            
            >>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                    'co2_emissions': [37.2, 19.66, 1712]},
...                    index=['Pork', 'Wheat Products', 'Beef'])

Copy
Copied!

            
            >>> df
consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the maximum value in each column.

Copy
Copied!

            
            >>> df.idxmax()
consumption     Wheat Products
co2_emissions             Beef
dtype: object

To return the index for the maximum value in each row, use axis="columns".

Copy
Copied!

            
            >>> df.idxmax(axis="columns")
Pork              co2_emissions
Wheat Products     consumption
Beef              co2_emissions
dtype: object

idxmin(axis=0, skipna=True)[source]

Return index of first occurrence of minimum over requested axis.

NA/null values are excluded.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
skipnabool, default True

Returns

Series

Raises

ValueError

See also

Series.idxmin

Notes

This method is the DataFrame version of ndarray.argmin.

Examples

Consider a dataset containing food consumption in Argentina.

Copy
Copied!

            
            >>> df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48],
...                    'co2_emissions': [37.2, 19.66, 1712]},
...                    index=['Pork', 'Wheat Products', 'Beef'])

Copy
Copied!

            
            >>> df
consumption  co2_emissions
Pork                  10.51         37.20
Wheat Products       103.11         19.66
Beef                  55.48       1712.00

By default, it returns the index for the minimum value in each column.

Copy
Copied!

            
            >>> df.idxmin()
consumption                Pork
co2_emissions    Wheat Products
dtype: object

To return the index for the minimum value in each row, use axis="columns".

Copy
Copied!

            
            >>> df.idxmin(axis="columns")
Pork                consumption
Wheat Products    co2_emissions
Beef                consumption
dtype: object

property iloc: pandas.core.indexing._iLocIndexer

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

An integer, e.g. 5.
A list or array of integers, e.g. [4, 3, 0].
A slice object with ints, e.g. 1:7.
A boolean array.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

See more at Selection by Position.

See also

DataFrame.iat
DataFrame.loc
Series.iloc

Examples

Copy
Copied!

            
            >>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
...           {'a': 100, 'b': 200, 'c': 300, 'd': 400},
...           {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = pd.DataFrame(mydict)
>>> df
a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

Indexing just the rows

With a scalar integer.

Copy
Copied!

            
            >>> type(df.iloc[0])
<class 'pandas.core.series.Series'>
>>> df.iloc[0]
a    1
b    2
c    3
d    4
Name: 0, dtype: int64

With a list of integers.

Copy
Copied!

            
            >>> df.iloc[[0]]
a  b  c  d
0  1  2  3  4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>

Copy
Copied!

            
            >>> df.iloc[[0, 1]]
a    b    c    d
0    1    2    3    4
1  100  200  300  400

With a slice object.

Copy
Copied!

            
            >>> df.iloc[:3]
a     b     c     d
0     1     2     3     4
1   100   200   300   400
2  1000  2000  3000  4000

With a boolean mask the same length as the index.

Copy
Copied!

            
            >>> df.iloc[[True, False, True]]
a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

With a callable, useful in method chains. The x passed to the lambda is the DataFrame being sliced. This selects the rows whose index label even.

Copy
Copied!

            
            >>> df.iloc[lambda x: x.index % 2 == 0]
a     b     c     d
0     1     2     3     4
2  1000  2000  3000  4000

Indexing both axes

You can mix the indexer types for the index and columns. Use : to select the entire axis.

With scalar integers.

Copy
Copied!

            
            >>> df.iloc[0, 1]
2

With lists of integers.

Copy
Copied!

            
            >>> df.iloc[[0, 2], [1, 3]]
b     d
0     2     4
2  2000  4000

With slice objects.

Copy
Copied!

            
            >>> df.iloc[1:3, 0:3]
a     b     c
1   100   200   300
2  1000  2000  3000

With a boolean array whose length matches the columns.

Copy
Copied!

            
            >>> df.iloc[:, [True, False, True, False]]
a     c
0     1     3
1   100   300
2  1000  3000

With a callable function that expects the Series or DataFrame.

Copy
Copied!

            
            >>> df.iloc[:, lambda df: [0, 2]]
a     c
0     1     3
1   100   300
2  1000  3000

index: Index

infer_objects()[source]

Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

Returns

convertedsame type as input object

See also

to_datetime
to_timedelta
to_numeric
convert_dtypes

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df
A
1  1
2  2
3  3

Copy
Copied!

            
            >>> df.dtypes
A    object
dtype: object

Copy
Copied!

            
            >>> df.infer_objects().dtypes
A    int64
dtype: object

info(verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None, null_counts=None)[source]

Print a concise summary of a DataFrame.

This method prints information about a DataFrame including the index dtype and columns, non-null values and memory usage.

Parameters

dataDataFrame
verbosebool, optional
bufwritable buffer, defaults to sys.stdout
max_colsint, optional
memory_usagebool, str, optional
show_countsbool, optional
null_countsbool, optional

Returns

None

See also

DataFrame.describe
DataFrame.memory_usage

Examples

Copy
Copied!

            
            >>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values,
...                   "float_col": float_values})
>>> df
int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

Prints information of all columns:

Copy
Copied!

            
            >>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
#   Column     Non-Null Count  Dtype
---  ------     --------------  -----
0   int_col    5 non-null      int64
1   text_col   5 non-null      object
2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Prints a summary of columns count and its dtypes but not per column information:

Copy
Copied!

            
            >>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

Pipe output of DataFrame.info to buffer instead of sys.stdout, get buffer content and writes to a text file:

Copy
Copied!

            
            >>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> s = buffer.getvalue()
>>> with open("df_info.txt", "w",
...           encoding="utf-8") as f:  
...     f.write(s)
260

The memory_usage parameter allows deep introspection mode, specially useful for big DataFrames and fine-tune memory optimization:

Copy
Copied!

            
            >>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> df = pd.DataFrame({
...     'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
#   Column    Non-Null Count    Dtype
---  ------    --------------    -----
0   column_1  1000000 non-null  object
1   column_2  1000000 non-null  object
2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 22.9+ MB

Copy
Copied!

            
            >>> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
#   Column    Non-Null Count    Dtype
---  ------    --------------    -----
0   column_1  1000000 non-null  object
1   column_2  1000000 non-null  object
2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 165.9 MB

insert(loc, column, value, allow_duplicates=False)[source]

Insert column into DataFrame at specified location.

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

Parameters

locint
columnstr, number, or hashable object
valueint, Series, or array-like
allow_duplicatesbool, optional

See also

Index.insert

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
col1  col2
0     1     3
1     2     4
>>> df.insert(1, "newcol", [99, 99])
>>> df
col1  newcol  col2
0     1      99     3
1     2      99     4
>>> df.insert(0, "col1", [100, 100], allow_duplicates=True)
>>> df
col1  col1  newcol  col2
0   100     1      99     3
1   100     2      99     4

Notice that pandas uses index alignment in case of value from type Series:

Copy
Copied!

            
            >>> df.insert(0, "col0", pd.Series([5, 6], index=[1, 2]))
>>> df
col0  col1  col1  newcol  col2
0   NaN   100     1      99     3
1   5.0   100     2      99     4

interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction=None, limit_area=None, downcast=None, **kwargs)[source]

Fill NaN values using an interpolation method.

Please note that only method='linear' is supported for DataFrame/Series with a MultiIndex.

Parameters

methodstr, default ‘linear’

Interpolation technique to use. One of:

‘linear’: Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes.
‘time’: Works on daily and higher resolution data to interpolate given length of interval.
‘index’, ‘values’: use the actual numerical values of the index.
‘pad’: Fill in NaNs using existing values.
‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. These methods use the numerical values of the index. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. df.interpolate(method='polynomial', order=5).
‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes.
‘from_derivatives’: Refers to scipy.interpolate.BPoly.from_derivatives which replaces ‘piecewise_polynomial’ interpolation method in scipy 0.18.

axis{{0 or ‘index’, 1 or ‘columns’, None}}, default None

Axis to interpolate along.

limitint, optional

Maximum number of consecutive NaNs to fill. Must be greater than 0.

inplacebool, default False

Update the data in place if possible.

limit_direction{{‘forward’, ‘backward’, ‘both’}}, Optional

Consecutive NaNs will be filled in this direction.

If limit is specified:
If ‘limit’ is not specified:

Changed in version 1.1.0:raises ValueError if limit_direction is ‘forward’ or ‘both’ and method is ‘backfill’ or ‘bfill’. raises ValueError if limit_direction is ‘backward’ or ‘both’ and method is ‘pad’ or ‘ffill’.

limit_area{{None, ‘inside’, ‘outside’}}, default None

If limit is specified, consecutive NaNs will be filled with this restriction.

None: No fill restriction.
‘inside’: Only fill NaNs surrounded by valid values (interpolate).
‘outside’: Only fill NaNs outside valid values (extrapolate).

downcastoptional, ‘infer’ or None, defaults to None

Downcast dtypes if possible.

``**kwargs``optional

Keyword arguments to pass on to the interpolating function.

Returns

Series or DataFrame or None

See also

fillna
scipy.interpolate.Akima1DInterpolator
scipy.interpolate.BPoly.from_derivatives
scipy.interpolate.interp1d
scipy.interpolate.KroghInterpolator
scipy.interpolate.PchipInterpolator
scipy.interpolate.CubicSpline

Notes

The ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’ and ‘akima’ methods are wrappers around the respective SciPy implementations of similar names. These use the actual numerical values of the index. For more information on their behavior, see the SciPy documentation and SciPy tutorial.

Examples

Filling in NaN in a Series via linear interpolation.

Copy
Copied!

            
            >>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64
>>> s.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Filling in NaN in a Series by padding, but filling at most two consecutive NaN at a time.

Copy
Copied!

            
            >>> s = pd.Series([np.nan, "single_one", np.nan,
...                "fill_two_more", np.nan, np.nan, np.nan,
...                4.71, np.nan])
>>> s
0              NaN
1       single_one
2              NaN
3    fill_two_more
4              NaN
5              NaN
6              NaN
7             4.71
8              NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0              NaN
1       single_one
2       single_one
3    fill_two_more
4    fill_two_more
5    fill_two_more
6              NaN
7             4.71
8             4.71
dtype: object

Filling in NaN in a Series via polynomial interpolation or splines: Both ‘polynomial’ and ‘spline’ methods require that you also specify an order (int).

Copy
Copied!

            
            >>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Fill the DataFrame forward (that is, going down) along each column using linear interpolation.

Note how the last entry in column ‘a’ is interpolated differently, because there is no entry after it to use for interpolation. Note how the first entry in column ‘b’ remains NaN, because there is no entry before it to use for interpolation.

Copy
Copied!

            
            >>> df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
...                    (np.nan, 2.0, np.nan, np.nan),
...                    (2.0, 3.0, np.nan, 9.0),
...                    (np.nan, 4.0, -4.0, 16.0)],
...                   columns=list('abcd'))
>>> df
a    b    c     d
0  0.0  NaN -1.0   1.0
1  NaN  2.0  NaN   NaN
2  2.0  3.0  NaN   9.0
3  NaN  4.0 -4.0  16.0
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a    b    c     d
0  0.0  NaN -1.0   1.0
1  1.0  2.0 -2.0   5.0
2  2.0  3.0 -3.0   9.0
3  2.0  4.0 -4.0  16.0

Using polynomial interpolation.

Copy
Copied!

            
            >>> df['d'].interpolate(method='polynomial', order=2)
0     1.0
1     4.0
2     9.0
3    16.0
Name: d, dtype: float64

isin(values)[source]

Whether each element in the DataFrame is contained in values.

Parameters

valuesiterable, Series, DataFrame or dict

Returns

DataFrame

See also

DataFrame.eq
Series.isin
Series.str.contains

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
...                   index=['falcon', 'dog'])
>>> df
num_legs  num_wings
falcon         2          2
dog            4          0

When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

Copy
Copied!

            
            >>> df.isin([0, 2])
num_legs  num_wings
falcon      True       True
dog        False       True

When values is a dict, we can pass values to check for each column separately:

Copy
Copied!

            
            >>> df.isin({'num_wings': [0, 3]})
num_legs  num_wings
falcon     False      False
dog        False       True

When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in df2.

Copy
Copied!

            
            >>> other = pd.DataFrame({'num_legs': [8, 2], 'num_wings': [0, 2]},
...                      index=['spider', 'falcon'])
>>> df.isin(other)
num_legs  num_wings
falcon      True       True
dog        False      False

isna()[source]

Detect missing values.

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

DataFrame

See also

DataFrame.isnull
DataFrame.notna
DataFrame.dropna
isna

Examples

Show which entries in a DataFrame are NA.

Copy
Copied!

            
            >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

Copy
Copied!

            
            >>> df.isna()
age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

Copy
Copied!

            
            >>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

Copy
Copied!

            
            >>> ser.isna()
0    False
1    False
2     True
dtype: bool

isnull()[source]

Detect missing values.

Returns

DataFrame

See also

DataFrame.isnull
DataFrame.notna
DataFrame.dropna
isna

Examples

Show which entries in a DataFrame are NA.

Copy
Copied!

            
            >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

Copy
Copied!

            
            >>> df.isna()
age   born   name    toy
0  False   True  False   True
1  False  False  False  False
2   True  False  False  False

Show which entries in a Series are NA.

Copy
Copied!

            
            >>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

Copy
Copied!

            
            >>> ser.isna()
0    False
1    False
2     True
dtype: bool

items()[source]

Iterate over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

Yields

labelobject
contentSeries

See also

DataFrame.iterrows
DataFrame.itertuples

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.items():
...     print(f'label:{label}')
...     print(f'content:{content}', sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64

iteritems()[source]

Iterate over (column name, Series) pairs.

Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.

Yields

labelobject
contentSeries

See also

DataFrame.iterrows
DataFrame.itertuples

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
...                   'population': [1864, 22000, 80000]},
...                   index=['panda', 'polar', 'koala'])
>>> df
species   population
panda   bear      1864
polar   bear      22000
koala   marsupial 80000
>>> for label, content in df.items():
...     print(f'label:{label}')
...     print(f'content:{content}', sep='\n')
...
label: species
content:
panda         bear
polar         bear
koala    marsupial
Name: species, dtype: object
label: population
content:
panda     1864
polar    22000
koala    80000
Name: population, dtype: int64

iterrows()[source]

Iterate over DataFrame rows as (index, Series) pairs.

Yields

indexlabel or tuple of label
dataSeries

See also

DataFrame.itertuples
DataFrame.items

Notes

Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). For example,
Copy

Copied!
```
            
            >>> df = pd.DataFrame([[1, 1.5]], columns=['int', 'float'])
>>> row = next(df.iterrows())[1]
>>> row
int      1.0
float    1.5
Name: 0, dtype: float64
>>> print(row['int'].dtype)
float64
>>> print(df['int'].dtype)
int64
        
```
To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally faster than iterrows.
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

itertuples(index=True, name='Pandas')[source]

Iterate over DataFrame rows as namedtuples.

Parameters

indexbool, default True
namestr or None, default “Pandas”

Returns

iterator

See also

DataFrame.iterrows
DataFrame.items

Notes

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. On python versions < 3.7 regular tuples are returned for DataFrames with a large number of columns (>254).

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
num_legs  num_wings
dog          4          0
hawk         2          2
>>> for row in df.itertuples():
...     print(row)
...
Pandas(Index='dog', num_legs=4, num_wings=0)
Pandas(Index='hawk', num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index as the first element of the tuple:

Copy
Copied!

            
            >>> for row in df.itertuples(index=False):
...     print(row)
...
Pandas(num_legs=4, num_wings=0)
Pandas(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded namedtuples:

Copy
Copied!

            
            >>> for row in df.itertuples(name='Animal'):
...     print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)

join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)[source]

Join columns of another DataFrame.

Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.

Parameters

otherDataFrame, Series, or list of DataFrame
onstr, list of str, or array-like, optional
how{‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘left’
lsuffixstr, default ‘’
rsuffixstr, default ‘’
sortbool, default False

Returns

DataFrame

See also

DataFrame.merge

Notes

Parameters on, lsuffix, and rsuffix are not supported when passing a list of DataFrame objects.

Support for specifying index levels as the on parameter was added in version 0.23.0.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
...                    'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})

Copy
Copied!

            
            >>> df
key   A
0  K0  A0
1  K1  A1
2  K2  A2
3  K3  A3
4  K4  A4
5  K5  A5

Copy
Copied!

            
            >>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
...                       'B': ['B0', 'B1', 'B2']})

Copy
Copied!

            
            >>> other
key   B
0  K0  B0
1  K1  B1
2  K2  B2

Join DataFrames using their indexes.

Copy
Copied!

            
            >>> df.join(other, lsuffix='_caller', rsuffix='_other')
key_caller   A key_other    B
0         K0  A0        K0   B0
1         K1  A1        K1   B1
2         K2  A2        K2   B2
3         K3  A3       NaN  NaN
4         K4  A4       NaN  NaN
5         K5  A5       NaN  NaN

If we want to join using the key columns, we need to set key to be the index in both df and other. The joined DataFrame will have key as its index.

Copy
Copied!

            
            >>> df.set_index('key').join(other.set_index('key'))
A    B
key
K0   A0   B0
K1   A1   B1
K2   A2   B2
K3   A3  NaN
K4   A4  NaN
K5   A5  NaN

Another option to join using the key columns is to use the on parameter. DataFrame.join always uses other’s index but we can use any column in df. This method preserves the original DataFrame’s index in the result.

Copy
Copied!

            
            >>> df.join(other.set_index('key'), on='key')
key   A    B
0  K0  A0   B0
1  K1  A1   B1
2  K2  A2   B2
3  K3  A3  NaN
4  K4  A4  NaN
5  K5  A5  NaN

keys()[source]

Get the ‘info axis’ (see Indexing for more).

This is index for Series, columns for DataFrame.

Returns

Index

kurt(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
**kwargs

Returns

Series or DataFrame (if level specified)

kurtosis(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)[source]

Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
**kwargs

Returns

Series or DataFrame (if level specified)

last(offset)[source]

Select final periods of time series data based on a date offset.

For a DataFrame with a sorted DatetimeIndex, this function selects the last few rows based on a date offset.

Parameters

offsetstr, DateOffset, dateutil.relativedelta

Returns

Series or DataFrame

Raises

TypeError

See also

first
at_time
between_time

Examples

Copy
Copied!

            
            >>> i = pd.date_range('2018-04-09', periods=4, freq='2D')
>>> ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts
A
2018-04-09  1
2018-04-11  2
2018-04-13  3
2018-04-15  4

Get the rows for the last 3 days:

Copy
Copied!

            
            >>> ts.last('3D')
A
2018-04-13  3
2018-04-15  4

Notice the data for 3 last calendar days were returned, not the last 3 observed days in the dataset, and therefore data for 2018-04-11 was not returned.

last_valid_index()[source]

Return index for last non-NA value or None, if no NA value is found.

Returns

scalartype of index

Notes

If all elements are non-NA/null, returns None. Also returns None for empty Series/DataFrame.

le(other, axis='columns', level=None)[source]

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
levelint or label

Returns

DataFrame of bool

See also

DataFrame.eq
DataFrame.ne
DataFrame.le
DataFrame.lt
DataFrame.ge
DataFrame.gt

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

Copy
Copied!

            
            >>> df == 100
cost  revenue
A  False     True
B  False    False
C   True    False

Copy
Copied!

            
            >>> df.eq(100)
cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

Copy
Copied!

            
            >>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

Copy
Copied!

            
            >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

Copy
Copied!

            
            >>> df == [250, 100]
cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

Copy
Copied!

            
            >>> df.eq([250, 250, 100], axis='index')
cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

Copy
Copied!

            
            >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
revenue
A      300
B      250
C      100
D      150

Copy
Copied!

            
            >>> df.gt(other)
cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost  revenue
Q1 A   250      100
B   150      250
C   100      300
Q2 A   150      200
B   300      175
C   220      225

Copy
Copied!

            
            >>> df.le(df_multindex, level=1)
cost  revenue
Q1 A   True     True
B   True     True
C   True     True
Q2 A  False     True
B   True    False
C   True    False

property loc: pandas.core.indexing._LocIndexer

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).
A list or array of labels, e.g. ['a', 'b', 'c'].
A slice object with labels, e.g. 'a':'f'.

Warning

Note that contrary to usual python slices, both the start and the stop are included
A boolean array of the same length as the axis being sliced, e.g. [True, False, True].
An alignable boolean Series. The index of the key will be aligned before masking.
An alignable Index. The Index of the returned selection will be the input.
A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

See more at Selection by Label.

Raises

KeyError
IndexingError

See also

DataFrame.at
DataFrame.iloc
DataFrame.xs
Series.loc

Examples

Getting values

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> df
max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

Single label. Note this returns the row as a Series.

Copy
Copied!

            
            >>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

List of labels. Note using [[]] returns a DataFrame.

Copy
Copied!

            
            >>> df.loc[['viper', 'sidewinder']]
max_speed  shield
viper               4       5
sidewinder          7       8

Single label for row and column

Copy
Copied!

            
            >>> df.loc['cobra', 'shield']
2

Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.

Copy
Copied!

            
            >>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

Copy
Copied!

            
            >>> df.loc[[False, False, True]]
max_speed  shield
sidewinder          7       8

Alignable boolean Series:

Copy
Copied!

            
            >>> df.loc[pd.Series([False, True, False],
...        index=['viper', 'sidewinder', 'cobra'])]
max_speed  shield
sidewinder          7       8

Index (same behavior as df.reindex)

Copy
Copied!

            
            >>> df.loc[pd.Index(["cobra", "viper"], name="foo")]
max_speed  shield
foo
cobra          1       2
viper          4       5

Conditional that returns a boolean Series

Copy
Copied!

            
            >>> df.loc[df['shield'] > 6]
max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series with column labels specified

Copy
Copied!

            
            >>> df.loc[df['shield'] > 6, ['max_speed']]
max_speed
sidewinder          7

Callable that returns a boolean Series

Copy
Copied!

            
            >>> df.loc[lambda df: df['shield'] == 8]
max_speed  shield
sidewinder          7       8

Setting values

Set value for all items matching the list of labels

Copy
Copied!

            
            >>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

Set value for an entire row

Copy
Copied!

            
            >>> df.loc['cobra'] = 10
>>> df
max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

Set value for an entire column

Copy
Copied!

            
            >>> df.loc[:, 'max_speed'] = 30
>>> df
max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

Set value for rows matching callable condition

Copy
Copied!

            
            >>> df.loc[df['shield'] > 35] = 0
>>> df
max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0

Getting values on a DataFrame with an index that has integer labels

Another example using integers for the index

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
max_speed  shield
7          1       2
8          4       5
9          7       8

Slice with integer labels for rows. As mentioned above, note that both the start and stop of the slice are included.

Copy
Copied!

            
            >>> df.loc[7:9]
max_speed  shield
7          1       2
8          4       5
9          7       8

Getting values with a MultiIndex

A number of examples using a DataFrame with a MultiIndex

Copy
Copied!

            
            >>> tuples = [
...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...    ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...         [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
max_speed  shield
cobra      mark i           12       2
mark ii           0       4
sidewinder mark i           10      20
mark ii           1       4
viper      mark ii           7       1
mark iii         16      36

Single label. Note this returns a DataFrame with a single index.

Copy
Copied!

            
            >>> df.loc['cobra']
max_speed  shield
mark i          12       2
mark ii          0       4

Single index tuple. Note this returns a Series.

Copy
Copied!

            
            >>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this returns a Series.

Copy
Copied!

            
            >>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Single tuple. Note using [[]] returns a DataFrame.

Copy
Copied!

            
            >>> df.loc[[('cobra', 'mark ii')]]
max_speed  shield
cobra mark ii          0       4

Single tuple for the index with a single label for the column

Copy
Copied!

            
            >>> df.loc[('cobra', 'mark i'), 'shield']
2

Slice from index tuple to single label

Copy
Copied!

            
            >>> df.loc[('cobra', 'mark i'):'viper']
max_speed  shield
cobra      mark i           12       2
mark ii           0       4
sidewinder mark i           10      20
mark ii           1       4
viper      mark ii           7       1
mark iii         16      36

Slice from index tuple to index tuple

Copy
Copied!

            
            >>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
max_speed  shield
cobra      mark i          12       2
mark ii          0       4
sidewinder mark i          10      20
mark ii          1       4
viper      mark ii          7       1

lookup(row_labels, col_labels)[source]

Label-based “fancy indexing” function for DataFrame. Given equal-length arrays of row and column labels, return an array of the values corresponding to each (row, col) pair.

Deprecated since version 1.2.0:DataFrame.lookup is deprecated, use DataFrame.melt and DataFrame.loc instead. For further details see Looking up values by index/column labels.

Parameters

row_labelssequence
col_labelssequence

Returns

numpy.ndarray

lt(other, axis='columns', level=None)[source]

Get Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
levelint or label

Returns

DataFrame of bool

See also

DataFrame.eq
DataFrame.ne
DataFrame.le
DataFrame.lt
DataFrame.ge
DataFrame.gt

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

Copy
Copied!

            
            >>> df == 100
cost  revenue
A  False     True
B  False    False
C   True    False

Copy
Copied!

            
            >>> df.eq(100)
cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

Copy
Copied!

            
            >>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

Copy
Copied!

            
            >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

Copy
Copied!

            
            >>> df == [250, 100]
cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

Copy
Copied!

            
            >>> df.eq([250, 250, 100], axis='index')
cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

Copy
Copied!

            
            >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
revenue
A      300
B      250
C      100
D      150

Copy
Copied!

            
            >>> df.gt(other)
cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost  revenue
Q1 A   250      100
B   150      250
C   100      300
Q2 A   150      200
B   300      175
C   220      225

Copy
Copied!

            
            >>> df.le(df_multindex, level=1)
cost  revenue
Q1 A   True     True
B   True     True
C   True     True
Q2 A  False     True
B   True    False
C   True    False

mad(axis=None, skipna=None, level=None)[source]

Return the mean absolute deviation of the values over the requested axis.

Parameters

axis{index (0), columns (1)}
skipnabool, default None
levelint or level name, default None

Returns

Series or DataFrame (if level specified)

mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=NoDefault.no_default)[source]

Replace values where the condition is True.

Parameters

condbool Series/DataFrame, array-like, or callable
otherscalar, Series/DataFrame, or callable
inplacebool, default False
axisint, default None
levelint, default None
errorsstr, {‘raise’, ‘ignore’}, default ‘raise’
try_castbool, default None

Returns

Same type as caller or None if inplace=True.

See also

DataFrame.where()

Notes

The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the mask documentation in indexing.

Examples

Copy
Copied!

            
            >>> s = pd.Series(range(5))
>>> s.where(s > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
>>> s.mask(s > 0)
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

Copy
Copied!

            
            >>> s.where(s > 1, 10)
0    10
1    10
2    2
3    3
4    4
dtype: int64
>>> s.mask(s > 1, 10)
0     0
1     1
2    10
3    10
4    10
dtype: int64

Copy
Copied!

            
            >>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
>>> m = df % 3 == 0
>>> df.where(m, -df)
A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9
>>> df.where(m, -df) == np.where(m, df, -df)
A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
>>> df.where(m, -df) == df.mask(~m, -df)
A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True

max(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)[source]

Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
**kwargs

Returns

Series or DataFrame (if level specified)

See also

Series.sum
Series.min
Series.max
Series.idxmin
Series.idxmax
DataFrame.sum
DataFrame.min
DataFrame.max
DataFrame.idxmin
DataFrame.idxmax

Examples

Copy
Copied!

            
            >>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
falcon    2
cold     fish      0
spider    8
Name: legs, dtype: int64

Copy
Copied!

            
            >>> s.max()
8

mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)[source]

Return the mean of the values over the requested axis.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
**kwargs

Returns

Series or DataFrame (if level specified)

median(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)[source]

Return the median of the values over the requested axis.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
**kwargs

Returns

Series or DataFrame (if level specified)

melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)[source]

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

Parameters

id_varstuple, list, or ndarray, optional
value_varstuple, list, or ndarray, optional
var_namescalar
value_namescalar, default ‘value’
col_levelint or str, optional
ignore_indexbool, default True

Returns

DataFrame

See also

melt
pivot_table
DataFrame.pivot
DataFrame.explode

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
...                    'B': {0: 1, 1: 3, 2: 5},
...                    'C': {0: 2, 1: 4, 2: 6}})
>>> df
A  B  C
0  a  1  2
1  b  3  4
2  c  5  6

Copy
Copied!

            
            >>> df.melt(id_vars=['A'], value_vars=['B'])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5

Copy
Copied!

            
            >>> df.melt(id_vars=['A'], value_vars=['B', 'C'])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
3  a        C      2
4  b        C      4
5  c        C      6

The names of ‘variable’ and ‘value’ columns can be customized:

Copy
Copied!

            
            >>> df.melt(id_vars=['A'], value_vars=['B'],
...         var_name='myVarname', value_name='myValname')
A myVarname  myValname
0  a         B          1
1  b         B          3
2  c         B          5

Original index values can be kept around:

Copy
Copied!

            
            >>> df.melt(id_vars=['A'], value_vars=['B', 'C'], ignore_index=False)
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5
0  a        C      2
1  b        C      4
2  c        C      6

If you have multi-index columns:

Copy
Copied!

            
            >>> df.columns = [list('ABC'), list('DEF')]
>>> df
A  B  C
D  E  F
0  a  1  2
1  b  3  4
2  c  5  6

Copy
Copied!

            
            >>> df.melt(col_level=0, id_vars=['A'], value_vars=['B'])
A variable  value
0  a        B      1
1  b        B      3
2  c        B      5

Copy
Copied!

            
            >>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1  value
0      a          B          E      1
1      b          B          E      3
2      c          B          E      5

memory_usage(index=True, deep=False)[source]

Return the memory usage of each column in bytes.

The memory usage can optionally include the contribution of the index and elements of object dtype.

This value is displayed in DataFrame.info by default. This can be suppressed by setting pandas.options.display.memory_usage to False.

Parameters

indexbool, default True
deepbool, default False

Returns

Series

See also

numpy.ndarray.nbytes
Series.memory_usage
Categorical
DataFrame.info

Examples

Copy
Copied!

            
            >>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, np.ones(shape=5000, dtype=int).astype(t))
...              for t in dtypes])
>>> df = pd.DataFrame(data)
>>> df.head()
int64  float64            complex128  object  bool
0      1      1.0              1.0+0.0j       1  True
1      1      1.0              1.0+0.0j       1  True
2      1      1.0              1.0+0.0j       1  True
3      1      1.0              1.0+0.0j       1  True
4      1      1.0              1.0+0.0j       1  True

Copy
Copied!

            
            >>> df.memory_usage()
Index           128
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

Copy
Copied!

            
            >>> df.memory_usage(index=False)
int64         40000
float64       40000
complex128    80000
object        40000
bool           5000
dtype: int64

The memory footprint of object dtype columns is ignored by default:

Copy
Copied!

            
            >>> df.memory_usage(deep=True)
Index            128
int64          40000
float64        40000
complex128     80000
object        180000
bool            5000
dtype: int64

Use a Categorical for efficient storage of an object-dtype column with many repeated values.

Copy
Copied!

            
            >>> df['object'].astype('category').memory_usage(deep=True)
5244

merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)[source]

Merge DataFrame or named Series objects with a database-style join.

A named Series object is treated as a DataFrame with a single named column.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Parameters

rightDataFrame or named Series
how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘inner’
onlabel or list
left_onlabel or list, or array-like
right_onlabel or list, or array-like
left_indexbool, default False
right_indexbool, default False
sortbool, default False
suffixeslist-like, default is (“_x”, “_y”)
copybool, default True
indicatorbool or str, default False
validatestr, optional

Returns

DataFrame

See also

merge_ordered
merge_asof
DataFrame.join

Notes

Support for specifying index levels as the on, left_on, and right_on parameters was added in version 0.23.0 Support for merging named Series objects was added in version 0.24.0

Examples

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]})
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]})
>>> df1
lkey value
0   foo      1
1   bar      2
2   baz      3
3   foo      5
>>> df2
rkey value
0   foo      5
1   bar      6
2   baz      7
3   foo      8

Merge df1 and df2 on the lkey and rkey columns. The value columns have the default suffixes, _x and _y, appended.

Copy
Copied!

            
            >>> df1.merge(df2, left_on='lkey', right_on='rkey')
lkey  value_x rkey  value_y
0  foo        1  foo        5
1  foo        1  foo        8
2  foo        5  foo        5
3  foo        5  foo        8
4  bar        2  bar        6
5  baz        3  baz        7

Merge DataFrames df1 and df2 with specified left and right suffixes appended to any overlapping columns.

Copy
Copied!

            
            >>> df1.merge(df2, left_on='lkey', right_on='rkey',
...           suffixes=('_left', '_right'))
lkey  value_left rkey  value_right
0  foo           1  foo            5
1  foo           1  foo            8
2  foo           5  foo            5
3  foo           5  foo            8
4  bar           2  bar            6
5  baz           3  baz            7

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have any overlapping columns.

Copy
Copied!

            
            >>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
Index(['value'], dtype='object')

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1
a  b
0   foo  1
1   bar  2
>>> df2
a  c
0   foo  3
1   baz  4

Copy
Copied!

            
            >>> df1.merge(df2, how='inner', on='a')
a  b  c
0   foo  1  3

Copy
Copied!

            
            >>> df1.merge(df2, how='left', on='a')
a  b  c
0   foo  1  3.0
1   bar  2  NaN

Copy
Copied!

            
            >>> df1 = pd.DataFrame({'left': ['foo', 'bar']})
>>> df2 = pd.DataFrame({'right': [7, 8]})
>>> df1
left
0   foo
1   bar
>>> df2
right
0   7
1   8

Copy
Copied!

            
            >>> df1.merge(df2, how='cross')
left  right
0   foo      7
1   foo      8
2   bar      7
3   bar      8

min(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)[source]

Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
**kwargs

Returns

Series or DataFrame (if level specified)

See also

Series.sum
Series.min
Series.max
Series.idxmin
Series.idxmax
DataFrame.sum
DataFrame.min
DataFrame.max
DataFrame.idxmin
DataFrame.idxmax

Examples

Copy
Copied!

            
            >>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
falcon    2
cold     fish      0
spider    8
Name: legs, dtype: int64

Copy
Copied!

            
            >>> s.min()
0

mod(other, axis='columns', level=None, fill_value=None)[source]

Get Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

mode(axis=0, numeric_only=False, dropna=True)[source]

Get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
numeric_onlybool, default False
dropnabool, default True

Returns

DataFrame

See also

Series.mode
Series.value_counts

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([('bird', 2, 2),
...                    ('mammal', 4, np.nan),
...                    ('arthropod', 8, 0),
...                    ('bird', 2, np.nan)],
...                   index=('falcon', 'horse', 'spider', 'ostrich'),
...                   columns=('species', 'legs', 'wings'))
>>> df
species  legs  wings
falcon        bird     2    2.0
horse       mammal     4    NaN
spider   arthropod     8    0.0
ostrich       bird     2    NaN

By default, missing values are not considered, and the mode of wings are both 0 and 2. Because the resulting DataFrame has two rows, the second row of species and legs contains NaN.

Copy
Copied!

            
            >>> df.mode()
species  legs  wings
0    bird   2.0    0.0
1     NaN   NaN    2.0

Setting dropna=False NaN values are considered and they can be the mode (like for wings).

Copy
Copied!

            
            >>> df.mode(dropna=False)
species  legs  wings
0    bird     2    NaN

Setting numeric_only=True, only the mode of numeric columns is computed, and columns of other types are ignored.

Copy
Copied!

            
            >>> df.mode(numeric_only=True)
legs  wings
0   2.0    0.0
1   NaN    2.0

To compute the mode over columns and not rows, use the axis parameter:

Copy
Copied!

            
            >>> df.mode(axis='columns', numeric_only=True)
0    1
falcon   2.0  NaN
horse    4.0  NaN
spider   0.0  8.0
ostrich  2.0  NaN

mul(other, axis='columns', level=None, fill_value=None)[source]

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

multiply(other, axis='columns', level=None, fill_value=None)[source]

Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

property ndim: int

Return an int representing the number of axes / array dimensions.

Return 1 if Series. Otherwise return 2 if DataFrame.

See also

ndarray.ndim

Examples

Copy
Copied!

            
            >>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.ndim
1

Copy
Copied!

            
            >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.ndim
2

ne(other, axis='columns', level=None)[source]

Get Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, =, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
levelint or label

Returns

DataFrame of bool

See also

DataFrame.eq
DataFrame.ne
DataFrame.le
DataFrame.lt
DataFrame.ge
DataFrame.gt

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'cost': [250, 150, 100],
...                    'revenue': [100, 250, 300]},
...                   index=['A', 'B', 'C'])
>>> df
cost  revenue
A   250      100
B   150      250
C   100      300

Comparison with a scalar, using either the operator or method:

Copy
Copied!

            
            >>> df == 100
cost  revenue
A  False     True
B  False    False
C   True    False

Copy
Copied!

            
            >>> df.eq(100)
cost  revenue
A  False     True
B  False    False
C   True    False

When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

Copy
Copied!

            
            >>> df != pd.Series([100, 250], index=["cost", "revenue"])
cost  revenue
A   True     True
B   True    False
C  False     True

Use the method to control the broadcast axis:

Copy
Copied!

            
            >>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True

When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

Copy
Copied!

            
            >>> df == [250, 100]
cost  revenue
A   True     True
B  False    False
C  False    False

Use the method to control the axis:

Copy
Copied!

            
            >>> df.eq([250, 250, 100], axis='index')
cost  revenue
A   True    False
B  False     True
C   True    False

Compare to a DataFrame of different shape.

Copy
Copied!

            
            >>> other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
...                      index=['A', 'B', 'C', 'D'])
>>> other
revenue
A      300
B      250
C      100
D      150

Copy
Copied!

            
            >>> df.gt(other)
cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False

Compare to a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
...                              'revenue': [100, 250, 300, 200, 175, 225]},
...                             index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
...                                    ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex
cost  revenue
Q1 A   250      100
B   150      250
C   100      300
Q2 A   150      200
B   300      175
C   220      225

Copy
Copied!

            
            >>> df.le(df_multindex, level=1)
cost  revenue
Q1 A   True     True
B   True     True
C   True     True
Q2 A  False     True
B   True    False
C   True    False

nlargest(n, columns, keep='first')[source]

Return the first n rows ordered by columns in descending order.

Return the first n rows with the largest values in columns, in descending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.

Parameters

nint

Number of rows to return.

columnslabel or list of labels

Column label(s) to order by.

keep{‘first’, ‘last’, ‘all’}, default ‘first’

Where there are duplicate values:

first : prioritize the first occurrence(s)
last : prioritize the last occurrence(s)
alldo not drop any duplicates, even it means
selecting more than n items.

Returns

DataFrame

See also

DataFrame.nsmallest
DataFrame.sort_values
DataFrame.head

Notes

This function cannot be used with all column types. For example, when specifying columns with object or category dtypes, TypeError is raised.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 11300,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru          11300      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nlargest to select the three rows having the largest values in column “population”.

Copy
Copied!

            
            >>> df.nlargest(3, 'population')
population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Malta       434000    12011      MT

When using keep='last', ties are resolved in reverse order:

Copy
Copied!

            
            >>> df.nlargest(3, 'population', keep='last')
population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN

When using keep='all', all duplicate items are maintained:

Copy
Copied!

            
            >>> df.nlargest(3, 'population', keep='all')
population      GDP alpha-2
France      65000000  2583560      FR
Italy       59000000  1937894      IT
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN

To order by the largest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

Copy
Copied!

            
            >>> df.nlargest(3, ['population', 'GDP'])
population      GDP alpha-2
France    65000000  2583560      FR
Italy     59000000  1937894      IT
Brunei      434000    12128      BN

notna()[source]

Detect existing (non-missing) values.

Return a boolean same-sized object indicating if the values are not NA. Non-missing values get mapped to True. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True). NA values, such as None or numpy.NaN, get mapped to False values.

Returns

DataFrame

See also

DataFrame.notnull
DataFrame.isna
DataFrame.dropna
notna

Examples

Show which entries in a DataFrame are not NA.

Copy
Copied!

            
            >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

Copy
Copied!

            
            >>> df.notna()
age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

Copy
Copied!

            
            >>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

Copy
Copied!

            
            >>> ser.notna()
0     True
1     True
2    False
dtype: bool

notnull()[source]

Detect existing (non-missing) values.

Returns

DataFrame

See also

DataFrame.notnull
DataFrame.isna
DataFrame.dropna
notna

Examples

Show which entries in a DataFrame are not NA.

Copy
Copied!

            
            >>> df = pd.DataFrame(dict(age=[5, 6, np.NaN],
...                    born=[pd.NaT, pd.Timestamp('1939-05-27'),
...                          pd.Timestamp('1940-04-25')],
...                    name=['Alfred', 'Batman', ''],
...                    toy=[None, 'Batmobile', 'Joker']))
>>> df
age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

Copy
Copied!

            
            >>> df.notna()
age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True

Show which entries in a Series are not NA.

Copy
Copied!

            
            >>> ser = pd.Series([5, 6, np.NaN])
>>> ser
0    5.0
1    6.0
2    NaN
dtype: float64

Copy
Copied!

            
            >>> ser.notna()
0     True
1     True
2    False
dtype: bool

nsmallest(n, columns, keep='first')[source]

Return the first n rows ordered by columns in ascending order.

Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant.

Parameters

nint
columnslist or str
keep{‘first’, ‘last’, ‘all’}, default ‘first’

Returns

DataFrame

See also

DataFrame.nlargest
DataFrame.sort_values
DataFrame.head

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
...                                   434000, 434000, 337000, 337000,
...                                   11300, 11300],
...                    'GDP': [1937894, 2583560 , 12011, 4520, 12128,
...                            17036, 182, 38, 311],
...                    'alpha-2': ["IT", "FR", "MT", "MV", "BN",
...                                "IS", "NR", "TV", "AI"]},
...                   index=["Italy", "France", "Malta",
...                          "Maldives", "Brunei", "Iceland",
...                          "Nauru", "Tuvalu", "Anguilla"])
>>> df
population      GDP alpha-2
Italy       59000000  1937894      IT
France      65000000  2583560      FR
Malta         434000    12011      MT
Maldives      434000     4520      MV
Brunei        434000    12128      BN
Iceland       337000    17036      IS
Nauru         337000      182      NR
Tuvalu         11300       38      TV
Anguilla       11300      311      AI

In the following example, we will use nsmallest to select the three rows having the smallest values in column “population”.

Copy
Copied!

            
            >>> df.nsmallest(3, 'population')
population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS

When using keep='last', ties are resolved in reverse order:

Copy
Copied!

            
            >>> df.nsmallest(3, 'population', keep='last')
population  GDP alpha-2
Anguilla       11300  311      AI
Tuvalu         11300   38      TV
Nauru         337000  182      NR

When using keep='all', all duplicate items are maintained:

Copy
Copied!

            
            >>> df.nsmallest(3, 'population', keep='all')
population    GDP alpha-2
Tuvalu         11300     38      TV
Anguilla       11300    311      AI
Iceland       337000  17036      IS
Nauru         337000    182      NR

To order by the smallest values in column “population” and then “GDP”, we can specify multiple columns like in the next example.

Copy
Copied!

            
            >>> df.nsmallest(3, ['population', 'GDP'])
population  GDP alpha-2
Tuvalu         11300   38      TV
Anguilla       11300  311      AI
Nauru         337000  182      NR

nunique(axis=0, dropna=True)[source]

Count number of distinct elements in specified axis.

Return Series with number of distinct elements. Can ignore NaN values.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
dropnabool, default True

Returns

Series

See also

Series.nunique
DataFrame.count

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [4, 5, 6], 'B': [4, 1, 1]})
>>> df.nunique()
A    3
B    2
dtype: int64

Copy
Copied!

            
            >>> df.nunique(axis=1)
0    1
1    2
2    2
dtype: int64

pad(axis=None, inplace=False, limit=None, downcast=None)[source]

Synonym for DataFrame.fillna() with method='ffill'.

Returns

Series/DataFrame or None

pct_change(periods=1, fill_method='pad', limit=None, freq=None, **kwargs)[source]

Percentage change between the current and a prior element.

Computes the percentage change from the immediately previous row by default. This is useful in comparing the percentage of change in a time series of elements.

Parameters

periodsint, default 1
fill_methodstr, default ‘pad’
limitint, default None
freqDateOffset, timedelta, or str, optional
**kwargs

Returns

chgSeries or DataFrame

See also

Series.diff
DataFrame.diff
Series.shift
DataFrame.shift

Examples

Series

Copy
Copied!

            
            >>> s = pd.Series([90, 91, 85])
>>> s
0    90
1    91
2    85
dtype: int64

Copy
Copied!

            
            >>> s.pct_change()
0         NaN
1    0.011111
2   -0.065934
dtype: float64

Copy
Copied!

            
            >>> s.pct_change(periods=2)
0         NaN
1         NaN
2   -0.055556
dtype: float64

See the percentage change in a Series where filling NAs with last valid observation forward to next valid.

Copy
Copied!

            
            >>> s = pd.Series([90, 91, None, 85])
>>> s
0    90.0
1    91.0
2     NaN
3    85.0
dtype: float64

Copy
Copied!

            
            >>> s.pct_change(fill_method='ffill')
0         NaN
1    0.011111
2    0.000000
3   -0.065934
dtype: float64

DataFrame

Percentage change in French franc, Deutsche Mark, and Italian lira from 1980-01-01 to 1980-03-01.

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     'FR': [4.0405, 4.0963, 4.3149],
...     'GR': [1.7246, 1.7482, 1.8519],
...     'IT': [804.74, 810.01, 860.13]},
...     index=['1980-01-01', '1980-02-01', '1980-03-01'])
>>> df
FR      GR      IT
1980-01-01  4.0405  1.7246  804.74
1980-02-01  4.0963  1.7482  810.01
1980-03-01  4.3149  1.8519  860.13

Copy
Copied!

            
            >>> df.pct_change()
FR        GR        IT
1980-01-01       NaN       NaN       NaN
1980-02-01  0.013810  0.013684  0.006549
1980-03-01  0.053365  0.059318  0.061876

Percentage of change in GOOG and APPL stock volume. Shows computing the percentage change between columns.

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     '2016': [1769950, 30586265],
...     '2015': [1500923, 40912316],
...     '2014': [1371819, 41403351]},
...     index=['GOOG', 'APPL'])
>>> df
2016      2015      2014
GOOG   1769950   1500923   1371819
APPL  30586265  40912316  41403351

Copy
Copied!

            
            >>> df.pct_change(axis='columns', periods=-1)
2016      2015  2014
GOOG  0.179241  0.094112   NaN
APPL -0.252395 -0.011860   NaN

pipe(func, *args, **kwargs)[source]

Apply func(self, *args, **kwargs).

Parameters

funcfunction
argsiterable, optional
kwargsmapping, optional

Returns

objectthe return type of func.

See also

DataFrame.apply
DataFrame.applymap
Series.map

Notes

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

Copy
Copied!

            
            >>> func(g(h(df), arg1=a), arg2=b, arg3=c)

You can write

Copy
Copied!

            
            >>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe(func, arg2=b, arg3=c)
... )

If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose f takes its data as arg2:

Copy
Copied!

            
            >>> (df.pipe(h)
...    .pipe(g, arg1=a)
...    .pipe((func, 'arg2'), arg1=a, arg3=c)
...  )

pivot(index=None, columns=None, values=None)[source]

Return reshaped DataFrame organized by given index / column values.

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the User Guide for more on reshaping.

Parameters

indexstr or object or a list of str, optional
columnsstr or object or a list of str
valuesstr, object or a list of the previous, optional

Returns

DataFrame

Raises

ValueError:

See also

DataFrame.pivot_table
DataFrame.unstack
wide_to_long

Notes

For finer-tuned control, see hierarchical indexing documentation along with the related stack/unstack methods.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t

Copy
Copied!

            
            >>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6

Copy
Copied!

            
            >>> df.pivot(index='foo', columns='bar')['baz']
bar  A   B   C
foo
one  1   2   3
two  4   5   6

Copy
Copied!

            
            >>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
baz       zoo
bar   A  B  C   A  B  C
foo
one   1  2  3   x  y  z
two   4  5  6   q  w  t

You could also assign a list of column names or a list of index names.

Copy
Copied!

            
            >>> df = pd.DataFrame({
...        "lev1": [1, 1, 1, 2, 2, 2],
...        "lev2": [1, 1, 2, 1, 1, 2],
...        "lev3": [1, 2, 1, 2, 1, 2],
...        "lev4": [1, 2, 3, 4, 5, 6],
...        "values": [0, 1, 2, 3, 4, 5]})
>>> df
lev1 lev2 lev3 lev4 values
0   1    1    1    1    0
1   1    1    2    2    1
2   1    2    1    3    2
3   2    1    2    4    3
4   2    1    1    5    4
5   2    2    2    6    5

Copy
Copied!

            
            >>> df.pivot(index="lev1", columns=["lev2", "lev3"],values="values")
lev2    1         2
lev3    1    2    1    2
lev1
1     0.0  1.0  2.0  NaN
2     4.0  3.0  NaN  5.0

Copy
Copied!

            
            >>> df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
lev3    1    2
lev1  lev2
1     1  0.0  1.0
2  2.0  NaN
2     1  4.0  3.0
2  NaN  5.0

A ValueError is raised if there are any duplicates.

Copy
Copied!

            
            >>> df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
...                    "bar": ['A', 'A', 'B', 'C'],
...                    "baz": [1, 2, 3, 4]})
>>> df
foo bar  baz
0  one   A    1
1  one   A    2
2  two   B    3
3  two   C    4

Notice that the first two rows are the same for our index and columns arguments.

Copy
Copied!

            
            >>> df.pivot(index='foo', columns='bar', values='baz')
Traceback (most recent call last):
   ...
ValueError: Index contains duplicate entries, cannot reshape

pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)[source]

Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

Parameters

valuescolumn to aggregate, optional
indexcolumn, Grouper, array, or list of the previous
columnscolumn, Grouper, array, or list of the previous
aggfuncfunction, list of functions, dict, default numpy.mean
fill_valuescalar, default None
marginsbool, default False
dropnabool, default True
margins_namestr, default ‘All’
observedbool, default False
sortbool, default True

Returns

DataFrame

See also

DataFrame.pivot
DataFrame.melt
wide_to_long

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two",
...                          "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small",
...                          "small", "large", "small", "small",
...                          "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df
A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

This first example aggregates values by taking the sum.

Copy
Copied!

            
            >>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum)
>>> table
C        large  small
A   B
bar one    4.0    5.0
two    7.0    6.0
foo one    4.0    1.0
two    NaN    6.0

We can also fill missing values using the fill_value parameter.

Copy
Copied!

            
            >>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
...                     columns=['C'], aggfunc=np.sum, fill_value=0)
>>> table
C        large  small
A   B
bar one      4      5
two      7      6
foo one      4      1
two      0      6

The next example aggregates by taking the mean across multiple columns.

Copy
Copied!

            
            >>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': np.mean})
>>> table
D         E
A   C
bar large  5.500000  7.500000
small  5.500000  8.500000
foo large  2.000000  4.500000
small  2.333333  4.333333

We can also calculate multiple types of aggregations for any given value column.

Copy
Copied!

            
            >>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': [min, max, np.mean]})
>>> table
D    E
mean  max      mean  min
A   C
bar large  5.500000  9.0  7.500000  6.0
small  5.500000  9.0  8.500000  8.0
foo large  2.000000  5.0  4.500000  4.0
small  2.333333  6.0  4.333333  2.0

plot[source]

pop(item)[source]

Return item and drop from frame. Raise KeyError if not found.

Parameters

itemlabel

Returns

Series

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=('name', 'class', 'max_speed'))
>>> df
name   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

Copy
Copied!

            
            >>> df.pop('class')
0      bird
1      bird
2    mammal
3    mammal
Name: class, dtype: object

Copy
Copied!

            
            >>> df
name  max_speed
0  falcon      389.0
1  parrot       24.0
2    lion       80.5
3  monkey        NaN

pow(other, axis='columns', level=None, fill_value=None)[source]

Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

prod(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)[source]

Return the product of the values over the requested axis.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
min_countint, default 0
**kwargs

Returns

Series or DataFrame (if level specified)

See also

Series.sum
Series.min
Series.max
Series.idxmin
Series.idxmax
DataFrame.sum
DataFrame.min
DataFrame.max
DataFrame.idxmin
DataFrame.idxmax

Examples

By default, the product of an empty or all-NA Series is 1

Copy
Copied!

            
            >>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

Copy
Copied!

            
            >>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

Copy
Copied!

            
            >>> pd.Series([np.nan]).prod()
1.0

Copy
Copied!

            
            >>> pd.Series([np.nan]).prod(min_count=1)
nan

product(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)[source]

Return the product of the values over the requested axis.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
min_countint, default 0
**kwargs

Returns

Series or DataFrame (if level specified)

See also

Series.sum
Series.min
Series.max
Series.idxmin
Series.idxmax
DataFrame.sum
DataFrame.min
DataFrame.max
DataFrame.idxmin
DataFrame.idxmax

Examples

By default, the product of an empty or all-NA Series is 1

Copy
Copied!

            
            >>> pd.Series([], dtype="float64").prod()
1.0

This can be controlled with the min_count parameter

Copy
Copied!

            
            >>> pd.Series([], dtype="float64").prod(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

Copy
Copied!

            
            >>> pd.Series([np.nan]).prod()
1.0

Copy
Copied!

            
            >>> pd.Series([np.nan]).prod(min_count=1)
nan

quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')[source]

Return values at the given quantile over requested axis.

Parameters

qfloat or array-like, default 0.5 (50% quantile)
axis{0, 1, ‘index’, ‘columns’}, default 0
numeric_onlybool, default True
interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}

Returns

Series or DataFrame

If q is an array, a DataFrame will be returned where the
If q is a float, a Series will be returned where the

See also

core.window.Rolling.quantile
numpy.percentile

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
...                   columns=['a', 'b'])
>>> df.quantile(.1)
a    1.3
b    3.7
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
a     b
0.1  1.3   3.7
0.5  2.5  55.0

Specifying numeric_only=False will also compute the quantile of datetime and timedelta data.

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [1, 2],
...                    'B': [pd.Timestamp('2010'),
...                          pd.Timestamp('2011')],
...                    'C': [pd.Timedelta('1 days'),
...                          pd.Timedelta('2 days')]})
>>> df.quantile(0.5, numeric_only=False)
A                    1.5
B    2010-07-02 12:00:00
C        1 days 12:00:00
Name: 0.5, dtype: object

query(expr, inplace=False, **kwargs)[source]

Query the columns of a DataFrame with a boolean expression.

Parameters

exprstr
inplacebool
**kwargs

Returns

DataFrame or None

See also

eval
DataFrame.eval

Notes

The result of the evaluation of this expression is first passed to DataFrame.loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.

For further details and examples see the query documentation in indexing.

Backtick quoted variables

Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.

During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign. For other characters that fall outside the ASCII range (U+0001..U+007F) and those that are not further specified in PEP 3131, the query parser will raise an error. This excludes whitespace different than the space character, but also the hashtag (as it is used for comments) and the backtick itself (backtick can also not be escaped).

In a special case, quotes that make a pair around a backtick can confuse the parser. For example, `it's` > `that's` will raise an error, as it forms a quoted string ('s > `that') with a backtick inside.

See also the Python documentation about lexical analysis (https://docs.python.org/3/reference/lexical_analysis.html) in combination with the source code in pandas.core.computation.parsing.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6
>>> df.query('A > B')
A  B  C C
4  5  2    6

The previous expression is equivalent to

Copy
Copied!

            
            >>> df[df.A > df.B]
A  B  C C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

Copy
Copied!

            
            >>> df.query('B == `C C`')
A   B  C C
0  1  10   10

The previous expression is equivalent to

Copy
Copied!

            
            >>> df[df.B == df['C C']]
A   B  C C
0  1  10   10

radd(other, axis='columns', level=None, fill_value=None)[source]

Get Addition of dataframe and other, element-wise (binary operator radd).

Equivalent to other + dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)[source]

Compute numerical data ranks (1 through n) along axis.

By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
numeric_onlybool, optional
na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
ascendingbool, default True
pctbool, default False

Returns

same type as caller

See also

core.groupby.GroupBy.rank

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
...                                    'spider', 'snake'],
...                         'Number_legs': [4, 2, 4, 8, np.nan]})
>>> df
Animal  Number_legs
0      cat          4.0
1  penguin          2.0
2      dog          4.0
3   spider          8.0
4    snake          NaN

The following example shows how the method behaves with the above parameters:

default_rank: this is the default behaviour obtained without using any parameter.
max_rank: setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
NA_bottom: choosing na_option = 'bottom', if there are records with NaN values they are placed at the bottom of the ranking.
pct_rank: when setting pct = True, the ranking is expressed as percentile rank.

Copy
Copied!

            
            >>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df
Animal  Number_legs  default_rank  max_rank  NA_bottom  pct_rank
0      cat          4.0           2.5       3.0        2.5     0.625
1  penguin          2.0           1.0       1.0        1.0     0.250
2      dog          4.0           2.5       3.0        2.5     0.625
3   spider          8.0           4.0       4.0        4.0     1.000
4    snake          NaN           NaN       NaN        5.0       NaN

rdiv(other, axis='columns', level=None, fill_value=None)[source]

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)[source]

Conform Series/DataFrame to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters

keywords for axesarray-like, optional
method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}
copybool, default True
levelint or name
fill_valuescalar, default np.NaN
limitint, default None
toleranceoptional

Returns

Series/DataFrame with changed index.

See also

DataFrame.set_index
DataFrame.reset_index
DataFrame.reindex_like

Examples

DataFrame.reindex supports two calling conventions

(index=index_labels, columns=column_labels, ...)
(labels, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Create a dataframe with some fictional data.

Copy
Copied!

            
            >>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
...                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
...                   index=index)
>>> df
http_status  response_time
Firefox            200           0.04
Chrome             200           0.02
Safari             404           0.07
IE10               404           0.08
Konqueror          301           1.00

Create a new index and reindex the dataframe. By default values in the new index that do not have corresponding records in the dataframe are assigned NaN.

Copy
Copied!

            
            >>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
...              'Chrome']
>>> df.reindex(new_index)
http_status  response_time
Safari               404.0           0.07
Iceweasel              NaN            NaN
Comodo Dragon          NaN            NaN
IE10                 404.0           0.08
Chrome               200.0           0.02

We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

Copy
Copied!

            
            >>> df.reindex(new_index, fill_value=0)
http_status  response_time
Safari                 404           0.07
Iceweasel                0           0.00
Comodo Dragon            0           0.00
IE10                   404           0.08
Chrome                 200           0.02

Copy
Copied!

            
            >>> df.reindex(new_index, fill_value='missing')
http_status response_time
Safari                404          0.07
Iceweasel         missing       missing
Comodo Dragon     missing       missing
IE10                  404          0.08
Chrome                200          0.02

We can also reindex the columns.

Copy
Copied!

            
            >>> df.reindex(columns=['http_status', 'user_agent'])
http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

Or we can use “axis-style” keyword arguments

Copy
Copied!

            
            >>> df.reindex(['http_status', 'user_agent'], axis="columns")
http_status  user_agent
Firefox            200         NaN
Chrome             200         NaN
Safari             404         NaN
IE10               404         NaN
Konqueror          301         NaN

To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

Copy
Copied!

            
            >>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
...                    index=date_index)
>>> df2
prices
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0

Suppose we decide to expand the dataframe to cover a wider date range.

Copy
Copied!

            
            >>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2)
prices
2009-12-29     NaN
2009-12-30     NaN
2009-12-31     NaN
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

The index entries that did not have a value in the original data frame (for example, ‘2009-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

Copy
Copied!

            
            >>> df2.reindex(date_index2, method='bfill')
prices
2009-12-29   100.0
2009-12-30   100.0
2009-12-31   100.0
2010-01-01   100.0
2010-01-02   101.0
2010-01-03     NaN
2010-01-04   100.0
2010-01-05    89.0
2010-01-06    88.0
2010-01-07     NaN

Please note that the NaN value present in the original dataframe (at index value 2010-01-03) will not be filled by any of the value propagation schemes. This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataframe, use the fillna() method.

See the user guide for more.

reindex_like(other, method=None, copy=True, limit=None, tolerance=None)[source]

Return an object with matching indices as other object.

Conform the object to the same index on all axes. Optional filling logic, placing NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters

otherObject of the same data type
method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}
copybool, default True
limitint, default None
toleranceoptional

Returns

Series or DataFrame

See also

DataFrame.set_index
DataFrame.reset_index
DataFrame.reindex

Notes

Same as calling .reindex(index=other.index, columns=other.columns,...).

Examples

Copy
Copied!

            
            >>> df1 = pd.DataFrame([[24.3, 75.7, 'high'],
...                     [31, 87.8, 'high'],
...                     [22, 71.6, 'medium'],
...                     [35, 95, 'medium']],
...                    columns=['temp_celsius', 'temp_fahrenheit',
...                             'windspeed'],
...                    index=pd.date_range(start='2014-02-12',
...                                        end='2014-02-15', freq='D'))

Copy
Copied!

            
            >>> df1
temp_celsius  temp_fahrenheit windspeed
2014-02-12          24.3             75.7      high
2014-02-13          31.0             87.8      high
2014-02-14          22.0             71.6    medium
2014-02-15          35.0             95.0    medium

Copy
Copied!

            
            >>> df2 = pd.DataFrame([[28, 'low'],
...                     [30, 'low'],
...                     [35.1, 'medium']],
...                    columns=['temp_celsius', 'windspeed'],
...                    index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
...                                            '2014-02-15']))

Copy
Copied!

            
            >>> df2
temp_celsius windspeed
2014-02-12          28.0       low
2014-02-13          30.0       low
2014-02-15          35.1    medium

Copy
Copied!

            
            >>> df2.reindex_like(df1)
temp_celsius  temp_fahrenheit windspeed
2014-02-12          28.0              NaN       low
2014-02-13          30.0              NaN       low
2014-02-14           NaN              NaN       NaN
2014-02-15          35.1              NaN    medium

rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None, errors='ignore')[source]

Alter axes labels.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

See the user guide for more.

Parameters

mapperdict-like or function
indexdict-like or function
columnsdict-like or function
axis{0 or ‘index’, 1 or ‘columns’}, default 0
copybool, default True
inplacebool, default False
levelint or level name, default None
errors{‘ignore’, ‘raise’}, default ‘ignore’

Returns

DataFrame or None

Raises

KeyError

See also

DataFrame.rename_axis

Examples

DataFrame.rename supports two calling conventions

(index=index_mapper, columns=columns_mapper, ...)
(mapper, axis={'index', 'columns'}, ...)

We highly recommend using keyword arguments to clarify your intent.

Rename columns using a mapping:

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})
a  c
0  1  4
1  2  5
2  3  6

Rename index using a mapping:

Copy
Copied!

            
            >>> df.rename(index={0: "x", 1: "y", 2: "z"})
A  B
x  1  4
y  2  5
z  3  6

Cast index labels to a different type:

Copy
Copied!

            
            >>> df.index
RangeIndex(start=0, stop=3, step=1)
>>> df.rename(index=str).index
Index(['0', '1', '2'], dtype='object')

Copy
Copied!

            
            >>> df.rename(columns={"A": "a", "B": "b", "C": "c"}, errors="raise")
Traceback (most recent call last):
KeyError: ['C'] not found in axis

Using axis-style parameters:

Copy
Copied!

            
            >>> df.rename(str.lower, axis='columns')
a  b
0  1  4
1  2  5
2  3  6

Copy
Copied!

            
            >>> df.rename({1: 2, 2: 4}, axis='index')
A  B
0  1  4
2  2  5
4  3  6

rename_axis(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False)[source]

Set the name of the axis for the index or columns.

Parameters

mapperscalar, list-like, optional
index, columnsscalar, list-like, dict-like or function, optional
axis{0 or ‘index’, 1 or ‘columns’}, default 0
copybool, default True
inplacebool, default False

Returns

Series, DataFrame, or None

See also

Series.rename
DataFrame.rename
Index.rename

Notes

DataFrame.rename_axis supports two calling conventions

(index=index_mapper, columns=columns_mapper, ...)
(mapper, axis={'index', 'columns'}, ...)

The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter copy is ignored.

The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.

We highly recommend using keyword arguments to clarify your intent.

Examples

Series

Copy
Copied!

            
            >>> s = pd.Series(["dog", "cat", "monkey"])
>>> s
0       dog
1       cat
2    monkey
dtype: object
>>> s.rename_axis("animal")
animal
0    dog
1    cat
2    monkey
dtype: object

DataFrame

Copy
Copied!

            
            >>> df = pd.DataFrame({"num_legs": [4, 4, 2],
...                    "num_arms": [0, 0, 2]},
...                   ["dog", "cat", "monkey"])
>>> df
num_legs  num_arms
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("animal")
>>> df
num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df
limbs   num_legs  num_arms
animal
dog            4         0
cat            4         0
monkey         2         2

MultiIndex

Copy
Copied!

            
            >>> df.index = pd.MultiIndex.from_product([['mammal'],
...                                        ['dog', 'cat', 'monkey']],
...                                       names=['type', 'name'])
>>> df
limbs          num_legs  num_arms
type   name
mammal dog            4         0
cat            4         0
monkey         2         2

Copy
Copied!

            
            >>> df.rename_axis(index={'type': 'class'})
limbs          num_legs  num_arms
class  name
mammal dog            4         0
cat            4         0
monkey         2         2

Copy
Copied!

            
            >>> df.rename_axis(columns=str.upper)
LIMBS          num_legs  num_arms
type   name
mammal dog            4         0
cat            4         0
monkey         2         2

reorder_levels(order, axis=0)[source]

Rearrange index levels using input order. May not drop or duplicate levels.

Parameters

orderlist of int or list of str
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Returns

DataFrame

replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')[source]

Replace values given in to_replace with value.

Values of the DataFrame are replaced with other values dynamically.

This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.

Parameters

to_replacestr, regex, list, dict, Series, int, float, or None

How to find the values that will be replaced.

numeric, str or regex:
- numeric: numeric values equal to to_replace will be
  replaced with value
- str: string exactly matching to_replace will be replaced
  with value
- regex: regexs matching to_replace will be replaced with
  value
list of str, regex, or numeric:
- First, if to_replace and value are both lists, they
  must be the same length.
- Second, if regex=True then all of the strings in both
  lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much for value since there are only a few possible substitution regexes you can use.
- str, regex and numeric rules apply as above.
dict:
- Dicts can be used to specify different replacement values
  for different existing values. For example, {'a': 'b', 'y': 'z'} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.
- For a DataFrame a dict can specify that different values
  should be replaced in different columns. For example, {'a': 1, 'b': 'z'} looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified in value. The value parameter should not be None in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in.
- For a DataFrame nested dictionaries, e.g.,
  {'a': {'b': np.nan}}, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. The value parameter should be None to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
None:
- This means that the regex argument must be a string,
  compiled regular expression, or list, dict, ndarray or Series of such elements. If value is also None then this must be a nested dictionary or Series.

See the examples section for examples of each of these.

valuescalar, dict, list, str, regex, default None

Value to replace any values matching to_replace with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.

inplacebool, default False

If True, performs operation inplace and returns None.

limitint, default None

Maximum size gap to forward or backward fill.

regexbool or same types as to_replace, default False

Whether to interpret to_replace and/or value as regular expressions. If this is True then to_replace must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which case to_replace must be None.

method{‘pad’, ‘ffill’, ‘bfill’, None}

The method to use when for replacement, when to_replace is a scalar, list or tuple and value is None.

Changed in version 0.23.0:Added to DataFrame.

Returns

DataFrame

Raises

AssertionError

If regex is not a bool and to_replace is not
None.

TypeError

If to_replace is not a scalar, array-like, dict, or None
If to_replace is a dict and value is not a list,
dict, ndarray, or Series
If to_replace is None and regex is not compilable
into a regular expression or is a list, dict, ndarray, or Series.
When replacing multiple bool or datetime64 objects and
the arguments to to_replace does not match the type of the value being replaced

ValueError

If a list or an ndarray is passed to to_replace and
value but they are not the same length.

See also

DataFrame.fillna
DataFrame.where
Series.str.replace

Notes

Regex substitution is performed under the hood with re.sub. The
rules for substitution for re.sub are the same.
Regular expressions will only substitute on strings, meaning you
cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
This method has a lot of options. You are encouraged to experiment
and play with this method to gain intuition about how it works.
When dict is used as the to_replace value, it is like
key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.

Examples

Scalar `to_replace` and `value`

Copy
Copied!

            
            >>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

List-like `to_replace`

Copy
Copied!

            
            >>> df.replace([0, 1, 2, 3], 4)
A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e

Copy
Copied!

            
            >>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e

Copy
Copied!

            
            >>> s.replace([1, 2], method='bfill')
0    0
1    3
2    3
3    3
4    4
dtype: int64

dict-like `to_replace`

Copy
Copied!

            
            >>> df.replace({0: 10, 1: 100})
A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e

Copy
Copied!

            
            >>> df.replace({'A': 0, 'B': 5}, 100)
A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e

Copy
Copied!

            
            >>> df.replace({'A': {0: 100, 4: 400}})
A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e

Regular expression `to_replace`

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
A    B
0   new  abc
1   foo  new
2  bait  xyz

Copy
Copied!

            
            >>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
A    B
0   new  abc
1   foo  bar
2  bait  xyz

Copy
Copied!

            
            >>> df.replace(regex=r'^ba.$', value='new')
A    B
0   new  abc
1   foo  new
2  bait  xyz

Copy
Copied!

            
            >>> df.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})
A    B
0   new  abc
1   xyz  new
2  bait  xyz

Copy
Copied!

            
            >>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
A    B
0   new  abc
1   new  new
2  bait  xyz

Compare the behavior of s.replace({'a': None}) and s.replace('a', None) to understand the peculiarities of the to_replace parameter:

Copy
Copied!

            
            >>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the to_replace value, it is like the value(s) in the dict are equal to the value parameter. s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):

Copy
Copied!

            
            >>> s.replace({'a': None})
0      10
1    None
2    None
3       b
4    None
dtype: object

When value=None and to_replace is a scalar, list or tuple, replace uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The command s.replace('a', None) is actually equivalent to s.replace(to_replace='a', value=None, method='pad'):

Copy
Copied!

            
            >>> s.replace('a', None)
0    10
1    10
2    10
3     b
4     b
dtype: object

resample(rule, axis=0, closed=None, label=None, convention='start', kind=None, loffset=None, base=None, on=None, level=None, origin='start_day', offset=None)[source]

Resample time-series data.

Convenience method for frequency conversion and resampling of time series. The object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or the caller must pass the label of a datetime-like series/index to the on/level keyword parameter.

Parameters

ruleDateOffset, Timedelta or str
axis{0 or ‘index’, 1 or ‘columns’}, default 0
closed{‘right’, ‘left’}, default None
label{‘right’, ‘left’}, default None
convention{‘start’, ‘end’, ‘s’, ‘e’}, default ‘start’
kind{‘timestamp’, ‘period’}, optional, default None
loffsettimedelta, default None
baseint, default 0
onstr, optional
levelstr or int, optional
origin{‘epoch’, ‘start’, ‘start_day’, ‘end’, ‘end_day’}, Timestamp
offsetTimedelta or str, default is None

Returns

pandas.core.Resampler

See also

Series.resample
DataFrame.resample
groupby
asfreq

Notes

See the user guide for more.

To learn more about the offset strings, please see this link.

Examples

Start by creating a series with 9 one minute timestamps.

Copy
Copied!

            
            >>> index = pd.date_range('1/1/2000', periods=9, freq='T')
>>> series = pd.Series(range(9), index=index)
>>> series
2000-01-01 00:00:00    0
2000-01-01 00:01:00    1
2000-01-01 00:02:00    2
2000-01-01 00:03:00    3
2000-01-01 00:04:00    4
2000-01-01 00:05:00    5
2000-01-01 00:06:00    6
2000-01-01 00:07:00    7
2000-01-01 00:08:00    8
Freq: T, dtype: int64

Downsample the series into 3 minute bins and sum the values of the timestamps falling into a bin.

Copy
Copied!

            
            >>> series.resample('3T').sum()
2000-01-01 00:00:00     3
2000-01-01 00:03:00    12
2000-01-01 00:06:00    21
Freq: 3T, dtype: int64

Downsample the series into 3 minute bins as above, but label each bin using the right edge instead of the left. Please note that the value in the bucket used as the label is not included in the bucket, which it labels. For example, in the original series the bucket 2000-01-01 00:03:00 contains the value 3, but the summed value in the resampled bucket with the label 2000-01-01 00:03:00 does not include 3 (if it did, the summed value would be 6, not 3). To include this value close the right side of the bin interval as illustrated in the example below this one.

Copy
Copied!

            
            >>> series.resample('3T', label='right').sum()
2000-01-01 00:03:00     3
2000-01-01 00:06:00    12
2000-01-01 00:09:00    21
Freq: 3T, dtype: int64

Downsample the series into 3 minute bins as above, but close the right side of the bin interval.

Copy
Copied!

            
            >>> series.resample('3T', label='right', closed='right').sum()
2000-01-01 00:00:00     0
2000-01-01 00:03:00     6
2000-01-01 00:06:00    15
2000-01-01 00:09:00    15
Freq: 3T, dtype: int64

Upsample the series into 30 second bins.

Copy
Copied!

            
            >>> series.resample('30S').asfreq()[0:5]   # Select first 5 rows
2000-01-01 00:00:00   0.0
2000-01-01 00:00:30   NaN
2000-01-01 00:01:00   1.0
2000-01-01 00:01:30   NaN
2000-01-01 00:02:00   2.0
Freq: 30S, dtype: float64

Upsample the series into 30 second bins and fill the NaN values using the pad method.

Copy
Copied!

            
            >>> series.resample('30S').pad()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    0
2000-01-01 00:01:00    1
2000-01-01 00:01:30    1
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Upsample the series into 30 second bins and fill the NaN values using the bfill method.

Copy
Copied!

            
            >>> series.resample('30S').bfill()[0:5]
2000-01-01 00:00:00    0
2000-01-01 00:00:30    1
2000-01-01 00:01:00    1
2000-01-01 00:01:30    2
2000-01-01 00:02:00    2
Freq: 30S, dtype: int64

Pass a custom function via apply

Copy
Copied!

            
            >>> def custom_resampler(arraylike):
...     return np.sum(arraylike) + 5
...
>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00     8
2000-01-01 00:03:00    17
2000-01-01 00:06:00    26
Freq: 3T, dtype: int64

For a Series with a PeriodIndex, the keyword convention can be used to control whether to use the start or end of rule.

Resample a year by quarter using ‘start’ convention. Values are assigned to the first quarter of the period.

Copy
Copied!

            
            >>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
...                                             freq='A',
...                                             periods=2))
>>> s
2012    1
2013    2
Freq: A-DEC, dtype: int64
>>> s.resample('Q', convention='start').asfreq()
2012Q1    1.0
2012Q2    NaN
2012Q3    NaN
2012Q4    NaN
2013Q1    2.0
2013Q2    NaN
2013Q3    NaN
2013Q4    NaN
Freq: Q-DEC, dtype: float64

Resample quarters by month using ‘end’ convention. Values are assigned to the last month of the period.

Copy
Copied!

            
            >>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',
...                                                   freq='Q',
...                                                   periods=4))
>>> q
2018Q1    1
2018Q2    2
2018Q3    3
2018Q4    4
Freq: Q-DEC, dtype: int64
>>> q.resample('M', convention='end').asfreq()
2018-03    1.0
2018-04    NaN
2018-05    NaN
2018-06    2.0
2018-07    NaN
2018-08    NaN
2018-09    3.0
2018-10    NaN
2018-11    NaN
2018-12    4.0
Freq: M, dtype: float64

For DataFrame objects, the keyword on can be used to specify the column instead of the index for resampling.

Copy
Copied!

            
            >>> d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...      'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018',
...                                     periods=8,
...                                     freq='W')
>>> df
price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25
>>> df.resample('M', on='week_starting').mean()
price  volume
week_starting
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0

For a DataFrame with MultiIndex, the keyword level can be used to specify on which level the resampling needs to take place.

Copy
Copied!

            
            >>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...       'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df2 = pd.DataFrame(
...     d2,
...     index=pd.MultiIndex.from_product(
...         [days, ['morning', 'afternoon']]
...     )
... )
>>> df2
price  volume
2000-01-01 morning       10      50
afternoon     11      60
2000-01-02 morning        9      40
afternoon     13     100
2000-01-03 morning       14      50
afternoon     18     100
2000-01-04 morning       17      40
afternoon     19      50
>>> df2.resample('D', level=0).sum()
price  volume
2000-01-01     21     110
2000-01-02     22     140
2000-01-03     32     150
2000-01-04     36      90

If you want to adjust the start of the bins based on a fixed timestamp:

Copy
Copied!

            
            >>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7T, dtype: int64

Copy
Copied!

            
            >>> ts.resample('17min').sum()
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17T, dtype: int64

Copy
Copied!

            
            >>> ts.resample('17min', origin='epoch').sum()
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17T, dtype: int64

Copy
Copied!

            
            >>> ts.resample('17min', origin='2000-01-01').sum()
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17T, dtype: int64

If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:

Copy
Copied!

            
            >>> ts.resample('17min', origin='start').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64

Copy
Copied!

            
            >>> ts.resample('17min', offset='23h30min').sum()
2000-10-01 23:30:00     9
2000-10-01 23:47:00    21
2000-10-02 00:04:00    54
2000-10-02 00:21:00    24
Freq: 17T, dtype: int64

If you want to take the largest Timestamp as the end of the bins:

Copy
Copied!

            
            >>> ts.resample('17min', origin='end').sum()
2000-10-01 23:35:00     0
2000-10-01 23:52:00    18
2000-10-02 00:09:00    27
2000-10-02 00:26:00    63
Freq: 17T, dtype: int64

In contrast with the start_day, you can use end_day to take the ceiling midnight of the largest Timestamp as the end of the bins and drop the bins not containing data:

Copy
Copied!

            
            >>> ts.resample('17min', origin='end_day').sum()
2000-10-01 23:38:00     3
2000-10-01 23:55:00    15
2000-10-02 00:12:00    45
2000-10-02 00:29:00    45
Freq: 17T, dtype: int64

To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:

Copy
Copied!

            
            >>> ts.resample('17min', offset='2min').sum()
2000-10-01 23:16:00     0
2000-10-01 23:33:00     9
2000-10-01 23:50:00    36
2000-10-02 00:07:00    39
2000-10-02 00:24:00    24
Freq: 17T, dtype: int64

To replace the use of the deprecated loffset argument:

Copy
Copied!

            
            >>> from pandas.tseries.frequencies import to_offset
>>> loffset = '19min'
>>> ts_out = ts.resample('17min').sum()
>>> ts_out.index = ts_out.index + to_offset(loffset)
>>> ts_out
2000-10-01 23:33:00     0
2000-10-01 23:50:00     9
2000-10-02 00:07:00    21
2000-10-02 00:24:00    54
2000-10-02 00:41:00    24
Freq: 17T, dtype: int64

reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')[source]

Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters

levelint, str, tuple, or list, default None
dropbool, default False
inplacebool, default False
col_levelint or str, default 0
col_fillobject, default ‘’

Returns

DataFrame or None

See also

DataFrame.set_index
DataFrame.reindex
DataFrame.reindex_like

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([('bird', 389.0),
...                    ('bird', 24.0),
...                    ('mammal', 80.5),
...                    ('mammal', np.nan)],
...                   index=['falcon', 'parrot', 'lion', 'monkey'],
...                   columns=('class', 'max_speed'))
>>> df
class  max_speed
falcon    bird      389.0
parrot    bird       24.0
lion    mammal       80.5
monkey  mammal        NaN

When we reset the index, the old index is added as a column, and a new sequential index is used:

Copy
Copied!

            
            >>> df.reset_index()
index   class  max_speed
0  falcon    bird      389.0
1  parrot    bird       24.0
2    lion  mammal       80.5
3  monkey  mammal        NaN

We can use the drop parameter to avoid the old index being added as a column:

Copy
Copied!

            
            >>> df.reset_index(drop=True)
class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal        NaN

You can also use reset_index with MultiIndex.

Copy
Copied!

            
            >>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
...                                    ('bird', 'parrot'),
...                                    ('mammal', 'lion'),
...                                    ('mammal', 'monkey')],
...                                   names=['class', 'name'])
>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
...                                      ('species', 'type')])
>>> df = pd.DataFrame([(389.0, 'fly'),
...                    ( 24.0, 'fly'),
...                    ( 80.5, 'run'),
...                    (np.nan, 'jump')],
...                   index=index,
...                   columns=columns)
>>> df
speed species
max    type
class  name
bird   falcon  389.0     fly
parrot   24.0     fly
mammal lion     80.5     run
monkey    NaN    jump

If the index has multiple levels, we can reset a subset of them:

Copy
Copied!

            
            >>> df.reset_index(level='class')
class  speed species
max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

If we are not dropping the index, by default, it is placed in the top level. We can place it in another level:

Copy
Copied!

            
            >>> df.reset_index(level='class', col_level=1)
speed species
class    max    type
name
falcon    bird  389.0     fly
parrot    bird   24.0     fly
lion    mammal   80.5     run
monkey  mammal    NaN    jump

When the index is inserted under another level, we can specify under which one with the parameter col_fill:

Copy
Copied!

            
            >>> df.reset_index(level='class', col_level=1, col_fill='species')
species  speed species
class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

If we specify a nonexistent level for col_fill, it is created:

Copy
Copied!

            
            >>> df.reset_index(level='class', col_level=1, col_fill='genus')
genus  speed species
class    max    type
name
falcon           bird  389.0     fly
parrot           bird   24.0     fly
lion           mammal   80.5     run
monkey         mammal    NaN    jump

rfloordiv(other, axis='columns', level=None, fill_value=None)[source]

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

rmod(other, axis='columns', level=None, fill_value=None)[source]

Get Modulo of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

rmul(other, axis='columns', level=None, fill_value=None)[source]

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Equivalent to other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')[source]

Provide rolling window calculations.

Parameters

windowint, offset, or BaseIndexer subclass
min_periodsint, default None
centerbool, default False
win_typestr, default None
onstr, optional
axisint or str, default 0
closedstr, default None
methodstr {‘single’, ‘table’}, default ‘single’

Returns

a Window or Rolling sub-classed for the particular operation

See also

expanding
ewm

Notes

By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

To learn more about the offsets & frequency strings, please see this link.

If win_type=None, all points are evenly weighted; otherwise, win_type can accept a string of any scipy.signal window function.

Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature. Please see the third example below on how to add the additional parameters.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df
B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Rolling sum with a window length of 2, using the ‘triang’ window type.

Copy
Copied!

            
            >>> df.rolling(2, win_type='triang').sum()
B
0  NaN
1  0.5
2  1.5
3  NaN
4  NaN

Rolling sum with a window length of 2, using the ‘gaussian’ window type (note how we need to specify std).

Copy
Copied!

            
            >>> df.rolling(2, win_type='gaussian').sum(std=3)
B
0       NaN
1  0.986207
2  2.958621
3       NaN
4       NaN

Rolling sum with a window length of 2, min_periods defaults to the window length.

Copy
Copied!

            
            >>> df.rolling(2).sum()
B
0  NaN
1  1.0
2  3.0
3  NaN
4  NaN

Same as above, but explicitly set the min_periods

Copy
Copied!

            
            >>> df.rolling(2, min_periods=1).sum()
B
0  0.0
1  1.0
2  3.0
3  2.0
4  4.0

Same as above, but with forward-looking windows

Copy
Copied!

            
            >>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
>>> df.rolling(window=indexer, min_periods=1).sum()
B
0  1.0
1  3.0
2  2.0
3  4.0
4  4.0

A ragged (meaning not-a-regular frequency), time-indexed DataFrame

Copy
Copied!

            
            >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
...                   index = [pd.Timestamp('20130101 09:00:00'),
...                            pd.Timestamp('20130101 09:00:02'),
...                            pd.Timestamp('20130101 09:00:03'),
...                            pd.Timestamp('20130101 09:00:05'),
...                            pd.Timestamp('20130101 09:00:06')])

Copy
Copied!

            
            >>> df
B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

Contrasting to an integer rolling window, this will roll a variable length window corresponding to the time period. The default for min_periods is 1.

Copy
Copied!

            
            >>> df.rolling('2s').sum()
B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

round(decimals=0, *args, **kwargs)[source]

Round a DataFrame to a variable number of decimal places.

Parameters

decimalsint, dict, Series
*args
**kwargs

Returns

DataFrame

See also

numpy.around
Series.round

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
...                   columns=['dogs', 'cats'])
>>> df
dogs  cats
0  0.21  0.32
1  0.01  0.67
2  0.66  0.03
3  0.21  0.18

By providing an integer each column is rounded to the same number of decimal places

Copy
Copied!

            
            >>> df.round(1)
dogs  cats
0   0.2   0.3
1   0.0   0.7
2   0.7   0.0
3   0.2   0.2

With a dict, the number of places for specific columns can be specified with the column names as key and the number of decimal places as value

Copy
Copied!

            
            >>> df.round({'dogs': 1, 'cats': 0})
dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

Using a Series, the number of places for specific columns can be specified with the column names as index and the number of decimal places as value

Copy
Copied!

            
            >>> decimals = pd.Series([0, 1], index=['cats', 'dogs'])
>>> df.round(decimals)
dogs  cats
0   0.2   0.0
1   0.0   1.0
2   0.7   0.0
3   0.2   0.0

rpow(other, axis='columns', level=None, fill_value=None)[source]

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

rsub(other, axis='columns', level=None, fill_value=None)[source]

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

rtruediv(other, axis='columns', level=None, fill_value=None)[source]

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)[source]

Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters

nint, optional
fracfloat, optional
replacebool, default False
weightsstr or ndarray-like, optional
random_stateint, array-like, BitGenerator, np.random.RandomState, optional
axis{0 or ‘index’, 1 or ‘columns’, None}, default None
ignore_indexbool, default False

Returns

Series or DataFrame

See also

DataFrameGroupBy.sample
SeriesGroupBy.sample
numpy.random.choice

Notes

If frac > 1, replacement should be set to True.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
...                    'num_wings': [2, 0, 0, 0],
...                    'num_specimen_seen': [10, 2, 1, 8]},
...                   index=['falcon', 'dog', 'spider', 'fish'])
>>> df
num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
dog            4          0                  2
spider         8          0                  1
fish           0          0                  8

Extract 3 random elements from the Series df['num_legs']: Note that we use random_state to ensure the reproducibility of the examples.

Copy
Copied!

            
            >>> df['num_legs'].sample(n=3, random_state=1)
fish      0
spider    8
falcon    2
Name: num_legs, dtype: int64

A random 50% sample of the DataFrame with replacement:

Copy
Copied!

            
            >>> df.sample(frac=0.5, replace=True, random_state=1)
num_legs  num_wings  num_specimen_seen
dog          4          0                  2
fish         0          0                  8

An upsample sample of the DataFrame with replacement: Note that replace parameter has to be True for frac parameter > 1.

Copy
Copied!

            
            >>> df.sample(frac=2, replace=True, random_state=1)
num_legs  num_wings  num_specimen_seen
dog            4          0                  2
fish           0          0                  8
falcon         2          2                 10
falcon         2          2                 10
fish           0          0                  8
dog            4          0                  2
fish           0          0                  8
dog            4          0                  2

Using a DataFrame column as weights. Rows with larger value in the num_specimen_seen column are more likely to be sampled.

Copy
Copied!

            
            >>> df.sample(n=2, weights='num_specimen_seen', random_state=1)
num_legs  num_wings  num_specimen_seen
falcon         2          2                 10
fish           0          0                  8

select_dtypes(include=None, exclude=None)[source]

Return a subset of the DataFrame’s columns based on the column dtypes.

Parameters

include, excludescalar or list-like

Returns

DataFrame

Raises

ValueError

See also

DataFrame.dtypes

Notes

To select all numeric types, use np.number or 'number'
To select strings you must use the object dtype, but note that this will return all object dtype columns
See the numpy dtype hierarchy
To select datetimes, use np.datetime64, 'datetime' or 'datetime64'
To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'
To select Pandas categorical dtypes, use 'category'
To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or 'datetime64[ns, tz]'

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'a': [1, 2] * 3,
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0

Copy
Copied!

            
            >>> df.select_dtypes(include='bool')
b
0  True
1  False
2  True
3  False
4  True
5  False

Copy
Copied!

            
            >>> df.select_dtypes(include=['float64'])
c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0

Copy
Copied!

            
            >>> df.select_dtypes(exclude=['int64'])
b    c
0   True  1.0
1  False  2.0
2   True  1.0
3  False  2.0
4   True  1.0
5  False  2.0

sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)[source]

Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
ddofint, default 1
numeric_onlybool, default None

Returns

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

set_axis(labels, axis=0, inplace=False)[source]

Assign desired index to given axis.

Indexes for column or row labels can be changed by assigning a list-like or Index.

Parameters

labelslist-like, Index
axis{0 or ‘index’, 1 or ‘columns’}, default 0
inplacebool, default False

Returns

renamedDataFrame or None

See also

DataFrame.rename_axis

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

Change the row labels.

Copy
Copied!

            
            >>> df.set_axis(['a', 'b', 'c'], axis='index')
A  B
a  1  4
b  2  5
c  3  6

Change the column labels.

Copy
Copied!

            
            >>> df.set_axis(['I', 'II'], axis='columns')
I  II
0  1   4
1  2   5
2  3   6

Now, update the labels inplace.

Copy
Copied!

            
            >>> df.set_axis(['i', 'ii'], axis='columns', inplace=True)
>>> df
i  ii
0  1   4
1  2   5
2  3   6

set_flags(*, copy=False, allows_duplicate_labels=None)[source]

Return a new object with updated flags.

Parameters

allows_duplicate_labelsbool, optional

Returns

Series or DataFrame

See also

DataFrame.attrs
DataFrame.flags

Notes

This method returns a new object that’s a view on the same data as the input. Mutating the input or the output values will be reflected in the other.

This method is intended to be used in method chains.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [1, 2]})
>>> df.flags.allows_duplicate_labels
True
>>> df2 = df.set_flags(allows_duplicate_labels=False)
>>> df2.flags.allows_duplicate_labels
False

set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)[source]

Set the DataFrame index using existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

Parameters

keyslabel or array-like or list of labels/arrays
dropbool, default True
appendbool, default False
inplacebool, default False
verify_integritybool, default False

Returns

DataFrame or None

See also

DataFrame.reset_index
DataFrame.reindex
DataFrame.reindex_like

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'month': [1, 4, 7, 10],
...                    'year': [2012, 2014, 2013, 2014],
...                    'sale': [55, 40, 84, 31]})
>>> df
month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Set the index to become the ‘month’ column:

Copy
Copied!

            
            >>> df.set_index('month')
year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

Create a MultiIndex using columns ‘year’ and ‘month’:

Copy
Copied!

            
            >>> df.set_index(['year', 'month'])
sale
year  month
2012  1     55
2014  4     40
2013  7     84
2014  10    31

Create a MultiIndex using an Index and a column:

Copy
Copied!

            
            >>> df.set_index([pd.Index([1, 2, 3, 4]), 'year'])
month  sale
year
1  2012  1      55
2  2014  4      40
3  2013  7      84
4  2014  10     31

Create a MultiIndex using two Series:

Copy
Copied!

            
            >>> s = pd.Series([1, 2, 3, 4])
>>> df.set_index([s, s**2])
month  year  sale
1 1       1  2012    55
2 4       4  2014    40
3 9       7  2013    84
4 16     10  2014    31

property shape: tuple[int, int]

Return a tuple representing the dimensionality of the DataFrame.

See also

ndarray.shape

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.shape
(2, 2)

Copy
Copied!

            
            >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4],
...                    'col3': [5, 6]})
>>> df.shape
(2, 3)

shift(periods=1, freq=None, axis=0, fill_value=NoDefault.no_default)[source]

Shift index by desired number of periods with an optional time freq.

When freq is not passed, shift the index without realigning the data. If freq is passed (in this case, the index must be date or datetime, or it will raise a NotImplementedError), the index will be increased using the periods and the freq. freq can be inferred when specified as “infer” as long as either freq or inferred_freq attribute is set in the index.

Parameters

periodsint
freqDateOffset, tseries.offsets, timedelta, or str, optional
axis{0 or ‘index’, 1 or ‘columns’, None}, default None
fill_valueobject, optional

Returns

DataFrame

See also

Index.shift
DatetimeIndex.shift
PeriodIndex.shift
tshift

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({"Col1": [10, 20, 15, 30, 45],
...                    "Col2": [13, 23, 18, 33, 48],
...                    "Col3": [17, 27, 22, 37, 52]},
...                   index=pd.date_range("2020-01-01", "2020-01-05"))
>>> df
Col1  Col2  Col3
2020-01-01    10    13    17
2020-01-02    20    23    27
2020-01-03    15    18    22
2020-01-04    30    33    37
2020-01-05    45    48    52

Copy
Copied!

            
            >>> df.shift(periods=3)
Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02   NaN   NaN   NaN
2020-01-03   NaN   NaN   NaN
2020-01-04  10.0  13.0  17.0
2020-01-05  20.0  23.0  27.0

Copy
Copied!

            
            >>> df.shift(periods=1, axis="columns")
Col1  Col2  Col3
2020-01-01   NaN    10    13
2020-01-02   NaN    20    23
2020-01-03   NaN    15    18
2020-01-04   NaN    30    33
2020-01-05   NaN    45    48

Copy
Copied!

            
            >>> df.shift(periods=3, fill_value=0)
Col1  Col2  Col3
2020-01-01     0     0     0
2020-01-02     0     0     0
2020-01-03     0     0     0
2020-01-04    10    13    17
2020-01-05    20    23    27

Copy
Copied!

            
            >>> df.shift(periods=3, freq="D")
Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52

Copy
Copied!

            
            >>> df.shift(periods=3, freq="infer")
Col1  Col2  Col3
2020-01-04    10    13    17
2020-01-05    20    23    27
2020-01-06    15    18    22
2020-01-07    30    33    37
2020-01-08    45    48    52

property size: int

Return an int representing the number of elements in this object.

Return the number of rows if Series. Otherwise return the number of rows times number of columns if DataFrame.

See also

ndarray.size

Examples

Copy
Copied!

            
            >>> s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3

Copy
Copied!

            
            >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4

skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)[source]

Return unbiased skew over requested axis.

Normalized by N-1.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
**kwargs

Returns

Series or DataFrame (if level specified)

slice_shift(periods=1, axis=0)[source]

Equivalent to shift without copying data. The shifted data will not include the dropped periods and the shifted axis will be smaller than the original.

Deprecated since version 1.2.0:slice_shift is deprecated, use DataFrame/Series.shift instead.

Parameters

periodsint

Returns

shiftedsame type as caller

Notes

While the slice_shift is faster than shift, you may pay for it later during alignment.

sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)[source]

Sort object by labels (along an axis).

Returns a new DataFrame sorted by label if inplace argument is False, otherwise updates the original DataFrame and returns None.

Parameters

axis{0 or ‘index’, 1 or ‘columns’}, default 0
levelint or level name or list of ints or list of level names
ascendingbool or list-like of bools, default True
inplacebool, default False
kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’
na_position{‘first’, ‘last’}, default ‘last’
sort_remainingbool, default True
ignore_indexbool, default False
keycallable, optional

Returns

DataFrame or None

See also

Series.sort_index
DataFrame.sort_values
Series.sort_values

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150],
...                   columns=['A'])
>>> df.sort_index()
A
1    4
29   2
100  1
150  5
234  3

By default, it sorts in ascending order, to sort in descending order, use ascending=False

Copy
Copied!

            
            >>> df.sort_index(ascending=False)
A
234  3
150  5
100  1
29   2
1    4

A key function can be specified which is applied to the index before sorting. For a MultiIndex this is applied to each level separately.

Copy
Copied!

            
            >>> df = pd.DataFrame({"a": [1, 2, 3, 4]}, index=['A', 'b', 'C', 'd'])
>>> df.sort_index(key=lambda x: x.str.lower())
a
A  1
b  2
C  3
d  4

sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)[source]

Sort by the values along either axis.

Parameters

bystr or list of str
axis{0 or ‘index’, 1 or ‘columns’}, default 0
ascendingbool or list of bool, default True
inplacebool, default False
kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’
na_position{‘first’, ‘last’}, default ‘last’
ignore_indexbool, default False
keycallable, optional

Returns

DataFrame or None

See also

DataFrame.sort_index
Series.sort_values

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
...     'col2': [2, 1, 9, 8, 7, 4],
...     'col3': [0, 1, 9, 4, 2, 3],
...     'col4': ['a', 'B', 'c', 'D', 'e', 'F']
... })
>>> df
col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Sort by col1

Copy
Copied!

            
            >>> df.sort_values(by=['col1'])
col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort by multiple columns

Copy
Copied!

            
            >>> df.sort_values(by=['col1', 'col2'])
col1  col2  col3 col4
1    A     1     1    B
0    A     2     0    a
2    B     9     9    c
5    C     4     3    F
4    D     7     2    e
3  NaN     8     4    D

Sort Descending

Copy
Copied!

            
            >>> df.sort_values(by='col1', ascending=False)
col1  col2  col3 col4
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B
3  NaN     8     4    D

Putting NAs first

Copy
Copied!

            
            >>> df.sort_values(by='col1', ascending=False, na_position='first')
col1  col2  col3 col4
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F
2    B     9     9    c
0    A     2     0    a
1    A     1     1    B

Sorting with a key function

Copy
Copied!

            
            >>> df.sort_values(by='col4', key=lambda col: col.str.lower())
col1  col2  col3 col4
0    A     2     0    a
1    A     1     1    B
2    B     9     9    c
3  NaN     8     4    D
4    D     7     2    e
5    C     4     3    F

Natural sort with the key argument, using the natsort package.

Copy
Copied!

            
            >>> df = pd.DataFrame({
...    "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
...    "value": [10, 20, 30, 40, 50]
... })
>>> df
time  value
0    0hr     10
1  128hr     20
2   72hr     30
3   48hr     40
4   96hr     50
>>> from natsort import index_natsorted
>>> df.sort_values(
...    by="time",
...    key=lambda x: np.argsort(index_natsorted(df["time"]))
... )
time  value
0    0hr     10
3   48hr     40
2   72hr     30
4   96hr     50
1  128hr     20

sparse[source]

squeeze(axis=None)[source]

Squeeze 1 dimensional axis objects into scalars.

Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.

This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.

Parameters

axis{0 or ‘index’, 1 or ‘columns’, None}, default None

Returns

DataFrame, Series, or scalar

See also

Series.iloc
DataFrame.iloc
Series.to_frame

Examples

Copy
Copied!

            
            >>> primes = pd.Series([2, 3, 5, 7])

Slicing might produce a Series with a single value:

Copy
Copied!

            
            >>> even_primes = primes[primes % 2 == 0]
>>> even_primes
0    2
dtype: int64

Copy
Copied!

            
            >>> even_primes.squeeze()
2

Squeezing objects with more than one value in every axis does nothing:

Copy
Copied!

            
            >>> odd_primes = primes[primes % 2 == 1]
>>> odd_primes
1    3
2    5
3    7
dtype: int64

Copy
Copied!

            
            >>> odd_primes.squeeze()
1    3
2    5
3    7
dtype: int64

Squeezing is even more effective when used with DataFrames.

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
>>> df
a  b
0  1  2
1  3  4

Slicing a single column will produce a DataFrame with the columns having only one value:

Copy
Copied!

            
            >>> df_a = df[['a']]
>>> df_a
a
0  1
1  3

So the columns can be squeezed down, resulting in a Series:

Copy
Copied!

            
            >>> df_a.squeeze('columns')
0    1
1    3
Name: a, dtype: int64

Slicing a single row from a single column will produce a single scalar DataFrame:

Copy
Copied!

            
            >>> df_0a = df.loc[df.index < 1, ['a']]
>>> df_0a
a
0  1

Squeezing the rows produces a single scalar Series:

Copy
Copied!

            
            >>> df_0a.squeeze('rows')
a    1
Name: 0, dtype: int64

Squeezing all axes will project directly into a scalar:

Copy
Copied!

            
            >>> df_0a.squeeze()
1

stack(level=- 1, dropna=True)[source]

Stack the prescribed level(s) from columns to index.

Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

if the columns have a single level, the output is a Series;
if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

Parameters

levelint, str, list, default -1
dropnabool, default True

Returns

DataFrame or Series

See also

DataFrame.unstack
DataFrame.pivot
DataFrame.pivot_table

Notes

The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

Examples

Single level columns

Copy
Copied!

            
            >>> df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
...                                     index=['cat', 'dog'],
...                                     columns=['weight', 'height'])

Stacking a dataframe with a single level column axis returns a Series:

Copy
Copied!

            
            >>> df_single_level_cols
weight height
cat       0      1
dog       2      3
>>> df_single_level_cols.stack()
cat  weight    0
height    1
dog  weight    2
height    3
dtype: int64

Multi level columns: simple case

Copy
Copied!

            
            >>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('weight', 'pounds')])
>>> df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol1)

Stacking a dataframe with a multi-level column axis:

Copy
Copied!

            
            >>> df_multi_level_cols1
weight
kg    pounds
cat       1        2
dog       2        4
>>> df_multi_level_cols1.stack()
weight
cat kg           1
pounds       2
dog kg           2
pounds       4

Missing values

Copy
Copied!

            
            >>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
...                                        ('height', 'm')])
>>> df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

Copy
Copied!

            
            >>> df_multi_level_cols2
weight height
kg      m
cat    1.0    2.0
dog    3.0    4.0
>>> df_multi_level_cols2.stack()
height  weight
cat kg     NaN     1.0
m      2.0     NaN
dog kg     NaN     3.0
m      4.0     NaN

Prescribing the level(s) to be stacked

The first parameter controls which level or levels are stacked:

Copy
Copied!

            
            >>> df_multi_level_cols2.stack(0)
kg    m
cat height  NaN  2.0
weight  1.0  NaN
dog height  NaN  4.0
weight  3.0  NaN
>>> df_multi_level_cols2.stack([0, 1])
cat  height  m     2.0
weight  kg    1.0
dog  height  m     4.0
weight  kg    3.0
dtype: float64

Dropping missing values

Copy
Copied!

            
            >>> df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]],
...                                     index=['cat', 'dog'],
...                                     columns=multicol2)

Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:

Copy
Copied!

            
            >>> df_multi_level_cols3
weight height
kg      m
cat    NaN    1.0
dog    2.0    3.0
>>> df_multi_level_cols3.stack(dropna=False)
height  weight
cat kg     NaN     NaN
m      1.0     NaN
dog kg     NaN     2.0
m      3.0     NaN
>>> df_multi_level_cols3.stack(dropna=True)
height  weight
cat m      1.0     NaN
dog kg     NaN     2.0
m      3.0     NaN

std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)[source]

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
ddofint, default 1
numeric_onlybool, default None

Returns

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

property style: Styler

Returns a Styler object.

Contains methods for building a styled HTML representation of the DataFrame.

See also

io.formats.style.Styler

sub(other, axis='columns', level=None, fill_value=None)[source]

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

subtract(other, axis='columns', level=None, fill_value=None)[source]

Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)[source]

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
numeric_onlybool, default None
min_countint, default 0
**kwargs

Returns

Series or DataFrame (if level specified)

See also

Series.sum
Series.min
Series.max
Series.idxmin
Series.idxmax
DataFrame.sum
DataFrame.min
DataFrame.max
DataFrame.idxmin
DataFrame.idxmax

Examples

Copy
Copied!

            
            >>> idx = pd.MultiIndex.from_arrays([
...     ['warm', 'warm', 'cold', 'cold'],
...     ['dog', 'falcon', 'fish', 'spider']],
...     names=['blooded', 'animal'])
>>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
>>> s
blooded  animal
warm     dog       4
falcon    2
cold     fish      0
spider    8
Name: legs, dtype: int64

Copy
Copied!

            
            >>> s.sum()
14

By default, the sum of an empty or all-NA Series is 0.

Copy
Copied!

            
            >>> pd.Series([], dtype="float64").sum()  # min_count=0 is the default
0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

Copy
Copied!

            
            >>> pd.Series([], dtype="float64").sum(min_count=1)
nan

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

Copy
Copied!

            
            >>> pd.Series([np.nan]).sum()
0.0

Copy
Copied!

            
            >>> pd.Series([np.nan]).sum(min_count=1)
nan

swap(likelihood=0.15)[source]

Performs random swapping of data.

Parameters

likelihoodfloat, optional

Returns

pandas.DataFrame

swapaxes(axis1, axis2, copy=True)[source]

Interchange axes and swap values axes appropriately.

Returns

ysame as input

swaplevel(i=- 2, j=- 1, axis=0)[source]

Swap levels i and j in a MultiIndex.

Default is to swap the two innermost levels of the index.

Parameters

i, jint or str
axis{0 or ‘index’, 1 or ‘columns’}, default 0

Returns

DataFrame

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(
...     {"Grade": ["A", "B", "A", "C"]},
...     index=[
...         ["Final exam", "Final exam", "Coursework", "Coursework"],
...         ["History", "Geography", "History", "Geography"],
...         ["January", "February", "March", "April"],
...     ],
... )
>>> df
Grade
Final exam  History     January      A
Geography   February     B
Coursework  History     March        A
Geography   April        C

In the following example, we will swap the levels of the indices. Here, we will swap the levels column-wise, but levels can be swapped row-wise in a similar manner. Note that column-wise is the default behaviour. By not supplying any arguments for i and j, we swap the last and second to last indices.

Copy
Copied!

            
            >>> df.swaplevel()
Grade
Final exam  January     History         A
February    Geography       B
Coursework  March       History         A
April       Geography       C

By supplying one argument, we can choose which index to swap the last index with. We can for example swap the first index with the last one as follows.

Copy
Copied!

            
            >>> df.swaplevel(0)
Grade
January     History     Final exam      A
February    Geography   Final exam      B
March       History     Coursework      A
April       Geography   Coursework      C

We can also define explicitly which indices we want to swap by supplying values for both i and j. Here, we for example swap the first and second indices.

Copy
Copied!

            
            >>> df.swaplevel(0, 1)
Grade
History     Final exam  January         A
Geography   Final exam  February        B
History     Coursework  March           A
Geography   Coursework  April           C

tail(n=5)[source]

Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n, this function returns all rows except the first n rows, equivalent to df[n:].

Parameters

nint, default 5

Returns

type of caller

See also

DataFrame.head

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
6      shark
7      whale
8      zebra

Viewing the last 5 lines

Copy
Copied!

            
            >>> df.tail()
animal
4  monkey
5  parrot
6   shark
7   whale
8   zebra

Viewing the last n lines (three in this case)

Copy
Copied!

            
            >>> df.tail(3)
animal
6  shark
7  whale
8  zebra

For negative values of n

Copy
Copied!

            
            >>> df.tail(-3)
animal
3    lion
4  monkey
5  parrot
6   shark
7   whale
8   zebra

take(indices, axis=0, is_copy=None, **kwargs)[source]

Return the elements in the given positional indices along an axis.

This means that we are not indexing according to actual values in the index attribute of the object. We are indexing according to the actual position of the element in the object.

Parameters

indicesarray-like
axis{0 or ‘index’, 1 or ‘columns’, None}, default 0
is_copybool
**kwargs

Returns

takensame type as caller

See also

DataFrame.loc
DataFrame.iloc
numpy.take

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([('falcon', 'bird', 389.0),
...                    ('parrot', 'bird', 24.0),
...                    ('lion', 'mammal', 80.5),
...                    ('monkey', 'mammal', np.nan)],
...                   columns=['name', 'class', 'max_speed'],
...                   index=[0, 2, 3, 1])
>>> df
name   class  max_speed
0  falcon    bird      389.0
2  parrot    bird       24.0
3    lion  mammal       80.5
1  monkey  mammal        NaN

Take elements at positions 0 and 3 along the axis 0 (default).

Note how the actual indices selected (0 and 1) do not correspond to our selected indices 0 and 3. That’s because we are selecting the 0th and 3rd rows, not rows whose indices equal 0 and 3.

Copy
Copied!

            
            >>> df.take([0, 3])
name   class  max_speed
0  falcon    bird      389.0
1  monkey  mammal        NaN

Take elements at indices 1 and 2 along the axis 1 (column selection).

Copy
Copied!

            
            >>> df.take([1, 2], axis=1)
class  max_speed
0    bird      389.0
2    bird       24.0
3  mammal       80.5
1  mammal        NaN

We may take elements using negative integers for positive indices, starting from the end of the object, just like with Python lists.

Copy
Copied!

            
            >>> df.take([-1, -2])
name   class  max_speed
1  monkey  mammal        NaN
3    lion  mammal       80.5

to_clipboard(excel=True, sep=None, **kwargs)[source]

Copy object to the system clipboard.

Write a text representation of object to the system clipboard. This can be pasted into Excel, for example.

Parameters

excelbool, default True
sepstr, default '\t'
**kwargs

See also

DataFrame.to_csv
read_clipboard

Notes

Requirements for your platform.

Linux : xclip, or xsel (with PyQt4 modules)
Windows : none
OS X : none

Examples

Copy the contents of a DataFrame to the clipboard.

Copy
Copied!

            
            >>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])

Copy
Copied!

            
            >>> df.to_clipboard(sep=',')  
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6

We can omit the index by passing the keyword index and setting it to false.

Copy
Copied!

            
            >>> df.to_clipboard(sep=',', index=False)  
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6

to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)[source]

Write object to a comma-separated values (csv) file.

Parameters

path_or_bufstr or file handle, default None
sepstr, default ‘,’
na_repstr, default ‘’
float_formatstr, default None
columnssequence, optional
headerbool or list of str, default True
indexbool, default True
index_labelstr or sequence, or False, default None
modestr
encodingstr, optional
compressionstr or dict, default ‘infer’
quotingoptional constant from csv module
quotecharstr, default ‘"’
line_terminatorstr, optional
chunksizeint or None
date_formatstr, default None
doublequotebool, default True
escapecharstr, default None
decimalstr, default ‘.’
errorsstr, default ‘strict’
storage_optionsdict, optional

Returns

None or str

See also

read_csv
to_excel

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
...                    'mask': ['red', 'purple'],
...                    'weapon': ['sai', 'bo staff']})
>>> df.to_csv(index=False)
'name,mask,weapon\nRaphael,red,sai\nDonatello,purple,bo staff\n'

Create ‘out.zip’ containing ‘out.csv’

Copy
Copied!

            
            >>> compression_opts = dict(method='zip',
...                         archive_name='out.csv')  
>>> df.to_csv('out.zip', index=False,
...           compression=compression_opts)

to_dict(orient='dict', into=<class 'dict'>)[source]

Convert the DataFrame to a dictionary.

The type of the key-value pairs can be customized with the parameters (see below).

Parameters

orientstr {‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’}
intoclass, default dict

Returns

dict, list or collections.abc.Mapping

See also

DataFrame.from_dict
DataFrame.to_json

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'col1': [1, 2],
...                    'col2': [0.5, 0.75]},
...                   index=['row1', 'row2'])
>>> df
col1  col2
row1     1  0.50
row2     2  0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}

You can specify the return orientation.

Copy
Copied!

            
            >>> df.to_dict('series')
{'col1': row1    1
row2    2
Name: col1, dtype: int64,
'col2': row1    0.50
row2    0.75
Name: col2, dtype: float64}

Copy
Copied!

            
            >>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]]}

Copy
Copied!

            
            >>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]

Copy
Copied!

            
            >>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}

You can also specify the mapping type.

Copy
Copied!

            
            >>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])

If you want a defaultdict, you need to initialize it:

Copy
Copied!

            
            >>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]

to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options=None)[source]

Write object to an Excel sheet.

To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to.

Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased.

Parameters

excel_writerpath-like, file-like, or ExcelWriter object
sheet_namestr, default ‘Sheet1’
na_repstr, default ‘’
float_formatstr, optional
columnssequence or list of str, optional
headerbool or list of str, default True
indexbool, default True
index_labelstr or sequence, optional
startrowint, default 0
startcolint, default 0
enginestr, optional
merge_cellsbool, default True
encodingstr, optional
inf_repstr, default ‘inf’
verbosebool, default True
freeze_panestuple of int (length 2), optional
storage_optionsdict, optional

See also

to_csv
ExcelWriter
read_excel
read_csv

Notes

For compatibility with to_csv(), to_excel serializes lists and dicts to strings before writing.

Once a workbook has been saved it is not possible to write further data without rewriting the whole workbook.

Examples

Create, write to and save a workbook:

Copy
Copied!

            
            >>> df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
...                    index=['row 1', 'row 2'],
...                    columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx")

To specify the sheet name:

Copy
Copied!

            
            >>> df1.to_excel("output.xlsx",
...              sheet_name='Sheet_name_1')

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

Copy
Copied!

            
            >>> df2 = df1.copy()
>>> with pd.ExcelWriter('output.xlsx') as writer:  
...     df1.to_excel(writer, sheet_name='Sheet_name_1')
...     df2.to_excel(writer, sheet_name='Sheet_name_2')

ExcelWriter can also be used to append to an existing Excel file:

Copy
Copied!

            
            >>> with pd.ExcelWriter('output.xlsx',
...                     mode='a') as writer:  
...     df.to_excel(writer, sheet_name='Sheet_name_3')

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

Copy
Copied!

            
            >>> df1.to_excel('output1.xlsx', engine='xlsxwriter')

to_feather(path, **kwargs)[source]

Write a DataFrame to the binary Feather format.

Parameters

pathstr or file-like object
**kwargs

to_gbq(destination_table, project_id=None, chunksize=None, reauth=False, if_exists='fail', auth_local_webserver=False, table_schema=None, location=None, progress_bar=True, credentials=None)[source]

Write a DataFrame to a Google BigQuery table.

This function requires the pandas-gbq package.

See the How to authenticate with Google BigQuery guide for authentication instructions.

Parameters

destination_tablestr

Name of table to be written, in the form dataset.tablename.

project_idstr, optional

Google BigQuery Account project ID. Optional when available from the environment.

chunksizeint, optional

Number of rows to be inserted in each chunk from the dataframe. Set to None to load the whole dataframe at once.

reauthbool, default False

Force Google BigQuery to re-authenticate the user. This is useful if multiple accounts are used.

if_existsstr, default ‘fail’

Behavior when the destination table exists. Value can be one of:

'fail'
'replace'
'append'

auth_local_webserverbool, default False

Use the local webserver flow instead of the console flow when getting user credentials.

New in version 0.2.0 of pandas-gbq.

table_schemalist of dicts, optional

List of BigQuery table fields to which according DataFrame columns conform to, e.g. [{'name': 'col1', 'type': 'STRING'},...]. If schema is not provided, it will be generated according to dtypes of DataFrame columns. See BigQuery API documentation on available names of a field.

New in version 0.3.1 of pandas-gbq.

locationstr, optional

Location where the load job should run. See the BigQuery locations documentation for a list of available locations. The location must match that of the target dataset.

New in version 0.5.0 of pandas-gbq.

progress_barbool, default True

Use the library tqdm to show the progress bar for the upload, chunk by chunk.

New in version 0.5.0 of pandas-gbq.

credentialsgoogle.auth.credentials.Credentials, optional

Credentials for accessing Google APIs. Use this parameter to override default credentials, such as to use Compute Engine google.auth.compute_engine.Credentials or Service Account google.oauth2.service_account.Credentials directly.

New in version 0.8.0 of pandas-gbq.

See also

pandas_gbq.to_gbq
read_gbq

to_hdf(path_or_buf, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')[source]

Write the contained data to an HDF5 file using HDFStore.

Hierarchical Data Format (HDF) is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects.

In order to add another DataFrame or Series to an existing HDF file please use append mode and a different a key.

Warning

One can store a subclass of DataFrame or Series to HDF5, but the type of the subclass is lost upon storing.

For more information see the user guide.

Parameters

path_or_bufstr or pandas.HDFStore
keystr
mode{‘a’, ‘w’, ‘r+’}, default ‘a’
complevel{0-9}, optional
complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’
appendbool, default False
format{‘fixed’, ‘table’, None}, default ‘fixed’
errorsstr, default ‘strict’
encodingstr, default “UTF-8”
min_itemsizedict or int, optional
nan_repAny, optional
data_columnslist of columns or True, optional

See also

read_hdf
DataFrame.to_parquet
DataFrame.to_sql
DataFrame.to_feather
DataFrame.to_csv

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]},
...                   index=['a', 'b', 'c'])
>>> df.to_hdf('data.h5', key='df', mode='w')

We can add another object to the same file:

Copy
Copied!

            
            >>> s = pd.Series([1, 2, 3, 4])
>>> s.to_hdf('data.h5', key='s')

Reading from HDF file:

Copy
Copied!

            
            >>> pd.read_hdf('data.h5', 'df')
A  B
a  1  4
b  2  5
c  3  6
>>> pd.read_hdf('data.h5', 's')
0    1
1    2
2    3
3    4
dtype: int64

Deleting file with data:

Copy
Copied!

            
            >>> import os
>>> os.remove('data.h5')

to_html(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)[source]

Render a DataFrame as an HTML table.

Parameters

bufstr, Path or StringIO-like, optional, default None
columnssequence, optional, default None
col_spacestr or int, list or dict of int or str, optional
headerbool, optional
indexbool, optional, default True
na_repstr, optional, default ‘NaN’
formatterslist, tuple or dict of one-param. functions, optional
float_formatone-parameter function, optional, default None
sparsifybool, optional, default True
index_namesbool, optional, default True
justifystr, default None
max_rowsint, optional
min_rowsint, optional
max_colsint, optional
show_dimensionsbool, default False
decimalstr, default ‘.’
bold_rowsbool, default True
classesstr or list or tuple, default None
escapebool, default True
notebook{True, False}, default False
borderint
encodingstr, default “utf-8”
table_idstr, optional
render_linksbool, default False

Returns

str or None

See also

to_string

to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=True, indent=None, storage_options=None)[source]

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

Parameters

path_or_bufstr or file handle, optional
orientstr
date_format{None, ‘epoch’, ‘iso’}
double_precisionint, default 10
force_asciibool, default True
date_unitstr, default ‘ms’ (milliseconds)
default_handlercallable, default None
linesbool, default False
compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}
indexbool, default True
indentint, optional
storage_optionsdict, optional

Returns

None or str

See also

read_json

Notes

The behavior of indent=0 varies from the stdlib, which does not indent the output but does insert newlines. Currently, indent=0 and the default indent=None are equivalent in pandas, though this may change in a future release.

orient='table' contains a ‘pandas_version’ field under ‘schema’. This stores the version of pandas used in the latest revision of the schema.

Examples

Copy
Copied!

            
            >>> import json
>>> df = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )

Copy
Copied!

            
            >>> result = df.to_json(orient="split")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
"columns": [
"col 1",
"col 2"
],
"index": [
"row 1",
"row 2"
],
"data": [
[
"a",
"b"
],
[
"c",
"d"
]
]
}

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

Copy
Copied!

            
            >>> result = df.to_json(orient="records")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
[
{
"col 1": "a",
"col 2": "b"
},
{
"col 1": "c",
"col 2": "d"
}
]

Encoding/decoding a Dataframe using 'index' formatted JSON:

Copy
Copied!

            
            >>> result = df.to_json(orient="index")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
"row 1": {
"col 1": "a",
"col 2": "b"
},
"row 2": {
"col 1": "c",
"col 2": "d"
}
}

Encoding/decoding a Dataframe using 'columns' formatted JSON:

Copy
Copied!

            
            >>> result = df.to_json(orient="columns")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
"col 1": {
"row 1": "a",
"row 2": "c"
},
"col 2": {
"row 1": "b",
"row 2": "d"
}
}

Encoding/decoding a Dataframe using 'values' formatted JSON:

Copy
Copied!

            
            >>> result = df.to_json(orient="values")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
[
[
"a",
"b"
],
[
"c",
"d"
]
]

Encoding with Table Schema:

Copy
Copied!

            
            >>> result = df.to_json(orient="table")
>>> parsed = json.loads(result)
>>> json.dumps(parsed, indent=4)  
{
"schema": {
"fields": [
{
"name": "index",
"type": "string"
},
{
"name": "col 1",
"type": "string"
},
{
"name": "col 2",
"type": "string"
}
],
"primaryKey": [
"index"
],
"pandas_version": "0.20.0"
},
"data": [
{
"index": "row 1",
"col 1": "a",
"col 2": "b"
},
{
"index": "row 2",
"col 1": "c",
"col 2": "d"
}
]
}

to_latex(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, bold_rows=False, column_format=None, longtable=None, escape=None, encoding=None, decimal='.', multicolumn=None, multicolumn_format=None, multirow=None, caption=None, label=None, position=None)[source]

Render object to a LaTeX tabular, longtable, or nested table/tabular.

Requires \usepackage{booktabs}. The output can be copy/pasted into a main LaTeX document or read from an external file with \input{table.tex}.

Changed in version 1.0.0:Added caption and label arguments.

Changed in version 1.2.0:Added position argument, changed meaning of caption argument.

Parameters

bufstr, Path or StringIO-like, optional, default None
columnslist of label, optional
col_spaceint, optional
headerbool or list of str, default True
indexbool, default True
na_repstr, default ‘NaN’
formatterslist of functions or dict of {str: function}, optional
float_formatone-parameter function or str, optional, default None
sparsifybool, optional
index_namesbool, default True
bold_rowsbool, default False
column_formatstr, optional
longtablebool, optional
escapebool, optional
encodingstr, optional
decimalstr, default ‘.’
multicolumnbool, default True
multicolumn_formatstr, default ‘l’
multirowbool, default False
captionstr or tuple, optional
labelstr, optional
positionstr, optional

Returns

str or None

See also

DataFrame.to_string
DataFrame.to_html

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
...                   mask=['red', 'purple'],
...                   weapon=['sai', 'bo staff']))
>>> print(df.to_latex(index=False))  
\begin{tabular}{lll}
\toprule
name &    mask &    weapon \\
\midrule
Raphael &     red &       sai \\
Donatello &  purple &  bo staff \\
\bottomrule
\end{tabular}

to_markdown(buf=None, mode='wt', index=True, storage_options=None, **kwargs)[source]

Print DataFrame in Markdown-friendly format.

New in version 1.0.0.

Parameters

bufstr, Path or StringIO-like, optional, default None
modestr, optional
indexbool, optional, default True
storage_optionsdict, optional
**kwargs

Returns

str

Notes

Requires the tabulate package.

Examples

Copy
Copied!

            
            >>> s = pd.Series(["elk", "pig", "dog", "quetzal"], name="animal")
>>> print(s.to_markdown())
|    | animal   |
|---:|:---------|
|  0 | elk      |
|  1 | pig      |
|  2 | dog      |
|  3 | quetzal  |

Output markdown with a tabulate option.

Copy
Copied!

            
            >>> print(s.to_markdown(tablefmt="grid"))
+----+----------+
|    | animal   |
+====+==========+
|  0 | elk      |
+----+----------+
|  1 | pig      |
+----+----------+
|  2 | dog      |
+----+----------+
|  3 | quetzal  |
+----+----------+

to_numpy(dtype=None, copy=False, na_value=NoDefault.no_default)[source]

Convert the DataFrame to a NumPy array.

By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32, the results dtype will be float32. This may require copying data and coercing values, which may be expensive.

Parameters

dtypestr or numpy.dtype, optional
copybool, default False
na_valueAny, optional

Returns

numpy.ndarray

See also

Series.to_numpy

Examples

Copy
Copied!

            
            >>> pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
[2, 4]])

With heterogeneous data, the lowest common type will have to be used.

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
>>> df.to_numpy()
array([[1. , 3. ],
[2. , 4.5]])

For a mix of numeric and non-numeric types, the output array will have object dtype.

Copy
Copied!

            
            >>> df['C'] = pd.date_range('2000', periods=2)
>>> df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
[2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)

to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)[source]

Write a DataFrame to the binary parquet format.

This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.

Parameters

pathstr or file-like object, default None
engine{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
indexbool, default None
partition_colslist, optional, default None
storage_optionsdict, optional
**kwargs

Returns

bytes if no path argument is provided else None

See also

read_parquet
DataFrame.to_csv
DataFrame.to_sql
DataFrame.to_hdf

Notes

This function requires either the fastparquet or pyarrow library.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
...               compression='gzip')  
>>> pd.read_parquet('df.parquet.gzip')  
col1  col2
0     1     3
1     2     4

If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don’t use partition_cols, which creates multiple files.

Copy
Copied!

            
            >>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()

to_period(freq=None, axis=0, copy=True)[source]

Convert DataFrame from DatetimeIndex to PeriodIndex.

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed).

Parameters

freqstr, default
axis{0 or ‘index’, 1 or ‘columns’}, default 0
copybool, default True

Returns

DataFrame with PeriodIndex

to_pickle(path, compression='infer', protocol=5, storage_options=None)[source]

Pickle (serialize) object to file.

Parameters

pathstr

File path where the pickled object will be stored.

compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’

A string representing the compression to use in the output file. By default, infers from the file extension in specified path. Compression mode may be any of the following possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and path_or_buf is path-like, then detect compression mode from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and mode is ‘zip’ or inferred as ‘zip’, other entries passed as additional compression options.

protocolint

Int which indicates which protocol should be used by the pickler, default HIGHEST_PROTOCOL (see [1] paragraph 12.1.2). The possible values are 0, 1, 2, 3, 4, 5. A negative value for the protocol parameter is equivalent to setting its value to HIGHEST_PROTOCOL.

[1]

https://docs.python.org/3/library/pickle.html.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec. Please see fsspec and urllib for more details.

New in version 1.2.0.

See also

read_pickle
DataFrame.to_hdf
DataFrame.to_sql
DataFrame.to_parquet

Examples

Copy
Copied!

            
            >>> original_df = pd.DataFrame({"foo": range(5), "bar": range(5, 10)})
>>> original_df
foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9
>>> original_df.to_pickle("./dummy.pkl")

Copy
Copied!

            
            >>> unpickled_df = pd.read_pickle("./dummy.pkl")
>>> unpickled_df
foo  bar
0    0    5
1    1    6
2    2    7
3    3    8
4    4    9

Copy
Copied!

            
            >>> import os
>>> os.remove("./dummy.pkl")

to_records(index=True, column_dtypes=None, index_dtypes=None)[source]

Convert DataFrame to a NumPy record array.

Index will be included as the first field of the record array if requested.

Parameters

indexbool, default True
column_dtypesstr, type, dict, default None
index_dtypesstr, type, dict, default None

Returns

numpy.recarray

See also

DataFrame.from_records
numpy.recarray

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [1, 2], 'B': [0.5, 0.75]},
...                   index=['a', 'b'])
>>> df
A     B
a  1  0.50
b  2  0.75
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('index', 'O'), ('A', '<i8'), ('B', '<f8')])

If the DataFrame index has no label then the recarray field name is set to ‘index’. If the index has a label then this is used as the field name:

Copy
Copied!

            
            >>> df.index = df.index.rename("I")
>>> df.to_records()
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i8'), ('B', '<f8')])

The index can be excluded from the record array:

Copy
Copied!

            
            >>> df.to_records(index=False)
rec.array([(1, 0.5 ), (2, 0.75)],
dtype=[('A', '<i8'), ('B', '<f8')])

Data types can be specified for the columns:

Copy
Copied!

            
            >>> df.to_records(column_dtypes={"A": "int32"})
rec.array([('a', 1, 0.5 ), ('b', 2, 0.75)],
dtype=[('I', 'O'), ('A', '<i4'), ('B', '<f8')])

As well as for the index:

Copy
Copied!

            
            >>> df.to_records(index_dtypes="<S2")
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S2'), ('A', '<i8'), ('B', '<f8')])

Copy
Copied!

            
            >>> index_dtypes = f"<S{df.index.str.len().max()}"
>>> df.to_records(index_dtypes=index_dtypes)
rec.array([(b'a', 1, 0.5 ), (b'b', 2, 0.75)],
dtype=[('I', 'S1'), ('A', '<i8'), ('B', '<f8')])

to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)[source]

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.

Parameters

namestr
consqlalchemy.engine.(Engine or Connection) or sqlite3.Connection
schemastr, optional
if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
indexbool, default True
index_labelstr or sequence, default None
chunksizeint, optional
dtypedict or scalar, optional
method{None, ‘multi’, callable}, optional

Raises

ValueError

See also

read_sql

Notes

Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.

References

[1]

https://docs.sqlalchemy.org

[2]

https://www.python.org/dev/peps/pep-0249/

Examples

Create an in-memory SQLite database.

Copy
Copied!

            
            >>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite://', echo=False)

Create a table from scratch with 3 rows.

Copy
Copied!

            
            >>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
>>> df
name
0  User 1
1  User 2
2  User 3

Copy
Copied!

            
            >>> df.to_sql('users', con=engine)
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3')]

An sqlalchemy.engine.Connection can also be passed to con:

Copy
Copied!

            
            >>> with engine.begin() as connection:
...     df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
...     df1.to_sql('users', con=connection, if_exists='append')

This is allowed to support operations that require that the same DBAPI connection is used for the entire operation.

Copy
Copied!

            
            >>> df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
>>> df2.to_sql('users', con=engine, if_exists='append')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'), (1, 'User 2'), (2, 'User 3'),
(0, 'User 4'), (1, 'User 5'), (0, 'User 6'),
(1, 'User 7')]

Overwrite the table with just df2.

Copy
Copied!

            
            >>> df2.to_sql('users', con=engine, if_exists='replace',
...            index_label='id')
>>> engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 6'), (1, 'User 7')]

Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars.

Copy
Copied!

            
            >>> df = pd.DataFrame({"A": [1, None, 2]})
>>> df
A
0  1.0
1  NaN
2  2.0

Copy
Copied!

            
            >>> from sqlalchemy.types import Integer
>>> df.to_sql('integers', con=engine, index=False,
...           dtype={"A": Integer()})

Copy
Copied!

            
            >>> engine.execute("SELECT * FROM integers").fetchall()
[(1,), (None,), (2,)]

to_stata(path, convert_dates=None, write_index=True, byteorder=None, time_stamp=None, data_label=None, variable_labels=None, version=114, convert_strl=None, compression='infer', storage_options=None)[source]

Export DataFrame object to Stata dta format.

Writes the DataFrame to a Stata dataset file. “dta” files contain a Stata dataset.

Parameters

pathstr, buffer or path object
convert_datesdict
write_indexbool
byteorderstr
time_stampdatetime
data_labelstr, optional
variable_labelsdict
version{114, 117, 118, 119, None}, default 114
convert_strllist, optional
compressionstr or dict, default ‘infer’
storage_optionsdict, optional

Raises

NotImplementedError
ValueError

See also

read_stata
io.stata.StataWriter
io.stata.StataWriter117

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon',
...                               'parrot'],
...                    'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta')

to_string(buf=None, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, min_rows=None, max_cols=None, show_dimensions=False, decimal='.', line_width=None, max_colwidth=None, encoding=None)[source]

Render a DataFrame to a console-friendly tabular output.

Parameters

bufstr, Path or StringIO-like, optional, default None
columnssequence, optional, default None
col_spaceint, list or dict of int, optional
headerbool or sequence, optional
indexbool, optional, default True
na_repstr, optional, default ‘NaN’
formatterslist, tuple or dict of one-param. functions, optional
float_formatone-parameter function, optional, default None
sparsifybool, optional, default True
index_namesbool, optional, default True
justifystr, default None
max_rowsint, optional
min_rowsint, optional
max_colsint, optional
show_dimensionsbool, default False
decimalstr, default ‘.’
line_widthint, optional
max_colwidthint, optional
encodingstr, default “utf-8”

Returns

str or None

See also

to_html

Examples

Copy
Copied!

            
            >>> d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
>>> df = pd.DataFrame(d)
>>> print(df.to_string())
col1  col2
0     1     4
1     2     5
2     3     6

to_timestamp(freq=None, how='start', axis=0, copy=True)[source]

Cast to DatetimeIndex of timestamps, at beginning of period.

Parameters

freqstr, default frequency of PeriodIndex
how{‘s’, ‘e’, ‘start’, ‘end’}
axis{0 or ‘index’, 1 or ‘columns’}, default 0
copybool, default True

Returns

DataFrame with DatetimeIndex

to_xarray()[source]

Return an xarray object from the pandas object.

Returns

xarray.DataArray or xarray.Dataset

See also

DataFrame.to_hdf
DataFrame.to_parquet

Notes

See the xarray docs

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2),
...                    ('parrot', 'bird', 24.0, 2),
...                    ('lion', 'mammal', 80.5, 4),
...                    ('monkey', 'mammal', np.nan, 4)],
...                   columns=['name', 'class', 'max_speed',
...                            'num_legs'])
>>> df
name   class  max_speed  num_legs
0  falcon    bird      389.0         2
1  parrot    bird       24.0         2
2    lion  mammal       80.5         4
3  monkey  mammal        NaN         4

Copy
Copied!

            
            >>> df.to_xarray()
<xarray.Dataset>
Dimensions:    (index: 4)
Coordinates:
* index      (index) int64 0 1 2 3
Data variables:
name       (index) object 'falcon' 'parrot' 'lion' 'monkey'
class      (index) object 'bird' 'bird' 'mammal' 'mammal'
max_speed  (index) float64 389.0 24.0 80.5 nan
num_legs   (index) int64 2 2 4 4

Copy
Copied!

            
            >>> df['max_speed'].to_xarray()
<xarray.DataArray 'max_speed' (index: 4)>
array([389. ,  24. ,  80.5,   nan])
Coordinates:
* index    (index) int64 0 1 2 3

Copy
Copied!

            
            >>> dates = pd.to_datetime(['2018-01-01', '2018-01-01',
...                         '2018-01-02', '2018-01-02'])
>>> df_multiindex = pd.DataFrame({'date': dates,
...                               'animal': ['falcon', 'parrot',
...                                          'falcon', 'parrot'],
...                               'speed': [350, 18, 361, 15]})
>>> df_multiindex = df_multiindex.set_index(['date', 'animal'])

Copy
Copied!

            
            >>> df_multiindex
speed
date       animal
2018-01-01 falcon    350
parrot     18
2018-01-02 falcon    361
parrot     15

Copy
Copied!

            
            >>> df_multiindex.to_xarray()
<xarray.Dataset>
Dimensions:  (animal: 2, date: 2)
Coordinates:
* date     (date) datetime64[ns] 2018-01-01 2018-01-02
* animal   (animal) object 'falcon' 'parrot'
Data variables:
speed    (date, animal) int64 350 18 361 15

to_xml(path_or_buffer=None, index=True, root_name='data', row_name='row', na_rep=None, attr_cols=None, elem_cols=None, namespaces=None, prefix=None, encoding='utf-8', xml_declaration=True, pretty_print=True, parser='lxml', stylesheet=None, compression='infer', storage_options=None)[source]

Render a DataFrame to an XML document.

New in version 1.3.0.

Parameters

path_or_bufferstr, path object or file-like object, optional
indexbool, default True
root_namestr, default ‘data’
row_namestr, default ‘row’
na_repstr, optional
attr_colslist-like, optional
elem_colslist-like, optional
namespacesdict, optional
prefixstr, optional
encodingstr, default ‘utf-8’
xml_declarationbool, default True
pretty_printbool, default True
parser{‘lxml’,’etree’}, default ‘lxml’
stylesheetstr, path object or file-like object, optional
compression{‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’
storage_optionsdict, optional

Returns

None or str

See also

to_json
to_html

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'shape': ['square', 'circle', 'triangle'],
...                    'degrees': [360, 360, 180],
...                    'sides': [4, np.nan, 3]})

Copy
Copied!

            
            >>> df.to_xml()  
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>

Copy
Copied!

            
            >>> df.to_xml(attr_cols=[
...           'index', 'shape', 'degrees', 'sides'
...           ])  
<?xml version='1.0' encoding='utf-8'?>
<data>
<row index="0" shape="square" degrees="360" sides="4.0"/>
<row index="1" shape="circle" degrees="360"/>
<row index="2" shape="triangle" degrees="180" sides="3.0"/>
</data>

Copy
Copied!

            
            >>> df.to_xml(namespaces={"doc": "https://example.com"},
...           prefix="doc")  
<?xml version='1.0' encoding='utf-8'?>
<doc:data xmlns:doc="https://example.com">
<doc:row>
<doc:index>0</doc:index>
<doc:shape>square</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides>4.0</doc:sides>
</doc:row>
<doc:row>
<doc:index>1</doc:index>
<doc:shape>circle</doc:shape>
<doc:degrees>360</doc:degrees>
<doc:sides/>
</doc:row>
<doc:row>
<doc:index>2</doc:index>
<doc:shape>triangle</doc:shape>
<doc:degrees>180</doc:degrees>
<doc:sides>3.0</doc:sides>
</doc:row>
</doc:data>

transform(func, axis=0, *args, **kwargs)[source]

Call func on self producing a DataFrame with transformed values.

Produced DataFrame will have same axis length as self.

Parameters

funcfunction, str, list-like or dict-like
axis{0 or ‘index’, 1 or ‘columns’}, default 0
*args
**kwargs

Returns

DataFrame

Raises

ValueErrorIf the returned DataFrame has a different length than self.

See also

DataFrame.agg
DataFrame.apply

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See gotchas.udf-mutation for more details.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df
A  B
0  0  1
1  1  2
2  2  3
>>> df.transform(lambda x: x + 1)
A  B
0  1  2
1  2  3
2  3  4

Even though the resulting DataFrame must have the same length as the input DataFrame, it is possible to provide several input functions:

Copy
Copied!

            
            >>> s = pd.Series(range(3))
>>> s
0    0
1    1
2    2
dtype: int64
>>> s.transform([np.sqrt, np.exp])
sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056

You can call transform on a GroupBy object:

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     "Date": [
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
...         "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
...     "Data": [5, 8, 6, 1, 50, 100, 60, 120],
... })
>>> df
Date  Data
0  2015-05-08     5
1  2015-05-07     8
2  2015-05-06     6
3  2015-05-05     1
4  2015-05-08    50
5  2015-05-07   100
6  2015-05-06    60
7  2015-05-05   120
>>> df.groupby('Date')['Data'].transform('sum')
0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64

Copy
Copied!

            
            >>> df = pd.DataFrame({
...     "c": [1, 1, 1, 2, 2, 2, 2],
...     "type": ["m", "n", "o", "m", "m", "n", "n"]
... })
>>> df
c type
0  1    m
1  1    n
2  1    o
3  2    m
4  2    m
5  2    n
6  2    n
>>> df['size'] = df.groupby('c')['type'].transform(len)
>>> df
c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4

transpose(*args, copy=False)[source]

Transpose index and columns.

Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().

Parameters

*argstuple, optional
copybool, default False

Returns

DataFrame

See also

numpy.transpose

Notes

Transposing a DataFrame with mixed dtypes will result in a homogeneous DataFrame with the object dtype. In such a case, a copy of the data is always made.

Examples

Square DataFrame with homogeneous dtype

Copy
Copied!

            
            >>> d1 = {'col1': [1, 2], 'col2': [3, 4]}
>>> df1 = pd.DataFrame(data=d1)
>>> df1
col1  col2
0     1     3
1     2     4

Copy
Copied!

            
            >>> df1_transposed = df1.T # or df1.transpose()
>>> df1_transposed
0  1
col1  1  2
col2  3  4

When the dtype is homogeneous in the original DataFrame, we get a transposed DataFrame with the same dtype:

Copy
Copied!

            
            >>> df1.dtypes
col1    int64
col2    int64
dtype: object
>>> df1_transposed.dtypes
0    int64
1    int64
dtype: object

Non-square DataFrame with mixed dtypes

Copy
Copied!

            
            >>> d2 = {'name': ['Alice', 'Bob'],
...       'score': [9.5, 8],
...       'employed': [False, True],
...       'kids': [0, 0]}
>>> df2 = pd.DataFrame(data=d2)
>>> df2
name  score  employed  kids
0  Alice    9.5     False     0
1    Bob    8.0      True     0

Copy
Copied!

            
            >>> df2_transposed = df2.T # or df2.transpose()
>>> df2_transposed
0     1
name      Alice   Bob
score       9.5   8.0
employed  False  True
kids          0     0

When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:

Copy
Copied!

            
            >>> df2.dtypes
name         object
score       float64
employed       bool
kids          int64
dtype: object
>>> df2_transposed.dtypes
0    object
1    object
dtype: object

truediv(other, axis='columns', level=None, fill_value=None)[source]

Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters

otherscalar, sequence, Series, or DataFrame
axis{0 or ‘index’, 1 or ‘columns’}
levelint or label
fill_valuefloat or None, default None

Returns

DataFrame

See also

DataFrame.add
DataFrame.sub
DataFrame.mul
DataFrame.div
DataFrame.truediv
DataFrame.floordiv
DataFrame.mod
DataFrame.pow

Notes

Mismatched indices will be unioned together.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'angles': [0, 3, 4],
...                    'degrees': [360, 180, 360]},
...                   index=['circle', 'triangle', 'rectangle'])
>>> df
angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360

Add a scalar with operator version which return the same results.

Copy
Copied!

            
            >>> df + 1
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Copy
Copied!

            
            >>> df.add(1)
angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361

Divide by constant with reverse version.

Copy
Copied!

            
            >>> df.div(10)
angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0

Copy
Copied!

            
            >>> df.rdiv(10)
angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778

Subtract a list and Series by axis with operator version.

Copy
Copied!

            
            >>> df - [1, 2]
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub([1, 2], axis='columns')
angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358

Copy
Copied!

            
            >>> df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
...        axis='index')
angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359

Multiply a DataFrame of different shape with operator version.

Copy
Copied!

            
            >>> other = pd.DataFrame({'angles': [0, 3, 4]},
...                      index=['circle', 'triangle', 'rectangle'])
>>> other
angles
circle          0
triangle        3
rectangle       4

Copy
Copied!

            
            >>> df * other
angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN

Copy
Copied!

            
            >>> df.mul(other, fill_value=0)
angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0

Divide by a MultiIndex by level.

Copy
Copied!

            
            >>> df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
...                              'degrees': [360, 180, 360, 360, 540, 720]},
...                             index=[['A', 'A', 'A', 'B', 'B', 'B'],
...                                    ['circle', 'triangle', 'rectangle',
...                                     'square', 'pentagon', 'hexagon']])
>>> df_multindex
angles  degrees
A circle          0      360
triangle        3      180
rectangle       4      360
B square          4      360
pentagon        5      540
hexagon         6      720

Copy
Copied!

            
            >>> df.div(df_multindex, level=1, fill_value=0)
angles  degrees
A circle        NaN      1.0
triangle      1.0      1.0
rectangle     1.0      1.0
B square        0.0      0.0
pentagon      0.0      0.0
hexagon       0.0      0.0

truncate(before=None, after=None, axis=None, copy=True)[source]

Truncate a Series or DataFrame before and after some index value.

This is a useful shorthand for boolean indexing based on index values above or below certain thresholds.

Parameters

beforedate, str, int
afterdate, str, int
axis{0 or ‘index’, 1 or ‘columns’}, optional
copybool, default is True,

Returns

type of caller

See also

DataFrame.loc
DataFrame.iloc

Notes

If the index being truncated contains only datetime values, before and after may be specified as strings instead of Timestamps.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
...                    'B': ['f', 'g', 'h', 'i', 'j'],
...                    'C': ['k', 'l', 'm', 'n', 'o']},
...                   index=[1, 2, 3, 4, 5])
>>> df
A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o

Copy
Copied!

            
            >>> df.truncate(before=2, after=4)
A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

The columns of a DataFrame can be truncated.

Copy
Copied!

            
            >>> df.truncate(before="A", after="B", axis="columns")
A  B
1  a  f
2  b  g
3  c  h
4  d  i
5  e  j

For Series, only rows can be truncated.

Copy
Copied!

            
            >>> df['A'].truncate(before=2, after=4)
2    b
3    c
4    d
Name: A, dtype: object

The index values in truncate can be datetimes or string dates.

Copy
Copied!

            
            >>> dates = pd.date_range('2016-01-01', '2016-02-01', freq='s')
>>> df = pd.DataFrame(index=dates, data={'A': 1})
>>> df.tail()
A
2016-01-31 23:59:56  1
2016-01-31 23:59:57  1
2016-01-31 23:59:58  1
2016-01-31 23:59:59  1
2016-02-01 00:00:00  1

Copy
Copied!

            
            >>> df.truncate(before=pd.Timestamp('2016-01-05'),
...             after=pd.Timestamp('2016-01-10')).tail()
A
2016-01-09 23:59:56  1
2016-01-09 23:59:57  1
2016-01-09 23:59:58  1
2016-01-09 23:59:59  1
2016-01-10 00:00:00  1

Because the index is a DatetimeIndex containing only dates, we can specify before and after as strings. They will be coerced to Timestamps before truncation.

Copy
Copied!

            
            >>> df.truncate('2016-01-05', '2016-01-10').tail()
A
2016-01-09 23:59:56  1
2016-01-09 23:59:57  1
2016-01-09 23:59:58  1
2016-01-09 23:59:59  1
2016-01-10 00:00:00  1

Note that truncate assumes a 0 value for any unspecified time component (midnight). This differs from partial string slicing, which returns any partially matching dates.

Copy
Copied!

            
            >>> df.loc['2016-01-05':'2016-01-10', :].tail()
A
2016-01-10 23:59:55  1
2016-01-10 23:59:56  1
2016-01-10 23:59:57  1
2016-01-10 23:59:58  1
2016-01-10 23:59:59  1

tshift(periods=1, freq=None, axis=0)[source]

Shift the time index, using the index’s frequency if available.

Deprecated since version 1.1.0:Use shift instead.

Parameters

periodsint
freqDateOffset, timedelta, or str, default None
axis{0 or ‘index’, 1 or ‘columns’, None}, default 0

Returns

shiftedSeries/DataFrame

Notes

If freq is not specified then tries to use the freq or inferred_freq attributes of the index. If neither of those attributes exist, a ValueError is thrown

tz_convert(tz, axis=0, level=None, copy=True)[source]

Convert tz-aware axis to target time zone.

Parameters

tzstr or tzinfo object
axisthe axis to convert
levelint, str, default None
copybool, default True

Returns

{klass}

Raises

TypeError

tz_localize(tz, axis=0, level=None, copy=True, ambiguous='raise', nonexistent='raise')[source]

Localize tz-naive index of a Series or DataFrame to target time zone.

This operation localizes the Index. To localize the values in a timezone-naive Series, use Series.dt.tz_localize().

Parameters

tzstr or tzinfo
axisthe axis to localize
levelint, str, default None
copybool, default True
ambiguous‘infer’, bool-ndarray, ‘NaT’, default ‘raise’
nonexistentstr, default ‘raise’

Returns

Series or DataFrame

Raises

TypeError

Examples

Localize local times:

Copy
Copied!

            
            >>> s = pd.Series([1],
...               index=pd.DatetimeIndex(['2018-09-15 01:30:00']))
>>> s.tz_localize('CET')
2018-09-15 01:30:00+02:00    1
dtype: int64

Be careful with DST changes. When there is sequential data, pandas can infer the DST time:

Copy
Copied!

            
            >>> s = pd.Series(range(7),
...               index=pd.DatetimeIndex(['2018-10-28 01:30:00',
...                                       '2018-10-28 02:00:00',
...                                       '2018-10-28 02:30:00',
...                                       '2018-10-28 02:00:00',
...                                       '2018-10-28 02:30:00',
...                                       '2018-10-28 03:00:00',
...                                       '2018-10-28 03:30:00']))
>>> s.tz_localize('CET', ambiguous='infer')
2018-10-28 01:30:00+02:00    0
2018-10-28 02:00:00+02:00    1
2018-10-28 02:30:00+02:00    2
2018-10-28 02:00:00+01:00    3
2018-10-28 02:30:00+01:00    4
2018-10-28 03:00:00+01:00    5
2018-10-28 03:30:00+01:00    6
dtype: int64

In some cases, inferring the DST is impossible. In such cases, you can pass an ndarray to the ambiguous parameter to set the DST explicitly

Copy
Copied!

            
            >>> s = pd.Series(range(3),
...               index=pd.DatetimeIndex(['2018-10-28 01:20:00',
...                                       '2018-10-28 02:36:00',
...                                       '2018-10-28 03:46:00']))
>>> s.tz_localize('CET', ambiguous=np.array([True, True, False]))
2018-10-28 01:20:00+02:00    0
2018-10-28 02:36:00+02:00    1
2018-10-28 03:46:00+01:00    2
dtype: int64

If the DST transition causes nonexistent times, you can shift these dates forward or backward with a timedelta object or 'shift_forward' or 'shift_backward'.

Copy
Copied!

            
            >>> s = pd.Series(range(2),
...               index=pd.DatetimeIndex(['2015-03-29 02:30:00',
...                                       '2015-03-29 03:30:00']))
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
2015-03-29 03:00:00+02:00    0
2015-03-29 03:30:00+02:00    1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
2015-03-29 01:59:59.999999999+01:00    0
2015-03-29 03:30:00+02:00              1
dtype: int64
>>> s.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))
2015-03-29 03:30:00+02:00    0
2015-03-29 03:30:00+02:00    1
dtype: int64

unstack(level=- 1, fill_value=None)[source]

Pivot a level of the (necessarily hierarchical) index labels.

Returns a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).

Parameters

levelint, str, or list of these, default -1 (last level)
fill_valueint, str or dict

Returns

Series or DataFrame

See also

DataFrame.pivot
DataFrame.stack

Examples

Copy
Copied!

            
            >>> index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'),
...                                    ('two', 'a'), ('two', 'b')])
>>> s = pd.Series(np.arange(1.0, 5.0), index=index)
>>> s
one  a   1.0
b   2.0
two  a   3.0
b   4.0
dtype: float64

Copy
Copied!

            
            >>> s.unstack(level=-1)
a   b
one  1.0  2.0
two  3.0  4.0

Copy
Copied!

            
            >>> s.unstack(level=0)
one  two
a  1.0   3.0
b  2.0   4.0

Copy
Copied!

            
            >>> df = s.unstack(level=0)
>>> df.unstack()
one  a  1.0
b  2.0
two  a  3.0
b  4.0
dtype: float64

update(other, join='left', overwrite=True, filter_func=None, errors='ignore')[source]

Modify in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

Parameters

otherDataFrame, or object coercible into a DataFrame
join{‘left’}, default ‘left’
overwritebool, default True
filter_funccallable(1d-array) -> bool 1d-array, optional
errors{‘raise’, ‘ignore’}, default ‘ignore’

Returns

Nonemethod directly changes calling object

Raises

ValueError
NotImplementedError

See also

dict.update
DataFrame.merge

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, 5, 6],
...                        'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df
A  B
0  1  4
1  2  5
2  3  6

The DataFrame’s length does not increase as a result of the update, only values at matching index/column labels are updated.

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df
A  B
0  a  d
1  b  e
2  c  f

For Series, its name attribute must be set.

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_column = pd.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df
A  B
0  a  d
1  b  y
2  c  e
>>> df = pd.DataFrame({'A': ['a', 'b', 'c'],
...                    'B': ['x', 'y', 'z']})
>>> new_df = pd.DataFrame({'B': ['d', 'e']}, index=[1, 2])
>>> df.update(new_df)
>>> df
A  B
0  a  x
1  b  d
2  c  e

If other contains NaNs the corresponding values are not updated in the original dataframe.

Copy
Copied!

            
            >>> df = pd.DataFrame({'A': [1, 2, 3],
...                    'B': [400, 500, 600]})
>>> new_df = pd.DataFrame({'B': [4, np.nan, 6]})
>>> df.update(new_df)
>>> df
A      B
0  1    4.0
1  2  500.0
2  3    6.0

value_counts(subset=None, normalize=False, sort=True, ascending=False, dropna=True)[source]

Return a Series containing counts of unique rows in the DataFrame.

New in version 1.1.0.

Parameters

subsetlist-like, optional
normalizebool, default False
sortbool, default True
ascendingbool, default False
dropnabool, default True

Returns

Series

See also

Series.value_counts

Notes

The returned Series will have a MultiIndex with one level per input column. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.

Examples

Copy
Copied!

            
            >>> df = pd.DataFrame({'num_legs': [2, 4, 4, 6],
...                    'num_wings': [2, 0, 0, 0]},
...                   index=['falcon', 'dog', 'cat', 'ant'])
>>> df
num_legs  num_wings
falcon         2          2
dog            4          0
cat            4          0
ant            6          0

Copy
Copied!

            
            >>> df.value_counts()
num_legs  num_wings
4         0            2
2         2            1
6         0            1
dtype: int64

Copy
Copied!

            
            >>> df.value_counts(sort=False)
num_legs  num_wings
2         2            1
4         0            2
6         0            1
dtype: int64

Copy
Copied!

            
            >>> df.value_counts(ascending=True)
num_legs  num_wings
2         2            1
6         0            1
4         0            2
dtype: int64

Copy
Copied!

            
            >>> df.value_counts(normalize=True)
num_legs  num_wings
4         0            0.50
2         2            0.25
6         0            0.25
dtype: float64

With dropna set to False we can also count rows with NA values.

Copy
Copied!

            
            >>> df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'],
...                    'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']})
>>> df
first_name middle_name
0       John       Smith
1       Anne        <NA>
2       John        <NA>
3       Beth      Louise

Copy
Copied!

            
            >>> df.value_counts()
first_name  middle_name
Beth        Louise         1
John        Smith          1
dtype: int64

Copy
Copied!

            
            >>> df.value_counts(dropna=False)
first_name  middle_name
Anne        NaN            1
Beth        Louise         1
John        Smith          1
NaN            1
dtype: int64

property values: numpy.ndarray

Return a Numpy representation of the DataFrame.

Warning

We recommend using DataFrame.to_numpy() instead.

Only the values in the DataFrame will be returned, the axes labels will be removed.

Returns

numpy.ndarray

See also

DataFrame.to_numpy
DataFrame.index
DataFrame.columns

Notes

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

e.g. If the dtypes are float16 and float32, dtype will be upcast to float32. If dtypes are int32 and uint8, dtype will be upcast to int32. By numpy.find_common_type() convention, mixing int64 and uint64 will result in a float64 dtype.

Examples

A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

Copy
Copied!

            
            >>> df = pd.DataFrame({'age':    [ 3,  29],
...                    'height': [94, 170],
...                    'weight': [31, 115]})
>>> df
age  height  weight
0    3      94      31
1   29     170     115
>>> df.dtypes
age       int64
height    int64
weight    int64
dtype: object
>>> df.values
array([[  3,  94,  31],
[ 29, 170, 115]])

A DataFrame with mixed type columns(e.g., str/object, int64, float32) results in an ndarray of the broadest type that accommodates these mixed types (e.g., object).

Copy
Copied!

            
            >>> df2 = pd.DataFrame([('parrot',   24.0, 'second'),
...                     ('lion',     80.5, 1),
...                     ('monkey', np.nan, None)],
...                   columns=('name', 'max_speed', 'rank'))
>>> df2.dtypes
name          object
max_speed    float64
rank          object
dtype: object
>>> df2.values
array([['parrot', 24.0, 'second'],
['lion', 80.5, 1],
['monkey', nan, None]], dtype=object)

var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)[source]

Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters

axis{index (0), columns (1)}
skipnabool, default True
levelint or level name, default None
ddofint, default 1
numeric_onlybool, default None

Returns

Series or DataFrame (if level specified)

Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1)

where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=NoDefault.no_default)[source]

Replace values where the condition is False.

Parameters

condbool Series/DataFrame, array-like, or callable
otherscalar, Series/DataFrame, or callable
inplacebool, default False
axisint, default None
levelint, default None
errorsstr, {‘raise’, ‘ignore’}, default ‘raise’
try_castbool, default None

Returns

Same type as caller or None if inplace=True.

See also

DataFrame.mask()

Notes

The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is True the element is used; otherwise the corresponding element from the DataFrame other is used.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the where documentation in indexing.

Examples

Copy
Copied!

            
            >>> s = pd.Series(range(5))
>>> s.where(s > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
>>> s.mask(s > 0)
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

Copy
Copied!

            
            >>> s.where(s > 1, 10)
0    10
1    10
2    2
3    3
4    4
dtype: int64
>>> s.mask(s > 1, 10)
0     0
1     1
2    10
3    10
4    10
dtype: int64

Copy
Copied!

            
            >>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df
A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
>>> m = df % 3 == 0
>>> df.where(m, -df)
A  B
0  0 -1
1 -2  3
2 -4 -5
3  6 -7
4 -8  9
>>> df.where(m, -df) == np.where(m, df, -df)
A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True
>>> df.where(m, -df) == df.mask(~m, -df)
A     B
0  True  True
1  True  True
2  True  True
3  True  True
4  True  True

xs(key, axis=0, level=None, drop_level=True)[source]

Return cross-section from the Series/DataFrame.

This method takes a key argument to select data at a particular level of a MultiIndex.

Parameters

keylabel or tuple of label
axis{0 or ‘index’, 1 or ‘columns’}, default 0
levelobject, defaults to first n levels (n=1 or len(key))
drop_levelbool, default True

Returns

Series or DataFrame

See also

DataFrame.loc
DataFrame.iloc

Notes

xs can not be used to set values.

MultiIndex Slicers is a generic way to get/set values on any level or levels. It is a superset of xs functionality, see MultiIndex Slicers.

Examples

Copy
Copied!

            
            >>> d = {'num_legs': [4, 4, 2, 2],
...      'num_wings': [0, 0, 2, 2],
...      'class': ['mammal', 'mammal', 'mammal', 'bird'],
...      'animal': ['cat', 'dog', 'bat', 'penguin'],
...      'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = pd.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df
num_legs  num_wings
class  animal  locomotion
mammal cat     walks              4          0
dog     walks              4          0
bat     flies              2          2
bird   penguin walks              2          2

Get values at specified index

Copy
Copied!

            
            >>> df.xs('mammal')
num_legs  num_wings
animal locomotion
cat    walks              4          0
dog    walks              4          0
bat    flies              2          2

Get values at several indexes

Copy
Copied!

            
            >>> df.xs(('mammal', 'dog'))
num_legs  num_wings
locomotion
walks              4          0

Get values at specified index and level

Copy
Copied!

            
            >>> df.xs('cat', level=1)
num_legs  num_wings
class  locomotion
mammal walks              4          0

Get values at several indexes and levels

Copy
Copied!

            
            >>> df.xs(('bird', 'walks'),
...       level=[0, 'locomotion'])
num_legs  num_wings
animal
penguin         2          2

Get values at specified column and axis

Copy
Copied!

            
            >>> df.xs('num_wings', axis=1)
class   animal   locomotion
mammal  cat      walks         0
dog      walks         0
bat      flies         2
bird    penguin  walks         2
Name: num_wings, dtype: int64