Pandas#

As MACPie relies heavily on the pandas library, a rich set of functions that work with pandas.DataFrame and pandas.Series objects were created and are provided through this API.

I/O Functions#

`read_csv`(filepath_or_buffer[, engine, ...])	Parse a csv file into a `pandas.DataFrame`.
`read_excel`(filepath_or_buffer[, sheet_name, ...])	Parse an Excel file into a `pandas.DataFrame`.
`read_file`(filepath_or_buffer[, format_options])	Parse a file into a `pandas.DataFrame`.

Selecting data#

`filter_by_id`(df, id_col_name, ids)	Filters a `pandas.DataFrame` object to only include a specified list of numerical IDs in a specified numerical ID column.
`filter_labels`(*dfs[, axis, filter_level, ...])	Filter dataframe row or column labels.
`filter_labels_pair`(left, right[, axis])	Filter row or column labels on a pair of dataframes.
`get_col_name`(df, col_name)	Get the properly-cased column name from `df`, ignoring case.
`get_col_names`(df, col_names[, strict])	Get the properly-cased columns names from `df`, ignoring case.
`get_cols_by_prefixes`(df, prefixes[, ...])	Get columns that start with the prefixes.
`remove_trailers`(ser[, predicates, ...])	Removes trailing elements in a `pandas.Series` object for which each predicate in `predicates` returns True.
`rtrim`(ser[, trim_empty_string])	Trim trailing missing values from series.
`rtrim_longest`(*sers[, trim_empty_string, ...])	Trim trailing missing values from each series.
`subset`(dfs, *kwargs)	Subset rows or columns of one or more DataFrames according to filtered labels.
`subset_pair`(left, right[, axis])	Subset rows or columns of a pair of dataframes according to filtered labels.

Indexing#

`drop_suffix`(df, suffix)	Removes the `suffix` in any column name containing the `suffix`.
`flatten_multiindex`(df[, axis, delimiter])	Flatten (i.e.
`insert`(df, col_name, col_value, **kwargs)	Adds a column to the end of the DataFrame
`prepend_multi_index_level`(df, level_name[, axis])	Prepend a MultiIndex level.
`replace_suffix`(df, old_suffix, new_suffix)	For any column names containing `old_suffix`, replace the `old_suffix` with `new_suffix`.

Describing data#

`add_diff_days`(df, col_start, col_end[, ...])	Adds a column whose values are the number of days between `col_start` and `col_end`
`any_duplicates`(df, col[, ignore_nan])	Return `True` if there are any duplicates in `col`.
`count_trailers`(ser[, predicates, count_na, ...])	Counts trailing elements in a `pandas.Series` object for which each predicate in `predicates` returns True.
`is_date_col`(df, arr_or_dtype)	Check whether the provided array or dtype is of the datetime64 dtype.
`mark_duplicates_by_cols`(df, cols)	Create a column in `df` called `get_option("column.system.duplicates")` which is a boolean Series denoting duplicate rows as identified by `cols`.

Combining data#

`date_proximity`(left, right[, id_on, ...])	Links data across two `pandas.DataFrame` objects by date proximity.
`merge`(left, right[, on, left_on, right_on, ...])	Merge `pandas.DataFrame` objects with a database-style join, similar to `pandas.DataFrame.merge()`, but with additional options.

Comparing data#

`compare`(left, right[, subset_pair_kwargs])	Compare two DataFrames and show the differences.
`diff_cols`(left, right[, ...])	Find the column differences between two DataFrames.
`diff_rows`(left, right[, subset_pair_kwargs])	Find the row differences between two DataFrames (that share the same columns).
`equals`(left, right[, subset_pair_kwargs])	For testing equality of `pandas.DataFrame` objects

Converting data#

`conform`(left, right[, subset_pair_kwargs, ...])	Conform one Dataframe to another.
`mimic_dtypes`(left, right[, categorical])	Cast column data types in `right` to be the same as those in `left` where the column name is the same.
`mimic_index_order`(left, right[, axis])	Order the `right` labels as close as possible to the order of the `left` labels.
`to_datetime`(df, date_col_name, **kwargs)	Convert `date_col_name` column in `df` to datetime.

Group by#

group_by_keep_one(df, group_by_col, ...[, ...])

Given a pandas.DataFrame object, group on the group_by_col column and keep only the earliest or latest row in each group as determined by the date in the date_col_name column.

Sorting Data#

sort_values_pair(left, right[, right_only, axis])

Sort the pair of DataFrames using their common labels.

Accessors#

Pandas allows adding additional “namespaces” to pandas objects to extend them. MACPie adds the mac namespace to pandas.DataFrame and pandas.Series objects to provide access to many of these methods and more.

See the corresponding accessor classes to see which methods are available via the mac namespace.

DataFrame Accessor#

Methods on this accessor class are available on pandas.DataFrame objects via the mac namespace.

MacDataFrameAccessor(df)

Custom DataFrame accessor to extend the pandas.DataFrame object.

Series Accessor#

Methods on this accessor class are available on pandas.Series objects via the mac namespace.

MacSeriesAccessor(ser)

Custom Series accessor to extend the pandas.DataSeriesFrame object.