Pandas#

As MACPie relies heavily on the pandas library, a rich set of functions that work with pandas.DataFrame and pandas.Series objects were created and are provided through this API.

I/O Functions#

read_csv(filepath_or_buffer[, engine, ...])

Parse a csv file into a pandas.DataFrame.

read_excel(filepath_or_buffer[, sheet_name, ...])

Parse an Excel file into a pandas.DataFrame.

read_file(filepath_or_buffer[, format_options])

Parse a file into a pandas.DataFrame.

Selecting data#

filter_by_id(df, id_col_name, ids)

Filters a pandas.DataFrame object to only include a specified list of numerical IDs in a specified numerical ID column.

filter_labels(*dfs[, axis, filter_level, ...])

Filter dataframe row or column labels.

filter_labels_pair(left, right[, axis])

Filter row or column labels on a pair of dataframes.

get_col_name(df, col_name)

Get the properly-cased column name from df, ignoring case.

get_col_names(df, col_names[, strict])

Get the properly-cased columns names from df, ignoring case.

get_cols_by_prefixes(df, prefixes[, ...])

Get columns that start with the prefixes.

remove_trailers(ser[, predicates, ...])

Removes trailing elements in a pandas.Series object for which each predicate in predicates returns True.

rtrim(ser[, trim_empty_string])

Trim trailing missing values from series.

rtrim_longest(*sers[, trim_empty_string, ...])

Trim trailing missing values from each series.

subset(*dfs, **kwargs)

Subset rows or columns of one or more DataFrames according to filtered labels.

subset_pair(left, right[, axis])

Subset rows or columns of a pair of dataframes according to filtered labels.

Indexing#

drop_suffix(df, suffix)

Removes the suffix in any column name containing the suffix.

flatten_multiindex(df[, axis, delimiter])

Flatten (i.e.

insert(df, col_name, col_value, **kwargs)

Adds a column to the end of the DataFrame

prepend_multi_index_level(df, level_name[, axis])

Prepend a MultiIndex level.

replace_suffix(df, old_suffix, new_suffix)

For any column names containing old_suffix, replace the old_suffix with new_suffix.

Describing data#

add_diff_days(df, col_start, col_end[, ...])

Adds a column whose values are the number of days between col_start and col_end

any_duplicates(df, col[, ignore_nan])

Return True if there are any duplicates in col.

count_trailers(ser[, predicates, count_na, ...])

Counts trailing elements in a pandas.Series object for which each predicate in predicates returns True.

is_date_col(df, arr_or_dtype)

Check whether the provided array or dtype is of the datetime64 dtype.

mark_duplicates_by_cols(df, cols)

Create a column in df called get_option("column.system.duplicates") which is a boolean Series denoting duplicate rows as identified by cols.

Combining data#

date_proximity(left, right[, id_on, ...])

Links data across two pandas.DataFrame objects by date proximity.

merge(left, right[, on, left_on, right_on, ...])

Merge pandas.DataFrame objects with a database-style join, similar to pandas.DataFrame.merge(), but with additional options.

Comparing data#

compare(left, right[, subset_pair_kwargs])

Compare two DataFrames and show the differences.

diff_cols(left, right[, ...])

Find the column differences between two DataFrames.

diff_rows(left, right[, subset_pair_kwargs])

Find the row differences between two DataFrames (that share the same columns).

equals(left, right[, subset_pair_kwargs])

For testing equality of pandas.DataFrame objects

Converting data#

conform(left, right[, subset_pair_kwargs, ...])

Conform one Dataframe to another.

mimic_dtypes(left, right[, categorical])

Cast column data types in right to be the same as those in left where the column name is the same.

mimic_index_order(left, right[, axis])

Order the right labels as close as possible to the order of the left labels.

to_datetime(df, date_col_name, **kwargs)

Convert date_col_name column in df to datetime.

Group by#

group_by_keep_one(df, group_by_col, ...[, ...])

Given a pandas.DataFrame object, group on the group_by_col column and keep only the earliest or latest row in each group as determined by the date in the date_col_name column.

Sorting Data#

sort_values_pair(left, right[, right_only, axis])

Sort the pair of DataFrames using their common labels.

Accessors#

Pandas allows adding additional “namespaces” to pandas objects to extend them. MACPie adds the mac namespace to pandas.DataFrame and pandas.Series objects to provide access to many of these methods and more.

See the corresponding accessor classes to see which methods are available via the mac namespace.

DataFrame Accessor#

Methods on this accessor class are available on pandas.DataFrame objects via the mac namespace.

MacDataFrameAccessor(df)

Custom DataFrame accessor to extend the pandas.DataFrame object.

Series Accessor#

Methods on this accessor class are available on pandas.Series objects via the mac namespace.

MacSeriesAccessor(ser)

Custom Series accessor to extend the pandas.DataSeriesFrame object.