macpie.pandas.group_by_keep_one#

macpie.pandas.group_by_keep_one(df: DataFrame, group_by_col: str, date_col_name: str, keep: str = 'all', id_col_name: str | None = None, drop_duplicates: bool = False) DataFrame#

Given a pandas.DataFrame object, group on the group_by_col column and keep only the earliest or latest row in each group as determined by the date in the date_col_name column.

Parameters:
dfDataFrame
group_by_colstr

The DataFrame column to group on

date_col_namestr

The date column to determine which row is earliest or latest

keep: {‘all’, ‘earliest’, ‘latest’}, default ‘all’

Specify which row of each group to keep.

  • all: keep all rows

  • earliest: in each group, keep only the earliest (i.e. oldest) row

  • latest: in each group, keep only the latest (i.e. most recent) row

id_col_namestr, optional

Used to sort results if there are duplicates. If drop_duplicates=True, the column specified here will also be used for identifying duplicates

drop_duplicatesbool, default: False

If True, then if more than one row is determined to be ‘earliest’ or ‘latest’ in each group, drop all duplicates except the first occurrence. If id_col_name is specified, then that column will also be used for identifying duplicates

Returns:
DataFrame

A DataFrame of the result.