macpie.pandas.group_by_keep_one#

macpie.pandas.group_by_keep_one(df: DataFrame, group_by_col: str, date_col_name: str, keep: str = 'all', id_col_name: str | None = None, drop_duplicates: bool = False) → DataFrame#

Given a pandas.DataFrame object, group on the group_by_col column and keep only the earliest or latest row in each group as determined by the date in the date_col_name column.

Parameters:

dfDataFrame

group_by_colstr

The DataFrame column to group on

date_col_namestr

The date column to determine which row is earliest or latest

keep: {‘all’, ‘earliest’, ‘latest’}, default ‘all’

Specify which row of each group to keep.

all: keep all rows
earliest: in each group, keep only the earliest (i.e. oldest) row
latest: in each group, keep only the latest (i.e. most recent) row

id_col_namestr, optional

Used to sort results if there are duplicates. If drop_duplicates=True, the column specified here will also be used for identifying duplicates

drop_duplicatesbool, default: False

If True, then if more than one row is determined to be ‘earliest’ or ‘latest’ in each group, drop all duplicates except the first occurrence. If id_col_name is specified, then that column will also be used for identifying duplicates

Returns:

DataFrame: A DataFrame of the result.