macpie.pandas.group_by_keep_one#
- macpie.pandas.group_by_keep_one(df: DataFrame, group_by_col: str, date_col_name: str, keep: str = 'all', id_col_name: str | None = None, drop_duplicates: bool = False) DataFrame #
Given a
pandas.DataFrame
object, group on thegroup_by_col
column and keep only the earliest or latest row in each group as determined by the date in thedate_col_name
column.- Parameters:
- dfDataFrame
- group_by_colstr
The DataFrame column to group on
- date_col_namestr
The date column to determine which row is earliest or latest
- keep: {‘all’, ‘earliest’, ‘latest’}, default ‘all’
Specify which row of each group to keep.
all: keep all rows
earliest: in each group, keep only the earliest (i.e. oldest) row
latest: in each group, keep only the latest (i.e. most recent) row
- id_col_namestr, optional
Used to sort results if there are duplicates. If
drop_duplicates=True
, the column specified here will also be used for identifying duplicates- drop_duplicatesbool, default: False
If
True
, then if more than one row is determined to be ‘earliest’ or ‘latest’ in each group, drop all duplicates except the first occurrence. Ifid_col_name
is specified, then that column will also be used for identifying duplicates
- Returns:
- DataFrame
A DataFrame of the result.