macpie.group_by_keep_one#

macpie.group_by_keep_one(dset: Dataset, keep: str = 'all', drop_duplicates: bool = False) None#

Given a Dataset object, group on the Dataset.id2_col_name column and keep only the earliest or latest row in each group as determined by the date in the Dataset.date_col_name column.

This is the Dataset analog of macpie.pandas.group_by_keep_one().

Parameters:
dsetDataset
keep: {‘all’, ‘earliest’, ‘latest’}, default ‘all’

Specify which row of each group to keep.

  • all: keep all rows

  • earliest: in each group, keep only the earliest (i.e. oldest) row

  • latest: in each group, keep only the latest (i.e. most recent) row

drop_duplicatesbool, default: False

If True, then if more than one row is determined to be ‘earliest’ or ‘latest’ in each group, drop all duplicates except the first occurrence. If dset has an id_col_name, then that column will also be used for identifying duplicates