macpie.date_proximity#
- macpie.date_proximity(left: Dataset, right: Dataset, get: str = 'all', when: str = 'earlier_or_later', days: int = 90, dropna: bool = False, drop_duplicates: bool = False, duplicates_indicator: bool = False, merge_suffixes=('_x', '_y'), prepend_level_name: bool = True) None #
Links data across two
Dataset
objects by date proximity, first joining them on theirDataset.id2_col_name
.Specifically, a left Dataset contains a timepoint anchor, and a right Dataset is linked to the left by retrieving all rows that match on
Dataset.id2_col_name
, and whoseDataset.date_col_name
fields are within a certain time range of each other.This is the
Dataset
analog ofmacpie.pandas.date_proximity()
.- Parameters:
- leftDataset
Contains the timepoint anchor (i.e. date_col)
- rightDataset
The Dataset to link.
- get{‘all’, ‘closest’}, default ‘all’
Indicates which rows of the right Dataset to link in reference to the timepoint anchor:
all: keep all rows
closest: get only the closest row that is within
days
days of the timepoint anchor
- when{‘earlier’, ‘later’, ‘earlier_or_later’}, default ‘earlier_or_later’
Indicates which rows of the right Dataset to link in temporal relation to the timepoint anchor
earlier: get only rows that are earlier than the timepoint anchor
later: get only rows that are lter (more recent) than the timepoint anchor
earlier_or_later: get rows that are earlier or later than the timepoint anchor
- daysint, default 90
The time range measured in days
- dropnabool, default: False
Whether to exclude rows that did not find any match
- drop_duplicatesbool, default: False
If
True
, then if more than one row in the right DataFrame is found, all will be dropped except the last one.- duplicates_indicatorbool or str, default False
If True, adds a column to the output DataFrame called “_mp_duplicates” denoting which rows are duplicates. The column can be given a different name by providing a string argument.
- merge_suffixeslist-like, default is (“_x”, “_y”)
A length-2 sequence where the first element is suffix to add to the left DataFrame columns, and second element is suffix to add to the right DataFrame columns.
- prepend_levelsbool, default True
Whether to add a top-level index using the
Dataset.name
attribute to column indexes inleft
andright
respectively (thus creating apandas.MultiIndex
if needed).