macpie.date_proximity#

macpie.date_proximity(left: Dataset, right: Dataset, get: str = 'all', when: str = 'earlier_or_later', days: int = 90, dropna: bool = False, drop_duplicates: bool = False, duplicates_indicator: bool = False, merge_suffixes=('_x', '_y'), prepend_level_name: bool = True) → None#

Links data across two Dataset objects by date proximity, first joining them on their Dataset.id2_col_name.

Specifically, a left Dataset contains a timepoint anchor, and a right Dataset is linked to the left by retrieving all rows that match on Dataset.id2_col_name, and whose Dataset.date_col_name fields are within a certain time range of each other.

This is the Dataset analog of macpie.pandas.date_proximity().

Parameters:

leftDataset

Contains the timepoint anchor (i.e. date_col)

rightDataset

The Dataset to link.

get{‘all’, ‘closest’}, default ‘all’

Indicates which rows of the right Dataset to link in reference to the timepoint anchor:

all: keep all rows
closest: get only the closest row that is within days days of the timepoint anchor

when{‘earlier’, ‘later’, ‘earlier_or_later’}, default ‘earlier_or_later’

Indicates which rows of the right Dataset to link in temporal relation to the timepoint anchor

earlier: get only rows that are earlier than the timepoint anchor
later: get only rows that are lter (more recent) than the timepoint anchor
earlier_or_later: get rows that are earlier or later than the timepoint anchor

daysint, default 90

The time range measured in days

dropnabool, default: False

Whether to exclude rows that did not find any match

drop_duplicatesbool, default: False

If True, then if more than one row in the right DataFrame is found, all will be dropped except the last one.

duplicates_indicatorbool or str, default False

If True, adds a column to the output DataFrame called “_mp_duplicates” denoting which rows are duplicates. The column can be given a different name by providing a string argument.

merge_suffixeslist-like, default is (“_x”, “_y”)

A length-2 sequence where the first element is suffix to add to the left DataFrame columns, and second element is suffix to add to the right DataFrame columns.

prepend_levelsbool, default True

Whether to add a top-level index using the Dataset.name attribute to column indexes in left and right respectively (thus creating a pandas.MultiIndex if needed).