Dataset#

A Dataset is a tabular data structure that contains columns representing common data associated with clinical research:

  • id_col_name to represent record IDs

  • date_col_name to represent record collection dates

  • id2_col_name to represent a secondary id column, most commonly a patient/subject ID.

Constructor#

Dataset([data, id_col_name, ...])

A Dataset object is a pandas.DataFrame that has columns representing common data associated with clinical research assessments.

Column changes#

Methods related to changing the columns in the Dataset.

Dataset.create_id_col([col_name, start_index])

Create id_col_name with sequential numerical index.

Dataset.rename_col(old_col, new_col[, inplace])

Rename old_col to new_col.

Dataset.prepend_level(level[, inplace])

Create a MultiIndex by adding level as the first level.

Dataset.drop_sys_cols([inplace])

Drop all sys_cols from Dataset.

Dataset.keep_cols(cols[, inplace])

Keep specified columns (thus dropping the rest).

Dataset.keep_fields(selected_fields[, inplace])

Keep specified fields (and drop the rest).

Combining, group by#

date_proximity(left, right[, get, when, ...])

Links data across two Dataset objects by date proximity, first joining them on their Dataset.id2_col_name.

group_by_keep_one(dset[, keep, drop_duplicates])

Given a Dataset object, group on the Dataset.id2_col_name column and keep only the earliest or latest row in each group as determined by the date in the Dataset.date_col_name column.

Reshaping, sorting#

Dataset.sort_by_id2()

Sort df by id2_col_name.

Excel IO#

Serialization to and from Excel requires an excel_dict, which is a dict that contains the needed information to round-trip a Dataset.

Dataset.cross_section(excel_dict)

Return the Dataset defined by excel_dict from this Dataset.

Dataset.display_name_generator

The function used to generate display_name.

Dataset.to_excel_dict()

Convert the Dataset to a dictionary representation needed for Excel reading/writing.

Dataset.from_excel_dict(excel_dict, df)

Construct a Dataset from a dictionary representation.

Helpers#

DatasetFields(*args, **kwargs)

A tabular representation of a set of macpie.Dataset fields.

Subclasses#

LavaDataset(*args, **kwargs)

A Dataset using LAVA defaults.