macpie.LavaDataset#

class macpie.LavaDataset(*args, **kwargs)#

A Dataset using LAVA defaults. (LAVA is the data management system used at the Memory and Aging Center.) Defaults used are the following:

  • id_col_name = “InstrID”

  • date_col_name = “DCDate”

  • id2_col_name = “PIDN”

__init__(*args, **kwargs)#

Methods

__init__(*args, **kwargs)

abs()

Return a Series/DataFrame with absolute numeric value of each element.

add(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator add).

add_prefix(prefix)

Prefix labels with string prefix.

add_suffix(suffix)

Suffix labels with string suffix.

add_tag(tag)

Add a tag to the Dataset

agg([func, axis])

Aggregate using one or more operations over the specified axis.

aggregate([func, axis])

Aggregate using one or more operations over the specified axis.

align(other[, join, axis, level, copy, ...])

Align two objects on their axes with the specified join method.

all([axis, bool_only, skipna, level])

Return whether all elements are True, potentially over an axis.

any(*[, axis, bool_only, skipna, level])

Return whether any element is True, potentially over an axis.

append(other[, ignore_index, ...])

Append rows of other to the end of caller, returning a new object.

apply(func[, axis, raw, result_type, args])

Apply a function along an axis of the DataFrame.

applymap(func[, na_action])

Apply a function to a Dataframe elementwise.

asfreq(freq[, method, how, normalize, ...])

Convert time series to specified frequency.

asof(where[, subset])

Return the last row(s) without any NaNs before where.

assign(**kwargs)

Assign new columns to a DataFrame.

astype(dtype[, copy, errors])

Cast a pandas object to a specified dtype dtype.

at_time(time[, asof, axis])

Select values at particular time of day (e.g., 9:30AM).

backfill(*[, axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='bfill'.

between_time(start_time, end_time[, ...])

Select values between particular times of the day (e.g., 9:00-9:30 AM).

bfill(*[, axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='bfill'.

bool()

Return the bool of a single element Series or DataFrame.

boxplot([column, by, ax, fontsize, rot, ...])

Make a box plot from DataFrame columns.

clear_tags()

Clear all tags.

clip([lower, upper, axis, inplace])

Trim values at input threshold(s).

combine(other, func[, fill_value, overwrite])

Perform column-wise combine with another DataFrame.

combine_first(other)

Update null elements with value in the same location in other.

compare(other[, align_axis, keep_shape, ...])

Compare to another DataFrame and show the differences.

convert_dtypes([infer_objects, ...])

Convert columns to best possible dtypes using dtypes supporting pd.NA.

copy([deep])

Make a copy of this object's indices and data.

corr([method, min_periods, numeric_only])

Compute pairwise correlation of columns, excluding NA/null values.

corrwith(other[, axis, drop, method, ...])

Compute pairwise correlation.

count([axis, level, numeric_only])

Count non-NA cells for each column or row.

cov([min_periods, ddof, numeric_only])

Compute pairwise covariance of columns, excluding NA/null values.

create_id_col([col_name, start_index])

Create id_col_name with sequential numerical index.

cross_section(excel_dict)

Return the Dataset defined by excel_dict from this Dataset.

cummax([axis, skipna])

Return cumulative maximum over a DataFrame or Series axis.

cummin([axis, skipna])

Return cumulative minimum over a DataFrame or Series axis.

cumprod([axis, skipna])

Return cumulative product over a DataFrame or Series axis.

cumsum([axis, skipna])

Return cumulative sum over a DataFrame or Series axis.

date_proximity(**kwargs)

default_display_name_generator(dset)

Default function to use as display_name_generator.

describe([percentiles, include, exclude, ...])

Generate descriptive statistics.

diff([periods, axis])

First discrete difference of element.

div(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

divide(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

dot(other)

Compute the matrix multiplication between the DataFrame and other.

drop([labels, axis, index, columns, level, ...])

Drop specified labels from rows or columns.

drop_duplicates([subset, keep, inplace, ...])

Return DataFrame with duplicate rows removed.

drop_sys_cols([inplace])

Drop all sys_cols from Dataset.

droplevel(level[, axis])

Return Series/DataFrame with requested index / column level(s) removed.

dropna(*[, axis, how, thresh, subset, inplace])

Remove missing values.

duplicated([subset, keep])

Return boolean Series denoting duplicate rows.

eq(other[, axis, level])

Get Equal to of dataframe and other, element-wise (binary operator eq).

equals(other)

Test whether two Datasets are equal.

eval(expr, *[, inplace])

Evaluate a string describing operations on DataFrame columns.

ewm([com, span, halflife, alpha, ...])

Provide exponentially weighted (EW) calculations.

excel_dict_has_tags(excel_dict, tags)

Helper function to determine if an excel_dict has tags.

expanding([min_periods, center, axis, method])

Provide expanding window calculations.

explode(column[, ignore_index])

Transform each element of a list-like to a row, replicating index values.

ffill(*[, axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='ffill'.

fillna([value, method, axis, inplace, ...])

Fill NA/NaN values using the specified method.

filter([items, like, regex, axis])

Subset the dataframe rows or columns according to the specified index labels.

first(offset)

Select initial periods of time series data based on a date offset.

first_valid_index()

Return index for first non-NA value or None, if no non-NA value is found.

floordiv(other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

from_dict(data[, orient, dtype, columns])

Construct DataFrame from dict of array-like or dicts.

from_excel_dict(excel_dict, df)

Construct a Dataset from a dictionary representation.

from_file(filepath, **kwargs)

Construct Dataset from a file.

from_records(data[, index, exclude, ...])

Convert structured or record ndarray to DataFrame.

ge(other[, axis, level])

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

get(key[, default])

Get item from object for given key (ex: DataFrame column).

group_by_keep_one(**kwargs)

groupby([by, axis, level, as_index, sort, ...])

Group DataFrame using a mapper or by a Series of columns.

gt(other[, axis, level])

Get Greater than of dataframe and other, element-wise (binary operator gt).

has_tag(tag)

Returns true if Dataset contains tag.

head([n])

Return the first n rows.

hist([column, by, grid, xlabelsize, xrot, ...])

Make a histogram of the DataFrame's columns.

idxmax([axis, skipna, numeric_only])

Return index of first occurrence of maximum over requested axis.

idxmin([axis, skipna, numeric_only])

Return index of first occurrence of minimum over requested axis.

infer_objects()

Attempt to infer better dtypes for object columns.

info([verbose, buf, max_cols, memory_usage, ...])

Print a concise summary of a DataFrame.

insert(loc, column, value[, allow_duplicates])

Insert column into DataFrame at specified location.

interpolate([method, axis, limit, inplace, ...])

Fill NaN values using an interpolation method.

isetitem(loc, value)

Set the given value in the column with position 'loc'.

isin(values)

Whether each element in the DataFrame is contained in values.

isna()

Detect missing values.

isnull()

DataFrame.isnull is an alias for DataFrame.isna.

items()

Iterate over (column name, Series) pairs.

iteritems()

Iterate over (column name, Series) pairs.

iterrows()

Iterate over DataFrame rows as (index, Series) pairs.

itertuples([index, name])

Iterate over DataFrame rows as namedtuples.

join(other[, on, how, lsuffix, rsuffix, ...])

Join columns of another DataFrame.

keep_cols(cols[, inplace])

Keep specified columns (thus dropping the rest).

keep_fields(selected_fields[, inplace])

Keep specified fields (and drop the rest).

keys()

Get the 'info axis' (see Indexing for more).

kurt([axis, skipna, level, numeric_only])

Return unbiased kurtosis over requested axis.

kurtosis([axis, skipna, level, numeric_only])

Return unbiased kurtosis over requested axis.

last(offset)

Select final periods of time series data based on a date offset.

last_valid_index()

Return index for last non-NA value or None, if no non-NA value is found.

le(other[, axis, level])

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

lookup(row_labels, col_labels)

Label-based "fancy indexing" function for DataFrame.

lt(other[, axis, level])

Get Less than of dataframe and other, element-wise (binary operator lt).

mad([axis, skipna, level])

Return the mean absolute deviation of the values over the requested axis.

mask(cond[, other, inplace, axis, level, ...])

Replace values where the condition is True.

max([axis, skipna, level, numeric_only])

Return the maximum of the values over the requested axis.

mean([axis, skipna, level, numeric_only])

Return the mean of the values over the requested axis.

median([axis, skipna, level, numeric_only])

Return the median of the values over the requested axis.

melt([id_vars, value_vars, var_name, ...])

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

memory_usage([index, deep])

Return the memory usage of each column in bytes.

merge(right[, how, on, left_on, right_on, ...])

Merge DataFrame or named Series objects with a database-style join.

min([axis, skipna, level, numeric_only])

Return the minimum of the values over the requested axis.

mod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator mod).

mode([axis, numeric_only, dropna])

Get the mode(s) of each element along the selected axis.

mul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

multiply(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

ne(other[, axis, level])

Get Not equal to of dataframe and other, element-wise (binary operator ne).

nlargest(n, columns[, keep])

Return the first n rows ordered by columns in descending order.

notna()

Detect existing (non-missing) values.

notnull()

DataFrame.notnull is an alias for DataFrame.notna.

nsmallest(n, columns[, keep])

Return the first n rows ordered by columns in ascending order.

nunique([axis, dropna])

Count number of distinct elements in specified axis.

pad(*[, axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='ffill'.

pct_change([periods, fill_method, limit, freq])

Percentage change between the current and a prior element.

pipe(func, *args, **kwargs)

Apply chainable functions that expect Series or DataFrames.

pivot(*[, index, columns, values])

Return reshaped DataFrame organized by given index / column values.

pivot_table([values, index, columns, ...])

Create a spreadsheet-style pivot table as a DataFrame.

pop(item)

Return item and drop from frame.

pow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

prepend_level(level[, inplace])

Create a MultiIndex by adding level as the first level.

prod([axis, skipna, level, numeric_only, ...])

Return the product of the values over the requested axis.

product([axis, skipna, level, numeric_only, ...])

Return the product of the values over the requested axis.

quantile([q, axis, numeric_only, ...])

Return values at the given quantile over requested axis.

query(expr, *[, inplace])

Query the columns of a DataFrame with a boolean expression.

radd(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator radd).

rank([axis, method, numeric_only, ...])

Compute numerical data ranks (1 through n) along axis.

rdiv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

reindex([labels, index, columns, axis, ...])

Conform Series/DataFrame to new index with optional filling logic.

reindex_like(other[, method, copy, limit, ...])

Return an object with matching indices as other object.

rename([mapper, index, columns, axis, copy, ...])

Alter axes labels.

rename_axis([mapper, inplace])

Set the name of the axis for the index or columns.

rename_col(old_col, new_col[, inplace])

Rename old_col to new_col.

reorder_levels(order[, axis])

Rearrange index levels using input order.

replace([to_replace, value, inplace, limit, ...])

Replace values given in to_replace with value.

replace_tag(old_tag, new_tag)

Replace old_tag with new_tag.

resample(rule[, axis, closed, label, ...])

Resample time-series data.

reset_index([level, drop, inplace, ...])

Reset the index, or a level of it.

rfloordiv(other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

rmod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator rmod).

rmul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

rolling(window[, min_periods, center, ...])

Provide rolling window calculations.

round([decimals])

Round a DataFrame to a variable number of decimal places.

rpow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

rsub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

rtruediv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

sample([n, frac, replace, weights, ...])

Return a random sample of items from an axis of object.

select_dtypes([include, exclude])

Return a subset of the DataFrame's columns based on the column dtypes.

sem([axis, skipna, level, ddof, numeric_only])

Return unbiased standard error of the mean over requested axis.

set_axis(labels, *[, axis, inplace, copy])

Assign desired index to given axis.

set_flags(*[, copy, allows_duplicate_labels])

Return a new object with updated flags.

set_index(keys, *[, drop, append, inplace, ...])

Set the DataFrame index using existing columns.

shift([periods, freq, axis, fill_value])

Shift index by desired number of periods with an optional time freq.

skew([axis, skipna, level, numeric_only])

Return unbiased skew over requested axis.

slice_shift([periods, axis])

Equivalent to shift without copying data.

sort_by_id2()

Sort df by id2_col_name.

sort_index(*[, axis, level, ascending, ...])

Sort object by labels (along an axis).

sort_values(by, *[, axis, ascending, ...])

Sort by the values along either axis.

squeeze([axis])

Squeeze 1 dimensional axis objects into scalars.

stack([level, dropna])

Stack the prescribed level(s) from columns to index.

std([axis, skipna, level, ddof, numeric_only])

Return sample standard deviation over requested axis.

sub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator sub).

subtract(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator sub).

sum([axis, skipna, level, numeric_only, ...])

Return the sum of the values over the requested axis.

swapaxes(axis1, axis2[, copy])

Interchange axes and swap values axes appropriately.

swaplevel([i, j, axis])

Swap levels i and j in a MultiIndex.

tail([n])

Return the last n rows.

take(indices[, axis, is_copy])

Return the elements in the given positional indices along an axis.

to_clipboard([excel, sep])

Copy object to the system clipboard.

to_csv([path_or_buf, sep, na_rep, ...])

Write object to a comma-separated values (csv) file.

to_dict([orient, into])

Convert the DataFrame to a dictionary.

to_excel(excel_writer[, sheet_name, na_rep, ...])

Write Dataset to an Excel sheet.

to_excel_dict()

Convert the Dataset to a dictionary representation needed for Excel reading/writing.

to_feather(path, **kwargs)

Write a DataFrame to the binary Feather format.

to_gbq(destination_table[, project_id, ...])

Write a DataFrame to a Google BigQuery table.

to_hdf(path_or_buf, key[, mode, complevel, ...])

Write the contained data to an HDF5 file using HDFStore.

to_html([buf, columns, col_space, header, ...])

Render a DataFrame as an HTML table.

to_json([path_or_buf, orient, date_format, ...])

Convert the object to a JSON string.

to_latex([buf, columns, col_space, header, ...])

Render object to a LaTeX tabular, longtable, or nested table.

to_markdown([buf, mode, index, storage_options])

Print DataFrame in Markdown-friendly format.

to_numpy([dtype, copy, na_value])

Convert the DataFrame to a NumPy array.

to_orc([path, engine, index, engine_kwargs])

Write a DataFrame to the ORC format.

to_parquet([path, engine, compression, ...])

Write a DataFrame to the binary parquet format.

to_period([freq, axis, copy])

Convert DataFrame from DatetimeIndex to PeriodIndex.

to_pickle(path[, compression, protocol, ...])

Pickle (serialize) object to file.

to_records([index, column_dtypes, index_dtypes])

Convert DataFrame to a NumPy record array.

to_sql(name, con[, schema, if_exists, ...])

Write records stored in a DataFrame to a SQL database.

to_stata(path, *[, convert_dates, ...])

Export DataFrame object to Stata dta format.

to_string([buf, columns, col_space, header, ...])

Render a DataFrame to a console-friendly tabular output.

to_timestamp([freq, how, axis, copy])

Cast to DatetimeIndex of timestamps, at beginning of period.

to_xarray()

Return an xarray object from the pandas object.

to_xml([path_or_buffer, index, root_name, ...])

Render a DataFrame to an XML document.

transform(func[, axis])

Call func on self producing a DataFrame with the same axis shape as self.

transpose(*args[, copy])

Transpose index and columns.

truediv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

truncate([before, after, axis, copy])

Truncate a Series or DataFrame before and after some index value.

tshift([periods, freq, axis])

Shift the time index, using the index's frequency if available.

tz_convert(tz[, axis, level, copy])

Convert tz-aware axis to target time zone.

tz_localize(tz[, axis, level, copy, ...])

Localize tz-naive index of a Series or DataFrame to target time zone.

unstack([level, fill_value])

Pivot a level of the (necessarily hierarchical) index labels.

update(other[, join, overwrite, ...])

Modify in place using non-NA values from another DataFrame.

value_counts([subset, normalize, sort, ...])

Return a Series containing counts of unique rows in the DataFrame.

var([axis, skipna, level, ddof, numeric_only])

Return unbiased variance over requested axis.

where(cond[, other, inplace, axis, level, ...])

Replace values where the condition is False.

xs(key[, axis, level, drop_level])

Return cross-section from the Series/DataFrame.

Attributes

FIELD_DATE_COL_VALUES_POSSIBLE

Possible default values for date_col_name of Dataset.

FIELD_DATE_COL_VALUE_DEFAULT

Default value for date_col_name of Dataset.

FIELD_ID2_COL_VALUES_POSSIBLE

Possible default values for id2_col_name of Dataset.

FIELD_ID2_COL_VALUE_DEFAULT

Default value for id2_col_name of Dataset.

FIELD_ID_COL_VALUES_POSSIBLE

Possible default values for id_col_name of Dataset.

FIELD_ID_COL_VALUE_DEFAULT

Default value for id_col_name of Dataset.

T

all_fields

Returns list of all fields of this Dataset.

at

Access a single value for a row/column label pair.

attrs

Dictionary of global attributes of this dataset.

axes

Return a list representing the axes of the DataFrame.

col_count

Number of columns in Dataset.

columns

The column labels of the DataFrame.

date_col_errors

Errors flag to use when parsing date_col_name

date_col_name

Column to use as record collection date.

display_name

The name for this Dataset suitable for display as generated by the display_name_generator function.

display_name_generator

The function used to generate display_name.

dtypes

Return the dtypes in the DataFrame.

empty

Indicator whether Series/DataFrame is empty.

excel_sheetname

Generates a valid Excel sheet name by truncating display_name to 30 characters.

flags

Get the properties associated with this pandas object.

history

History information as generated by the macpie.util.MethodHistory decorator.

iat

Access a single value for a row/column pair by integer position.

id2_col_name

Column to use as secondary record IDs.

id_col_name

Column to use as record IDs.

iloc

Purely integer-location based indexing for selection by position.

index

The index (row labels) of the DataFrame.

key_cols

Returns list of non-null key column names of this Dataset, defined as id_col_name, date_col_name, and id2_col_name

key_fields

Returns list of all key fields of this Dataset (analog of key_cols).

loc

Access a group of rows and columns by label(s) or a boolean array.

name

Name of Dataset.

ndim

Return an int representing the number of axes / array dimensions.

non_key_cols

Returns list of non-key column names of this Dataset, defined as any columns that are not key_cols or sys_cols.

non_key_fields

Returns list of all non-key fields of this Dataset (analog of non_key_cols).

row_count

Number of rows in Dataset.

shape

Return a tuple representing the dimensionality of the DataFrame.

size

Return an int representing the number of elements in this object.

style

Returns a Styler object.

sys_cols

Returns list of system column names of this Dataset, defined as any columns starting with column.system.prefix option.

sys_fields

Returns list of all system fields of this Dataset (analog of sys_cols).

tag_duplicates

Tag that denotes this Dataset has duplicates

tags

Tag(s) of Dataset.

values

Return a Numpy representation of the DataFrame.