macpie command#

Commands for manipulating and analyzing data contained in Excel and/or CSV/text files.

Main Options#

These main options help macpie know which columns in your files are the key columns (such as columns containing your primary IDs and dates and/or any secondary IDs). If they are not specified using these options, then defaults (as described below) will be used.

-i <STRING>, --id-col=<STRING>#

Default=InstrID. ID column header. The column header of the primary ID column. In general, this column contains the primary key/index (unique identifiers) of the dataset. In a research data management system, this is typically the ID of a specific data form or assessment.

-d <STRING>, --date-col=<STRING>#

Default=DCDate. Date column header. The column header of the primary Date column. In a research data management system, this is typically the date the form or assessment was completed or collected.

-j <STRING>, --id2-col=<STRING>#

Default=PIDN. ID2 column header. The column header of the primary ID2 column. In general, this column contains the secondary key/index of the dataset. In a research data management system, this is typically the ID of the patient, subject, or participant who completed the form or assessment.

-v, --verbose#

Verbose messages. Output more details on what the executed command is doing or has done.

macpie keepone#

This command groups rows that have the same macpie --id2-col value, and allows you to keep only the earliest or latest row in each group as determined by the macpie --date-col values (discarding the other rows in the group).

Usage#

$ macpie keepone [OPTIONS] PRIMARY

Options#

-k <STRING>, --keep=<STRING> (all|earliest|latest)#

Specify which rows of the PRIMARY file to keep.

  • all (default): keep all rows

  • earliest: for each unique value in the column specified by the macpie --id2-col option, keep only the earliest row (determined by the values in the macpie --date-col column)

  • latest: for each unique value in the column specified by the macpie --id2-col option, keep only the latest row (determined by the values in the macpie --date-col column)

Arguments#

PRIMARY#

Required. A list of filenames and/or directories.

Output#

The results of each dataset will be stored in a corresponding worksheet inside the results file.

Examples#

  1. For each PIDN, keep only the earliest CDR record as determined by its DCDate.

    $ macpie keepone --keep=earliest cdr.csv
    

    Equivalent command but using shorter single-dash option names for brevity:

    $ macpie keepone -k earliest cdr.csv
    
  2. For each VID (a column containing Visit IDs), keep the latest record as determined by its VDate (a column containing the Visit Dates) values.

    $ macpie --id2-col=VID --date-col=VDate keepone --keep=latest visits.csv
    

    Equivalent command but using shorter single-dash option names for brevity:

    $ macpie -j VID -d VDate keepone -k earliest visits.csv
    

API#

macpie keepone#

This command groups rows that have the same --id2-col value, and allows you to keep only the earliest or latest row in each group as determined by the --date-col values (discarding the other rows in the group).

primarypathlib.Path

A file path

macpie keepone [OPTIONS] [PRIMARY]...

Options

-k, --keep <keep>#
Options:

all | earliest | latest

Arguments

PRIMARY#

Optional argument(s)

macpie merge#

This command is a common follow-up to the link command, as it allows you to select specific fields across various datasets to merge together into one dataset (thereby removing unwanted fields, which can be many).

The output file of the link command includes a worksheet named _available_fields. This provides a view of all the fields across all the datasets that you input into the link command. By placing an "x" next to a particular field, the merge command will attempt to merge only those fields you marked into one single dataset. The linking fields (i.e. id_col_name, date_col_name, and id2_col of the primary argument in the link command, e.g. PIDN, DCDate, InstrID) will always be included.

NOTE: The output file of this command can also be an input to this same command.

Usage#

$ macpie merge PRIMARY

Options#

Show a short summary of the usage and options.

Arguments#

Required. Filename of the results file created by the link command OR this command.

Output#

In the results file, all the merged fields will be in a single worksheet. Any dataset that was not merged (by choice or because there were duplicates), will remain in its own worksheet. If a dataset could not be merged because there were duplicates, you can remove the duplicates, save the file, and use this same command to attempt the merge again.

Examples#

  1. After linking cdr.csv and faq.csv together, I decide only want the the following fields in my dataset:

    • CDRTot and BoxScore from cdr.csv

    • FAQTot from faq.csv

    1. So first, open the results file from the link command and navigate to the _available_fields worksheet.

    2. Mark an "x" next to those fields.

    3. Save the file.

    4. Run the following command:

      $ macpie merge results_XXX.xlsx
      

API#

macpie merge#

macpie merge [OPTIONS] PRIMARY

Options

--keep-original, --no-keep-original#

Arguments

PRIMARY#

Required argument