Skip to contents

Add arbitrary preprocessing code to a multiverse analysis pipeline

Usage

add_preprocess(.df, process_name, code)

Arguments

.df

The original data.frame(e.g., base data set). If part of set of add_* decision functions in a pipeline, the base data will be passed along as an attribute.

process_name

a character string. A descriptive name for what the preprocessing step accomplishes.

code

the literal code you would like to execute after data are filtered. glue syntax is allowed. An example might be centering or scaling a predictor after the appropriate filters are applied to the data.

The code should be written to work with pipes (i.e., |> or %>%). Pre-processing code will eventually take the base data along with any filters applied to the data. This means mutate calls are the most natural but other functions that take a data.frame as the first argument should work as well (as long as they also return a data.frame).

Value

a data.frame with three columns: type, group, and code. Type indicates the decision type, group is a decision, and the code is the actual code that will be executed. If part of a pipe, the current set of decisions will be appended as new rows.

Examples


library(tidyverse)
library(multitool)

the_data <-
  data.frame(
    id   = 1:500,
    iv1  = rnorm(500),
    iv2  = rnorm(500),
    iv3  = rnorm(500),
    mod1 = rnorm(500),
    mod2 = rnorm(500),
    mod3 = rnorm(500),
    cov1 = rnorm(500),
    cov2 = rnorm(500),
    dv1  = rnorm(500),
    dv2  = rnorm(500),
    include1 = rbinom(500, size = 1, prob = .1),
    include2 = sample(1:3, size = 500, replace = TRUE),
    include3 = rnorm(500)
  )

the_data |>
  add_filters(include1 == 0,include2 != 3,include2 != 2, include3 > -2.5) |>
  add_variables("ivs", iv1, iv2, iv3) |>
  add_variables("dvs", dv1, dv2) |>
  add_variables("mods", starts_with("mod")) |>
  add_preprocess("scale_iv", 'mutate({ivs} = scale({ivs}))')
#> # A tibble: 16 × 3
#>    type       group    code                          
#>    <chr>      <chr>    <chr>                         
#>  1 filters    include1 include1 == 0                 
#>  2 filters    include1 include1 %in% unique(include1)
#>  3 filters    include2 include2 != 3                 
#>  4 filters    include2 include2 != 2                 
#>  5 filters    include2 include2 %in% unique(include2)
#>  6 filters    include3 include3 > -2.5               
#>  7 filters    include3 include3 %in% unique(include3)
#>  8 variables  ivs      iv1                           
#>  9 variables  ivs      iv2                           
#> 10 variables  ivs      iv3                           
#> 11 variables  dvs      dv1                           
#> 12 variables  dvs      dv2                           
#> 13 variables  mods     mod1                          
#> 14 variables  mods     mod2                          
#> 15 variables  mods     mod3                          
#> 16 preprocess scale_iv mutate({ivs} = scale({ivs}))