Skip to contents

Add a model and formula to a multiverse pipeline

Usage

add_model(
  .df,
  model_desc,
  code,
  additional_args = NULL,
  add_standardized = TRUE,
  add_performance = TRUE
)

Arguments

.df

The original data.frame(e.g., base data set). If part of set of add_* decision functions in a pipeline, the base data will be passed along as an attribute.

model_desc

a human readable name you would like to give the model.

code

literal model syntax you would like to run. You can use glue inside formulas to dynamically generate variable names based on a variable grid. For example, if you make variable grid with two versions of your IVs (e.g., iv1 and iv2), you can write your formula like so: lm(happiness ~ {iv} + control_var). The only requirement is that the variables written in the formula actually exist in the underlying data. You are also responsible for loading any packages that run a particular model (e.g., lme4 for mixed-models)

additional_args

a list of any additional arguments supplied to parameters::parameters().

add_standardized

logical. Indicates whether or not to produce standardized model coefficients via parameters::standardize_parameters(). This is most of the time desirable, however, in some cases for some model types you might want to skip this step. To do so, set to FALSE.

add_performance

whether or not to run performance::performance() on the model. Defaults to TRUE. Set to FALSE in computationally expensive situations or if model fit statistics are not needed.

Value

a data.frame with three columns: type, group, and code. Type indicates the decision type, group is a decision, and the code is the actual code that will be executed. If part of a pipe, the current set of decisions will be appended as new rows.

Examples


library(tidyverse)
library(multitool)

the_data <-
  data.frame(
    id   = 1:500,
    iv1  = rnorm(500),
    iv2  = rnorm(500),
    iv3  = rnorm(500),
    mod1 = rnorm(500),
    mod2 = rnorm(500),
    mod3 = rnorm(500),
    cov1 = rnorm(500),
    cov2 = rnorm(500),
    dv1  = rnorm(500),
    dv2  = rnorm(500),
    include1 = rbinom(500, size = 1, prob = .1),
    include2 = sample(1:3, size = 500, replace = TRUE),
    include3 = rnorm(500)
  )

the_data |>
  add_filters(include1 == 0,include2 != 3,include2 != 2, include3 > -2.5) |>
  add_variables("ivs", iv1, iv2, iv3) |>
  add_variables("dvs", dv1, dv2) |>
  add_variables("mods", starts_with("mod")) |>
  add_preprocess("scale_iv", 'mutate({ivs} = scale({ivs}))') |>
  add_model("linear model", lm({dvs} ~ {ivs} * {mods}))
#> # A tibble: 17 × 6
#>    type       group       code  additional_args add_standardized add_performance
#>    <chr>      <chr>       <chr> <lgl>           <lgl>            <lgl>          
#>  1 filters    include1    incl… NA              NA               NA             
#>  2 filters    include1    incl… NA              NA               NA             
#>  3 filters    include2    incl… NA              NA               NA             
#>  4 filters    include2    incl… NA              NA               NA             
#>  5 filters    include2    incl… NA              NA               NA             
#>  6 filters    include3    incl… NA              NA               NA             
#>  7 filters    include3    incl… NA              NA               NA             
#>  8 variables  ivs         iv1   NA              NA               NA             
#>  9 variables  ivs         iv2   NA              NA               NA             
#> 10 variables  ivs         iv3   NA              NA               NA             
#> 11 variables  dvs         dv1   NA              NA               NA             
#> 12 variables  dvs         dv2   NA              NA               NA             
#> 13 variables  mods        mod1  NA              NA               NA             
#> 14 variables  mods        mod2  NA              NA               NA             
#> 15 variables  mods        mod3  NA              NA               NA             
#> 16 preprocess scale_iv    muta… NA              NA               NA             
#> 17 models     linear mod… lm({… NA              TRUE             TRUE