Multiple Imputation in EMA Research

A Practical Introduction for Applied Researchers

Jonas Dora, University of Washington (jdora@uw.edu)

2025-05-26

Missing data is ubiquitous in EMA research

  • Meta-analytic average is ~20% missingness (Wrzus & Neubauer, 2022)
    • Depending on study design & population, 10-40% missingness can be expected
  • Different sources of missingness
    • Missingness mechanism known (planned missingness designs)
    • Missingness mechanism not known (we should assume missing not at random)

Missing data can bias your inferences

  • Missing data can lead to biased estimates when data are not missing at random
  • In EMA, missing data might follow patterns related to the variables in your model
  • Example: Let’s say we do an EMA study to understand the daily within-person association between mood and alcohol use. Participants might be less likely to complete a morning EMA survey when:
    • They got drunk the night before
    • They are highly distressed right now
  • Neither listwise deletion nor FIML can handle this case
  • If you use listwise deletion/FIML in this scenario, your results will be biased by an unknown magnitude in an unknown direction; not good!
# Set seed for reproducibility
set.seed(1707)

# Define study parameters
n_participants <- 100
n_days <- 14
n_signals <- 5

# Simulation function
simulate_ema_data <- function(missing = TRUE) { # to be able to generate a df with/without missingness later
  ema_data <- expand.grid(
    PID = 1:n_participants,
    study.day = 1:n_days,
    EMA.signal = 1:n_signals
  )
  
  # Sort data because I'm neurotic
  ema_data <- ema_data %>%
    arrange(PID, study.day, EMA.signal)
  
  # Add person-level random effects
  person_effects <- data.frame(
    PID = 1:n_participants,
    mood_effect = rnorm(n_participants, 0, 0.5),
    alc_use_effect = rnorm(n_participants, 0, 0.5)
  )
  
  ema_data <- ema_data %>% 
    left_join(person_effects, by = "PID")
  # Initialize some variables for later
  ema_data$mood <- NA
  ema_data$alc.use <- NA
  ema_data$alc.intox <- NA
  
  # Generate data
  for (pid in 1:n_participants) {
    for (day in 1:n_days) {
      idx <- which(ema_data$PID == pid & ema_data$study.day == day)
      
      base_mood <- rnorm(1, 0, 1) + ema_data$mood_effect[idx[1]]
      
      for (signal in 1:n_signals) {
        curr_idx <- idx[ema_data$EMA.signal[idx] == signal]
        
        if (signal == 1) {
          ema_data$mood[curr_idx] <- base_mood + rnorm(1, 0, 0.3)
        } else {
          prev_mood <- ema_data$mood[idx[ema_data$EMA.signal[idx] == (signal-1)]]
          ema_data$mood[curr_idx] <- prev_mood * 0.7 + rnorm(1, 0, 0.5)
        }
        
        # Only generate alcohol use for signal 1
        if (signal == 1) {
          # Alcohol use depends on mood (lower mood -> higher probability of drinking)
          logit_p <- -1 + (-0.34 * ema_data$mood[curr_idx]) + ema_data$alc_use_effect[curr_idx]
          prob <- exp(logit_p) / (1 + exp(logit_p))
          ema_data$alc.use[curr_idx] <- rbinom(1, 1, prob)
        }
        
        # Only generate alcohol intoxication for signal 5
        if (signal == 5) {
          curr_alc_use <- ema_data$alc.use[idx[ema_data$EMA.signal[idx] == 1]]
          
          if (!is.na(curr_alc_use) && curr_alc_use == 1) {
            # If they drank, generate intoxication level
            ema_data$alc.intox[curr_idx] <- rexp(1, rate = 1/2) + 1
          } else {
            # If they didn't drink, intoxication is 0
            ema_data$alc.intox[curr_idx] <- 0
          }
        }
      }
    }
  }
  
  # Create lagged intoxication variable for use in missingness pattern
  ema_data <- ema_data %>%
    group_by(PID) %>%
    mutate(prev_day_intox = lag(alc.intox, n = n_signals)) %>%
    ungroup()
  
  # Apply missingness if specified
  if (missing) {
    for (i in 1:nrow(ema_data)) {
      # Only apply missingness to observations where alcohol use is asked (signal 1)
      if (ema_data$EMA.signal[i] == 1) {
        miss_prob <- 0.25
        
        # Increase probability based on previous day's intoxication (if available)
        prev_intox <- ema_data$prev_day_intox[i]
        if (!is.na(prev_intox)) {
          # Increase miss probability as previous day's intoxication increases
          miss_prob <- miss_prob + (0.2 * prev_intox)
        }
        
        # Increase probability based on current mood AND current alcohol use
        curr_mood <- ema_data$mood[i]
        curr_alc_use <- ema_data$alc.use[i]
        
        if (!is.na(curr_mood)) {
          # Decrease miss probability as mood increases (negative relationship)
          miss_prob <- miss_prob - (0.10 * curr_mood)
        }
        
        # Much higher missingness when they actually drank
        if (!is.na(curr_alc_use) && curr_alc_use == 1) {
          miss_prob <- miss_prob + 0.25  # need this to get it to work, it's awkward
        }
        
        # Ensure probability is between 0 and 1
        miss_prob <- pmax(0, pmin(0.95, miss_prob))
        
        # Apply missingness
        if (runif(1) < miss_prob) {
          ema_data$alc.use[i] <- NA
        }
      }
    }
  }
  
  # Return the dataset
  return(ema_data)
}

# Simulate two datasets (one with missingness, one without)
ema_data_missing <- simulate_ema_data(missing = TRUE)
ema_data_complete <- simulate_ema_data(missing = FALSE)

# Calculate missingness percentage in alcohol use variable
missing_percentage <- ema_data_missing %>%
  filter(EMA.signal == 1) %>%
  summarise(missing_percent = mean(is.na(alc.use)) * 100) %>%
  pull(missing_percent)

# Print missingness percentage
cat("Percentage of missing data:", round(missing_percentage, 1), "%\n")
Percentage of missing data: 30.9 %

Summary

  • Missing data has biased the true relationship between mood and alcohol use
  • The odds ratio for mood effect is 1.4 in the complete data but 1.19 with missing data
  • This bias occurs because in my simulation:
    1. People who used alcohol night before are less likely to respond in the morning
    2. People are less likely to respond when in a bad mood
    3. Lower mood increases probability of alcohol use
  • MI can help reduce this bias (but I almost never see EMA studies use MI)

Multiple imputation example in EMA research

Percentage of missing data: 38.7 %
# Examine missingness patterns in our data
md.pattern(impute_data)

    PID study.day age neg.urgency extraversion conscientiousness agreeableness
858   1         1   1           1            1                 1             1
542   1         1   1           1            1                 1             1
      0         0   0           0            0                 0             0
    openness gender_f mood alc.use peer.context goal motivation     
858        1        1    1       1            1    1          1    0
542        1        1    0       0            0    0          0    5
           0        0  542     542          542  542        542 2710
# Create an initial mids object
ini <- mice(impute_data, maxit = 0, print = FALSE)

# Extract the default methods and predictor matrix
meth <- ini$meth
pred <- ini$pred

# Set appropriate multilevel imputation methods
meth["mood"] <- "2l.pmm"  # Multilevel predictive mean matching
meth["alc.use"] <- "2l.binary"  # Multilevel binary logistic
meth["peer.context"] <- "2l.binary"  
meth["goal"] <- "2l.pmm"  
meth["motivation"] <- "2l.pmm" 

print(meth)
              PID         study.day              mood           alc.use 
               ""                ""          "2l.pmm"       "2l.binary" 
     peer.context               age       neg.urgency      extraversion 
      "2l.binary"                ""                ""                "" 
conscientiousness     agreeableness          openness              goal 
               ""                ""                ""          "2l.pmm" 
       motivation          gender_f 
         "2l.pmm"                "" 
# Set PID as clustering variable (-2) to account for within-person dependence
pred[, "PID"] <- -2  # This designates PID as the cluster variable

# Specify predictor matrix
pred["alc.use", c("study.day", "mood", "peer.context", "age", "neg.urgency", 
                  "goal", "motivation", "gender_f")] <- 1
pred["alc.use", c("extraversion", "conscientiousness", "agreeableness", "openness")] <- 0
pred["mood", c("study.day", "alc.use", "peer.context", "age", "neg.urgency",
               "extraversion", "goal", "motivation", "gender_f")] <- 1
pred["mood", c("conscientiousness", "agreeableness", "openness")] <- 0
pred["peer.context", c("mood", "alc.use", "neg.urgency")] <- 1
pred["peer.context", c("study.day", "age", "extraversion", "conscientiousness",
                       "agreeableness", "openness", "goal", "motivation", "gender_f")] <- 0

print(pred["alc.use", ])
              PID         study.day              mood           alc.use 
               -2                 1                 1                 0 
     peer.context               age       neg.urgency      extraversion 
                1                 1                 1                 0 
conscientiousness     agreeableness          openness              goal 
                0                 0                 0                 1 
       motivation          gender_f 
                1                 1 
# Run the imputation
imp <- mice(impute_data, 
            m = 20,            
            maxit = 20,       
            method = meth,    
            predictorMatrix = pred,  
            seed = 1707,
            print = F)    
# Check convergence of the imputation algorithm
# This shows whether each chain has converged to a stable distribution
con_imp <- mice::convergence(imp)

# Examine PSRF (potential scale reduction factor) values
# PSRF = Rhat
# Values close to 1 indicate good convergence
print(con_imp[con_imp$vrb == "alc.use", ])
   .it     vrb           ac      psrf
61   1 alc.use           NA 0.9741973
62   2 alc.use -0.302604382 0.9970422
63   3 alc.use  0.116996522 0.9858374
64   4 alc.use  0.008032434 0.9874435
65   5 alc.use  0.052664661 0.9715946
66   6 alc.use  0.076052455 1.0906550
67   7 alc.use  0.051735709 1.0482012
68   8 alc.use  0.057073272 1.0046944
69   9 alc.use  0.089848615 0.9914499
70  10 alc.use  0.093281445 0.9907460
71  11 alc.use  0.101968167 1.0055585
72  12 alc.use  0.111693202 0.9927337
73  13 alc.use  0.112600346 1.0025160
74  14 alc.use  0.135027333 1.0129710
75  15 alc.use  0.141613603 1.0076589
76  16 alc.use  0.125800895 1.0036086
77  17 alc.use  0.143388783 1.0011747
78  18 alc.use  0.144191908 1.0038905
79  19 alc.use  0.125533013 1.0075405
80  20 alc.use  0.117900218 1.0026739
print(con_imp[con_imp$vrb == "mood", ])
   .it  vrb          ac      psrf
41   1 mood          NA 1.3610892
42   2 mood  0.23883190 1.3485760
43   3 mood  0.05207759 1.1375956
44   4 mood  0.03532566 1.0178350
45   5 mood -0.01633312 1.0018989
46   6 mood  0.02577737 0.9906877
47   7 mood  0.05218038 1.0554583
48   8 mood  0.03234683 1.0241945
49   9 mood  0.02543299 1.0015312
50  10 mood  0.07443303 1.0265345
51  11 mood  0.05896736 1.0268163
52  12 mood  0.07616777 0.9996484
53  13 mood  0.08942376 0.9937040
54  14 mood  0.08474352 1.0115785
55  15 mood  0.09326196 1.0290397
56  16 mood  0.07992721 1.0235071
57  17 mood  0.07136240 1.0187112
58  18 mood  0.10475139 1.0059539
59  19 mood  0.08641873 1.0076602
60  20 mood  0.08432917 1.0024021
# Plot convergence trace plots with increased iterations
plot(imp, c("alc.use", "mood"))

# Create density plots to compare distributions across imputations

# Complete the datasets into long format for easier plotting
imp_long <- mice::complete(imp, action = "long", include = TRUE)

# Create density plot for mood by imputation
p1 <- ggplot(data = dplyr::mutate(imp_long, `.imp` = as.factor(`.imp`)),
             mapping = aes(x = mood, color = `.imp`)) +
  geom_density() +
  facet_wrap(~ `.imp`) +
  #scale_color_manual(values = my_colors) +
  labs(color = "Imputation", 
       title = "Distribution of Mood Across Imputations",
       x = "Mood", 
       y = "Density") +
  theme_minimal() +
  theme(text = element_text(size = 12))

p1

# Create scatterplot to show relationship between key variables
# Focus on mood vs alcohol use for imputed vs observed values
imp1 <- complete(imp, 1)  # Get first imputed dataset

# Create dataset for plotting
scatter_data <- data.frame(
  mood = imp1$mood,
  alc_use = imp1$alc.use,
  is_imputed = is.na(impute_data$alc.use)
)

ggplot(scatter_data, aes(x = mood, y = alc_use, color = is_imputed)) +
  geom_jitter(height = 0.05, width = 0, alpha = 0.6) +
  labs(color = "Imputed Value", 
       title = "Relationship Between Mood and Alcohol Use",
       subtitle = "Comparing observed and imputed values",
       x = "Mood", 
       y = "Alcohol Use") +
  scale_color_manual(values = c("FALSE" = "blue", "TRUE" = "red"),
                     labels = c("FALSE" = "Observed", "TRUE" = "Imputed")) +
  theme_minimal() +
  theme(text = element_text(size = 12))

Odds ratios for mood effect on alcohol use:
Complete data (true):  1.41844 
Missing data (biased):  1.297017 
Imputed data (pooled):  1.426534 

Pooling results across imputed datasets

# Fit model to each imputed dataset 
imp_fits <- with(imp, glmer(alc.use ~ mood + (1 | PID), 
                           family = binomial(link = "logit")))

# Pool the results 
pooled_results <- pool(imp_fits)

# View the pooled results
summary(pooled_results)
         term   estimate  std.error statistic       df      p.value
1 (Intercept)  0.5970593 0.48578038  1.229072 13.31101 0.2403298948
2        mood -0.3577774 0.08957245 -3.994279 55.51241 0.0001927245

Frequentist Pooling: Ruben’s rules

  • Pooled estimate: simply the mean of estimates across imputations (assumes normality)
  • Total variance = within-imputation variance + between-imputation variance (with adjustment)
  • Degrees of freedom are adjusted based on fraction of missing information (FMI)
    • FMI = (between-imputation variance + between-imputation variance/m)/total variance
    • Higher FMI –> missingness has bigger effect on results (and thus lower df)
# Fit Bayesian model to each imputed dataset
# Using brms to fit Bayesian multilevel models
priors <- c(prior(normal(0, 0.5), class = "b"),
            prior(normal(0, 1.5), class = "Intercept"),
            prior(cauchy(0, 0.5), class = "sd"))


# Then run brm_multiple
bayesian_pooled <- brm_multiple(
 alc.use ~ mood + (1| PID),
 data = imp,   
 family = bernoulli(),
 prior = priors,
 chains = 4, cores = 4, iter = 4000,
 seed = 1707
)

# Summary of pooled Bayesian results
summary(bayesian_pooled)
 Family: bernoulli 
  Links: mu = logit 
Formula: alc.use ~ mood + (1 | PID) 
   Data: imp (Number of observations: 1400) 
  Draws: 80 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 160000

Group-Level Effects: 
~PID (Number of levels: 100) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.53      0.12     0.27     0.76 1.25      225      308

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.60      0.46    -0.30     1.23 3.23       89      128
mood         -0.35      0.09    -0.52    -0.19 1.45      155      646

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Bayesian Pooling: Combining posterior samples

  • Posterior samples from each imputed dataset are simply combined
  • Equal weights are given to each imputation
  • Automatically accounts for both within- and between-imputation variance
  • Provides one full posterior distribution for each parameter
  • More computationally intensive but conceptually simpler than frequentist pooling
# Bayesian joint modeling imputation
# Define model formulas with mi() notation
bform <- bf(mood | mi() ~ 1 + mi(alc.use) + study.day + peer.context + age + (1|PID)) +
  bf(alc.use | mi() ~ 1 + study.day + peer.context + age + (1|PID)) + 
  set_rescor(FALSE)

# Fit the joint model directly on the incomplete data
bayesian_pooled2 <- brm_multiple(bform, 
                       data = imp,
                       family = gaussian(),
                       chains = 4, cores = 4, iter = 4000,
                       seed = 1707
)

# Summary of joint model results
summary(bayesian_pooled2)
 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: mood | mi() ~ 1 + mi(alc.use) + study.day + peer.context + age + (1 | PID) 
         alc.use | mi() ~ 1 + study.day + peer.context + age + (1 | PID) 
   Data: imp (Number of observations: 1400) 
  Draws: 80 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 160000

Group-Level Effects: 
~PID (Number of levels: 100) 
                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(mood_Intercept)       0.47      0.05     0.37     0.57 1.17      309
sd(alcuse_Intercept)     0.11      0.02     0.06     0.15 1.26      221
                     Tail_ESS
sd(mood_Intercept)       1100
sd(alcuse_Intercept)      425

Population-Level Effects: 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mood_Intercept         -0.60      0.46    -1.49     0.19 1.61      132      699
alcuse_Intercept        0.46      0.15     0.15     0.73 1.75      120      200
mood_study.day         -0.00      0.01    -0.02     0.01 1.25      223      824
mood_peer.context       0.49      0.07     0.35     0.61 1.23      237      303
mood_age                0.01      0.01    -0.01     0.04 1.60      133      644
alcuse_study.day       -0.00      0.00    -0.01     0.00 1.13      368     1237
alcuse_peer.context     0.14      0.03     0.07     0.21 1.38      170      302
alcuse_age              0.00      0.00    -0.00     0.01 1.37      173      573
mood_mialc.use         -0.42      0.08    -0.57    -0.26 1.37      172      754

Family Specific Parameters: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_mood       0.94      0.02     0.89     0.99 1.30      199      341
sigma_alcuse     0.45      0.03     0.41     0.50 2.40       97      365

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Bayesian Joint Modeling Imputation

  • Handles imputation and analysis in a single step
  • This only works in Bayes as we can simultaneously sample from the posterior of missing values and parameters of interest
  • This incorporates uncertainty coming from different sources (sampling, parameter estimation, missing data, …)
  • Only works with linear models due to limitations in Stan

Things I didn’t mention so far

  • Passive imputation for scales & derived variables

    # Example: Creating a composite negative mood score from component items
    meth["mood.negative"] <- "~I(mood.angry + mood.unhappy + mood.sad + mood.anxious)"
    pred["mood.negative", c("mood.angry", "mood.unhappy", "mood.sad", "mood.anxious")] <- 1
  • Interactions in imputation models

    # Example: Including an interaction term in the imputation model
    meth["int1"] <- "~I(mood.negative*neg.urgency)"
    pred["int1", c("mood.negative", "neg.urgency")] <- 1
  • Convergence issues

    • Figure out which variable causes the issue
    • Increase maxit if chains don’t converge (default is often too low)
    • Consider different imputation methods
    • Trial & error {shrug}
  • Planning for MI at study conception

    • Collect auxiliary variables that predict both missingness and values
    • Consider planned missingness designs (don’t overdo it) to reduce participant burden
    • Add items that directly contain information about missingness