Joshua A. Goode – Introduction to SLCMA

Notes on Slide Navigation

Copying Code Blocks

All code blocks can be copied by clicking the clipboard icon in the upper right corner. If the icon is hidden, hovering your mouse cursor in the area should reveal it.

Scrolling Content

In come cases, content may wrap beyond the limits of the slide. It may be necessary to scroll up/down or left/right.

Some Caveats

Let’s Keep it Simple, Silly

My goal is to focus largely on conceptual issues related to SLCMA and studies of DNA methylation. Although I briefly discuss the slcma R Package created by Dr. Andrew Smith, this is not meant to be a full tutorial.

I’m Just a Padawan

Dr. Andrew Smith is a Jedi Master. I learned everything I know about SLCMA from him. Some materials have been borrowed/adapted from his teaching. I am grateful and humbled by the opportunity to learn from him.

The Importance of Social Science

DNAm Research

Limited Focus in Biology
- Current Exposure
- Ever Exposed
Contribution of Social Science
- Several large panel studies
- Many years of data
- A plethora of data types
- Really interesting questions/theory

What We Need

Way to use our vast data to test hypotheses across the life course
- Systematic to avoid false-positive results
- Efficient to accommodate analysis of high-dimensional data
- Easy to use
SLCMA can help!

What is SLCMA?

Structured
Life
Course
Modeling
Approach

Which life course hypothesis best fits our data?

SLCMA Hypotheses

Big 3

Sensitive Periods
Accumulation
Recency

Others

Mobility
Change
Always Exposed
Ever Exposed

Hypotheses (Big 3)

Sensitive Periods \(\left(SP \text{ at } t_j\right)\)

The developmental timing of an exposure has the strongest effect on the outcome at a specific time point due to heightened levels of plasticity or reprogramming
Just the exposure variable at each age
Can be continuous or binary
\[SP_j = x_j\]

Hypotheses (Big 3)

Accumulation \(\left(Acc\right)\)

Every additional time point of exposure affects the outcome in a dose-response manner, independent of the exposure timing
Add up the exposure variable across
Can be continuous or binary
\[Acc = \sum_{j=1}^m{x_j}\]

Hypotheses (Big 3)

Recency \(\left(Rec\right)\)

More proximal exposures (closer in time to the of the outcome) are more strongly linked to the outcome than are more distal exposures
Add up the products of each exposure variable multiplied by its age of observation
Can be continuous or binary
\[Rec = \sum_{j=1}^m{\left(x_jt_j\right)}\]

SLCMA Steps

Fit a regression model for each single life course hypothesis of interest, as well as groups of compound hypotheses

Measure the goodness-of-fit of each model and select the best one

Calculate appropriate p-values for the selected model

Simulate Example Data

Variable	Description
`y`	Outcome
`sp04`	Binary Exposure (\(Age = 4\))
`sp26`	Binary Exposure (\(Age = 26\))
`sp43`	Binary Exposure (\(Age = 43\))
`acc`	Accumulation
`rec`	Recency

Simulate Example Data

# set seed
set.seed(1234)

# simulate exposure & outcome data
n <- c(141, 20, 88, 80, 40, 35, 317, 367)
y <- c(28.7, 27.5, 29.1, 27.8, 28.3, 27.1, 27.6, 26.0)
se <- c(0.5, 1.1, 0.7, 0.7, 0.8, 0.9, 0.3, 0.2)
sp04 <- rep(c(0, 1, 0, 0, 1, 1, 0, 1), times = n)
sp26 <- rep(c(0, 0, 1, 0, 1, 0, 1, 1), times = n)
sp43 <- rep(c(0, 0, 0, 1, 0, 1, 1, 1), times = n)
e <- lm(rnorm(sum(n)) ~ sp04 * sp26 * sp43)$residuals
y <- rep(y, n) + rep(se * sqrt(n), n) * e / sd(e)

# construct accumulation & recency measures
acc <- sp04 + sp26 + sp43
rec <- (sp04 * 4) + (sp26 * 26) + (sp43 * 43)

# create data frame
dats_bin <- data.frame(cbind(y, sp04, sp26, sp43, acc, rec))

# clean up
rm(list = "n", "y", "se", "sp04", "sp26", "sp43", "acc", "rec", "e")

Step 1: Fit Models

Which single hypothesis best fits our data?

model_sp04 <- lm(y ~ sp04, data = dats_bin)
model_sp26 <- lm(y ~ sp26, data = dats_bin)
model_sp43 <- lm(y ~ sp43, data = dats_bin)
model_acc <- lm(y ~ acc, data = dats_bin)
model_rec <- lm(y ~ rec, data = dats_bin)

Step 1: Fit Models

Which single hypothesis best fits our data?

Hyp.	Coeff.	R²
SP₀₄	-1.737	0.026
SP₂₆	-1.075	0.008
SP₄₃	-1.820	0.023
Acc	-0.964	0.034
Rec	-0.032	0.025

Step 1: Fit Models

Which compound hypothesis best fits our data?

model_acc_sp04 <- lm(y ~ acc + sp04, data = dats_bin)
model_acc_sp26 <- lm(y ~ acc + sp26, data = dats_bin)
model_acc_sp43 <- lm(y ~ acc + sp43, data = dats_bin)
model_acc_rec <- lm(y ~ acc + rec, data = dats_bin)

Step 1: Fit Models

Which compound hypothesis best fits our data?

Hyp.	Coeff.	R²
Acc + SP₀₄	-0.731	0.036
Acc + SP₂₆	-1.389	0.039
Acc + SP₄₃	-0.836	0.034
Acc + Rec	-1.179	0.034

Step 1: Fit Models

Which compound hypothesis best fits our data?

model_acc_sp26_sp04 <- lm(y ~ acc + sp26 + sp04, data = dats_bin)
model_acc_sp26_sp43 <- lm(y ~ acc + sp26 + sp43, data = dats_bin)
model_acc_sp26_rec <- lm(y ~ acc + sp26 + rec, data = dats_bin)

Step 1: Fit Models

Which compound hypothesis best fits our data?

Hyp.	Coeff.	R²
Acc + SP₂₆ + SP₀₄	-1.380	0.039
Acc + SP₂₆ + SP₄₃	-1.396	0.039
Acc + SP₂₆ + Rec	-1.397	0.039

Step 2: Compare Model Fit

Are compound hypotheses improving our model?

Model	Hypothesis	R²
1	Acc	0.034
2	Acc + SP₂₆	0.039
3	Acc + SP₂₆ + SP₀₄	0.039

Step 2: Compare Model Fit

Step 2: Compare Model Fit

Step 2: Compare Model Fit

Step 2: Compare Model Fit

Step 3: Calculate Correct p-Value

final_model <- summary(lm(y ~ acc, data = dats_bin))
print(final_model)


Call:
lm(formula = y ~ acc, data = dats_bin)

Residuals:
    Min      1Q  Median      3Q     Max 
-21.771  -3.318  -0.039   3.160  20.029 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  29.1837     0.3376  86.433  < 2e-16 ***
acc          -0.9642     0.1566  -6.158 1.04e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.215 on 1086 degrees of freedom
Multiple R-squared:  0.03373,   Adjusted R-squared:  0.03285 
F-statistic: 37.92 on 1 and 1086 DF,  p-value: 1.039e-09

Step 3: Calculate Correct p-Value

The \(p\)-value in our output \(\left(p = 1.04 \times 10^{-9}\right)\) is incorrect.

Assumes we only tested a single hypothesis when we actually tested five.

Easiest way to address this is with a Bonferroni correction. \[5\left(1.04 \times 10^{-9}\right) = 5.19 \times 10^{-9}\]

`slcma` R Package

A Better Way

As James Ingram said, “There’s gotta be a better way.”
The slcma R Package simplifies everything.

Please Cite the Package

Smith, A. (2024). SLCMA: Structured Approach to Evaluating Life-Course Hypotheses (SLCMA). https://www.r-project.org

`slcma` R Package

SLCMA Steps w/ Package

Fit a regression model for each single life course hypothesis of interest, as well as groups of compound hypotheses
- Manually: Fit a model for each hypothesis, as well as compound hypotheses
- Package: Uses LARS to fit each model

`slcma` R Package

SLCMA Steps w/ Package

Measure the goodness-of-fit of each model and select the best one
- Manually: Create an elbow plot
- Package: Generates an elbow plot with a single command

`slcma` R Package

SLCMA Steps w/ Package

Calculate appropriate p-values for the selected model
- Manually: Apply a Bonterroni correction
- Package: Uses fixed LASSO inference or max-|t| test to correct p-values

Installing the Package

The slcma package is available on GitHub
Can be installed with the install_github() command from the remotes package

# install the package
remotes::install_github("thedunnlab/slcma")

# load the package
library(slcma)

Step 1: Fit Models

slcma_model <- slcma(y ~ sp04 + sp26 + sp43 +
                       Accumulation(sp04, sp26, sp43) +
                       Recency(weights = c(4, 26, 43), sp04, sp26, sp43),
                     data = dats_bin)

                                              Term
                                       (Intercept)
                                              sp04
                                              sp26
                                              sp43
                    Accumulation(sp04, sp26, sp43)
 Recency(weights = c(4, 26, 43), sp04, sp26, sp43)
                             Role
       Adjusted for in all models
 Available for variable selection
 Available for variable selection
 Available for variable selection
 Available for variable selection
 Available for variable selection

Step 2: Compare Model Fit

summary(slcma_model)


Summary of LARS procedure
 Step              Variable selected Variable removed Variables R-squared
    0                                                         0     0.000
    1 Accumulation(sp04, sp26, sp43)                          1     0.022
    2                           sp04                          2     0.029
    3                           sp43                          3     0.039

Step 2: Compare Model Fit

plot(slcma_model)

Step 3: Calculate Correct p-Value

slcmaInfer(slcma_model, 1, method = "slcmaFLI")


Inference for model at Step 1 of LARS procedure

Number of selected variables: 1
R-squared from lasso fit: 0.022

Results from fixed lasso inference (selective inference):

Standard deviation of noise (specified or estimated) sigma = 5.210

Testing results at lambda = 18.578, with alpha = 0.050

                                 Coef P-value  CI.lo CI.up LoTailArea
Accumulation(sp04, sp26, sp43) -0.964       0 -1.272 -0.63      0.024
                               UpTailArea
Accumulation(sp04, sp26, sp43)      0.024

SLCMA for Methylation Data

Clocks/Surrogates
- Just basic outcome variables
- Easy to do

Differentially Methylated Probes (EWAS)
- SLCMA for each probe
- A lot of models
  - \(\text{5 Hypoytheses} \times \text{850,000 Probes} \approx \text{4,250,000 Models}\)
  - Package is super helpful!

Post-Selection Inference Methods

Naive Calculation

Inflated FWER
Biased p-Values
Fast Computation

Bonferroni Correction

FWER Controlled
Unbiased p-Values
Fast Computation
Overly Conservative

Fixed LASSO Inference

FWER Controlled
Unbiased p-Values
Slow Computation

Max-|t| Test

FWER Controlled
Unbiased p-Values
Slow Computation

Post-Selection Inference Methods

Fixed LASSO Inference

Uses selectiveInference Package
2-Tail CIs; 1-Tail p-Values
Strange Warnings

Max-|t| Test

“Baked into” slcma Package
CIs Very Slow
No Compound Hypotheses

Post-Selection Inference Methods

When true hypothesis is compound, power to select a single hypothesis is greater for Max-|t| Test than Fixed LASSO Inference.

Additional Considerations

M-Values vs. Beta Values

M-Values have better statistical properties
Betas Values are easier to interpret

Parallel Processing

SLCMA to detect differentially methylated probes is computationally intensive
Parallel processing is key
Best approach will depend on your cluster

Additional Resources

Basic Introduction to SLCMA

Smith, B. J., Smith, A. D. A. C., & Dunn, E. C. (2022). Statistical Modeling of Sensitive Period Effects Using the Structured Life Course Modeling Approach (SLCMA). In S. L. Andersen (Ed.), Sensitive Periods of Brain Development and Preventive Interventions (pp. 215–234). Springer International Publishing. https://doi.org/10.1007/7854_2021_280

SLCMA for High-Throughput Analysis

Zhu, Y., Simpkin, A. J., Suderman, M. J., Lussier, A. A., Walton, E., Dunn, E. C., & Smith, A. D. A. C. (2020). A Structured Approach to Evaluating Life-Course Hypotheses: Moving Beyond Analyses of Exposed Versus Unexposed in the -Omics Context. American Journal of Epidemiology, 190(6), 1101–1112. https://doi.org/10.1093/aje/kwaa246

SLCMA DNA Methylation Example

Lussier, A. A., Zhu, Y., Smith, B. J., Cerutti, J., Fisher, J., Melton, P. E., Wood, N. M., Cohen-Woods, S., Huang, R.-C., Mitchell, C., Schneper, L., Notterman, D. A., Simpkin, A. J., Smith, A. D. A. C., Suderman, M. J., Walton, E., Relton, C. L., Ressler, K. J., & Dunn, E. C. (2023). Association between the timing of childhood adversity and epigenetic patterns across childhood and adolescence: Findings from the Avon Longitudinal Study of Parents and Children (ALSPAC) prospective cohort. The Lancet Child & Adolescent Health, 7(8), 532–543. https://doi.org/10.1016/S2352-4642(23)00127-X

Thank You

Please feel free to reach out with questions.
If there is interest, I can put together a video tutorial that dives a bit more in depth.

https://bit.ly/jagoode

Introduction to SLCMA

Notes on Slide Navigation

Some Caveats

The Importance of Social Science

DNAm Research

What We Need

What is SLCMA?

SLCMA Hypotheses

Big 3

Others

Hypotheses (Big 3)

Sensitive Periods \(\left(SP \text{ at } t_j\right)\)

Hypotheses (Big 3)

Accumulation \(\left(Acc\right)\)

Hypotheses (Big 3)

Recency \(\left(Rec\right)\)

SLCMA Steps

Simulate Example Data

Simulate Example Data

Step 1: Fit Models

Which single hypothesis best fits our data?

Step 1: Fit Models

Which single hypothesis best fits our data?

Step 1: Fit Models

Which compound hypothesis best fits our data?

Step 1: Fit Models

Which compound hypothesis best fits our data?

Step 1: Fit Models

Which compound hypothesis best fits our data?

Step 1: Fit Models

Which compound hypothesis best fits our data?

Step 2: Compare Model Fit

Are compound hypotheses improving our model?

Step 2: Compare Model Fit

Step 2: Compare Model Fit

Step 2: Compare Model Fit

Step 2: Compare Model Fit

Step 3: Calculate Correct p-Value

Step 3: Calculate Correct p-Value

slcma R Package

A Better Way

Please Cite the Package

slcma R Package

SLCMA Steps w/ Package

slcma R Package

SLCMA Steps w/ Package

slcma R Package

SLCMA Steps w/ Package

Installing the Package

Step 1: Fit Models

Step 2: Compare Model Fit

Step 2: Compare Model Fit

Step 3: Calculate Correct p-Value

SLCMA for Methylation Data

Post-Selection Inference Methods

Post-Selection Inference Methods

Post-Selection Inference Methods

Additional Considerations

M-Values vs. Beta Values

Parallel Processing

Additional Resources

Thank You

`slcma` R Package

`slcma` R Package

`slcma` R Package

`slcma` R Package