MethylSurroGetR

Tools to Generate Predicted Values of DNAm Surrogates in R

Joshua A. Goode
Trey Smith

June 19, 2025

Some Quick Notes

Notes on Slide Navigation

Copying Code Blocks

All code blocks can be copied by clicking the clipboard icon in the upper right corner. If the icon is hidden, hovering your mouse cursor in the area should reveal it.

Scrolling Content

In come cases, content may wrap beyond the limits of the slide. It may be necessary to scroll up/down or left/right.

🚧 Under Development 🚧

This package is still in development and not yet ready for general use.

Proceed with caution!

What is MethylSurroGetR?

Simple set of user-friendly functions for generating predicted values from existing DNA methylation surrogates

  • Data Management
  • Handling Missing Data
  • Generating Estimates

What is MethylSurroGetR?

  • Does not develop surrogates
  • Existing packages already address well-known clocks
  • Fills in gap for recently published and/or less well-known surrogates
  • Allows construction of MRSs/PEGs from EWAS results

Install & Load

# install.packages("remotes")
remotes::install_github("jagoode27/MethylSurroGetR")

library(MethylSurroGetR)

Load Example Methylation Data

data("beta_matrix_miss", package = "MethylSurroGetR")
print(beta_matrix_miss)
         samp1      samp2      samp3     samp4      samp5
cg01 0.1028646 0.32037324 0.48290240        NA 0.36948887
cg02 0.2875775         NA         NA 0.8830174         NA
cg04 0.4348927 0.18769112 0.89035022 0.1422943 0.98421920
cg05 0.9849570 0.78229430 0.91443819 0.5492847 0.15420230
cg07 0.8998250 0.24608773         NA 0.3279207 0.95450365
cg08 0.8895393 0.69280341 0.64050681 0.9942698 0.65570580
cg09        NA 0.09359499 0.60873498 0.9540912         NA
cg10 0.8864691         NA         NA 0.5854834 0.14190691
cg12 0.1750527         NA 0.14709469        NA 0.69000710
cg13 0.9630242 0.90229905 0.69070528 0.7954674 0.02461368
cg14 0.1306957         NA 0.93529980 0.6478935         NA
cg16 0.6531019 0.33282354 0.30122890 0.3198206 0.89139412
cg17 0.1428000 0.41454634 0.41372433 0.3688455 0.15244475
cg19 0.3435165 0.48861303 0.06072057 0.3077200         NA
cg20 0.6567581 0.95447383 0.94772694 0.2197676         NA

Load Example Surrogate Weights

data("wts_vec_lin", package = "MethylSurroGetR")
print(wts_vec_lin)
        cg02         cg03         cg06         cg07         cg08         cg11 
-0.009083377 -0.001155999  0.005978497 -0.007562015  0.001218960 -0.005869372 
        cg13         cg15         cg17         cg18    Intercept 
-0.007449367  0.005066157  0.007900907 -0.002510744  1.211000000 

Create mehtyl_surro Object

surro_set(methyl, weights, intercept = NULL)
  • methyl: Numeric matrix of methylation data
    • CpG sites as row names
    • sample IDs as column names
  • weights: Named numeric vector of surrogate weights.
  • intercept: Optional chacracter string to identify the name of the intercept in the weights object

Create mehtyl_surro Object

lin_surrogate <- surro_set(methyl = beta_matrix_miss,
                           weights = wts_vec_lin,
                           intercept = "Intercept")
print(lin_surrogate)
$methyl
         samp1     samp2     samp3     samp4      samp5
cg02 0.2875775        NA        NA 0.8830174         NA
cg07 0.8998250 0.2460877        NA 0.3279207 0.95450365
cg08 0.8895393 0.6928034 0.6405068 0.9942698 0.65570580
cg13 0.9630242 0.9022990 0.6907053 0.7954674 0.02461368
cg17 0.1428000 0.4145463 0.4137243 0.3688455 0.15244475
cg03        NA        NA        NA        NA         NA
cg06        NA        NA        NA        NA         NA
cg11        NA        NA        NA        NA         NA
cg15        NA        NA        NA        NA         NA
cg18        NA        NA        NA        NA         NA

$weights
        cg02         cg03         cg06         cg07         cg08         cg11 
-0.009083377 -0.001155999  0.005978497 -0.007562015  0.001218960 -0.005869372 
        cg13         cg15         cg17         cg18 
-0.007449367  0.005066157  0.007900907 -0.002510744 

$intercept
Intercept 
    1.211 

attr(,"class")
[1] "methyl_surro"

Two Types of Missing Values

  • Missing Observatons
    • probes present in target data
    • missing for some samples
  • Missing Probes
    • probes not present in target data
    • missing for all samples
      • removed during QC
      • not on chip

Two Types of Missing Values

samp1 samp2 samp3 samp4 samp5
cg02 0.288 NA NA 0.883 NA
cg07 0.900 0.246 NA 0.328 0.955
cg08 0.890 0.693 0.641 0.994 0.656
cg13 0.963 0.902 0.691 0.795 0.025
cg17 0.143 0.415 0.414 0.369 0.152
cg03 NA NA NA NA NA
cg06 NA NA NA NA NA
cg11 NA NA NA NA NA
cg15 NA NA NA NA NA
cg18 NA NA NA NA NA

Check for Missing Values

methyl_miss(methyl_surro)
  • methyl_surro: methyl_surro object created with surro_set()

Check for Missing Values

missing <- methyl_miss(methyl_surro = lin_surrogate)
print(missing)
$missing_obs
cg02 cg07 
 0.6  0.2 

$missing_probes
[1] "cg03" "cg06" "cg11" "cg15" "cg18"

Impute Missing Observations

impute_obs(methyl_surro,
           method = c("mean", "median"),
           min_nonmiss_prop = 0)
  • methyl_surro: methyl_surro object
  • method: Character string indicating the imputation method
    • Current options are “mean” or “median”
    • Currently developing KNN and weighted KNN options
  • min_nonmiss_prop: Optional minimum proportion of non-missing data required in a probe for imputation to proceed

Impute Missing Observations

lin_surrogate <- impute_obs(methyl_surro = lin_surrogate,
                            method = "mean",
                            min_nonmiss_prop = 0)
print(lin_surrogate)
$methyl
         samp1     samp2     samp3     samp4      samp5
cg02 0.2875775 0.5852975 0.5852975 0.8830174 0.58529746
cg07 0.8998250 0.2460877 0.6070843 0.3279207 0.95450365
cg08 0.8895393 0.6928034 0.6405068 0.9942698 0.65570580
cg13 0.9630242 0.9022990 0.6907053 0.7954674 0.02461368
cg17 0.1428000 0.4145463 0.4137243 0.3688455 0.15244475
cg03        NA        NA        NA        NA         NA
cg06        NA        NA        NA        NA         NA
cg11        NA        NA        NA        NA         NA
cg15        NA        NA        NA        NA         NA
cg18        NA        NA        NA        NA         NA

$weights
        cg02         cg03         cg06         cg07         cg08         cg11 
-0.009083377 -0.001155999  0.005978497 -0.007562015  0.001218960 -0.005869372 
        cg13         cg15         cg17         cg18 
-0.007449367  0.005066157  0.007900907 -0.002510744 

$intercept
Intercept 
    1.211 

attr(,"class")
[1] "methyl_surro"

Fill Missing Probes

reference_fill(
  methyl_surro,
  reference,
  type = c("probes", "obs", "all")
)
  • methyl_surro: methyl_surro object
  • reference: Named numeric vector of methylation reference values
  • type: Character string to identify which probes to fill

Fill Missing Probes

data("ref_vec_mean", package = "MethylSurroGetR")
print(ref_vec_mean)
     cg01      cg02      cg03      cg04      cg05      cg06      cg07      cg08 
0.3992451 0.6616689 0.4948262 0.5278895 0.6770353 0.5526592 0.4940793 0.7745650 
     cg09      cg10      cg11      cg12      cg13      cg14      cg15      cg16 
0.5281033 0.4982656 0.4566024 0.3856340 0.6752219 0.5866269 0.4004940 0.4996738 
     cg17      cg18      cg19      cg20 
0.2984722 0.3923206 0.3747138 0.7031609 

Fill Missing Probes

lin_surrogate <- reference_fill(methyl_surro = lin_surrogate,
                                reference = ref_vec_mean,
                                type = "probes")
print(lin_surrogate)
$methyl
         samp1     samp2     samp3     samp4      samp5
cg02 0.2875775 0.5852975 0.5852975 0.8830174 0.58529746
cg07 0.8998250 0.2460877 0.6070843 0.3279207 0.95450365
cg08 0.8895393 0.6928034 0.6405068 0.9942698 0.65570580
cg13 0.9630242 0.9022990 0.6907053 0.7954674 0.02461368
cg17 0.1428000 0.4145463 0.4137243 0.3688455 0.15244475
cg03 0.4948262 0.4948262 0.4948262 0.4948262 0.49482616
cg06 0.5526592 0.5526592 0.5526592 0.5526592 0.55265924
cg11 0.4566024 0.4566024 0.4566024 0.4566024 0.45660238
cg15 0.4004940 0.4004940 0.4004940 0.4004940 0.40049405
cg18 0.3923206 0.3923206 0.3923206 0.3923206 0.39232059

$weights
        cg02         cg03         cg06         cg07         cg08         cg11 
-0.009083377 -0.001155999  0.005978497 -0.007562015  0.001218960 -0.005869372 
        cg13         cg15         cg17         cg18 
-0.007449367  0.005066157  0.007900907 -0.002510744 

$intercept
Intercept 
    1.211 

attr(,"class")
[1] "methyl_surro"

Estimate Surrogate Values

surro_calc(methyl_surro,
           transform = c("linear", "count", "probability"))
  • methyl_surro: methyl_surro object
  • transform: Character string specifying the transformation to apply
    • "linear": For surrogates estimated with Gaussian models
    • "count": For surrogates estimated with Poisson models
    • "probability": For surrogates estimated with binomial models

Estimate Surrogate Values

estimates <- surro_calc(methyl_surro = lin_surrogate,
                        transform = "linear")
print(estimates)
   samp1    samp2    samp3    samp4    samp5 
1.197718 1.202317 1.201093 1.199796 1.201382 

Piping Commands

estimates <- beta_matrix_miss |>
  surro_set(weights = wts_vec_lin, intercept = "Intercept") |>
  impute_obs(method = "mean") |>
  reference_fill(reference = ref_vec_mean, type = "probes") |>
  surro_calc(transform = "linear")

print(estimates)
   samp1    samp2    samp3    samp4    samp5 
1.197718 1.202317 1.201093 1.199796 1.201382 

Thank You


  • Text-based version of this tutorial is available HERE.
  • Please feel free to reach out with any questions.