quant_summary()

An R function to generate ‘quick and dirty’ summary stats for numeric variables. This is often useful for running quick coding checks.
Author

Joshua A. Goode

Published

September 4, 2024

The Importance of Checking Data

Work in Progress

This section is currently under construction and will be completed soon.

Quant Summary Function

Tips
  • All code blocks on this page can be copied by clicking in the upper right corner.
  • Note that some code and output blocks may scroll left and right.
  • As with all content on my site, please feel free to reach out if you have any questions.

The Function

quant_summmary <- function(df){
  summary <- apply(df, 2, function(var){
    if (sum(!is.na(var)) == 0) {
      c(rep(NA, times = 11), sum(is.na(var)), sum(!is.na(var)))
    } else {
      c(mean(var, na.rm = TRUE), sd(var, na.rm = TRUE),
        quantile(var, c(0, 0.05, 0.25, 0.50, 0.75, 0.95, 1.00), na.rm = TRUE),
        sum(is.na(var)), sum(!is.na(var)))
    }
  })
  summary <- data.frame(t(summary))
  colnames(summary) <- c("mean", "sd", "min", "q05", "q25", "q50", "q75", "q95", "max", "miss", "nonmiss")
  corrs <- data.frame(round(cor(df), digits = 3))
  return(list(summary = summary, corrs = corrs))
}

Examples

Basic Approach

We’re loading the dplyr package here for data management only; it is not required for the quant_summary() function.

Because the function is written for a data frame, we’re using the select() function in dplyr to select the variables we want.

# loading packages
library(dplyr)

# loading data
subset_data <- mtcars %>%
  select(c(mpg, hp, disp, wt))

We can run the function to create a new object. Although it’s incredibly unoriginal, we’re calling it data_summary.

data_summary <- quant_summmary(subset_data)

Because our new object is stored as a list, we can access the summary statistics in the summary element.

data_summary$summary
          mean          sd    min    q05       q25     q50    q75       q95
mpg   20.09062   6.0269481 10.400 11.995  15.42500  19.200  22.80  31.30000
hp   146.68750  68.5628685 52.000 63.650  96.50000 123.000 180.00 253.55000
disp 230.72188 123.9386938 71.100 77.350 120.82500 196.300 326.00 449.00000
wt     3.21725   0.9784574  1.513  1.736   2.58125   3.325   3.61   5.29275
         max miss nonmiss
mpg   33.900    0      32
hp   335.000    0      32
disp 472.000    0      32
wt     5.424    0      32

Similarly, I can access the the correlation matrix in the corrs element.

data_summary$corrs
        mpg     hp   disp     wt
mpg   1.000 -0.776 -0.848 -0.868
hp   -0.776  1.000  0.791  0.659
disp -0.848  0.791  1.000  0.888
wt   -0.868  0.659  0.888  1.000

We could also access these without elements without creating an object.

quant_summmary(subset_data)$summary
          mean          sd    min    q05       q25     q50    q75       q95
mpg   20.09062   6.0269481 10.400 11.995  15.42500  19.200  22.80  31.30000
hp   146.68750  68.5628685 52.000 63.650  96.50000 123.000 180.00 253.55000
disp 230.72188 123.9386938 71.100 77.350 120.82500 196.300 326.00 449.00000
wt     3.21725   0.9784574  1.513  1.736   2.58125   3.325   3.61   5.29275
         max miss nonmiss
mpg   33.900    0      32
hp   335.000    0      32
disp 472.000    0      32
wt     5.424    0      32
quant_summmary(subset_data)$corrs
        mpg     hp   disp     wt
mpg   1.000 -0.776 -0.848 -0.868
hp   -0.776  1.000  0.791  0.659
disp -0.848  0.791  1.000  0.888
wt   -0.868  0.659  0.888  1.000

Modified Approach

If we’re feeling fancy, we can combine the summary statistics and correlations into a single table. Here we’re using the rename_with() function in dplyr to rename the correlation columns as {var}_r.

cbind(
  quant_summmary(subset_data)$summary,
  quant_summmary(subset_data)$corrs |>
    rename_with(~ paste0(.x, "_r")
    )
)
          mean          sd    min    q05       q25     q50    q75       q95
mpg   20.09062   6.0269481 10.400 11.995  15.42500  19.200  22.80  31.30000
hp   146.68750  68.5628685 52.000 63.650  96.50000 123.000 180.00 253.55000
disp 230.72188 123.9386938 71.100 77.350 120.82500 196.300 326.00 449.00000
wt     3.21725   0.9784574  1.513  1.736   2.58125   3.325   3.61   5.29275
         max miss nonmiss  mpg_r   hp_r disp_r   wt_r
mpg   33.900    0      32  1.000 -0.776 -0.848 -0.868
hp   335.000    0      32 -0.776  1.000  0.791  0.659
disp 472.000    0      32 -0.848  0.791  1.000  0.888
wt     5.424    0      32 -0.868  0.659  0.888  1.000