scale_minmax()

An R function to recode variables using min-max scaling.
Author

Joshua A. Goode

Published

July 18, 2024

An Brief Introduction to Min-Max Scaling

In min-max scaling, values are recoded to have a particular range. The simplest approach is to recode values to have range \([0, 1]\) (see Eq. 1).

\[\begin{equation}\tag{Eq. 1: Range [0, 1]} x' = \frac{\left(x - min(x)\right)}{max(x) - min(x)} \end{equation}\]

Alternatively, values can be recoded to have range \([a, b]\) (see Eq. 2).

\[\begin{equation}\tag{Eq. 2: Range [a, b]} x' = \frac{\left(x - min(x)\right)\left(b - a\right)}{max(x) - min(x)} + a \end{equation}\]

Min-Max Scaling in R

Tips
  • All code blocks on this page can be copied by clicking in the upper right corner.
  • As with all content on my site, please feel free to reach out if you have any questions.

The Function

scale_minmax <- function(x, to_min = 0, to_max = 1){
  (x - min(x, na.rm = TRUE)) * (to_max - to_min) /
    (max(x, na.rm = TRUE) - min(x, na.rm = TRUE)) +
    to_min
}

Examples

Example 1

To use the function, we just call scale_minmax and provide the object to be recoded. By default, it will recode values to range \([0,1]\).

ex1 <- scale_minmax(mtcars$mpg)

Let’s check a few things to make sure everything is working as we expect. First, we’ll check to make sure that the range matches the values we specified.

c(min(ex1, na.rm = TRUE), max(ex1, na.rm = TRUE))
[1] 0 1

That’s looks good, so let’s check that the correlation is 1.

cor(mtcars$mpg[!is.na(mtcars$mpg)], ex1[!is.na(ex1)])
[1] 1

That also looks good! Finally, let’s check we have the same number of missing values in our new variable.

sum(is.na(mtcars$mpg)) == sum(is.na(ex1))
[1] TRUE

We’re good to go!

Example 2

Alternatively, we can specify the range for our new variable using the to_min and to-max options. A common approach is to recode values to \([-1, 1]\).

ex2 <- scale_minmax(mtcars$mpg, to_min = -1, to_max = 1)

Let’s run through our checks again to convince ourselves that everything still works.

c(min(ex2, na.rm = TRUE), max(ex2, na.rm = TRUE))
[1] -1  1
cor(mtcars$mpg[!is.na(mtcars$mpg)], ex2[!is.na(ex2)])
[1] 1
sum(is.na(mtcars$mpg)) == sum(is.na(ex2))
[1] TRUE

Everything looks good!

Example 3

We can also choose completely random values of min and max if we want to. This example recodes values to range \([1981, 1984]\).

ex3 <- scale_minmax(mtcars$mpg, to_min = 1981, to_max = 1984)

We’ll check everything one last time just to be sure.

c(min(ex3, na.rm = TRUE), max(ex3, na.rm = TRUE))
[1] 1981 1984
cor(mtcars$mpg[!is.na(mtcars$mpg)], ex3[!is.na(ex3)])
[1] 1
sum(is.na(mtcars$mpg)) == sum(is.na(ex3))
[1] TRUE

Everything still looks good!