All about *apply family in R

R has many *apply functions which are very helpful to simplify our code. The *apply functions are all covered in dplyr package but it is still good to know the differences and how to use it. It is just too convenient to ignore them.

First, the following Mnemonics gives you an overview of what each *apply function do in general.

Mnemonics

  • lapply is a list apply which acts on a list or vector and returns a list.
  • sapply is a simple lapply (function defaults to returning a vector or matrix when possible)
  • vapply is a verified apply (allows the return object type to be prespecified)
  • rapply is a recursive apply for nested lists, i.e. lists within lists
  • tapply is a tagged apply where the tags identify the subsets
  • apply is generic: applies a function to a matrix’s rows or columns (or, more generally, to dimensions of an array)

Example:

apply
For sum/mean of each row/columns, there are more optimzed function: colMeans, rowMeans, colSums, rowSums.While using apply to dataframe, it will automatically coerce it to a matrix.

# Two dimensional matrix# Two dimensional matrix
myMetric <- matrix(floor(runif(15,0,100)),5,3)
myMetric
# apply min to rows
apply(myMetric,1,min)
# apply min to columns
apply(myMetric,2,min)

[,1] [,2] [,3]
[1,] 28 22 6
[2,] 31 75 80
[3,] 7 88 96
[4,] 15 70 27
[5,] 74 84 12 //
[1] 6 31 7 15 12 //
[1] 7 22 6 //

lapply
For list vector, it applies the function to each element in it. lapply is the workhorse under all * apply functions. The most fundamental one.

x <- list(a = runif(5,0,1), b = seq(1:10), c = seq(10:100))
lapply(x, FUN = mean)

# Result

$a
[1] 0.4850281

$b
[1] 5.5

$c
[1] 46

sapply
sapply is doing the similar to lapply, it is just the output different. It simplifies the output to a vector rather than a list.

x <- list(a = runif(5,0,1), b = seq(1:10), c = seq(10:100))
sapply(x, FUN = mean)

a                 b                  c
0.2520706 5.5000000 46.0000000

vapply – similar to sapply, just speed faster.

rapply
This is a recursive apply, especially useful for a nested list structure. For example:

#Append ! to string, otherwise increment
myFun <- function(x){
if (is.character(x)){
return(paste(x,”!”,sep=””))
}
else{
return(x + 1)
}
}

#A nested list structure
l <- list(a = list(a1 = “Boo”, b1 = 2, c1 = “Eeek”),
b = 3, c = “Yikes”,
d = list(a2 = 1, b2 = list(a3 = “Hey”, b3 = 5)))

#Result is named vector, coerced to character
rapply(l,myFun)

#Result is a nested list like l, with values altered
rapply(l, myFun, how = “replace”)

a.a1 a.b1 a.c1 b c d.a2 d.b2.a3 d.b2.b3
“Boo!” “3” “Eeek!” “4” “Yikes!” “2” “Hey!” “6”

[1] “break”
$a
$a$a1
[1] “Boo!”

$a$b1
[1] 3

$a$c1
[1] “Eeek!”

 

$b
[1] 4

$c
[1] “Yikes!”

$d
$d$a2
[1] 2

$d$b2
$d$b2$a3
[1] “Hey!”

$d$b2$b3
[1] 6

tapply
For when you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.

tapply is similar in spirit to the split-apply-combine functions that are common in R (aggregate, by, avg, ddply, etc.)

x <- 1:20
y = factor(rep(letters[1:5], each = 4))
tapply(x,y,sum)

a b c d e
10 26 42 58 74

mapply and map
For when you have several data structures (e.g. vectors, lists) and you want to apply a function to the 1st elements of each, and then the 2nd elements of each, etc., coercing the result to a vector/array as in sapply.

**Map** is a wrapper to mapply with SIMPLIFY = FALSE, so it will be guaranteed to return a list.

mapply(sum, 1:5, 1:10,1:20)
mapply(rep, 1:4, 4:1)

[1] 3 6 9 12 15 13 16 19 22 25 13 16 19 22 25 23 26 29 32 35
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

 

This post is compiled from stackoverflow’s top answers.

A better view of this is to look at the R Notebook I’ve created: https://rpubs.com/euler-tech/292794