Check out this blog post by Michael Mayer for a great comparison of different methods. And in some cases, running a for loop might even be faster than using an apply() function. It is almost always faster to use the vectorized function than to run a loop or to use an apply() function, if you have the option. Doing sqrt(vector) and sapply(vector, sqrt) will return the same answer, so using the apply() function is unnecessary. For example, the sqrt() function is vectorized. Most functions in R are already “vectorized”, which means the function will be applied to each element of the vector instead of having to loop through one element at a time. Even though the apply() family of functions can be used across a simple vector, there’s often no need to do so. Note: You may have noticed that in all of my examples, I’m using apply() across a list or a data frame. In our case, this would mean returning the answers as a vector like below, which usually makes it easier to work with down the line. But instead of returning a list, it will return the answers in the simplest possible format. It goes hand-in-hand with lapply() and works the same way, where it can accept a list and a function name as the input. This is where the sapply() function comes in. There wasn’t really any reason for those values to be put in a list format instead of, say, a vector. In the previous example, our means were returned as elements in a list, but each list element was represented by just one value. lapply() ends up being the best of the three methods I just showed you. lapply() does the same thing as the for loop, but is far more efficient in terms of space and effort. You’ll notice that the output of lapply() is also a list, where the means of height, mass, and flowers are saved as list elements of the same name. # Use lapply to find the mean of each list element lapply(plants, mean) We also have to subset our data to only contain height values (columns 2 through 4) because our first column contains the individual identifiers. So let’s try finding the mean plant height for each row (i.e., for each individual). Then you enter the name of the function that will be applied to the rows or columns (don’t include parentheses or function arguments). MARGIN = 1 indicates that you want to analyze across the data frame’s rows, while MARGIN = 2 analyzes across columns. First, you enter the data frame you want to analyze, then MARGIN asks you which dimension you want to analyze. In the arguments, you specify what you want as follows: apply(X = ame, MARGIN = 1, FUN = ). The first column contains the IDs for each individual, and each successive column describes their heights at time points 0, 10, and 20 in that order.Įxample <- ame(indiv = c( "A", "B", "C", "D", "E"),Īpply() lets you perform a function across a data frame’s rows or columns. This data set is in wide format* and describes the heights of five individuals (e.g., plants) in inches at three different time points (0, 10, and 20 days). These functions all end in apply() because you apply the function you want across all the specified elements. I’m going to discuss the functions apply(), lapply(), sapply(), and tapply() in this blog post (as well as using the dplyr library for similar tasks). For those of you familiar with ‘for’ loops, the apply() family often allows you to avoid constructing those and instead wrap the loop into one simple function. Today I’m going to talk about a useful family of functions that allows you to repetitively perform a specified function (e.g., sum(), mean()) across a vector, list, matrix, or data frame.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |