# 13. R functions that I use all the time

· Blair Fix

Out of the box, R comes with many functions that make data analysis easy. In this post, I’ll review some of the functions that I use frequently, in whatever order rolls off the top of my head.

To get the ball rolling, we need a vector to work with. Let’s define a vector `x` that contains some random(ish) numbers:

``````> x = c(1, 4, -2, 10, 15, -5, -1, 2, 8, 0)
``````

(Remember that the `c` function is how we combine numbers into vectors.)

Now on to the functions

### length

Having filled a vector with some data, often the first thing I want to know is how much data I have. R has a function for that called `length`. It tells you how many elements in a vector:

``````> length(x)
[1] 10
``````

Nice! It looks like I put 10 numbers into `x`.

### sort

When I see numbers that are in no particular order, I have an urge to sort them. R has a `sort` function for that:

``````> sort(x)
[1] -5 -2 -1  0  1  2  4  8 10 15
``````

It just feels better seeing things in order!

### mean

Now let’s get to some summary statistics. The mean is by far the most popular. R has a function for that call (drum roll) `mean`:

``````> mean(x)
[1] 3.2
``````

### median

Let’s not leave out our friendly `median` function, which returns the midpoint of the data:

``````> median(x)
[1] 1.5
``````

Hmm … the median is not the same as the mean. That means the data (that I made up) has a skew. More on skewness sometime in the future.

### standard deviation

The standard deviation is a workhouse stat that mathematicians love and the general public often misunderstands. Calculate it in R with `sd`:

``````> sd(x)
[1] 6.124632
``````

### max/min

Want to know the high and/or low values in your data. Use the `max` and `min` functions

``````> max(x)
[1] 15

> min(x)
[1] -5
``````

### summary

If you want to see all the summary statistics in one go, R has a function for that called `summary`:

``````> summary(x)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-5.00   -0.75    1.50    3.20    7.00   15.00
``````

From left to right, you get the minimum value, the 1st quartile, the median, the mean, the third quartile, and the max. Nifty!

### quantiles

Speaking of quartiles, R has a nice function for calculating ‘quantiles’. It’s called `quantiles`. If that language is confusing, you can think of the function as returning percentiles.

For example, here’s how I’d get the 30th percentile in `x`:

``````> quantile(x, 0.3)
30%
-0.3
``````

R gives me the percentile I’m calculating, followed by its value.

Here’s the 90th percentile:

``````> quantile(x, 0.9)
90%
10.5
``````

The `head` and `tail` functions return the start and end of a vector.

Here’s the first 4 values of `x`:

``````> head(x, 4)
[1]  1  4 -2 10
``````

And here are the last 3 values of `x`:

``````> tail(x, 3)
[1] 2 8 0
``````

### N largest/smallest values

To get the single largest/smallest value, we’ve got the `max` and `min` functions. But what about if I want to know the 3 largest/smallest values? How would I do that?

The answer is that we combine the max/min functions with head/tail functions.

Suppose we want the 3 largest values in `x`. First we’d sort `x`:

``````> sort(x)
[1] -5 -2 -1  0  1  2  4  8 10 15
``````

You can see that the 3 largest values live in the last 3 elements. So we’ll take the tail of the sorted values:

``````x_sort = sort(x)
tail(x_sort, 3)
[1]  8 10 15
``````

Or suppose we want the 2 smallest values. Now we take the head of `x_sort`:

``````x_sort = sort(x)