# 13. R functions that I use all the time

Out of the box, R comes with many functions that make data analysis easy. In this post, I’ll review some of the functions that I use frequently, in whatever order rolls off the top of my head.

To get the ball rolling, we need a vector to work with. Let’s define a vector `x`

that contains some random(ish) numbers:

```
> x = c(1, 4, -2, 10, 15, -5, -1, 2, 8, 0)
```

(Remember that the `c`

function is how we combine numbers into vectors.)

Now on to the functions

### length

Having filled a vector with some data, often the first thing I want to know is how much data I have. R has a function for that called `length`

. It tells you how many elements in a vector:

```
> length(x)
[1] 10
```

Nice! It looks like I put 10 numbers into `x`

.

### sort

When I see numbers that are in no particular order, I have an urge to sort them. R has a `sort`

function for that:

```
> sort(x)
[1] -5 -2 -1 0 1 2 4 8 10 15
```

It just feels better seeing things in order!

### mean

Now let’s get to some summary statistics. The mean is by far the most popular. R has a function for that call (drum roll) `mean`

:

```
> mean(x)
[1] 3.2
```

### median

Let’s not leave out our friendly `median`

function, which returns the midpoint of the data:

```
> median(x)
[1] 1.5
```

Hmm … the median is not the same as the mean. That means the data (that I made up) has a skew. More on skewness sometime in the future.

### standard deviation

The standard deviation is a workhouse stat that mathematicians love and the general public often misunderstands. Calculate it in R with `sd`

:

```
> sd(x)
[1] 6.124632
```

### max/min

Want to know the high and/or low values in your data. Use the `max`

and `min`

functions

```
> max(x)
[1] 15
> min(x)
[1] -5
```

### summary

If you want to see all the summary statistics in one go, R has a function for that called `summary`

:

```
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-5.00 -0.75 1.50 3.20 7.00 15.00
```

From left to right, you get the minimum value, the 1st quartile, the median, the mean, the third quartile, and the max. Nifty!

### quantiles

Speaking of quartiles, R has a nice function for calculating ‘quantiles’. It’s called `quantiles`

. If that language is confusing, you can think of the function as returning percentiles.

For example, here’s how I’d get the 30th percentile in `x`

:

```
> quantile(x, 0.3)
30%
-0.3
```

R gives me the percentile I’m calculating, followed by its value.

Here’s the 90th percentile:

```
> quantile(x, 0.9)
90%
10.5
```

### head and tail

The `head`

and `tail`

functions return the start and end of a vector.

Here’s the first 4 values of `x`

:

```
> head(x, 4)
[1] 1 4 -2 10
```

And here are the last 3 values of `x`

:

```
> tail(x, 3)
[1] 2 8 0
```

### N largest/smallest values

To get the single largest/smallest value, we’ve got the `max`

and `min`

functions. But what about if I want to know the 3 largest/smallest values? How would I do that?

The answer is that we combine the max/min functions with head/tail functions.

Suppose we want the 3 largest values in `x`

. First we’d sort `x`

:

```
> sort(x)
[1] -5 -2 -1 0 1 2 4 8 10 15
```

You can see that the 3 largest values live in the last 3 elements. So we’ll take the tail of the sorted values:

```
x_sort = sort(x)
tail(x_sort, 3)
[1] 8 10 15
```

Or suppose we want the 2 smallest values. Now we take the head of `x_sort`

:

```
x_sort = sort(x)
head(x_sort, 2)
[1] -5 -2
```

### That pretty much covers it

I’d say that the functions above cover 90% of the calculating that I do in my own research.

The hard part isn’t using these functions. (As you can see, they’re super easy.) The hard part is usually getting the data into a suitable form to apply these functions.

More on that in the future.