I have a `data.frame` and I need to calculate the mean per group (i.e. per `Month`, below).

``````Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
``````

My desired output is like below, where the values for `Rate1` and `Rate2` are the group means. Please disregard the value, I have made it up for the example.

``````Name       Rate1       Rate2
Aira        23.21       12.2
Ben         45.23       43.9
Cat         33.22       32.2
``````

## Solution 1

This type of operation is exactly what `aggregate` was designed for:

``````d <- read.table(text=
'Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43

aggregate(d[, 3:4], list(d\$Name), mean)

Group.1    Rate1    Rate2
1    Aira 16.33333 47.00000
2     Ben 31.33333 50.33333
3     Cat 44.66667 54.00000
``````

Here we aggregate columns 3 and 4 of data.frame `d`, grouping by `d\$Name`, and applying the `mean` function.

Or, using a formula interface:

``````aggregate(. ~ Name, d[-2], mean)
``````

## Solution 2

Or use `group_by` & `summarise_at` from the `dplyr` package:

``````library(dplyr)

d %>%
group_by(Name) %>%
summarise_at(vars(-Month), funs(mean(., na.rm=TRUE)))

# A tibble: 3 x 3
Name  Rate1 Rate2
<fct> <dbl> <dbl>
1 Aira   16.3  47.0
2 Ben    31.3  50.3
3 Cat    44.7  54.0
``````

See `?summarise_at` for the many ways to specify the variables to act on. Here, `vars(-Month)` says all variables except `Month`.

In more recent versions of tidyverse/`dplyr`, using `summarise(across(...))` is preferred to `summarise_at`:

``````d %>%
group_by(Name) %>%
summarise(across(-Month, mean, na.rm = TRUE))
``````

## Solution 3

You can also use package `plyr`, which is somehow more versatile:

``````library(plyr)

ddply(d, .(Name), summarize,  Rate1=mean(Rate1), Rate2=mean(Rate2))

Name    Rate1    Rate2
1 Aira 16.33333 47.00000
2  Ben 31.33333 50.33333
3  Cat 44.66667 54.00000
``````

## Solution 4

A option is using the package `data.table`, which also has the class data.frame, but operations like you are looking for are computed much faster.

``````library(data.table)
mydt <- structure(list(Name = c("Aira", "Aira", "Aira", "Ben", "Ben", "Ben", "Cat", "Cat", "Cat"), Month = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Rate1 = c(15.6396600443877, 2.15649279424609, 6.24692918928743, 2.37658797276116, 34.7500663272292, 3.28750138697048, 29.3265553981065, 17.9821839334431, 10.8639802575958), Rate2 = c(17.1680489538369, 5.84231656330206, 8.54330866437461, 5.88415184986176, 3.02064294862551, 17.2053351400752, 16.9552950199166, 2.56058000170089, 15.7496228048122)), .Names = c("Name", "Month", "Rate1", "Rate2"), row.names = c(NA, -9L), class = c("data.table", "data.frame"))
``````

Now to take the mean of Rate1 and Rate2 for all 3 months, for each person (Name): First, decide which columns you want to take the mean of

``````colstoavg <- names(mydt)[3:4]
``````

Now we use lapply to take the mean over the columns we want to avg (colstoavg)

``````mydt.mean <- mydt[,lapply(.SD,mean,na.rm=TRUE),by=Name,.SDcols=colstoavg]

mydt.mean
Name     Rate1     Rate2
1: Aira  8.014361 10.517891
2:  Ben 13.471385  8.703377
3:  Cat 19.390907 11.755166
``````

## Solution 5

Here are a variety of ways to do this in base `R` including an alternative `aggregate` approach. The examples below return means per month, which I think is what you requested. Although, the same approach could be used to return means per person:

Using `ave`:

``````my.data <- read.table(text = '
Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

Rate1.mean <- with(my.data, ave(Rate1, Month, FUN = function(x) mean(x, na.rm = TRUE)))
Rate2.mean <- with(my.data, ave(Rate2, Month, FUN = function(x) mean(x, na.rm = TRUE)))

my.data <- data.frame(my.data, Rate1.mean, Rate2.mean)
my.data
``````

Using `by`:

``````my.data <- read.table(text = '
Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

by.month <- as.data.frame(do.call("rbind", by(my.data, my.data\$Month, FUN = function(x) colMeans(x[,3:4]))))
colnames(by.month) <- c('Rate1.mean', 'Rate2.mean')
by.month <- cbind(Month = rownames(by.month), by.month)

my.data <- merge(my.data, by.month, by = 'Month')
my.data
``````

Using `lapply` and `split`:

``````my.data <- read.table(text = '
Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

ly.mean <- lapply(split(my.data, my.data\$Month), function(x) c(Mean = colMeans(x[,3:4])))
ly.mean <- as.data.frame(do.call("rbind", ly.mean))
ly.mean <- cbind(Month = rownames(ly.mean), ly.mean)

my.data <- merge(my.data, ly.mean, by = 'Month')
my.data
``````

Using `sapply` and `split`:

``````my.data <- read.table(text = '
Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
my.data

sy.mean <- t(sapply(split(my.data, my.data\$Month), function(x) colMeans(x[,3:4])))
colnames(sy.mean) <- c('Rate1.mean', 'Rate2.mean')
sy.mean <- data.frame(Month = rownames(sy.mean), sy.mean, stringsAsFactors = FALSE)
my.data <- merge(my.data, sy.mean, by = 'Month')
my.data
``````

Using `aggregate`:

``````my.data <- read.table(text = '
Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43
Cat        3      45        32
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

my.summary <- with(my.data, aggregate(list(Rate1, Rate2), by = list(Month),
FUN = function(x) { mon.mean = mean(x, na.rm = TRUE) } ))

my.summary <- do.call(data.frame, my.summary)
colnames(my.summary) <- c('Month', 'Rate1.mean', 'Rate2.mean')
my.summary

my.data <- merge(my.data, my.summary, by = 'Month')
my.data
``````

EDIT: June 28, 2020

Here I use `aggregate` to obtain the column `means` of an entire `matrix` by group where group is defined in an external `vector`:

``````my.group <- c(1,2,1,2,2,3,1,2,3,3)

my.data <- matrix(c(   1,    2,    3,    4,    5,
10,   20,   30,   40,   50,
2,    4,    6,    8,   10,
20,   30,   40,   50,   60,
20,   18,   16,   14,   12,
1000, 1100, 1200, 1300, 1400,
2,    3,    4,    3,    2,
50,   40,   30,   20,   10,
1001, 2001, 3001, 4001, 5001,
1000, 2000, 3000, 4000, 5000), nrow = 10, ncol = 5, byrow = TRUE)
my.data

my.summary <- aggregate(list(my.data), by = list(my.group), FUN = function(x) { my.mean = mean(x, na.rm = TRUE) } )
my.summary
#  Group.1          X1       X2          X3       X4          X5
#1       1    1.666667    3.000    4.333333    5.000    5.666667
#2       2   25.000000   27.000   29.000000   31.000   33.000000
#3       3 1000.333333 1700.333 2400.333333 3100.333 3800.333333
``````

## Solution 6

I describe two ways to do this, one based on data.table and the other based on reshape2 package . The data.table way already has an answer, but I have tried to make it cleaner and more detailed.

The data is like this:

`````` d <- structure(list(Name = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("Aira", "Ben", "Cat"), class = "factor"),
Month = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Rate1 = c(12L,
18L, 19L, 53L, 22L, 19L, 22L, 67L, 45L), Rate2 = c(23L, 73L,
45L, 19L, 87L, 45L, 87L, 43L, 32L)), .Names = c("Name", "Month",
"Rate1", "Rate2"), class = "data.frame", row.names = c(NA, -9L
))
Name Month Rate1 Rate2
1 Aira     1    12    23
2 Aira     2    18    73
3 Aira     3    19    45
4  Ben     1    53    19
5  Ben     2    22    87
6  Ben     3    19    45

library("reshape2")
mym <- melt(d, id = c("Name"))
res <- dcast(mym, Name ~ variable, mean)
res
#Name Month    Rate1    Rate2
#1 Aira     2 16.33333 47.00000
#2  Ben     2 31.33333 50.33333
#3  Cat     2 44.66667 54.00000
``````

Using data.table:

``````# At first, I convert the data.frame to data.table and then I group it
setDT(d)
d[, .(Rate1 = mean(Rate1), Rate2 = mean(Rate2)), by = .(Name)]
#   Name    Rate1    Rate2
#1: Aira 16.33333 47.00000
#2:  Ben 31.33333 50.33333
#3:  Cat 44.66667 54.00000
``````

There is another way of doing it by avoiding to write many argument for j in data.table using a .SD

``````d[, lapply(.SD, mean), by = .(Name)]
#   Name Month    Rate1    Rate2
#1: Aira     2 16.33333 47.00000
#2:  Ben     2 31.33333 50.33333
#3:  Cat     2 44.66667 54.00000
``````

if we only want to have Rate1 and Rate2 then we can use the .SDcols as follows:

``````d[, lapply(.SD, mean), by = .(Name), .SDcols = 3:4]
#  Name    Rate1    Rate2
#1: Aira 16.33333 47.00000
#2:  Ben 31.33333 50.33333
#3:  Cat 44.66667 54.00000
``````

## Solution 7

You can also accomplish this using the `sqldf` package as shown below:

``````library(sqldf)

x <- read.table(text='Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87
Ben        3      19        45
Cat        1      22        87
Cat        2      67        43

sqldf("
select
Name
,avg(Rate1) as Rate1_float
,avg(Rate2) as Rate2_float
,avg(Rate1) as Rate1
,avg(Rate2) as Rate2
from x
group by
Name
")

#  Name Rate1_float Rate2_float Rate1 Rate2
#1 Aira    16.33333    47.00000    16    47
#2  Ben    31.33333    50.33333    31    50
#3  Cat    44.66667    54.00000    44    54
``````

I am a recent convert to `dplyr` as shown in other answers, but `sqldf` is nice as most data analysts/data scientists/developers have at least some fluency in SQL. In this way, I think it tends to make for more universally readable code than `dplyr` or other solutions presented above.

UPDATE: In responding to the comment below, I attempted to update the code as shown above. However, the behavior was not as I expected. It seems that the column definition (i.e. `int` vs `float`) is only carried through when the column alias matches the original column name. When you specify a new name, the aggregate column is returned without rounding.

## Solution 8

You could also use the generic function `cbind()` and `lm()` without the intercept:

``````cbind(lm(d\$Rate1~-1+d\$Name)\$coef,lm(d\$Rate2~-1+d\$Name)\$coef)
>               [,1]     [,2]
>d\$NameAira 16.33333 47.00000
>d\$NameBen  31.33333 50.33333
>d\$NameCat  44.66667 54.00000
``````