I have the following data frame

``````x <- read.table(text = "  id1 id2 val1 val2
1   a   x    1    9
2   a   x    2    4
3   a   y    3    5
4   a   y    4    9
5   b   x    1    7
6   b   y    4    4
7   b   x    3    9
8   b   y    2    8", header = TRUE)
``````

I want to calculate the mean of val1 and val2 grouped by id1 and id2, and simultaneously count the number of rows for each id1-id2 combination. I can perform each calculation separately:

``````# calculate mean
aggregate(. ~ id1 + id2, data = x, FUN = mean)

# count rows
aggregate(. ~ id1 + id2, data = x, FUN = length)
``````

In order to do both calculations in one call, I tried

``````do.call("rbind", aggregate(. ~ id1 + id2, data = x, FUN = function(x) data.frame(m = mean(x), n = length(x))))
``````

However, I get a garbled output along with a warning:

``````#     m   n
# id1 1   2
# id2 1   1
#     1.5 2
#     2   2
#     3.5 2
#     3   2
#     6.5 2
#     8   2
#     7   2
#     6   2
# Warning message:
#   In rbind(id1 = c(1L, 2L, 1L, 2L), id2 = c(1L, 1L, 2L, 2L), val1 = list( :
#   number of columns of result is not a multiple of vector length (arg 1)
``````

I could use the plyr package, but my data set is quite large and plyr is very slow (almost unusable) when the size of the dataset grows.

How can I use `aggregate` or other functions to perform several calculations in one call?

## Solution 1

You can do it all in one step and get proper labeling:

``````> aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
#   id1 id2 val1.mn val1.n val2.mn val2.n
# 1   a   x     1.5    2.0     6.5    2.0
# 2   b   x     2.0    2.0     8.0    2.0
# 3   a   y     3.5    2.0     7.0    2.0
# 4   b   y     3.0    2.0     6.0    2.0
``````

This creates a dataframe with two id columns and two matrix columns:

``````str( aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
'data.frame':   4 obs. of  4 variables:
\$ id1 : Factor w/ 2 levels "a","b": 1 2 1 2
\$ id2 : Factor w/ 2 levels "x","y": 1 1 2 2
\$ val1: num [1:4, 1:2] 1.5 2 3.5 3 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..\$ : NULL
.. ..\$ : chr  "mn" "n"
\$ val2: num [1:4, 1:2] 6.5 8 7 6 2 2 2 2
..- attr(*, "dimnames")=List of 2
.. ..\$ : NULL
.. ..\$ : chr  "mn" "n"
``````

As pointed out by @lord.garbage below, this can be converted to a dataframe with "simple" columns by using `do.call(data.frame, ...)`

``````str( do.call(data.frame, aggregate(. ~ id1+id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) ) )
)
'data.frame':   4 obs. of  6 variables:
\$ id1    : Factor w/ 2 levels "a","b": 1 2 1 2
\$ id2    : Factor w/ 2 levels "x","y": 1 1 2 2
\$ val1.mn: num  1.5 2 3.5 3
\$ val1.n : num  2 2 2 2
\$ val2.mn: num  6.5 8 7 6
\$ val2.n : num  2 2 2 2
``````

This is the syntax for multiple variables on the LHS:

``````aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = function(x) c(mn = mean(x), n = length(x) ) )
``````

## Solution 2

Given this in the question :

I could use the plyr package, but my data set is quite large and plyr is very slow (almost unusable) when the size of the dataset grows.

Then in `data.table` (`1.9.4+`) you could try :

``````> DT
id1 id2 val1 val2
1:   a   x    1    9
2:   a   x    2    4
3:   a   y    3    5
4:   a   y    4    9
5:   b   x    1    7
6:   b   y    4    4
7:   b   x    3    9
8:   b   y    2    8

> DT[ , .(mean(val1), mean(val2), .N), by = .(id1, id2)]   # simplest
id1 id2  V1  V2 N
1:   a   x 1.5 6.5 2
2:   a   y 3.5 7.0 2
3:   b   x 2.0 8.0 2
4:   b   y 3.0 6.0 2

> DT[ , .(val1.m = mean(val1), val2.m = mean(val2), count = .N), by = .(id1, id2)]  # named
id1 id2 val1.m val2.m count
1:   a   x    1.5    6.5     2
2:   a   y    3.5    7.0     2
3:   b   x    2.0    8.0     2
4:   b   y    3.0    6.0     2

> DT[ , c(lapply(.SD, mean), count = .N), by = .(id1, id2)]   # mean over all columns
id1 id2 val1 val2 count
1:   a   x  1.5  6.5     2
2:   a   y  3.5  7.0     2
3:   b   x  2.0  8.0     2
4:   b   y  3.0  6.0     2
``````

For timings comparing `aggregate` (used in question and all 3 other answers) to `data.table` see this benchmark (the `agg` and `agg.x` cases).

## Solution 3

Using the `dplyr` package you could achieve this by using `summarise_all`. With this summarise-function you can apply other functions (in this case `mean` and `n()`) to each of the non-grouping columns:

``````x %>%
group_by(id1, id2) %>%
summarise_all(funs(mean, n()))
``````

which gives:

``````     id1    id2 val1_mean val2_mean val1_n val2_n
1      a      x       1.5       6.5      2      2
2      a      y       3.5       7.0      2      2
3      b      x       2.0       8.0      2      2
4      b      y       3.0       6.0      2      2
``````

If you don't want to apply the function(s) to all non-grouping columns, you specify the columns to which they should be applied or by excluding the non-wanted with a minus using the `summarise_at()` function:

``````# inclusion
x %>%
group_by(id1, id2) %>%
summarise_at(vars(val1, val2), funs(mean, n()))

# exclusion
x %>%
group_by(id1, id2) %>%
summarise_at(vars(-val2), funs(mean, n()))
``````

## Solution 4

You could add a `count` column, aggregate with `sum`, then scale back to get the `mean`:

``````x\$count <- 1
agg <- aggregate(. ~ id1 + id2, data = x,FUN = sum)
agg
#   id1 id2 val1 val2 count
# 1   a   x    3   13     2
# 2   b   x    4   16     2
# 3   a   y    7   14     2
# 4   b   y    6   12     2

agg[c("val1", "val2")] <- agg[c("val1", "val2")] / agg\$count
agg
#   id1 id2 val1 val2 count
# 1   a   x  1.5  6.5     2
# 2   b   x  2.0  8.0     2
# 3   a   y  3.5  7.0     2
# 4   b   y  3.0  6.0     2
``````

It has the advantage of preserving your column names and creating a single `count` column.

## Solution 5

Perhaps you want to merge?

``````x.mean <- aggregate(. ~ id1+id2, p, mean)
x.len  <- aggregate(. ~ id1+id2, p, length)

merge(x.mean, x.len, by = c("id1", "id2"))

id1 id2 val1.x val2.x val1.y val2.y
1   a   x    1.5    6.5      2      2
2   a   y    3.5    7.0      2      2
3   b   x    2.0    8.0      2      2
4   b   y    3.0    6.0      2      2
``````

## Solution 6

You can also use the `plyr::each()` to introduce multiple functions:

``````aggregate(cbind(val1, val2) ~ id1 + id2, data = x, FUN = plyr::each(avg = mean, n = length))
``````

## Solution 7

After `dplyr` version 1.0.0, the above `summarize_all` and `summarize_at` functions were superseded by `summarize(across(...))`, where you can select columns to operate on (`val1:val2` here).

We can also supply a list of functions in `across`, and set column names with glue specification (`{.col}` = original column name, `{.fn}` = function name in the list).

More information of `across` can be found in the official documentation.

``````library(dplyr)

x %>% group_by(id1, id2) %>%
summarize(across(val1:val2, list(mean = mean, n = length), .names = "{.col}_{.fn}"))

# A tibble: 4 × 6
# Groups:   id1 
id1   id2   val1_mean val1_n val2_mean val2_n
<chr> <chr>     <dbl>  <int>     <dbl>  <int>
1 a     x           1.5      2       6.5      2
2 a     y           3.5      2       7        2
3 b     x           2        2       8        2
4 b     y           3        2       6        2
``````