I have code that at one place ends up with a list of data frames which I really want to convert to a single big data frame.

I got some pointers from an earlier question which was trying to do something similar but more complex.

Here's an example of what I am starting with (this is grossly simplified for illustration):

``````listOfDataFrames <- vector(mode = "list", length = 100)

for (i in 1:100) {
listOfDataFrames[[i]] <- data.frame(a=sample(letters, 500, rep=T),
b=rnorm(500), c=rnorm(500))
}
``````

I am currently using this:

``````  df <- do.call("rbind", listOfDataFrames)
``````

## Solution 1

Use `bind_rows()` from the dplyr package:

``````bind_rows(list_of_dataframes, .id = "column_label")
``````

## Solution 2

One other option is to use a plyr function:

``````df <- ldply(listOfDataFrames, data.frame)
``````

This is a little slower than the original:

``````> system.time({ df <- do.call("rbind", listOfDataFrames) })
user  system elapsed
0.25    0.00    0.25
> system.time({ df2 <- ldply(listOfDataFrames, data.frame) })
user  system elapsed
0.30    0.00    0.29
> identical(df, df2)
 TRUE
``````

My guess is that using `do.call("rbind", ...)` is going to be the fastest approach that you will find unless you can do something like (a) use a matrices instead of a data.frames and (b) preallocate the final matrix and assign to it rather than growing it.

Edit 1:

Based on Hadley's comment, here's the latest version of `rbind.fill` from CRAN:

``````> system.time({ df3 <- rbind.fill(listOfDataFrames) })
user  system elapsed
0.24    0.00    0.23
> identical(df, df3)
 TRUE
``````

This is easier than rbind, and marginally faster (these timings hold up over multiple runs). And as far as I understand it, the version of `plyr` on github is even faster than this.

## Solution 3

For the purpose of completeness, I thought the answers to this question required an update. "My guess is that using `do.call("rbind", ...)` is going to be the fastest approach that you will find..." It was probably true for May 2010 and some time after, but in about Sep 2011 a new function `rbindlist` was introduced in the `data.table` package version 1.8.2, with a remark that "This does the same as `do.call("rbind",l)`, but much faster". How much faster?

``````library(rbenchmark)
benchmark(
do.call = do.call("rbind", listOfDataFrames),
plyr_rbind.fill = plyr::rbind.fill(listOfDataFrames),
plyr_ldply = plyr::ldply(listOfDataFrames, data.frame),
data.table_rbindlist = as.data.frame(data.table::rbindlist(listOfDataFrames)),
replications = 100, order = "relative",
columns=c('test','replications', 'elapsed','relative')
)
``````

``````                  test replications elapsed relative
4 data.table_rbindlist          100    0.11    1.000
1              do.call          100    9.39   85.364
2      plyr_rbind.fill          100   12.08  109.818
3           plyr_ldply          100   15.14  137.636
``````

## Solution 4

Code:

``````library(microbenchmark)

dflist <- vector(length=10,mode="list")
for(i in 1:100)
{
dflist[[i]] <- data.frame(a=runif(n=260),b=runif(n=260),
c=rep(LETTERS,10),d=rep(LETTERS,10))
}

mb <- microbenchmark(
plyr::rbind.fill(dflist),
dplyr::bind_rows(dflist),
data.table::rbindlist(dflist),
plyr::ldply(dflist,data.frame),
do.call("rbind",dflist),
times=1000)

ggplot2::autoplot(mb)
``````

Session:

``````R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

> packageVersion("plyr")
 1.8.4
> packageVersion("dplyr")
 0.5.0
> packageVersion("data.table")
 1.9.6
``````

UPDATE: Rerun 31-Jan-2018. Ran on the same computer. New versions of packages. Added seed for seed lovers.

``````set.seed(21)
library(microbenchmark)

dflist <- vector(length=10,mode="list")
for(i in 1:100)
{
dflist[[i]] <- data.frame(a=runif(n=260),b=runif(n=260),
c=rep(LETTERS,10),d=rep(LETTERS,10))
}

mb <- microbenchmark(
plyr::rbind.fill(dflist),
dplyr::bind_rows(dflist),
data.table::rbindlist(dflist),
plyr::ldply(dflist,data.frame),
do.call("rbind",dflist),
times=1000)

ggplot2::autoplot(mb)+theme_bw()

R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

> packageVersion("plyr")
 1.8.4
> packageVersion("dplyr")
 0.7.2
> packageVersion("data.table")
 1.10.4
``````

UPDATE: Rerun 06-Aug-2019.

``````set.seed(21)
library(microbenchmark)

dflist <- vector(length=10,mode="list")
for(i in 1:100)
{
dflist[[i]] <- data.frame(a=runif(n=260),b=runif(n=260),
c=rep(LETTERS,10),d=rep(LETTERS,10))
}

mb <- microbenchmark(
plyr::rbind.fill(dflist),
dplyr::bind_rows(dflist),
data.table::rbindlist(dflist),
plyr::ldply(dflist,data.frame),
do.call("rbind",dflist),
purrr::map_df(dflist,dplyr::bind_rows),
times=1000)

ggplot2::autoplot(mb)+theme_bw()

R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

packageVersion("plyr")
packageVersion("dplyr")
packageVersion("data.table")
packageVersion("purrr")

>> packageVersion("plyr")
 1.8.4
>> packageVersion("dplyr")
 0.8.3
>> packageVersion("data.table")
 1.12.2
>> packageVersion("purrr")
 0.3.2
``````

UPDATE: Rerun 18-Nov-2021.

``````set.seed(21)
library(microbenchmark)

dflist <- vector(length=10,mode="list")
for(i in 1:100)
{
dflist[[i]] <- data.frame(a=runif(n=260),b=runif(n=260),
c=rep(LETTERS,10),d=rep(LETTERS,10))
}

mb <- microbenchmark(
plyr::rbind.fill(dflist),
dplyr::bind_rows(dflist),
data.table::rbindlist(dflist),
plyr::ldply(dflist,data.frame),
do.call("rbind",dflist),
Reduce("rbind",dflist),
purrr::map_df(dflist,dplyr::bind_rows),
times=1000)

ggplot2::autoplot(mb)+theme_bw()

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

>packageVersion("plyr")
 1.8.6
> packageVersion("dplyr")
 1.0.7
> packageVersion("data.table")
 1.14.2
> packageVersion("purrr")
 0.3.4
``````

## Solution 5

There is also `bind_rows(x, ...)` in `dplyr`.

``````> system.time({ df.Base <- do.call("rbind", listOfDataFrames) })
user  system elapsed
0.08    0.00    0.07
>
> system.time({ df.dplyr <- as.data.frame(bind_rows(listOfDataFrames)) })
user  system elapsed
0.01    0.00    0.02
>
> identical(df.Base, df.dplyr)
 TRUE
``````

## Solution 6

Here's another way this can be done (just adding it to the answers because `reduce` is a very effective functional tool that is often overlooked as a replacement for loops. In this particular case, neither of these are significantly faster than do.call)

using base R:

``````df <- Reduce(rbind, listOfDataFrames)
``````

or, using the tidyverse:

``````library(tidyverse) # or, library(dplyr); library(purrr)
df <- listOfDataFrames %>% reduce(bind_rows)
``````

## Solution 7

How it should be done in the tidyverse:

``````df.dplyr.purrr <- listOfDataFrames %>% map_df(bind_rows)
``````

## Solution 8

The only thing that the solutions with `data.table` are missing is the identifier column to know from which dataframe in the list the data is coming from.

Something like this:

``````df_id <- data.table::rbindlist(listOfDataFrames, idcol = TRUE)
``````

The `idcol` parameter adds a column (`.id`) identifying the origin of the dataframe contained in the list. The result would look to something like this:

``````.id a         b           c
1   u   -0.05315128 -1.31975849
1   b   -1.00404849 1.15257952
1   y   1.17478229  -0.91043925
1   q   -1.65488899 0.05846295
1   c   -1.43730524 0.95245909
1   b   0.56434313  0.93813197
``````

## Solution 9

An updated visual for those wanting to compare some of the recent answers (I wanted to compare the purrr to dplyr solution). Basically I combined answers from @TheVTM and @rmf.

Code:

``````library(microbenchmark)
library(data.table)
library(tidyverse)

dflist <- vector(length=10,mode="list")
for(i in 1:100)
{
dflist[[i]] <- data.frame(a=runif(n=260),b=runif(n=260),
c=rep(LETTERS,10),d=rep(LETTERS,10))
}

mb <- microbenchmark(
dplyr::bind_rows(dflist),
data.table::rbindlist(dflist),
purrr::map_df(dflist, bind_rows),
do.call("rbind",dflist),
times=500)

ggplot2::autoplot(mb)
``````

Session Info:

``````sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
``````

Package Versions:

``````> packageVersion("tidyverse")
 1.1.1
> packageVersion("data.table")
 1.10.0
``````