I have a need that I imagine could be satisfied by
reshape, but I can't quite figure out.
I have a list of names (
brand), and accompanying ID number (
id). This data is in long form, so names can have multiple ID's. I'd like to de-dupicate by the name (
brand) and concatenate the multiple possible
id's into a string separated by a comment.
brand id RadioShack 2308 Rag & Bone 4466 Ragu 1830 Ragu 4518 Ralph Lauren 1638 Ralph Lauren 2719 Ralph Lauren 2720 Ralph Lauren 2721 Ralph Lauren 2722
RadioShack 2308 Rag & Bone 4466 Ragu 1830,4518 Ralph Lauren 1638,2719,2720,2721,2722
How would I accomplish this?
Let's call your data.frame
> aggregate(id ~ brand, data = DF, c) brand id 1 RadioShack 2308 2 Rag & Bone 4466 3 Ragu 1830, 4518 4 Ralph Lauren 1638, 2719, 2720, 2721, 2722
Another alternative using
result <- aggregate(id ~ brand, data = DF, paste, collapse = ",")
This produces the same result and now
id is not a
list anymore. Thanks to @Frank comment. To see the
class of each column try:
> sapply(result, class) brand id "factor" "character"
As mentioned by @DavidArenburg in the comments, another alternative is using the
aggregate(id ~ brand, data = DF, toString)
A nice clean one line in
results as a list
DF[ , .(id = list(id)), by = brand] brand id 1: RadioShack 2308 2: Rag & Bone 4466 3: Ragu 1830,4518 4: Ralph Lauren 1638,2719,2720,2721,2722 >
results as a string
DF[ , .(id = paste(id, collapse=",")), by = brand] brand id 1: RadioShack 2308 2: Rag & Bone 4466 3: Ragu 1830,4518 4: Ralph Lauren 1638,2719,2720,2721,2722
Even though the two results appear the same (that is when you print them, they look identical), they are in fact very different and allow for different functionality.
Namely, using the list option (the first one) allows you to then perform functions on the orignal
The latter will allow you to display the information more easily (including exporting to
excel), but to operate on the
id's will require splicing them back.
Or using dplyr:
library(dplyr) DF %>% group_by(brand) %>% summarise(id = paste(id, collapse = ","))
DF is the name of your data.frame.
Here's the information in base R:
myby <- by(df$id,df$brand,function(x)paste(x,collapse=","))
The formatting of "by" objects is weird. You can take
data.frame(id=c(myby)) and the brands will become rownames:
# id # RadioShack 2308 # Rag & Bone 4466 # Ragu 1830,4518 # Ralph Lauren 1638,2719,2720,2721,2722
Alternately, if you load the
data.table package, this will work:
dt <- data.table(df) dt[,paste(id,collapse=","),by=brand] # brand V1 # 1: RadioShack 2308 # 2: Rag & Bone 4466 # 3: Ragu 1830,4518 # 4: Ralph Lauren 1638,2719,2720,2721,2722