In the following example, "Algorithms in C++" is present twice.

The $unset modifier can remove a particular field but how to remove an entry from a field?

{
  "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), 
  "favorites" : {
    "books" : [
      "Algorithms in C++",    
      "The Art of Computer Programming", 
      "Graph Theory",      
      "Algorithms in C++"
    ]
  }, 
  "name" : "robert"
}

Solution 1

As of MongoDB 2.2 you can use the aggregation framework with an $unwind, $group and $project stage to achieve this:

db.users.aggregate([{$unwind: '$favorites.books'},
                    {$group: {_id: '$_id',
                              books: {$addToSet: '$favorites.books'},
                              name: {$first: '$name'}}},
                    {$project: {'favorites.books': '$books', name: '$name'}}
                   ])

Note the need for the $project to rename the favorites field, since $group aggregate fields cannot be nested.

Solution 2

The easiest solution is to use setUnion (Mongo 2.6+):

db.users.aggregate([
    {'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])

Another (more lengthy) version that is based on the idea from @kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):

> db.users.aggregate([
    {'$unwind': {
        'path': '$favorites.books',
        // output the document even if its list of books is empty
        'preserveNullAndEmptyArrays': true
    }},
    {'$group': {
        '_id': '$_id',
        'books': {'$addToSet': '$favorites.books'},
        // arbitrary name that doesn't exist on any document
        '_other_fields': {'$first': '$$ROOT'},
    }},
    {
      // the field, in the resulting document, has the value from the last document merged for the field. (c) docs
      // so the new deduped array value will be used
      '$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
    },
    // this stage wouldn't be necessary if the field wasn't nested
    {'$addFields': {'favorites.books': '$books'}},
    {'$project': {'_other_fields': 0, 'books': 0}}
])

{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" : 
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }    

Solution 3

What you have to do is use map reduce to detect and count duplicate tags .. then use $set to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),

This has been discussed sevel times here .. please seee

Removing duplicate records using MapReduce

Fast way to find duplicates on indexed column in mongodb

http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce

http://www.mongodb.org/display/DOCS/MapReduce

How to remove duplicate record in MongoDB by MapReduce?

Solution 4

function unique(arr) {
    var hash = {}, result = [];
    for (var i = 0, l = arr.length; i < l; ++i) {
        if (!hash.hasOwnProperty(arr[i])) {
            hash[arr[i]] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

db.collection.find({}).forEach(function (doc) {
    db.collection.update({ _id: doc._id }, { $set: { "favorites.books": unique(doc.favorites.books) } });
})

Solution 5

Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.

For instance, in order to remove duplicates from an array:

// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory",
//     "Algorithms in C++"
//   ]},
//   "name" : "robert"
// }
db.collection.aggregate(
  { $set:
    { "favorites.books":
      { $function: {
          body: function(books) { return books.filter((v, i, a) => a.indexOf(v) === i) },
          args: ["$favorites.books"],
          lang: "js"
      }}
    }
  }
)
// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory"
//   ]},
//   "name" : "robert"
// }

This has the advantages of:

  • keeping the original order of the array (if that's not a requirement, then prefer @Dennis Golomazov's $setUnion answer)
  • being more efficient than a combination of expensive $unwind and $group stages.

$function takes 3 parameters:

  • body, which is the function to apply, whose parameter is the array to modify.
  • args, which contains the fields from the record that the body function takes as parameter. In our case "$favorites.books".
  • lang, which is the language in which the body function is written. Only js is currently available.