mongodb

mongodb-query

aggregation-framework

I've got a list of articles, and each of them has an array property which lists various individuals mentioned in them:

_id: {
    $oid: "52b632a9e4f2ba13c82ccd23"
},
providerName: "The Guardian",
url: "http://feeds.theguardian.com/c/34708/f/663860/s/3516cebc/sc/38/l/0L0Stheguardian0N0Cmusic0C20A130Cdec0C220Cwaterboys0Efishermans0Eblues0Etour0Ehammersmith/story01.htm",
subject: "The Waterboys – review",
class_artist: [
    "paul mccartney"
]

I've been trying (unsuccessfully) to get a list of all the individual artists (class_artist), based on the number of articles they've been tagged in within the past 7 days.

I've gotten as far as:

var date = new Date();
date.setDate(date.getDate() - 7);

db.articles.group({
    key: { class_artist: 1 },
    cond: { class_date: { $gt: date } },
    reduce: function ( curr, result ) { result.cnt++; },
    initial: { cnt : 0 }
}).sort({cnt: -1});

But unfortunately, it doesn't count them based on the individual array values, but by array compositions (that is, lists of artists).

I tried using the $unwind function, but have not been able to make it work.

Solution 1

What framework are you using? This is not MongoDB shell and looks like some weird wrapper around MapReduce. In that case $unwind would not be available, and you need it for user in the aggregation framework. Here's what you want in the mongo shell:

db.articles.aggregate([
  {$match: { class_date: { $gte: date } } },
  {$project: { _id: 0, class_artist: 1 } },
  {$unwind: "$class_artist" },
  {$group: { _id: "$class_artist", tags: { $sum: 1 } }},
  {$project: { _id: 0,class_artist: "$_id", tags: 1 } },
  {$sort: { tags: -1 } }
])

So efficiently:

  1. Filter by date because you already set a var for the last 7 days
  2. Project only the field(s) we need { We need only one! }
  3. Unwind the array so we now have a record for every array element in every document
  4. Group on the Artist from the expanded documents
  5. Project into a document format you can use as group messed around with _id
  6. Sort the results in reverse order to see the top tagged first

And the great thing about aggregation is you can gradually build up those stages to see what is going on.

Shake and bake into your own driver implmentation or ODM framework as required.