mongodb

mongodb-query

aggregation-framework

Is there a query for calculating how many distinct values a field contains in DB.

f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)

If someone adds more documents with a new country I would like the query to return 9.

Is there easier way then group and count?

Solution 1

MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.

There is a shell db.collection.distinct() helper as well:

> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]

> db.countries.distinct('country').length
4

As noted in the MongoDB documentation:

Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.

Solution 2

Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.

db.articles.aggregate([
    {
        $match: {
            keywords: { $not: {$size: 0} }
        }
    },
    { $unwind: "$keywords" },
    {
        $group: {
            _id: {$toLower: '$keywords'},
            count: { $sum: 1 }
        }
    },
    {
        $match: {
            count: { $gte: 2 }
        }
    },
    { $sort : { count : -1} },
    { $limit : 100 }
]);

that give result such as

{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }

Solution 3

With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.

For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:

db.users.aggregate([
    { "$group": {
        "_id": { "$toLower": "$role" },
        "count": { "$sum": 1 }
    } },
    { "$group": {
        "_id": null,
        "counts": {
            "$push": { "k": "$_id", "v": "$count" }
        }
    } },
    { "$replaceRoot": {
        "newRoot": { "$arrayToObject": "$counts" }
    } }    
])

Example Output

{
    "user" : 67,
    "superuser" : 5,
    "admin" : 4,
    "moderator" : 12
}

Solution 4

You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.

Sample

For each distinct value of field counts the occurrences in documents optionally filtered by query

> db.users.distinctAndCount('name', {name: /^a/i})

{
  "Abagail": 1,
  "Abbey": 3,
  "Abbie": 1,
  ...
}

The field parameter could be an array of fields

> db.users.distinctAndCount(['name','job'], {name: /^a/i})

{
  "Austin,Educator" : 1,
  "Aurelia,Educator" : 1,
  "Augustine,Carpenter" : 1,
  ...
}

Solution 5

To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :

db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})

So, find number distinct names from a collection where age > 25 will be like :

db.your_collection_name.distinct('names', {'age': {"$gt": 25}})

Hope it helps!

Solution 6

I use this query:

var collection = "countries"; var field = "country"; 
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})

Output:

countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16

This query first distinct all the values, and then count for each one of them the number of occurrences.

Solution 7

I wanted a more concise answer and I came up with the following using the documentation at aggregates and group

db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])

Solution 8

If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:

db.users.aggregate([
  { $group: { _id: '$country' } },
  { $count: 'countOfUniqueCountries' }
]);