I have a collection of documents like this one:

{
    "_id" : ObjectId("..."),
    "field1": "some string",
    "field2": "another string",
    "field3": 123
}

I'd like to be able to iterate over the entire collection, and find the entire number of fields there are. In this example document there are 3 (I don't want to include _id), but it ranges from 2 to 50 fields in a document. Ultimately, I'm just looking for the average number of fields per document.

Any ideas?

Solution 1

Iterate over the entire collection, and find the entire number of fields there are

Now you can utilise aggregation operator $objectToArray (SERVER-23310) to turn keys into values and count them. This operator is available in MongoDB v3.4.4+

For example:

db.collection.aggregate([
         {"$project":{"numFields":{"$size":{"$objectToArray":"$$ROOT"}}}}, 
         {"$group":{"_id":null, "fields":{"$sum":"$numFields"}, "docs":{"$sum":1}}}, 
         {"$project":{"total":{"$subtract":["$fields", "$docs"]}, _id:0}}
])

First stage $project is to turn all keys into array to count fields. Second stage $group is to sum the number of keys/fields in the collection, also the number of documents processed. Third stage $project is subtracting the total number of fields with the total number of documents (As you don't want to count for _id ).

You can easily add $avg to count for average on the last stage.

Solution 2

PRIMARY> var count = 0;
PRIMARY> db.my_table.find().forEach( function(d) { for(f in d) { count++; } });
PRIMARY> count
1074942

This is the most simple way I could figure out how to do this. On really large datasets, it probably makes sense to go the Map-Reduce path. But, while your set is small enough, this'll do.

This is O(n^2), but I'm not sure there is a better way.

Solution 3

You could create a Map-Reduce job. In the Map step iterate over the properties of each document as a javascript object, output the count and reduce to get the total.

Solution 4

For a simple way just find() all value and for each set of record get size of array.

db.getCollection().find(<condition>)

then for each set of result, get the size of array.

sizeOf(Array[i])