My data looks like this:

session, age, firstName, lastName
1, 28, John, Doe
1, 21, Donna, Keren
2, 32, Jenna, Haze
2, 52, Tommy, Lee

I'd like to get all the rows which are the largest (by age) per session. So So for the above input my output would look like:

sessionid, age, firstName, lastName
1, 28, John, Doe
2, 52, Tommy, Lee

because John has the largest age in the session = 1 group and Tommy has the largest age on the session=2 group.

I need to export the result to a file (csv) and it may contain lots of records.

How can I achieve this?

Solution 1

MongoDB aggregation offers the $max operator, but in your case you want the "whole" record as it is. So the appropriate thing to do here is $sort and then use the $first operator within a $group statement:

    { "$sort": { "session": 1, "age": -1 } },
    { "$group": {
        "_id": "$session",
        "age": { "$first": "$age" },
        "firstName": { "$first" "$firstName" },
        "lastName": { "$first": "$lastName" }

So the "sorting" gets the order right, and the "grouping" picks the first occurrence within the "grouping" key where those fields exist.

Mostly $first here because the $sort is done in reverse order. You can also use $last when in an ascending order as well.

Solution 2

You could try the below aggregation query which uses max attribute:

   $group: {
       "_id": "$session",           
       "age": { $max: "$age" }
   { $out : "max_age" }

The results should be outputted to the new collection max_age and then you could dump it into a csv.

Note: it will give only the session and max age and will not return other fields.