I have a MongoDb collection contains of 284.116 tweets. The problem is the "author" field in some objects are in object type, but in other objects -this "author" field- are in array type. So the problem is I want to filter which ones are Array, and which ones are Object.

For example: The author field's type is object.

{
    "_id" : ObjectId("55edfbd11a87d41d987a6dc1"),
    "tweet" : "Back in my dorm, yay!",
    "uri" : "https://twitter.com/natalylug0/status/640994018529181696",
    "date" : "2015-09-08 00:04:17",
    "country" : "U.S.A.",
    "city" : "Texas",
    "state" : "Dallas",
    "author" : {
        "username" : "Nataly",
        "uri" : "https://twitter.com/natalylug0",
        "screenname" : "natalylug0"
    }
}

And the other one: The author field's type is array.

{
    "_id" : ObjectId("55ee3a00e11fbb1030d659fe"),
    "author" : [ 
        {
            "username" : "Relapsed Shini",
            "uri" : "https://twitter.com/iPictoraL",
            "screenname" : "iPictoraL"
        }
    ],
    "tweet" : "@zumbiezuza 😍😍😍💚 ily zoeeeeeeee",
    "uri" : "https://twitter.com/iPictoraL/status/641060812140900352",
    "date" : "2015-09-08 01:29:42",
    "country" : "U.S.A.",
    "city" : "Texas",
    "state" : "Dallas"
}

So I executed query like this:

db.getCollection('tweets').find({ author: { $type: 4} })

And what I get is

Fetched 0 record(s) 

But if execute $type:3 I get 284.116 values which is the same value of size of this collection.

So my question is, how can I filter objects which "author" fields contain arrays.

Solution 1

Actually there is a "gotcha" listed in the documentation for $type specifically about arrays:

When applied to arrays, $type matches any inner element that is of the specified type. Without projection this means that the entire array will match if any element has the right type. With projection, the results will include just those elements of the requested type.

So that means that rather than detect whether the "element itself" is in array, what is actually being tested is the "inner element" of the array to see what type it is.

Now the documentation itself suggests this JavaScript test with $where:

.find({ "$where": "return Array.isArray(this.author)" })

But I think that's pretty horrible as there is a better way.

The trick is in "dot notation", where you ask for the 0 index element of the array to $exists

.find({ "author.0": { "$exists": true } })

Which is just the basic case that if the "0th" element exists then the field is present and the data is therefore an array.

Once you understand that logical premise then it is pretty simple test. The only thing that cannot be matched by that is a "truly empty" array, in which case you can fall back to the JavaScript alternative if needed. But this can actually use an index, so it would be preferred to use the latter form.

Solution 2

Here's a better way to do what you originally asked; that is to actually check if a certain field holds an array type value:

.find({ "author": { "$gte": [] } })

MongoDB's $type functionality for arrays, though well documented, is IMO inconsistent with all the other $type checks, and obviously doesn't work for this use case, but since around 2.6, you can use the above query to check whether the value is an array (empty or not).

I say this is "better" than the currently selected answer, because executing code via $where is not recommended, unless standard query constructs truly cannot get the job done.

To elaborate, $where is not recommended due to performance via lack of ability to use indexes in executed code. More detail: https://docs.mongodb.com/manual/reference/operator/query/where/#considerations

Also, if you'd like to check for non-empty arrays specifically, use this:

.find({ "author": { "$gt": [] } })

Technically, this one is also better than the current answer's corresponding $exists solution, as the field may have a non-array object with a field named "0", and that would match as a "non-empty array", which is wrong in that case.

Solution 3

Since mongoDB version 3.2 we have $isArray for an aggregation pipeline, which allow to do something like:

db.tweets.aggregate([
   {$set: {isArray: {$cond: [{ $isArray: "$author" }, 1, 0]}}},
   {$match: {isArray: 1}}
])

Playground example A

Or even:

db.tweets.aggregate([
  {$match: {$expr: {$isArray: "$author"}}}
])

Playground example B