Is it slow/poor form to use the $in operator in MongoDB with a large array of possibilities?

posts.find({
    author : {
        $in : ['friend1','friend2','friend3'....'friend40'] 
    }
})

App Engine, for example, won't let you use more than 30 because they translate directly to one query per item in the IN array, and so instead force you into using their method for handling fan out. While that's probably the most efficient method in Mongo too, the code for it is significantly more complex so I'd prefer to just use this generic method.

Will Mongo execute these $in queries efficiently for reasonable-sized datasets?

Solution 1

It can be fairly efficient with small lists (hard to say what small is, but at least into the tens/hundreds) for $in. It does not work like app-engine since mongodb has actual btree indexes and isn't a column store like bigtable.

With $in it will skip around in the index to find the matching documents, or walk through the whole collection if there isn't an index to use.

Solution 2

Assuming you have created index on the author field, from algorithmic point of view, the time complexity of $in operation is: $(N*log(M)), where N is the length of input array and M is the size of the collection.

The time complexity of $in operation will not change unless you change a database (Though I don't think any db can break O(N*log(M))).

However, from engineering point of view, if N goes to a big number, it is better to let your business logic server to simulate the $in operation, either by batch or one-by-one.

This is simply because: memory in database servers is way more valuable than the memory in business logic servers.

Solution 3

If you build an index (ensureIndex) on the list element, it should be pretty quick.

Have you tried using explain()? Its a good, built-in way to profile your queries: http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-Use%7B%7Bexplain%7D%7D.