mongodb

indexing

mongodb-indexes

b-tree

I'm looking for a tool to get a decent estimate of how large a MongoDB index will be based on a few signals like:

  • How many documents in my collection
  • The size of the indexed field(s)
  • The size of the _id I'm using if not ObjectId
  • Geo/Non-geo

Has anyone stumbled across something like this? I can imagine it would be extremely useful given Mongo's performance degradation once it hits the memory wall and documents start getting paged out to disk. If I have a functioning database and want to add another index, the only way I'll know if it will be too big is to actually add it.

It wouldn't need to be accurate down to the bit, but with some assumptions about B-Trees and the index implementation I'm sure it could be reasonable enough to be helpful.

If this doesn't exist already I'd like to build and open source it, so if I've missed any required parameters for this calculation please include in your answer.

Solution 1

I just spoke with some of the 10gen engineers and there isn't a tool but you can do a back of the envelope calculation that is based on this formula:

2 * [ n * ( 18 bytes overhead + avg size of indexed field + 5 or so bytes of conversion fudge factor ) ]

Where n is the number of documents you have.

The overhead and conversion padding are mongo specific but the 2x comes from the b-tree data structure being roughly half full (but having allocated 100% of the space a full tree would require) in the worst case.

I'd explain more but I'm learning about it myself at the moment. This presentation will have more details: http://www.10gen.com/presentations/mongosp-2011/mongodb-internals

Solution 2

You can check the sizes of the indexes on a collection by using command:

db.collection.stats()

More details here: http://docs.mongodb.org/manual/reference/method/db.collection.stats/#db.collection.stats

Solution 3

Another way to calculate is to ingest ~1000 or so documents into every collection, in other words, build a small scale model of what you're going to end up within production, create indexes or what have you and calculate the final numbers based on db.collection.stats() average.

Edit (from a comment):

Tyler's answer describes the original MMAP storage engine circa MongoDB 2.0, but this formula definitely isn't applicable to modern versions of MongoDB. WiredTiger, the default storage engine in MongoDB 3.2+, uses index prefix compression so index sizes will vary based on the distribution of key values. There are also a variety of index types and options which might affect sizing. The best approach for a reasonable estimate would be using empirical estimation with representative test data for your projected growth.

Solution 4

Best option is to test in non-prod deployment!

Insert 1000 documents and check index sizes , insert 100000 documents and check index sizes and so one.

Easy way to check in a loop all collections total index sizes:

  var y=0;db.adminCommand("listDatabases").databases.forEach(function(d){mdb=db.getSiblingDB(d.name);mdb.getCollectionNames().forEach(function(c){s=mdb[c].stats(1024*1024).totalIndexSize;y=y+s;print("db.Collection:"+d.name+"."+c+" totalIndexSize: "+s+" MB"); })});print("============================");print("Instance totalIndexSize: "+y+" MB");