I'm about to adopt MongoDB for a new project and I've chosen it for flexibility, not scalability so will be running it on one machine. From the documentation and web posts I keep reading that all indexes are in RAM. This just isn't making sense to me as my indexes will easily be larger than the amount of available RAM.
Can anyone share some insight on the index/RAM relationship and what happens when both an individual index and all of my indexes exceed the size of available RAM?
MongoDB keeps what it can of the indexes in RAM. They'll be swaped out on an LRU basis. You'll often see documentation that suggests you should keep your "working set" in memory: if the portions of index you're actually accessing fit in memory, you'll be fine.
It is the working set size plus MongoDB's indexes which should ideally reside in RAM at all times i.e. the amount of available RAM should ideally be at least the working set size plus the size of indexes plus what the rest of the OS (Operating System) and other software running on the same machine needs. If the available RAM is less than that, LRUing is what happens and we might therefore get significant slowdown. One thing to keep in mind is that in an index btree buckets are cached, not individual index keys i.e. if we had a uniform distribution of keys in an index including for historical data, we might need more of the index in RAM compared to when we have a compound index on time plus something else. With the latter, keys in the same btree bucket are usually from the same time era, so this caveat does not happen. Also, we should keep in mind that our field names in BSON are stored in the records (but not the index) so if we are under memory pressure they should be kept short.
Those who are interested in MongoDB's current virtual memory usage (which of course also is about RAM), can have a look at the status of mongod.