mongodb

performance

debian

ubuntu

I am using mongo for storing log files. Both mongoDB and mysql are running on the same machine, virtualizing mongo env is not an option. I am afraid I will soon run into perf issues as the logs table grows very fast. Is there a way to limit resident memory for mongo so that it won't eat all available memory and excessively slow down the mysql server?

DB machine: Debian 'lenny' 5

Other solutions (please comment):

  • As we need all historical data, we can not use capped collections, but I am also considering using a cron script that dumps and deletes old data

  • Should I also consider using smaller keys, as suggested on other forums?

Solution 1

Hey Vlad, you have a couple of simple strategies here regarding logs.

The first thing to know is that Mongo can generally handle lots of successive inserts without a lot of RAM. The reason for this is simple, you only insert or update recent stuff. So the index size grows, but the data will be constantly paged out.

Put another way, you can break out the RAM usage into two major parts: index & data.

If you're running typical logging, the data portion is constantly being flushed away, so only the index really stays in RAM.

The second thing to know is that you can mitigate the index issue by putting logs into smaller buckets. Think of it this way. If you collect all of the logs into a date-stamped collection (call it logs20101206), then you can also control the size of the index in RAM.

As you roll over days, the old index will flush from RAM and it won't be accessed again, so it will simply go away.

but I am also considering using a cron script that dumps and deletes old data

This method of logging by days also helps delete old data. In three months when you're done with the data you simply do db.logs20101206.drop() and the collection instantly goes away. Note that you don't reclaim disk space (it's all pre-allocated), but new data will fill up the empty spot.

Should I also consider using smaller keys, as suggested on other forums?

Yes.

In fact, I have it built into my data objects. So I access data using logs.action or logs->action, but underneath, the data is actually saved to logs.a. It's really easy to spend more space on "fields" than on "values", so it's worth shrinking the "fields" and trying to abstract it away elsewhere.

Solution 2

For version 3.2+, which uses wiredTiger engine, the option --wiredTigerCacheSizeGB is relevant to the question. You can set it if you know what you are exactly doing. I don't know if it's best practice, just read from the document and raise it here.

Solution 3

For Windows it seems possible to control the amount of memory MongoDB uses, see this tutorial at Captain Codeman:

Limit MongoDB memory use on Windows without Virtualization