Let's say we are designing a new system and have decided to use MongoDB as the primary database. The data schema is very similar to a blog with [growing] comments.

In the book "MongoDB Developers", Tip #6: Do not embed fields that have unbound growth, it says it is inefficient to constantly append data to the end of an array (but it also hinted that comments are a "wierd edge case").

Let's say our new system is like those "comments" in a blog - dynamically growing all the time, but also sometimes changing or some being deleted.

So, having recognized that there could be a performance issue using MongoDB, what other alternative database (must be horizontally scalable database) could serve this purpose? (We don't mind using MongoDB as our primary database, but separate the "comments" to a alternative database. What are the options available?

Notes:

The Redis feature of having Hashes as its data types fit the description of our "comments" data structure - constantly growing but sometimes modified or deleted - BUT we do not need a pure in-memory database (we don't wish to dedicate so much RAM when the data can be persisted to the disk) - otherwise this would be a good fit for our problem

What about using CouchDB? We are not investigated about this product. How does it perform with a growing data structure?

Solution 1

To add to what Thilo said above, the reason to "not embed fields that have unbound growth" is because this type of document size expansion can cause MongoDB to have to move the document if it exceeds the current space allocated to it. You can read more about this in the Padding Factor section of the documentation.

Those types of moves are relatively expensive, especially if they happen frequently. Therefore limiting the size (essentially bounding that growth) of the comments equivalent in your main collection (most recent X etc.) and perhaps even pre-populating that document field (essentially manual padding) to reduce the moves caused by comment additions/changes may well be worth it for you.

Solution 2

You could stick with MongoDB, but not embed all the comments into the main document, but just the most recent ones (limited by number), and keep all the rest in a separate collection.

Solution 3

Mongo sounds like it would work fine for you guys, just keep the "comments" in a separate collection ad opposed to a sub element of another document, i.e. a page (continuing the blog example).

As for Mongo's performance, as long as those indexes can fit in ram you should be fine.

Solution 4

Your main problem is that you are then probably making updates and deletes to data in different memory pages which means you won't be able to update as sequentially. In this instance, many databases will have the same problem so switching from MongoDB won't solve anything.