I am creating index in mongodb having 10 million records but following error

db.logcollection.ensureIndex({"Module":1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 3,
        "ok" : 0,
        "errmsg" : "Btree::insert: key too large to index, failing play.logcollection.$Module_1 1100 { : \"RezGainUISystem.Net.WebException: The request was aborted: The request was canceled.\r\n   at System.Net.ConnectStream.InternalWrite(Boolean async, Byte...\" }",
        "code" : 17282
}

Please help me how to createindex in mongodb,

Solution 1

MongoDB will not create an index on a collection if the index entry for an existing document exceeds the index key limit (1024 bytes). You can however create a hashed index or text index instead:

db.logcollection.createIndex({"Module":"hashed"})

or

db.logcollection.createIndex({"Module":"text"})

Solution 2

You can silent this behaviour by launching mongod instance with the following command:

mongod --setParameter failIndexKeyTooLong=false

or by executing the following command from mongoShell

db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

If you ensured that your field will exceed the limit very rarely, then one way to solve this issue is by splitting your field (that causes index out of limit) into parts by byte length < 1KB e.g. for field val I would split it into tuple of fields val_1, val_2 and so on. Mongo stores text as utf-8 valid values. It means that you need a function that can split utf-8 strings properly.

   def split_utf8(s, n):
    """
    (ord(s[k]) & 0xc0) == 0x80 - checks whether it is continuation byte (actual part of the string) or jsut header indicates how many bytes there are in multi-byte sequence

    An interesting aside by the way. You can classify bytes in a UTF-8 stream as follows:

    With the high bit set to 0, it's a single byte value.
    With the two high bits set to 10, it's a continuation byte.
    Otherwise, it's the first byte of a multi-byte sequence and the number of leading 1 bits indicates how many bytes there are in total for this sequence (110... means two bytes, 1110... means three bytes, etc).
    """
    s = s.encode('utf-8')
    while len(s) > n:
        k = n
        while (ord(s[k]) & 0xc0) == 0x80:
            k -= 1
        yield s[:k]
        s = s[k:]
    yield s

Then you can define your compound index:

db.coll.ensureIndex({val_1: 1, val_2: 1, ...}, {background: true})

or multiple indexes per each val_i:

db.coll.ensureIndex({val_1: 1}, {background: true})
db.coll.ensureIndex({val_1: 2}, {background: true})
...
db.coll.ensureIndex({val_1: i}, {background: true})

Important: If you consider using your field in compound index, then be careful with the second argument for split_utf8 function. At each document you need to remove sum of bytes of each field value that comprise your index key e.g. for index (a:1, b:1, val: 1) 1024 - sizeof(value(a)) - sizeof(value(b))

In any other cases use either hash or text indexes.

Solution 3

As different people has pointed out in the answers, the error key too large to index means that you are attempting to create an index on field or fields that exceeds 1024 bytes in length.

In ASCII terms, 1024 bytes typically translates to around 1024 characters in length.

There is no solution for this, as this is an intrinsic limit set by MongoDB as mentioned in MongoDB Limits and Thresholds page:

The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes.

Turning on the failIndexKeyTooLong error is not a solution, as mentioned in the server parameters manual page:

...these operations would successfully insert or modify a document but the index or indexes would not include references to the document.

What that sentence means is that the offending document will not be included in the index, and may be missing from query results.

For example:

> db.test.insert({_id: 0, a: "abc"})

> db.test.insert({_id: 1, a: "def"})

> db.test.insert({_id: 2, a: <string more than 1024 characters long>})

> db.adminCommand( { setParameter: 1, failIndexKeyTooLong: false } )

> db.test.find()
{"_id": 0, "a": "abc"}
{"_id": 1, "a": "def"}
{"_id": 2, "a": <string more than 1024 characters long>}
Fetched 3 record(s) in 2ms

> db.test.find({a: {$ne: "abc"}})
{"_id": 1, "a": "def"}
Fetched 1 record(s) in 1ms

By forcing MongoDB to ignore the failIndexKeyTooLong error, the last query does not contain the offending document (i.e. the document with _id: 2 is missing from the result), thus the query resulted in the wrong result set.

Solution 4

When running into the "index key limit", the solution depends on the needs of your schema. In extremely rare cases, key matching on an value of > 1024 bytes is a design requirement. In fact, nearly all Databases impose an index key limit restriction, yet typically somewhat configurable in legacy relational DBs (Oracle/MySQL/PostgreSQL), so that you can easily shoot yourself in the foot.

For quick search, a "text" index is designed to optimize searching and pattern matching on long text fields, and is well suited to the use case. However, more commonly, a uniqueness constraint on long text values is a requirement. And "text" indexes do not behave as does a unique scalar value with the unique flag set { unique: true } (more like an array of all the text strings in the field).

Taking inspiration from MongoDb's GridFS, uniqueness checks can easily be implemented by adding a "md5" field to the document and creating a unique scalar index on that. Sort of like a custom unique hashed index. This allows a virtually unlimited (~ 16mb) text field length, that is indexed for search and unique across the collection.

const md5 = require('md5');
const mongoose = require('mongoose');

let Schema = new mongoose.Schema({
  text: {
    type: String,
    required: true,
    trim: true,
    set: function(v) {
        this.md5 = md5(v);
        return v;
    }
  },
  md5: {
    type: String,
    required: true,
    trim: true
  }
});

Schema.index({ md5: 1 }, { unique: true });
Schema.index({ text: "text" }, { background: true });

Solution 5

In my case I was trying to index on a large subdocument array, and when I went and looked at my query the query was actually for a subproperty of a subproperty, so I changed the index to focus on said subsubproperty and it worked okay.

In my case, goals was the large subdocument array, the failing "key too large" index looked like {"goals": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1} and the query looked like this:

emailsDisabled: {$ne: true},
priorityEmailsDisabled: {$ne: true},
goals: {
  $elemMatch: {
    "topPriority.ymd": ymd,
  }
}

and once I changed the index to be {"goals.topPriority.ymd": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1} it worked fine.

To be clear, all that I'm certain has worked here is that it allowed me to create the index. The question of whether that index works for that query is a separate one that I have not yet answered.