mongodb

mongodb-query

upsert

I have documents that looks something like that, with a unique index on bars.name:

{ name: 'foo', bars: [ { name: 'qux', somefield: 1 } ] }

. I want to either update the sub-document where { name: 'foo', 'bars.name': 'qux' } and $set: { 'bars.$.somefield': 2 }, or create a new sub-document with { name: 'qux', somefield: 2 } under { name: 'foo' }.

Is it possible to do this using a single query with upsert, or will I have to issue two separate ones?

Related: 'upsert' in an embedded document (suggests to change the schema to have the sub-document identifier as the key, but this is from two years ago and I'm wondering if there are better solutions now.)

Solution 1

No there isn't really a better solution to this, so perhaps with an explanation.

Suppose you have a document in place that has the structure as you show:

{ 
  "name": "foo", 
  "bars": [{ 
       "name": "qux", 
       "somefield": 1 
  }] 
}

If you do an update like this

db.foo.update(
    { "name": "foo", "bars.name": "qux" },
    { "$set": { "bars.$.somefield": 2 } },
    { "upsert": true }
)

Then all is fine because matching document was found. But if you change the value of "bars.name":

db.foo.update(
    { "name": "foo", "bars.name": "xyz" },
    { "$set": { "bars.$.somefield": 2 } },
    { "upsert": true }
)

Then you will get a failure. The only thing that has really changed here is that in MongoDB 2.6 and above the error is a little more succinct:

WriteResult({
    "nMatched" : 0,
    "nUpserted" : 0,
    "nModified" : 0,
    "writeError" : {
        "code" : 16836,
        "errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: bars.$.somefield"
    }
})

That is better in some ways, but you really do not want to "upsert" anyway. What you want to do is add the element to the array where the "name" does not currently exist.

So what you really want is the "result" from the update attempt without the "upsert" flag to see if any documents were affected:

db.foo.update(
    { "name": "foo", "bars.name": "xyz" },
    { "$set": { "bars.$.somefield": 2 } }
)

Yielding in response:

WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })

So when the modified documents are 0 then you know you want to issue the following update:

db.foo.update(
    { "name": "foo" },
    { "$push": { "bars": {
        "name": "xyz",
        "somefield": 2
    }}
)

There really is no other way to do exactly what you want. As the additions to the array are not strictly a "set" type of operation, you cannot use $addToSet combined with the "bulk update" functionality there, so that you can "cascade" your update requests.

In this case it seems like you need to check the result, or otherwise accept reading the whole document and checking whether to update or insert a new array element in code.

Solution 2

if you dont mind changing the schema a bit and having a structure like so:

{ "name": "foo", "bars": { "qux": { "somefield": 1 },
                           "xyz": { "somefield": 2 },
                  }
}

You can perform your operations in one go. Reiterating 'upsert' in an embedded document for completeness

Solution 3

I was digging for the same feature, and found that in version 4.2 or above, MongoDB provides a new feature called Update with aggregation pipeline.
This feature, if used with some other techniques, makes possible to achieve an upsert subdocument operation with a single query.

It's a very verbose query, but I believe if you know that you won't have too many records on the subCollection, it's viable. Here's an example on how to achieve this:

const documentQuery = { _id: '123' }
const subDocumentToUpsert = { name: 'xyz', id: '1' }

collection.update(documentQuery, [
    {
        $set: {
            sub_documents: {
                $cond: {
                    if: { $not: ['$sub_documents'] },
                    then: [subDocumentToUpsert],
                    else: {
                        $cond: {
                            if: { $in: [subDocumentToUpsert.id, '$sub_documents.id'] },
                            then: {
                                $map: {
                                    input: '$sub_documents',
                                    as: 'sub_document',
                                    in: {
                                        $cond: {
                                            if: { $eq: ['$$sub_document.id', subDocumentToUpsert.id] },
                                            then: subDocumentToUpsert,
                                            else: '$$sub_document',
                                        },
                                    },
                                },
                            },
                            else: { $concatArrays: ['$sub_documents', [subDocumentToUpsert]] },
                        },
                    },
                },
            },
        },
    },
])

Solution 4

There's a way to do it in two queries - but it will still work in a bulkWrite.

This is relevant because in my case not being able to batch it is the biggest hangup. With this solution, you don't need to collect the result of the first query, which allows you to do bulk operations if you need to.

Here are the two successive queries to run for your example:

// Update subdocument if existing
collection.updateMany({
    name: 'foo', 'bars.name': 'qux' 
}, {
    $set: { 
        'bars.$.somefield': 2 
    }
})
// Insert subdocument otherwise
collection.updateMany({
    name: 'foo', $not: {'bars.name': 'qux' }
}, {
    $push: { 
        bars: {
            somefield: 2, name: 'qux'
        }
    }
})

This also has the added benefit of not having corrupted data / race conditions if multiple applications are writing to the database concurrently. You won't risk ending up with two bars: {somefield: 2, name: 'qux'} subdocuments in your document if two applications run the same queries at the same time.