mongodb

mongodb-query

aggregation-framework

I have a collection called article_category which store all article_id belongs to the category with category_id with data format like so.

Collection 1: article_category

{
  "article_id": 2015110920343902,
  "all_category_id": [5,8,10]
}

Then I have other collection called article which store all my post

Collection 2: article

{
  "title": "This is example rows in article collection"
  "article_id": 2015110920343902,
},
{
  "title": "Something change"
  "article_id": 2015110920343903,
},
{
  "title": "This is another rows",
  "article_id": 2015110920343904,
}

Now I want to perform MongoDB query to find title with regex while category_id must equal to 8. Here is my query but is not work.

db.article.aggregate(
{
  $match: 
  {
    title: 
    {
       $regex: /example/
    }
  }
},
{
    $lookup:
       {
         from: "article_category",
         pipeline: [
            { $match: { category_id: 8 } }
         ],
         as: "article_category"
       }
  }
)

Above query only show the records which match by regex but not match by category_id.

Any idea?

Solution 1

First of all, it is all_category_id, not category_id. Secondly, you don't link articles - all documents will have exactly the same article_category array. Lastly, you probably want to filter out articles that don't have matched category. The conditional pipeline should look more like this:

db.article.aggregate([
  { $match: {
      title: { $regex: /example/ }
  } },
  { $lookup: {
    from: "article_category",
    let: {
      article_id: "$article_id"
    },
    pipeline: [
      { $match: {
          $expr: { $and: [
              { $in: [ 8, "$all_category_id" ] },
              { $eq: [ "$article_id", "$$article_id" ] }
          ] }
      } }
    ],
    as: "article_category"
  } },
  { $match: {
    $expr: { $gt: [
      { $size: "$article_category"},
      0
    ] }
  } }
] )

UPDATE:

If you don't match article_id, the $lookup will result with identical article_category array to all articles.

Let's say your article_category collection has another document:

{
  "article_id": 0,
  "all_category_id": [5,8,10]
}

With { $eq: [ "$article_id", "$$article_id" ] } in the pipeline the resulting article_category is

[ 
  { 
    "article_id" : 2015110920343902, 
    "all_category_id" : [ 5, 8, 10 ] 
  } 
]

without:

[ 
  { 
    "article_id" : 2015110920343902, 
    "all_category_id" : [ 5, 8, 10 ] 
  },
  {
    "article_id": 0,
    "all_category_id": [ 5, 8, 10 ]
  }
]

If the later is what you need, it would be way simpler to make to find requests:

db.article.find({ title: { $regex: /example/ } })

and

db.article_category.find({ all_category_id: 8 })

Solution 2

You've couple of things incorrect here. category_id should be all_category_id. Use the join condition in $lookup and move the $match outside of $lookup stage with $unwind for optimized lookup.

Use $project with exclusion to drop the looked up field from final response. Something like {$project:{article_category:0}}

Try

db.article.aggregate([
  {"$match":{"title":{"$regex":/example/}}},
  {"$lookup":{
    "from":"article_category",
    "localField":"article_id",
    "foreignField":"article_id",
    "as":"article_category"
  }},
  {"$unwind":"$article_category"},
  {"$match":{"article_category.all_category_id":8}}
])

For uncorrelated subquery try

db.article.aggregate([
  {"$match":{"title":{"$regex":/example/}}},
  {"$lookup":{
    "from":"article_category",
    "pipeline":[{"$match":{"all_category_id":8}}],
    "as":"categories"
  }},
  {"$match":{"categories":{"$ne":[]}}}
])