What's the most 'mongo' way of representing many-to-many relationships that have attributes?

So for example:

Intro


MYSQL tables

people => firstName, lastName, ...

Movies => name, length ..

peopleMovies => movieId, personId, language, role

Solution 1


Embed people into movies...?

In MongoDB I understand it's good to denormalize and embed but I don't want to embed people into movies, it just doesn't logically make any sense. Because people don't necessarily only have to belongs to movies.

Solution 2


People and Movies will be two separate collections. People => embed [{movieId: 12, personId: 1, language: "English", role: "Main"} ...]

Movies => embed [{movieId: 12, personId: 1, language: "English", role: "Main"} ...]

The issue with this solution is that when we want to update a person's role for a specific movie we'll need to run two update queries to ensure data is in sync in both collections.

Solution 3


We can also do something much more relational like and end up with three collections

People => firstName, lastName, ... Movies => name, length .. Castings => movieId, personId, language, role

The issue with this is that because of the lack of a join statement in MongoDB, it would take 3 queries to go from people -> movies and vice versa.

Here is my question, what are some other ways to model something like this in MongoDB and in a more NoSQL way. In terms of the solutions provided, which one would be the best in terms of performance and convention in mongo.

Solution 1

In many ways meteor's API encourages flat relational documents, however MongoDB is a non-relational data store. This conflict is, unfortunately, left as an exercise for the developer to solve.

The notion of schema structure and joins is an enormous topic to cover within a single answer, so I will attempt to be as succinct as possible.

Reasons why you should choose a relational model

Assume you have comment and post data. Consider what would happen if you embedded comments within your posts.

  • DDP operates on documents. All of the comments will be sent every time a new comment in the same post is added.

  • allow and deny rules operate on documents. It may be unreasonable to expect that the same rules apply simultaneously to both posts and comments.

  • Publications tend to make more sense in terms of collections. In the above scenario, we could not easily publish a list of comments independent of their posts.

  • Relational databases exist for good reasons. One of them is to avoid the multiple modification problem inherent in your second solution.

Reasons why you should choose an embedded model

  • Joins are not supported natively by MongoDB, and there isn't a core package to produce a reactive join.

Recommendations

Use your third solution. In my experience, the reasons for choosing a relational model far outweigh the restrictions imposed by the data store. Of course overcoming the lack of joins isn't easy, but the pain is likely to be isolated to only a handful of publish functions. Here are some resources I'd highly recommend:

If you need more information beyond this, please comment below and I will update my answer.

Solution 2

I think you should denormalize your collections. The important point when designing MongoDB collections and documents is to think about your views. What data do you need to display your view? The idea is that you should try to have that data be part of your document.

For example, in your case, you probably have a view for Movies where you want to display information about a movie. But that page about a movie probably needs just basic information about each person (first name, last name, photo URL). Not all other things. And vice-versa, the page about a person will probably list all movies, but also again only a subset of information about each movie is needed, like title, year, and poster photo URL.

So one option would be to have two collections, but then embed (denormalize) just those few fields you need between collections. So for example, Movies collection would have a field people which would be an array of subdocuments. And the People collection would have movies field which would be an array of subdocuments, with those extra fields you want to specify role and so on.

So documents might be something like the following. For movies:

{
  _id: "AAA",
  title: "...",
  year: 2015,
  length: 120,
  posterURL: "...",
  people: [
    {
      person: {
        _id: "BBB",
        firstName: "...",
        lastName: "...",
        photoURL: "..."
      },
      role: "..."
    }
  ]
}

For people:

{
  _id: "BBB",
  firstName: "...",
  lastName: "...",
  photoURL: "...",
  movies: [
    {
      _id: "AAA",
      title: "...",
      year: 2015,
      posterURL: "..."
    }
  ]
}

Of course, the question is how to keep those fields in sync. What if you update the poster photo URL of a movie, you want it to be updated in all Person documents as well. To solve this problem, we developed PeerDB, a package to define relations between collections which then makes sure they are kept in sync.

So in your case, I would have such collections defined in PeerDB, in CoffeeScript:

class People extends Document
  @Meta
    name: 'People'

class Movies extends Document
  @Meta
    name: 'Movies'
    fields: =>
      people: [
        person: @ReferenceField People, ['firstName', 'lastName', 'photoURL'], true, 'movies', ['title', 'year', 'posterURL']
      ]

In short, this definition says that people.person field should be a reference to People collection and kept in sync for firstName, lastName, photoURL. Additionally, a reverse reference field should be made in People documents under the field movies with title, year, posterURL.

Pretty simple. But there are some downsides. The arrays could get very big (maybe not in the case of movies and people, but for some other data) which could make documents too big for MongoDB per-document limits (at the moment 16 MB). Additionally, if you observe, you will see that for People documents there is no information about the role in the list of movies. This is because the role is not part of the referenced document, but it is something which is next to the reference. What if you would want to display role for movie a person was in on the person page/view?

So, maybe it would be better to have three collections, one for basic information about movies, another for people, and then a collection for relation between people and movies. So data could be maybe something like, for movies:

{
  _id: "AAA",
  title: "...",
  year: 2015,
  length: 120,
  posterURL: "..."
}

For people:

{
  _id: "BBB",
  firstName: "...",
  lastName: "...",
  photoURL: "..."
}

For casting:

{
  _id: "...",
  movie: {
    _id: "AAA",
    title: "...",
    year: 2015,
    posterURL: "..."
  },
  person: {
    _id: "BBB",
    firstName: "...",
    lastName: "...",
    photoURL: "..."
  },
  role: "..."
}

And PeerDB definitions:

class People extends Document
  @Meta
    name: 'People'

class Movies extends Document
  @Meta
    name: 'Movies'

class Casting extends Document
  @Meta
    name: 'Casting'
    fields: =>
      person: @ReferenceField People, ['firstName', 'lastName', 'photoURL']
      movie: @ReferenceField Movies, ['title', 'year', 'posterURL']

PeerDB would then make sure that things are kept in sync. It would also remove casting document if a movie or a person is deleted from the database.

This then allows you to make a Meteor publish which is efficient and does not require any dynamic building of related queries. You simply publish Casting collection and this is it. You can even query on some condition. For example, you want to display all directors sorted by firstName and lastName and their movies? Possible with only one query.