Skip to content

FastApi & MongoDB - the full guide #1515

Closed
@mclate

Description

Description

In this issue i'd like to gather all the information about the use of MongoDB, FastApi and Pydantic. At this point this is a "rather complete" solution, but i'd like to gather feedback and comments from the community to se how it can be improved.

The biggest pain point that started this and several other threads when trying to use FastAPI with mongo is the _id field. There are several issues here:

  1. Most known one - _id field being ObjectId, which is not very JSON-friendly
  2. _id field by it's naming is not very python-friendly (that is, written as is in Pydantic model, it would become a private field - many IDEs will point that)

Below i'll try to describe solutions i've found in different places and see what cases do the cover and what's left unsolved.

Let's say, we have some Joe, who's a regular developer. Joe just discovered FastAPI and is familiar with mongo (to the extend that he can create and fetch documents from DB). Joe wants to build clean and fast api that would:

1️⃣ Be able to define mongo-compatible documents as regular Pydantic models (with all the proper validations in place):

class User(BaseModel):
    id: ObjectId = Field(description="User id")
    name: str = Field()

2️⃣ Write routes that would use native Pydantic models as usual:

@app.post('/me', response_model=User)
def save_me(body: User):
   ...

3️⃣ Have api to return json like {"id": "5ed8b7eaccda20c1d4e95bb0", "name": "Joe"} (it's quite expected in the "outer world" to have id field for the document rather than _id. And it just looks nicer.)
4️⃣ Have Swagger and ReDoc documentation to display fields id (str), name (str)
5️⃣ Be able to save Pydantic documents into Mongo with proper id field substitution:

user = User(id=ObjectId(), name='Joe')
inserted = db.user.insert_one(user) # This should insert document as `{"_id": user.id, "name": "Joe"}`
assert inserted.inserted_id == user.id

6️⃣ Should be able to fetch documents from Mongo with proper id matching:

user_id = ObjectId()
found = db.user.find({"_id": user_id})
user = User(**found)
assert user.id == user_id

Known solutions

Validating ObjectId

As proposed in #452, one can define custom field for ObjectId and apply validations to it. One can also create base model that would encode ObjectId into strings:

class OID(str):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            return ObjectId(str(v))
        except InvalidId:
            raise ValueError("Not a valid ObjectId")


class MongoModel(BaseModel):
    class Config(BaseConfig):
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

class User(MongoModel):
    id: OID = Field()
    name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    return body

Now we have:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
☑️ ☑️

Dealing with _id

Another suggested option would be to use alias="_id" on Pydantic model:

class MongoModel(BaseModel):
    class Config(BaseConfig):
        allow_population_by_field_name = True  # << Added
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

class User(MongoModel):
    id: OID = Field(alias="_id")  # << Notice alias
    name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.dict(by_alias=True))  # << Inserting as dict with aliased fields
    assert res.inserted_id == body.id
    return body

Now are able to save to DB using User.id field as _id - that solves 5️⃣.

However, how Swagger and ReDoc show id field as _id, and json that is returned looks like this: {"_id":"5ed803afba6455fd78659988","name":"Joe"}. This is a regression for 3️⃣ and 4️⃣
Now we have:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
☑️ ☑️ ✅️ ☑️

Hacking our way through

We can do some extra coding to keep id field and make proper inserting into DB. Effectively, we're shuffling id and _id field in MongoModel upon dumping/loading.

class MongoModel(BaseModel):

    class Config(BaseConfig):
        allow_population_by_field_name = True
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

    @classmethod
    def from_mongo(cls, data: dict):
        """We must convert _id into "id". """
        if not data:
            return data
        id = data.pop('_id', None)
        return cls(**dict(data, id=id))

    def mongo(self, **kwargs):
        exclude_unset = kwargs.pop('exclude_unset', True)
        by_alias = kwargs.pop('by_alias', True)

        parsed = self.dict(
            exclude_unset=exclude_unset,
            by_alias=by_alias,
            **kwargs,
        )

        # Mongo uses `_id` as default key. We should stick to that as well.
        if '_id' not in parsed and 'id' in parsed:
            parsed['_id'] = parsed.pop('id')

        return parsed

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())  # << Notice that we should use `User.mongo()` now.
    assert res.inserted_id == body.id
    return body

This brings back documentation and proper output and solves the insertion:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
✅️ ✅️ ✅️ ☑️

Looks like we're getting closer...

Fetching docs from DB

Now, let's try to fetch doc from DB and return it:

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())  # << Notice that we should use `User.mongo()` now.
    assert res.inserted_id == body.id

    found = col.find_one({'_id': res.inserted_id})
    return found

    """
    pydantic.error_wrappers.ValidationError: 1 validation error for User
    response -> id
      field required (type=value_error.missing)
    """

The workaround for this is to use User.from_mongo:

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())
    assert res.inserted_id == body.id

    found = col.find_one({'_id': res.inserted_id})
    return User.from_mongo(found)  # << Notice that we should use `User.from_mongo()` now.

This seem to cover fetching from DB. Now we have:

1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣
✅️ ✅️ ✅️ ✅️

Conclusion and questions

Under the spoiler one can find final code to make FastApi work with mongo in the most "native" way:

Full code
class OID(str):
  @classmethod
  def __get_validators__(cls):
      yield cls.validate

  @classmethod
  def validate(cls, v):
      try:
          return ObjectId(str(v))
      except InvalidId:
          raise ValueError("Not a valid ObjectId")


class MongoModel(BaseModel):

  class Config(BaseConfig):
      allow_population_by_field_name = True
      json_encoders = {
          datetime: lambda dt: dt.isoformat(),
          ObjectId: lambda oid: str(oid),
      }

  @classmethod
  def from_mongo(cls, data: dict):
      """We must convert _id into "id". """
      if not data:
          return data
      id = data.pop('_id', None)
      return cls(**dict(data, id=id))

  def mongo(self, **kwargs):
      exclude_unset = kwargs.pop('exclude_unset', True)
      by_alias = kwargs.pop('by_alias', True)

      parsed = self.dict(
          exclude_unset=exclude_unset,
          by_alias=by_alias,
          **kwargs,
      )

      # Mongo uses `_id` as default key. We should stick to that as well.
      if '_id' not in parsed and 'id' in parsed:
          parsed['_id'] = parsed.pop('id')

      return parsed


class User(MongoModel):
  id: OID = Field()
  name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
  assert isinstance(body.id, ObjectId)
  res = db.insert_one(body.mongo())
  assert res.inserted_id == body.id

  found = col.find_one({'_id': res.inserted_id})
  return User.from_mongo(found)

And the list of things that are sub-optimal with given code:

  1. One can no longer return any data and expect FastApi to apply response_model validation. Have to use User.from_mongo with every return. This is somewhat a code duplication. Would be nice to get rid of this somehow
  2. The amount of "boilerplate" code needed to make FastAPI work "natively" with mongo is quite significant and it's not that straightforward. This can lead to potential errors and raises entry bar for someone who wants to start using FastAPI with mongo
  3. There is still this duality, where in models one uses id field, while all mongo queries are built using _id. Afraid there is no way to get rid of this though... (I'm aware that MongoEngine and other ODM engines cover this, but specifically decided to stay out of this subject and focus on "native" code)

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions