What is MongoDB Schema Validation?

If you are interested to learn about the deployment in MongoDB

Schema Validation

MongoDB has a flexible schema by default. This indicates that there was no first severe schema validation setup. If you want to be sure that all of the documents in a collection have a common structure, you may implement schema validation rules.

MongoDB supports JSON Schema validation. The $jsonSchema operator allows us to define our document structure.

Example

db.createCollection("posts", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: [ "title", "body" ],
      properties: {
        title: {
          bsonType: "string",
          description: "Title of post - Required."
        },
        body: {
          bsonType: "string",
          description: "Body of post - Required."
        },
        category: {
          bsonType: "string",
          description: "Category of post - Optional."
        },
        likes: {
          bsonType: "int",
          description: "Post like count. Must be an integer - Optional."
        },
        tags: {
          bsonType: ["string"],
          description: "Must be an array of strings - Optional."
        },
        date: {
          bsonType: "date",
          description: "Must be a date - Optional."
        }
      }
    }
  }
})

A crucial idea when creating any backend application is schema validation. Customers occasionally offer input that does not meet our expectations. We must implement the validation of any data being added into the database in such instances.
MongoDB makes it simple to validate data whenever you add a new entry to your database documents, like:

determining whether an email is unique

determining whether a value is inside a certain range

determining whether or not the phone number has 10 digits.

In the event that a field doesn’t meet the necessary requirements, MongoDB also offers the ability to implement custom error messages.

In this article we will go through the steps of designing a basic NodeJs application, setting up MongoDB Atlas (cloud version of MongoDB) within the application, and then we will design one simple user model and implement simple validation methods on that model.

PREREQUISITES:

This tutorial is designed with the understanding that you are familiar with setting up a cluster on MongoDB Atlas and obtaining a connection URI from it.

STEP 1: SETTING UP NODE-SERVER

Set up our project now. See How to Install NodeJs if node is not already installed on your system. After installation, make a folder for the project.
Open a terminal there (Git bash is suggested for Windows users) and type the following command:

npm init -y

This will initialize the NPM in your current folder, and you will see a package.json file which will look like the following. The name will depend on your project folder’s name.

<code>{
  "name": "schema-validation",
  "version": "1.0.0",
  "description": "",
  "main": "app.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" &amp;&amp; exit 1"
  }
}
</code>

Now we need to install some node packages. For this run the following command.

npm i express dotenv mongoose --save

Now we are ready to start our project.
Create a file called app.js in your project’s root folder and write the following code in that file.

<code>const express = require('express'),
  app = express();

//Following lines make sure that our app can parse the json data in our api calls

app.use(express.json());
app.use(express.urlencoded({
  extended: false
}));

const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
  console.log("Server running on port 8080");
});
</code>

In the above code, we are importing express module of node to create a server and we are setting up the server at port 8080. Here we have passed process.env.PORT to make sure if we host this application it will run on the port provided by the hosting environment.

You can run this application by typing node app.js in the terminal, but to make ours work, we will install NPM package nodemon to make development easier, as it will rerun the application automatically when we make any change. To install nodemon run:


sudo npm i nodemon--save - dev--global

Now run nodemon app.js in terminal. If you see an aq response like below then you are good to go.

STEP 2: SETTING UP MONGODB ATLAS

Visit MongoDB Atlas. You will notice a dropdown with the name Project0 in the top left corner. Click on that dropdown and click on create new project.

image

The given window will appear after that.

image

Enter the name of your project, click on next, and then on create project.

Now a window will appear as shown below. Click build database and select the free tier on the next page.
image

After that you will get a window like this. Click on create cluster (this will take some time).

image

Now your cluster is created and it’s time to acquire URI.
Click on connect in the given window.

image

Now in the given window enter allow from anywhere in the IP column and enter username and password (save this password somewhere as you will need it to connect to the
database).

image

Create user and then click on choose connection method.

image

Click on connect your application and copy the URI that you get in your window.

image

Now use the acquired Mongo URI as shown below in your app.js file.

const express = require("express"),
  app = express(),
  mongoose = require("mongoose");

require("dotenv")
  .config();

const mongouri = “mongodb + srv: //User1:"+ process.env.MONGO_PASS + "@cluster0.wakey.mongodb.net/myFirstDatabase?retryWrites=true&w=majority";

  try {
    mongoose.connect(mongouri, {
      useUnifiedTopology: true,
      useNewUrlParser: true
    });
  } catch (error) {
    handleError(error);
  }
process.on('unhandledRejection', error => {
  console.log('unhandledRejection', error.message);
});

//Following lines make sure that our app can parse the json data in our api calls

app.use(express.json());
app.use(express.urlencoded({
  extended: false
}));

//setting port for api
const PORT = process.env.PORT || 8080;
app.listen(PORT, () => {
  console.log("Server running on port 8080");
});

Now we are going to create a user model without validation.
Open user.js file of the models folder and write the given code. I have explained it below.

var mongoose = require('mongoose'),
  Schema = mongoose.Schema;

/**
 * User Schema
 */
var userSchema = new Schema({
  fullName: {
    type: String
  },
  email: {
    type: String
  },
  phone: {
    type: Number
  },
  created: {
    type: Date,
    default: Date.now
  }
});

module.exports = mongoose.model('User', userSchema);

Here we have defined user schema using mongoose (node module to manage MongoDB database) and we are exporting it to be used in other modules of nodejs application using module.exports.

Now we will create a single API post route to accept data to be inserted into the database.
Open user.js in routes files and write down the given code.

var express = require("express"),
  router = express.Router(),
  User = require("../models/user");

router.post("/add", function (req, res) {
  const user = new User({
    fullName: req.body.fullName,
    email: req.body.email,
    phone: req.body.phone,
  });
  user.save((err, user) => {
    if (err) {
      res.status(500)
        .send({
          message: err
        });
      return;
    } else res.status(200)
      .send({
        message: "User Inserted to database!!"
      })
  });
});

module.exports = router;

Step 3 — Validating Number Fields

Recall from Step 1 when you inserted the following document into the peaks collection:

Mountain with a string value for its height

{
    "name": "Manaslu",
    "height": "8163m",
    "location": "Nepal"
}

Even though this document’s height value is a string instead of a number, the insertMany() method you used to insert this document was successful. This was possible because you haven’t yet added any validation rules for the height field.

MongoDB will accept any value for this field — even values that don’t make any sense for this field, like negative values — as long as the inserted document is written in valid JSON syntax. To work around this, you can extend the schema validation document from the previous step to include additional rules regarding the height field.

Start by ensuring that the height field is always present in newly-inserted documents and that it’s always expressed as a number. Modify the schema validation with the following command:

db.runCommand({
    "collMod": "peaks",
    "validator": {
        $jsonSchema: {
            "bsonType": "object",
            "description": "Document describing a mountain peak",
            "required": ["name", "height"],
            "properties": {
                "name": {
                    "bsonType": "string",
                    "description": "Name must be a string and is required"
                },
                "height": {
                    "bsonType": "number",
                    "description": "Height must be a number and is required"
                }
            },
        }
    }
})

In this command’s schema document, the height field is included in the required array. Likewise, there’s a height document within the properties object that will require any new height values to be a number. Again, the description field is auxiliary, and any description you include should only be to help other users understand the intention behind the JSON Schema.

MongoDB will respond with a short success message to let you know that the collection was successfully modified:

Output{ "ok" : 1 }

Now you can test the new rule. Try inserting a document with the minimal document structure required to pass the validation document. The following method will insert a document containing the only two mandatory fields, name and height:

db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": 8300
    }
)

The insertion will succeed:

Output{
  acknowledged: true,
  insertedId: ObjectId("61e0c8c376b24e08f998e371")
}

Next, try inserting a document with a missing height field:

db.peaks.insertOne(
    {
        "name": "Test peak"
    }
)

Then try another that includes the height field, but this field contains a string value:

db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": "8300m"
    }
)

Both times, the operations will trigger an error message and fail:

OutputWriteError({
        "index" : 0,
        "code" : 121,
        "errmsg" : "Document failed validation",
        . . .
})

However, if you try inserting a mountain peak with a negative height, the mountain will save properly:

db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": -100
    }
)

To prevent this, you could add a few more properties to the schema validation document. Replace the current schema validation settings by running the following operation:

db.runCommand({
    "collMod": "peaks",
    "validator": {
        $jsonSchema: {
            "bsonType": "object",
            "description": "Document describing a mountain peak",
            "required": ["name", "height"],
            "properties": {
                "name": {
                    "bsonType": "string",
                    "description": "Name must be a string and is required"
                },
                "height": {
                    "bsonType": "number",
                    "description": "Height must be a number between 100 and 10000 and is required",
                    "minimum": 100,
                    "maximum": 10000
                }
            },
        }
    }
})

The new minimum and maximum attributes set constraints on values included in height fields, ensuring they can’t be lower than 100 or higher than 10000. This range makes sense in this case, as this collection is used to store information about mountain peak heights, but you could choose any values you like for these attributes.

Now, if you try inserting a peak with a negative height value again, the operation will fail:

db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": -100
    }
)
OutputWriteError({
	"index" : 0,
	"code" : 121,
	"errmsg" : "Document failed validation",
. . .

As this output shows, your document schema now validates string values held in each document’s name field as well as numeric values held in the height fields. Continue reading to learn how to validate array values stored in each document’s location field.

Step 4 — Validating Array Fields

Now that each peak’s name and height values are being verified by schema validation constraints, we can turn our attention to the location field to guarantee its data consistency.

Specifying the location for mountains is more tricky than one might expect, since peaks span more than one country, and this is the case for many of the famous eight-thousanders. Because of this, it would make sense store each peak’s location data as an array containing one or more country names instead of being just a string value. As with the height values, making sure each location field’s data type is consistent across every document can help with summarizing data when using aggregation pipelines.

First, consider some examples of location values that users might enter, and weigh which ones would be valid or invalid:

  • ["Nepal", "China"]: this is a two-element array, and would be a valid value for a mountain spanning two countries.
  • ["Nepal"]: this example is a single-element array, it would also be a valid value for a mountain located in a single country.
  • "Nepal": this example is a plain string. It would be invalid because although it lists a single country, the location field should always contain an array
  • []: an empty array, this example would not be a valid value. After all, every mountain must exist in at least one country.
  • ["Nepal", "Nepal"]: this two-element array would also be invalid, as it contains the same value appearing twice.
  • ["Nepal", 15]: lastly, this two-element array would be invalid, as one of its values is a number instead of a string and this is not a correct location name.

To ensure that MongoDB will correctly interpret each of these examples as valid or invalid, run the following operation to create some new validation rules for the peaks collection:

db.runCommand({
    "collMod": "peaks",
    "validator": {
        $jsonSchema: {
            "bsonType": "object",
            "description": "Document describing a mountain peak",
            "required": ["name", "height", "location"],
            "properties": {
                "name": {
                    "bsonType": "string",
                    "description": "Name must be a string and is required"
                },
                "height": {
                    "bsonType": "number",
                    "description": "Height must be a number between 100 and 10000 and is required",
                    "minimum": 100,
                    "maximum": 10000
                },
                "location": {
                    "bsonType": "array",
                    "description": "Location must be an array of strings",
                    "minItems": 1,
                    "uniqueItems": true,
                    "items": {
                        "bsonType": "string"
                    }
                }
            },
        }
    }
})

In this $jsonSchema object, the location field is included within the required array as well as the properties object. There, it’s defined with a bsonType of array to ensure that the location value is always an array rather than a single string or a number.

The minItems property validates that the array must contain at least one element, and the uniqueItems property is set to true to ensure that elements within each location array will be unique. This will prevent values like ["Nepal", "Nepal"] from being accepted. Lastly, the items subdocument defines the validation schema for each individual array item. Here, the only expectation is that every item within a location array must be a string.

Note: The available schema document properties are different for each bsonType and, depending on the field type, you will be able to validate different aspects of the field value. For example, with number values you could define minimum and maximum allowable values to create a range of acceptable values. In the previous example, by setting the location field’s bsonType to array, you can validate features particular to arrays.

You can find details on all possible validation choices in the JSON Schema documentation.

After executing the command, MongoDB will respond with a short success message that the collection was successfully modified with the new schema document:

Output{ "ok" : 1 }

Now try inserting documents matching the examples prepared earlier to test how the new rule behaves. Once again, let’s use the minimal document structure, with only the nameheight, and location fields present.

db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": 8300,
        "location": ["Nepal", "China"]
    }
)

The document will be inserted successfully as it fulfills all the defined validation expectations. Similarly, the following document will insert without error:

db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": 8300,
        "location": ["Nepal"]
    }
)

However, if you were to run any of the following insertOne() methods, they would trigger a validation error and fail:

db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": 8300,
        "location": "Nepal"
    }
)
db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": 8300,
        "location": []
    }
)
db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": 8300,
        "location": ["Nepal", "Nepal"]
    }
)
db.peaks.insertOne(
    {
        "name": "Test peak",
        "height": 8300,
        "location": ["Nepal", 15]
    }
)

As per the validation rules you defined previously, the location values provided in these operations are considered invalid.

After following this step, three primary fields describing a mountain top are already being validated through MongoDB’s schema validation feature. In the next step, you’ll learn how to validate nested documents using the ascents field as an example.

Step 5 — Validating Embedded Documents

At this point, your peaks collection has three fields — nameheight and location — that are being kept in check by schema validation. This step focuses on defining validation rules for the ascents field, which describes successful attempts at summiting each peak. In the example document from Step 1 that represents Mount Everest, the ascents field was structured as follows:

The Everest document

{
    "name": "Everest",
    "height": 8848,
    "location": ["Nepal", "China"],
    "ascents": {
        "first": {
            "year": 1953,
        },
        "first_winter": {
            "year": 1980,
        },
        "total": 5656,
    }
}

The ascents subdocument contains a total field whose value represents the total number of ascent attempts for the given mountain. It also contains information on the first winter ascent of the mountain as well as the first ascent overall. These, however, might not be essential to the mountain description. After all, some mountains might not have been ascended in winter yet, or the ascent dates are disputed or not known. For now, just assume the information that you will always want to have in each document is the total number of ascent attempts.

You can change the schema validation document so that the ascents field must always be present and its value must always be a subdocument. This subdocument, in turn, must always contain a total attribute holding a number greater than or equal to zero. The first and first_winter fields aren’t required for the purposes of this guide, so the validation form won’t consider them and they can take flexible forms.

Once again, replace the schema validation document for the peaks collection by running the following runCommand() method:

db.runCommand({
    "collMod": "peaks",
    "validator": {
        $jsonSchema: {
            "bsonType": "object",
            "description": "Document describing a mountain peak",
            "required": ["name", "height", "location", "ascents"],
            "properties": {
                "name": {
                    "bsonType": "string",
                    "description": "Name must be a string and is required"
                },
                "height": {
                    "bsonType": "number",
                    "description": "Height must be a number between 100 and 10000 and is required",
                    "minimum": 100,
                    "maximum": 10000
                },
                "location": {
                    "bsonType": "array",
                    "description": "Location must be an array of strings",
                    "minItems": 1,
                    "uniqueItems": true,
                    "items": {
                        "bsonType": "string"
                    }
                },
                "ascents": {
                    "bsonType": "object",
                    "description": "Ascent attempts information",
                    "required": ["total"],
                    "properties": {
                        "total": {
                            "bsonType": "number",
                            "description": "Total number of ascents must be 0 or higher",
                            "minimum": 0
                        }
                   }
                }
            },
        }
    }
})

Anytime the document contains subdocuments under any of its fields, the JSON Schema for that field follows the exact same syntax as the main document schema. Just like how the same documents can be nested within one another, the validation schema nests them within one another as well. This makes it straightforward to define complex validation schemas for document structures containing multiple subdocuments in a hierarchical structure.

In this JSON Schema document, the ascents field is included within the required array, making it mandatory. It also appears in the properties object where it’s defined with a bsonType of object, just like the root document itself.

Notice that the definition for ascents validation follows a similar principle as the root document. It has the required field, denoting properties the subdocument must contain. It also defines a properties list, following the same structure. Since the ascents field is a subdocument, it’s values will be validated just like those of a larger document would be.

Within ascents, there’s a required array whose only value is total, meaning that every ascents subdocument will be required to contain a total field. Following that, the total value is described thoroughly within the properties object, which specifies that this must always be a number with a minimum value of zero.

Again, because neither the first nor the first_winter fields are mandatory for the purposes of this guide, they aren’t included in these validation rules.

With this schema validation document applied, try inserting the sample Mount Everest document from the first step to verify it allows you to insert documents you’ve already established as valid:

db.peaks.insertOne(
    {
        "name": "Everest",
        "height": 8848,
        "location": ["Nepal", "China"],
        "ascents": {
            "first": {
                "year": 1953,
            },
            "first_winter": {
                "year": 1980,
            },
            "total": 5656,
        }
    }
)

The document saves successfully, and MongoDB returns the new object identifier:

Output{
        "acknowledged" : true,
        "insertedId" : ObjectId("619100f51292cb2faee531f8")
}

To make sure the last pieces of validation work properly, try inserting a document that doesn’t include the ascents field:

db.peaks.insertOne(
    {
        "name": "Everest",
        "height": 8848,
        "location": ["Nepal", "China"]
    }
)

This time, the operation will trigger an error message pointing out a failed document validation:

OutputWriteError({
        "index" : 0,
        "code" : 121,
        "errmsg" : "Document failed validation",
        . . .
})

Now try inserting a document whose ascents subdocument is missing the total field:

db.peaks.insertOne(
    {
        "name": "Everest",
        "height": 8848,
        "location": ["Nepal", "China"],
        "ascents": {
            "first": {
                "year": 1953,
            },
            "first_winter": {
                "year": 1980,
            }
        }
    }
)

This will again trigger an error.

As a final test, try entering a document that contains an ascents field with a total value, but this value is negative:

db.peaks.insertOne(
    {
        "name": "Everest",
        "height": 8848,
        "location": ["Nepal", "China"],
        "ascents": {
            "first": {
                "year": 1953,
            },
            "first_winter": {
                "year": 1980,
            },
            "total": -100
        }
    }
)

Conclusion

You learned how to utilise JSON Schema documents to verify documents to verify document structures before storing them in a collection by following this tutorial. Then you applied value constraints to numbers and arrays using JSON Schema documents to check field types. In a hierarchical document structure, you’ve also learnt how to validate the subdocuments.

Although it shouldn’t be seen as a replacement for data validation performed at the application level, MongoDB’s schema validation function can help prevent violations of data constraints that are crucial to maintaining the meaning of your data. The flexibility of a schemaless approach to data storage can be maintained while using schema validation as a useful tool for data structure. With schema validation, you are in total control of those parts of the document structure you want to validate and those you’d like to leave open-ended.

What is MongoDB Schema Validation?
Show Buttons
Hide Buttons