MongoDB – Replication | How It Works?

If you are interested to learn about the sharding in MongoDB

Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

Why Replication?

  • To keep your data safe
  • High (24*7) availability of data
  • Disaster recovery
  • No downtime for maintenance (like backups, index rebuilds, compaction)
  • Read scaling (extra copies to read from)
  • Replica set is transparent to the application

How Replication Works in MongoDB

MongoDB achieves replication by the use of replica set. A replica set is a group of mongod instances that host the same data set. In a replica, one node is primary node that receives all write operations. All other instances, such as secondaries, apply operations from the primary so that they have the same data set. Replica set can have only one primary node.

  • Replica set is a group of two or more nodes (generally minimum 3 nodes are required).
  • In a replica set, one node is primary node and remaining nodes are secondary.
  • All data replicates from primary to secondary node.
  • At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected.
  • After the recovery of failed node, it again join the replica set and works as a secondary node.

A typical diagram of MongoDB replication is shown in which client application always interact with the primary node and the primary node then replicates the data to the secondary nodes.

MongoDB Replication

Replica Set Features

  • A cluster of N nodes
  • Any one node can be primary
  • All write operations go to primary
  • Automatic failover
  • Automatic recovery
  • Consensus election of primary

Set Up a Replica Set

In this tutorial, we will convert standalone MongoDB instance to a replica set. To convert to replica set, following are the steps −

  • Shutdown already running MongoDB server.
  • Start the MongoDB server by specifying — replSet option. Following is the basic syntax of –replSet −
mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet "REPLICA_SET_INSTANCE_NAME"

Example

mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0
  • It will start a mongod instance with the name rs0, on port 27017.
  • Now start the command prompt and connect to this mongod instance.
  • In Mongo client, issue the command rs.initiate() to initiate a new replica set.
  • To check the replica set configuration, issue the command rs.conf(). To check the status of replica set issue the command rs.status().

Add Members to Replica Set

To add members to replica set, start mongod instances on multiple machines. Now start a mongo client and issue a command rs.add().

Syntax

The basic syntax of rs.add() command is as follows −

>rs.add(HOST_NAME:PORT)

Example

Suppose your mongod instance name is mongod1.net and it is running on port 27017. To add this instance to replica set, issue the command rs.add() in Mongo client.

>rs.add("mongod1.net:27017")
>

You can add mongod instance to replica set only when you are connected to primary node. To check whether you are connected to primary or not, issue the command db.isMaster() in mongo client.

Why do we need replication in MongoDB?

Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions.

What is the purpose of replication?

The purpose of replication is to advance theory by confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Theory advances in fits and starts with conceptual leaps, unexpected observations, and a patchwork of evidence.

What is the benefit of replication?

Improve data availability

Data replication enhances the resilience and reliability of systems by storing data at multiple sites across the network. That means, in case of a technical glitch due to malware, software errors, hardware failure or other disruption, data access can still occur from a different site.

How do I enable replication in MongoDB?

To start, you’ll need MongoDB installed on three or more nodes. Each of the nodes in the cluster will need to be able to communicate with the others over a standard port (27017 by default). Additionally, each replica set member needs to have a hostname that is resolvable from the others.

Overview: Network Connectivity

  • Establish a virtual private network. Ensure that your network topology routes all traffic between members within a single site over the local area network.
  • Configure access control to prevent connections from unknown clients to the replica set.
  • Configure networking and firewall rules so that incoming and outgoing packets are permitted only on the default MongoDB port and only from within your deployment. See the IP Binding considerations.
  • Ensure that each member of a replica set is accessible by way of resolvable DNS or hostnames.

For more detail, check out: production notes in our documentation and security hardening.

1: Start each member of the replica set with the appropriate options.

The following example specifies the replica set name and the ip binding through the --replSet and —bind_ip command-line options:

mongod --auth --replSet "rs0" --bind_ip localhost,<hostname(s)|ip address(es)>

For <hostname(s)|ip address(es)>, specify the hostname(s) and/or ip address(es) for your mongod instance that remote clients (including the other members of the replica set) can use to connect to the instance.

2: Connect a mongo shell to one of the mongod instances.

From the same machine where one of the mongod is running, start the mongo shell. To connect to the mongod listening to localhost on the default port of 27017, simply issue:

mongo -u $USERNAME -p $PASSWORD

Depending on how you installed MongoDB and set up your environment, you may need to specify the path to the mongo binary.

3: Initiate the replica set.

From the mongo shell, run the full rs.initiate({...}) on replica set member 0. This command initializes the replica set, and should only be run on the first replica set member. On subsequent nodes, you can run the command without parameters – just rs.initiate().

rs.initiate({
  _id :  "rs0",
  members: [
    { _id:  0, host:  "mongodb0.example.net:27017" },
    { _id:  1, host:  "mongodb1.example.net:27017" },
    { _id:  2, host:  "mongodb2.example.net:27017" }
  ]
})

MongoDB initiates a replica set, using the default replica set configuration.

4: View the replica set configuration.

Use rs.conf() to display the replica set configuration object:

    rs.conf()

The replica set configuration object resembles the following:

{
   "_id" : "rs0",
   "version" : 1,
   "protocolVersion" : NumberLong(1),
   "members" : [
      {
         "_id" : 0,
         "host" : "mongodb0.example.net:27017",
         "arbiterOnly" : false,
         "buildIndexes" : true,
         "hidden" : false,
         "priority" : 1,
         "tags" : {

         },
         "slaveDelay" : NumberLong(0),
         "votes" : 1
      },
      {
         "_id" : 1,
         "host" : "mongodb1.example.net:27017",
         "arbiterOnly" : false,
         "buildIndexes" : true,
         "hidden" : false,
         "priority" : 1,
         "tags" : {

         },
         "slaveDelay" : NumberLong(0),
         "votes" : 1
      },
      {
         "_id" : 2,
         "host" : "mongodb2.example.net:27017",
         "arbiterOnly" : false,
         "buildIndexes" : true,
         "hidden" : false,
         "priority" : 1,
         "tags" : {

         },
         "slaveDelay" : NumberLong(0),
         "votes" : 1
      }

   ],
   "settings" : {
      "chainingAllowed" : true,
      "heartbeatIntervalMillis" : 2000,
      "heartbeatTimeoutSecs" : 10,
      "electionTimeoutMillis" : 10000,
      "catchUpTimeoutMillis" : -1,
      "getLastErrorModes" : {

      },
      "getLastErrorDefaults" : {
         "w" : 1,
         "wtimeout" : 0
      },
      "replicaSetId" : ObjectId("585ab9df685f726db2c6a840")
   }
}

5: Ensure that the replica set has a primary.

Use rs.status() to identify the primary in the replica set.

How does MongoDB detect replication lag?

Replication lag is a delay in data being copied to a secondary server after an update on the primary server. Short windows of replication lag are normal, and should be considered in systems that choose to read the eventually-consistent secondary data. Replication lag can also prevent secondary servers from assuming the primary role if the primary goes down.

If you want to check your current replication lag:

  • In a Mongo shell connected to the primary, call the rs.printSlaveReplicationInfo() method.
    This returns the syncedTo value for each member, which shows the time when the last oplog entry was written to the secondary server.Replication lag can be due to several things, including:
  • Network Latency: Check your ping and traceroute to see if there is a network routing issue or packet loss. See: ping diagonistic documentation, troubleshooting replica sets.
  • Disk Throughput: Sometimes the secondary server is unable to flush data to disk as rapidly as the primary. This is common on multi-tenant systems, especially if the system accesses disk devices over an IP network. System-level tools, like vmstat or iostat can help you find out more. See: production notes, mongostat.
  • Concurrency: Long-running operations on the primary can block replications. Set up your write concern so that write operations don’t return if replication can’t keep up with the load. Alternatively, check slow queries and long-running operations via the database profiler. See: Write Concern.
  • Appropriate Write Concern: If the primary requires a large amount of writes (due to a bulk load operation or a sizable data ingestion), the secondaries may not be able to keep up with the changes on the oplog. Consider setting your write concern to “majority” in order to ensure that large operations are properly replicated.

Conclusion

Rather than having to configure and manage everything yourself, you can always use MongoDB Atlas. It streamlines and automates your replica sets, making the process effortless for you. MongoDB Atlas can also deploy globally sharded replica sets with few clicks, enabling data locality, disaster recovery, and multi-region deployments.

MongoDB – Replication | How It Works?
Show Buttons
Hide Buttons