I have been playing around with MongoDb, thanks to the M101J Course offered by Mongodb University. These NoSQL datastores are gaining popularity due to a number of reasons and one among them being the ease with which they can be scaled out i.e horizontal scaling. This horizontal scaling in MongoDB can be achieved by creating a sharded cluster of mongodb instances.
You might want to understand the concept of sharding before continuing. The MongoDB reference has a very clear explanation for the same here.
A sharded setup of mongodb requires the following:
- Mongodb Configuration server – this stores the cluster’s metadata
- mongos instance – this is the router and routes the queries to different shards based on the sharding key
- Individual mongodb instances – these act as the shards.
The below is the architecture diagram of a sample mongodb sharded setup (Source: MongodDB Reference)
Lets create all of the above components in a single instance i.e on your localhost.
Creating Mongodb Config Server
$ mkdir dataconfigdb $ mongod --configsvr --port 27010
- First creates a data directory to store the cluster metadata
- Second launches the config server deamon on port 27010. The default port 27019, but I have overriden by using the
--port
command line option.
Setting up Query Routers (mongos instances)
This is the routing service for the sharding cluster where by it takes queries from the application and gets the data from the specific shards. Query routers can be setup by using the mongos
command as shown below:
$ mongos -configdb localhost:27010 --port 27011
This outputs a number of things on the console starting with the following line:
2015-02-01T18:51:35.606+0300 warning: running with 1 config server should be done only for testing purposes and is not recommended for production
It is recommended to run with 3 configdb server for production so as to avoid a single point of failure. But for our testing, 1 configdb server should be fine.
--configdb
command line option is used to let the Query router know about the config servers we have setup. It takes a comma separated : values like –configdb host1:port1,host2:port2. In our case we have only 1 config server.
Running mongodb shards
Now we need to run the actual mongodb instances which store the shared data. We will created 3 sharded instances of mongodb and run all of these on localhost on different ports and provide each mongodb instance its own --dbpath
as shown below:
Mongodb Shard – 1
$ mongod --port 27012 --dbpath datadb
Mongodb Shard – 2
$ mongod --port 27013 --dbpath datadb2
Mongodb Shard – 3
$ mongod --port 27014 --dbpath datadb3
Now we have three shards of mongodb running on localhost. For the database I will be using the students
database having collection grades
. The structure of the documents in grades
is given below:
{ "_id" : ObjectId("50906d7fa3c412bb040eb577"), "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
You can choose any database of your choice.
Registering the shards with mongos
Now that we have created our two mongodb shards running at localhost:27012 and localhost:27013 respectively, we will go ahead and register these shards with our mongos query router, also define which database we need to shard and then enable sharding on the collection we are interested by providing the shard key. All these have to be carried out by connecting to the mongos query router as shown in the below commands:
$ mongo --port 27011 --host localhost mongos> sh.addShard("localhost:27012") { "shardAdded" : "shard0000", "ok" : 1 } mongos> sh.addShard("localhost:27013") { "shardAdded" : "shard0001", "ok" : 1 } mongos> sh.addShard("localhost:27014") { "shardAdded" : "shard0001", "ok" : 1 } mongos> sh.enableSharding("students") { "ok" : 1 } mongos> sh.shardCollection("students.grades", {"student_id" : 1}) { "collectionsharded" : "students.grades", "ok" : 1 } mongos>
In the sh.shardCollection
we specify the collection and the field from the collection which is to be used as a shard key.
Adding data to the mongodb sharded cluster
Lets connect to mongos and run some code to populate data to the grades collection in students database.
for ( i = 1; i < 10000; i++ ) { db.grades.insert({student_id: i, type: "exam", score : Math.random() * 100 }); db.grades.insert({student_id: i, type: "quiz", score : Math.random() * 100 }); db.grades.insert({student_id: i, type: "homework", score : Math.random() * 100 }); } WriteResult({ "nInserted" : 1 })
After inserting the data we would notice some activity in the mongos daemon stating that it is moving some chunks for specific shard and so on i.e the balancer will be in action trying to balance the data across the shards. The output will be something like:
2015-02-02T18:26:26.770+0300 [Balancer] moving chunk ns: students.grades moving ( ns: students.grades, shard: shard0000:localhost:27012, lastmod: 1|1||000000000000000000000000, min: { student_id: MinKey }, max: { student_id: 200.0 }) shard0000:localhost:27012 -> shard0001:localhost:27013 2015-02-02T18:31:12.314+0300 [Balancer] moving chunk ns: students.grades moving ( ns: students.grades, shard: shard0000:localhost:27012, lastmod: 2|2||000000000000000000000000, min: { student_id: 200.0 }, max: { student_id: 2096.0 }) shard0000:localhost:27012 -> shard0002:localhost:27014
Lets look at the status of the shards by connecting to the mongos. It can be achieved by using the sh.status()
command.
$ mongo --port 27011 --host localhost mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54cf95d9d9309193f5fa0780") } shards: { "_id" : "shard0000", "host" : "localhost:27012" } { "_id" : "shard0001", "host" : "localhost:27013" } { "_id" : "shard0002", "host" : "localhost:27014" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "blog", "partitioned" : false, "primary" : "shard0000" } { "_id" : "course", "partitioned" : false, "primary" : "shard0000" } { "_id" : "m101", "partitioned" : false, "primary" : "shard0000" } { "_id" : "school", "partitioned" : false, "primary" : "shard0000" } { "_id" : "students", "partitioned" : true, "primary" : "shard0000" } students.grades shard key: { "student_id" : 1 } chunks: shard0001 1 shard0002 1 shard0000 1 { "student_id" : { "$minKey" : 1 } } -->> { "student_id" : 200 } on : shard0001 Timestamp(2, 0) { "student_id" : 200 } -->> { "student_id" : 2096 } on : shard0002 Timestamp(3, 0) { "student_id" : 2096 } -->> { "student_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(3, 1) { "_id" : "task-db", "partitioned" : false, "primary" : "shard0000" } { "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
The above output shows that the database students
is sharded and the sharded collection is the grades
collection. It also shows the different shards available and the range of shard keys distributed across different shards. So on shard0001 we have student_id
from minimum to 200, then on shard0002 we have student_id
from 200 upto 2096 and the rest in shard0000.
We can also connect to individual shards and query to find out the max and minimum student ids available.
On shard0000
$ mongo --host localhost --port 27012 MongoDB shell version: 2.6.7 connecting to: localhost:27012/test > use students switched to db students > db.grades.find().sort({student_id : 1}).limit(1) { "_id" : ObjectId("54cf97295a23cc67efa848c8"), "student_id" : 2096, "type" : "exam", "score" : 6.7372970981523395 } > db.grades.find().sort({student_id : -1}).limit(1) { "_id" : ObjectId("54cf973b5a23cc67efa8a567"), "student_id" : 9999, "type" : "homework", "score" : 60.64519872888923 }
On shard0001
C:UsersMohamed>mongo --host localhost --port 27013 MongoDB shell version: 2.6.7 connecting to: localhost:27013/test > use students switched to db students > db.grades.find().sort({student_id:1}).limit(1).pretty() { "_id" : ObjectId("54cf97d05a23cc67efa8a568"), "student_id" : 1, "type" : "exam", "score" : 5.511052813380957 } > db.grades.find().sort({student_id:-1}).limit(1).pretty() { "_id" : ObjectId("54cf97d15a23cc67efa8a7bc"), "student_id" : 199, "type" : "homework", "score" : 51.78457708097994 }
On shard0002
$ mongo --host localhost --port 27014 MongoDB shell version: 2.6.7 connecting to: localhost:27014/test > use students switched to db students > db.grades.find().sort({student_id:1}).limit(1).pretty() { "_id" : ObjectId("54cf971f5a23cc67efa83292"), "student_id" : 200, "type" : "homework", "score" : 79.56434232182801 } > db.grades.find().sort({student_id:-1}).limit(1).pretty() { "_id" : ObjectId("54cf97295a23cc67efa848c7"), "student_id" : 2095, "type" : "homework", "score" : 62.75710032787174 }
Lets execute the same set of queries on the mongos query router and see that the results this time will include data from all the shards and not just individual shard.
$ mongo --port 27011 --host localhost MongoDB shell version: 2.6.7 connecting to: localhost:27011/test mongos> use students switched to db students mongos> db.grades.find().sort({student_id:-1}).limit(1).pretty() { "_id" : ObjectId("54cf973b5a23cc67efa8a567"), "student_id" : 9999, "type" : "homework", "score" : 60.64519872888923 } mongos> db.grades.find().sort({student_id:1}).limit(1).pretty() { "_id" : ObjectId("54cf97d05a23cc67efa8a568"), "student_id" : 1, "type" : "exam", "score" : 5.511052813380957 } mongos>
So this brings to the end of setting up sharded mongodb cluster on localhost. Hope it was informative and useful!
Categories: Code, Design, Open Source
Hey, thanks for the great tips! I’m just wondering to which node you’re connecting to insert the data? I’ve tried to rebuld the infrastructure explained here (and connected to 27011) but i can’t see any entries when running queries on the router.
You would add data by connecting to mongos running on port 27011.
while u add data via the router, you can open the console running the mongos daemon and see some logs there which indicate where the data is being moved and if there are any exceptions. Also notes I just updated the post to modify the program which was populating the data and also registering the mongodb running on 27014 port as the third shard.
I was already connecting to the router and fixed my issue by adding the shardKey to my mongoose.Schema http://mongoosejs.com/docs/guide.html#shardKey
Thanks for your help!
Thanks Sanaulla. Successfully set up Mangodb cluster in localhost. Followed your step by step tutorial and it works like a charm.
glad that the post helped you 🙂
thanks lot!
thanks for reading!
Hi there, great post, few doubts..
If one of my server goes down, then what will happen to my data on that server?
I shut down one server, 27014, after inserting data, and now querying gives error..how to solve that, will my app access suffer until the server is up. Is replication the solution for above ?
And if I don’t replicate and my server goes down, is that data gone forever ?
hey..I did cluster setup successfully with one shard as a replica set. Then I added user to the mongos router and created keyfile and later on used the same key file on all the cluster members.But when I am authenticating, only on router it is returning auth true and validate uaser but on shard it is showing auth failed…can anybody please help me?
Very helpful.. Short and Simple…
Glad that it was helpful
Hi Mohamed,
I am new in mongo DB sharding,i am having 3 server with mongo installed can u help me to mongodb sharding for testing purpose.
You can follow the instructions in the article. In place of localhost you have to specify the IP of your servers having mongo installed.
hey thanks, but i m not getting the data on
shard0001 1
shard0002 1
I getting data only in shard0000
Plz give me solution,
I got status as follows:
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("58d4aea8067c4a435b3b514a")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:27012" }
{ "_id" : "shard0001", "host" : "localhost:27013" }
{ "_id" : "shard0002", "host" : "localhost:27014" }
active mongoses:
"3.2.9" : 1
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "students", "primary" : "shard0000", "partitioned" : true }
students.grades
shard key: { "student_id" : 1 }
unique: false
balancing: true
chunks:
shard0000 1
{ "student_id" : { "$minKey" : 1 } } -->> { "student_id"
: { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)
{ "_id" : "test", "primary" : "shard0001", "partitioned" : false }
You have only 1 record in the student.grades collection. Try to add more data. Use the for loop I have provided above to populate more records and then see the status. You will find the distribution of records into different shards.
Hi Mohamed,
Thanks for this great article.
I followed the said step but not getting success. I stuck on the second step while running
mongos –configdb localhost:27010 –port 27011
got the below output
BadValue: configdb supports only replica set connection string
try ‘mongos –help’ for more information
I am trying with MongoDB shell version v3.4.7 on Windows 10.
Please help me.
getting error…
mongod –configsvr –port 27010
2019-04-24T18:44:19.147+0530 I CONTROL [initandlisten] MongoDB starting : pid=11138 port=27010 dbpath=/data/configdb master=1 64-bit host=aman-VirtualBox
2019-04-24T18:44:19.147+0530 I CONTROL [initandlisten] db version v3.2.22
2019-04-24T18:44:19.147+0530 I CONTROL [initandlisten] git version: 105acca0d443f9a47c1a5bd608fd7133840a58dd
2019-04-24T18:44:19.147+0530 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2019-04-24T18:44:19.147+0530 I CONTROL [initandlisten] allocator: tcmalloc
2019-04-24T18:44:19.148+0530 I CONTROL [initandlisten] modules: none
2019-04-24T18:44:19.148+0530 I CONTROL [initandlisten] build environment:
2019-04-24T18:44:19.148+0530 I CONTROL [initandlisten] distmod: ubuntu1604
2019-04-24T18:44:19.148+0530 I CONTROL [initandlisten] distarch: x86_64
2019-04-24T18:44:19.148+0530 I CONTROL [initandlisten] target_arch: x86_64
2019-04-24T18:44:19.148+0530 I CONTROL [initandlisten] options: { net: { port: 27010 }, sharding: { clusterRole: “configsvr” } }
2019-04-24T18:44:19.184+0530 I STORAGE [initandlisten] exception in initAndListen: 98 Unable to create/open lock file: /data/configdb/mongod.lock errno:13 Permission denied Is a mongod instance already running?, terminating
2019-04-24T18:44:19.187+0530 I CONTROL [initandlisten] dbexit: rc: 100