Mongo Scaling
Mongo Scaling
Mongo Scaling
com
Scaling
node_a1
write
node_b1 node_a1
write
write
write
shard2
node_c2 node_b2 node_a2
write
shard2
node_c2 node_b2 node_a2
shard3
node_c3 node_b3 node_a3
write
10
11
Schema
Data model effects performance
Partial versus full document writes Partial versus full document reads
Schema and Schema usage critical for scaling and
perfromance
Roundtrips to database Disk seek time Size of data to read & write
12
Indexes
Index common queries Do not over index
13
, 5)
[5, 10)
[10,
[-
, 5) buckets
[5, 7)
[7, 9)
[9, 10)
[10,
) buckets
{...} {...} {...} {...} {...} {...} {...} {...} {...} {...} {...}
15
Picking an a Index
find({x:
10,
y:
foo})
index on y
remember
16
What is Sharding
Ad-hoc partitioning Consistent hashing
17
MongoDB Sharding
Automatic partitioning and management Range based Convert to sharded system with no downtime Fully consistent
18
- +
- + - 40 41 +
- + - 40 41 + 51 +
41 50
- + - 40 41 + 51 + 61 +
22
41 50
51 60
Sunday, August 21, 2011
- + - 40 41 + 51 + 61 +
23
41 50
51 60
Sunday, August 21, 2011
shard1 - 40 41 50 51 60 61 +
Sunday, August 21, 2011 23
- 40 41 50 51 60 61 +
Sunday, August 21, 2011 24
shard1 - 40 41 50 51 60 61 +
Sunday, August 21, 2011 24
shard1 - 40
shard2 41 50
51 60 61 +
Sunday, August 21, 2011 24
shard1 - 40
shard2 41 50
shard3
51 60 61 +
Sunday, August 21, 2011 24
Sharding Features
Shard data without no downtime Automatic balancing as data is written Commands routed (switched) to correct node
Inserts - must have the Shard Key Updates - must have the Shard Key Queries Indexed Queries
With Shard Key - routed to nodes Without Shard Key - scatter gather With Shard Key - routed in order Without Shard Key - distributed sort merge
25
MongoDB Replication
MongoDB replication like MySQL replication
Asynchronous master/slave
Variations:
26
27
Member 2
28
Member 2 PRIMARY
Member 2 DOWN
Member 3 PRIMARY
Member 2 DOWN
Member 3 PRIMARY
RECOVERING
Member 2
Automatic recovery
32
Member 3 PRIMARY
Member 2
33
34
Cannot be elected as PRIMARY Can vote in an election Do not hold any data
Hidden {hidden:True} Tagging - New in 2.0 tags
:
{"dc":
"ny"},
"rack":
"r23s5"}
Sunday, August 21, 2011 35
Using Replicas
slaveOk() - driver will send read requests to Secondaries - driver will always send writes to Primary Java examples - DB.slaveOk() - Collection.slaveOk()
- find(q).addOption(Bytes.QUERYOPTION_SLAVEOK);
36
Safe Writes
db.runCommand({getLastError:
1,
w
:
1})
- ensure write is synchronous - command returns after primary has written to memory
w=n or w='majority'
- n is the number of nodes data must be replicated to - driver will always send writes to Primary
- Each member is "tagged" e.g. "US_EAST", "EMEA", "US_WEST" - Ensure that the write is executed in each tagged "region"
fsync:true
- Ensures changed disk blocks are ushed to disk
j:true
Replication features
Reads from Primary are always consistent Reads from Secondaries are eventually consistent Automatic failover if a Primary fails Automatic recovery when a node joins the set
38
39
Schema #1
>
db.profiles.save( {
_id
:
"
facebook_name
:
"alvin.j.richards",
twitter_name
:
"jonnyeight",
linkedin_name
:
"alvinrichards",
details
:
{
loc:
[50.78076,7.181969],
...} }) >
db.profiles.ensureIndex({facebook_name:1}) >
db.runCommand(
{
shardCollection
:
social.profiles,
key
:
{
facebook_name
:
1}
}
)
40
Schema #1
Good: Schema is simple to understand Easy to add new identifiers, e.g. foursquare name Query is routed to a shard
db.profiles.find({facebook_name:
"alvin.j.richards"})
Bad: Each identifier needs a separate index More indexes means less data in memory Memory contention and disk paging Query is scatter/gathered across cluster
db.profiles.find({linkedin_name:"alvinrichards"})
41
Schema #2
>
db.profiles.save( {
_id
:
ObjectId("1234")
details
:
{loc:
[50.78076,7.181969],
...}}) >
db.identfiers.save( {
_id
:
{type:
"facebook_name",
value:
"alvin.j.richards},
profile:
ObjectId("1234")}) >
db.identfiers.save( {
_id
:
{type:
"twitter_name",
value:
"jonnyeight},
profile:
ObjectId("1234")}) >
db.runCommand(
{
shardCollection
:
social.identifiers,
key
:
{
_id
:
1}
}
) >
db.runCommand(
{
shardCollection
:
social.profiles,
key
:
{
_id
:
1}
}
)
Sunday, August 21, 2011 42
Schema #2
Good: Easy to add new identifiers, e.g. foursquare name All query are routed to a shard >
db.profiles.find(
{_id
:
{type:
"facebook_name":
value:
"alvin.j.richards"}})
>
db.profiles.find(
{_id
:
{type:
"foursquare_id":
value:
"alvin10gen"}})
Bad: Schema is more complex Two lookups are required for each access (but both routed) Need to maintain links (data relationships)
43
Summary
Schema & Index design Simplest way to scale Sharding Automatically scale writes Replication Automatically scale reads
Sunday, August 21, 2011 44
download at mongodb.org
[email protected] MongoDB Munich, Germany - October 10
conferences,
appearances,
and
meetups
http://www.10gen.com/events
http://bit.ly/mongoW
http://linkd.in/joinmongo
45