Flushing Out an Idea

06 Jun 2014

In out last post we discovered what looks like a promising approach to get the next level of performance from MongoDB. We can pull the routing logic inside of the mongos process into the driver and bypass the synchronous interactions between the mongos and the mongod process.

What would that solution look like?

First we need to understand what routing related functions the mongos is performing. For this discussion we will ignore the other cluster maintenance functions performed by the mongos such as balancing and instead focus just on the routing of operations and those functions triggered by the routing of operations.

Route operations to the appropriate shards based on the distribution of chunks across the cluster.
Coordinating with the shard servers on the current version of the chunk distribution per database/collection. This ensures that when a chunk move occurs the mongos and mongod agree on the state of the world.
Track the volume of writes per chunk to trigger the splitting of chunks.

Routing

The routing of requests is conceptually simple. The config database contains the layout of each database and collection. We simply need to read the information from the appropriate collections within the config database and the apply the routing rules. There are a few situations we need to be aware of:

Non-partitioned database. In this case we route all of the requests to the primary shard.
Partitioned database / unsharded collection. Similarly, we need to send all of the requests for this collection to the primary shard for the database.
Partitioned database / sharded collection. In this case we need to read the chunks for the collections and use the ranges defined by the chunks to determine the right set of shards to send the request. Each request will have to be evaluated against the chunks to find the right subset of shards to use. Lets look at the details of chunk based routing.

Chunk Based Routing

The documents in the chunks collection of the config database contains the shard, minimum value (inclusive) and maximum value (exclusive) for the chunk range. For chunk based routing we need to simulate the relative ordering that MongoDB uses.

Range Shard Keys

MongoDB first orders items based on the type of the value. As an example, all string values will always compare less than a document value. There are a few groups of types where MongoDB considers the types to be the same. The best example of types comparing equal is the numeric types. This type equality allows an integer value of 1 to compare equal to a double value of 1.0 which will compare equal to the long value of NumberLong(1). We have added an ElementType.compare(ElementType) method in the 2.0.0 release to reflect this ordering.

After considering the type of the element MongoDB then compares the values. This is more straight forward of a comparison but for strings we have to remember that MongoDB does not use a collator and instead does a sort based on the UTF-8 encoding of the string. This ordering has been expressed in the driver by making the Element interface extend the Comparable interface.

To ensure that our ordering and the ordering for the MongoDB server match we have created a simple integration test that inserts several different documents using different values and types for the _id field. We then perform a find all with a sort on the _id field and ensure it matches the ordering we get from the comparing the Element values.

Hashed Shard Keys

Hashed shard keys require that we first compute the right hash. Since this is a hash we need to be sure that the hash values the driver generates exactly match the values generated by the server. We also want to be sure that the hash does not change between versions. We created a Jira ticket and pull request to create a set of hash test vectors that the driver then uses to check its own hash algorithm.

With the correct hash value we can use the same logic for finding the right shard just using the long hash value instead of the original element value.

Detecting chunk moves.

We now understand how to route chunks to the right shards. How are we going to detect when there is a change in the distribution of chunks due to a chunk moves? For that we reverse engineer the shard version synchronization that happens between the mongos and the mongod. The full details of the interaction are beyond the scope of this post but there are a series of setShardVersion commands that the mongos uses to synchronize the version of the chunk map used by both processes. When the mongod updates the version of the chunk map then future queries to that collection from the mongos will return an error with the ShardConfigStale bit set.

Splitting Chunks

The last function of the mongos is breaking up chunks when they get too big. This is done by issuing splitVector vector commands based on seeing specific volumes of writes to the various chunks. We have found this strategy to be less that optimal in practice since the splitVector commands tend to be issued when the write demand on the cluster is at its highest.

In replacement, we recommend a strategy of using split window (similar to a balancer window) that will serially check the chunks in the cluster and using the dataSize command to quickly estimate the size of chunks. Chunks that have grown beyond the maximum chunk size can then be split.

Wrapping Up

We now know that we have to implement the routing of requests and add the ability to detect chunk moves. On the surface it all looks very reasonable but also very risky. Even a small error in routing a request can easily cause the entire system to fail.

In the next post we will look at a solution that should provide data integrity guarantees while still minimizing the mongos bottleneck.

Asynchronous Java Driver Blog About Archive Feed