Yahoo! Cloud Server Benchmark

The YCSB (Yahoo! Cloud Server Benchmark) provides a suite for testing various data stores against a set of well defined workloads that attempt to mimic different use cases. For our purposes we are using the framework to compare the performance of the MongoDB Inc. supported driver (legacy) to the MongoDB Asynchronous Java Driver.

Based on previous testing with the YCSB and other users experience with the MongoDB driver for YCSB we have created and enhanced version of the both the legacy and the asynchronous driver. The enhancements are beyond the scope of this document but contain what we feel is the optimal, realistic, configuration for both drivers. We have also simplified the YCSB configuration to be based on the MongoDB URI only. All of the changes are available from the GitHub Repository. We have also created a YCSB Pull Request containing all of the updates so they will be included in the upstream releases.

The results of the benchmark for the 6 provided workloads (A-F) show clearly that the MongoDB Asynchronous Java Driver has lower latency, and higher throughput across the benchmark scenarios. In addition, this driver has a much lower slope of increasing latency as contention for the available connections increases.

The following sections contain a high level description of each workload and the results for each. The full launch script, output, and analysis spreadsheet are also available for review.

In a graphs below the latency values are displayed as a stock chart where the maximum value is the 95th percentile and the close value is the average latency. It is important to note that the resolution of the 95th percentile reported by the YCSB is in milliseconds. The minimum and average values are reported in microseconds. In the case of the 95th percentile being less than 1 millisecond a value of zero is reported, where for the minimum and average values non-zero values are reported.

Workload A - Update Heavy Workload

The first workload has a 50/50 split between reads and updates of individual records. This workload tries to model a session store for a web container.

All of the default settings for the workload are the same as those provided by the benchmark except the number of records and operations have been increased to 1,000,000 each.

Workload A - Throughput.Workload A - Read Latency.Workload A - Update Latency.

Workload B - Read Mostly Workload

The second workload has a 95/5 split between reads and updates of individual records. This workload tries to model a blog or photo sharing entry where reads dominate but there may be occasional updates for comments or tags.

All of the default settings for the workload are the same as those provided by the benchmark except the number of records and operations have been increased to 1,000,000 each.

Workload B - Throughput.Workload B - Read Latency.Workload B - Update Latency.

Workload C - Read Only Workload

The third workload only performs reads of records. This workload tries to model a situation of a read-only cache where updates are performed by some off-line process.

All of the default settings for the workload are the same as those provided by the benchmark except the number of records and operations have been increased to 1,000,000 each.

Workload C - Throughput.Workload C - Read Latency.

Workload D - Read Latest Workload

The fourth workload has a 95/5 split between reads and inserts of individual records. This workload tries to model a social network of user status updates.

All of the default settings for the workload are the same as those provided by the benchmark except the number of records and operations have been increased to 1,000,000 each.

Workload D - Throughput.Workload D - Read Latency.Workload D - Insert Latency.

Workload E - Read Latest Workload

The fifth workload has a 95/5 split between scans and inserts of records. This workload tries to model a threaded discussion of clustered comments.

All of the default settings for the workload are the same as those provided by the benchmark except the number of records has been increased to 1,000,000 and the number of operations has been increased to 250,000.

This workload reads large volumes of data from the MongoDB servers. mongostat regularly reported in excess of 100MB/second netout during the Asynchronous Java Driver runs. We suspect the relatively flat throughput is primarily due to the limitations in bandwidth between the machines.

Workload E - Throughput.Workload E - Scan Latency.Workload E - Insert Latency.

Workload F - Read-Modify-Write Workload

The last workload has a 50/50 split between reads and read/modify/updates of records.

All of the default settings for the workload are the same as those provided by the benchmark except the number of records and operations have been increased to 1,000,000 each.

Workload F - Throughput.Workload F - Read/Modify/Write Latency.Workload F - Read Latency.Workload F - Update Latency.

Test Environment

The test client was run on an Amazon EC2 m3.xlarge instance within the US East (Virginia) Zone #12. The MongoDB provided ami-b0ba55d8 which is configured to use MongoDB 2.6.1 with 4000 IOPS. Open JDK 1.7.0_55 was used.

The MongoDB server was a standalone mongod also running an Amazon EC2 m3.xlarge instance within the US East (Virginia) Zone #12. It also used the same AMI as the client. Average ping times between the machines was about 0.25 ms. The mongod process was started with the system init scripts.