The MongoDB Asynchronous Java Driver provides a high performance alternative to the 10gen driver. This driver has been created to ensure maximum utilization of resources on the client, in the network, on the server, and in the developer's IDE.
The main motivation for the driver is performance. Part of the performance consideration is not just it's raw performance but also the driver's ease of use and expressive power. These usability factors provide benefits in developer productivity and the resulting codes readability. Considerable effort has been put into making the library easy to use and clearly express the developer's intent.
For the more common non-trivial commands to the server a set of domain objects and associated builders are provided. In addition to the commands there is also a query builder that provides a very natural mechanism for defining queries:
Document query = where("f").greaterThan(23).lessThan(42).and("g").lessThan(3);
Similar effort has been placed in making sure that command construction is as clear as possible. A good example is the aggregate command which presents a lot of functionality in a single command but is also complex and difficult to construct in a clear and concise way. Consider the 10gen example provided here and compare to the same pipeline using the expressive power of this drivers helper classes.
DocumentBuilder b1 = BuilderFactory.start(); DocumentBuilder b2 = BuilderFactory.start(); Aggregate.Builder builder = new Aggregate.Builder(); builder.group(id().addField("state").addField("city"), set("pop").sum("pop")) .sort(asc("pop")) .group(id("_id.state"), set("biggestcity").last("_id.city"), set("biggestpop").last("pop"), set("smallestcity").first("_id.city"), set("smallestpop").first("pop")) .project( includeWithoutId(), set("state", field("_id")), set("biggestCity", b1.add(set("name", field("biggestcity"))) .add(set("pop", field("biggestpop")))), set("smallestCity", b2.add(set("name", field("smallestcity"))) .add(set("pop", field("smallestpop"))))));
While the usability of the driver is critical, its primary reason for existing is to enable maximum performance from a MongoDB server. A series of benchmarks have been created to measure the performance of the Asynchronous driver relative to the 10gen supported (legacy) driver.
YCSB (Yahoo! Cloud Server Benchmark) provides a standard set of workloads to try and compare the performance of various data stores. Instead of benchmarking different data stores we have used the benchmark to compare the relative performance of the legacy MongoDB Java Driver and the MongoDB Asynchronous Java Driver. The YCSB results show MongoDB Asynchronous Java Driver has lower latency, lower variability in latency and higher throughput across all of the benchmark scenarios. In addition, this driver has a much lower slope for increasing latency as contention for the available connections increases.
In addition to the workload based benchmark we have also create two micro-benchmarks.
The first attempts to isolate the performance of the driver performing a very simple insert and update operation. The benchmark demonstrates the throughput of the two drivers using various durability/write concerns. Of particular note is that the Asynchronous Java Driver can maintain throughput even for increasing levels for durability/write concern.
The last micro-benchmark compares the efficiency of the BSON libraries bundled with the two drivers. The benchmark serializes and de-serializes a series of increasingly complex documents and measures the time for each. The benchmark was originally intended to determine if pre-determining the size of objects written was better then writing the objects and then filling in the size once the document was written. It showed conclusively that the buffer management and copying overhead made pre-determining the object size better. When the benchmark was extended to the legacy driver's BSON library that the Asynchronous Java Driver was always faster and for small documents orders of magnitude faster.
A driver cannot be judged simply on the usability and performance alone. It also has to work and keep working. The MongoDB Asynchronous Driver has an extensive test suite that ensures that each release works as intended and limits regressions.
The driver has an extensive set of over 1,000 unit tests. The unit tests provide 96% line and 95% branch coverage and are designed to verify the behavior of each class and method. To make sure that developers run the unit tests often it is important that they run quickly. To achieve this goal the tests make extensive use of the EasyMock mocking support library. The result is a unit test suite with 96% coverage (as measured by Cobertura) that finishes in under 30 seconds.
Unit testing only proves that the driver does what the developer intended and continues to do the same thing as changes are made. The driver's integration tests have been created to ensure that the developer's and driver's interpretation of various MongoDb provided documentation matches the MongoDB server's implementation. These tests tend to be focused on specific functionality of the MongoDB servers such as authentication or cluster configuration detection. For these tests a local MongoDB instance in the appropriate configuration is started as part of the test.
The last suite of tests is the acceptance tests that are designed to exercise both the interfaces provided by the driver but also the interaction of the driver with the various MongoDB applications and configurations. Similar to the integration tests the acceptance tests start various configurations of MongoDB as part of the test but the goal of the acceptance tests is to exercise as much of the functional breadth of the driver as possible including common failure modes.
As an example of the sufficiency of the test suite: When porting the driver to the 2.2 version of MongoDB it was discovered, via failing integration and acceptance tests, that the MongoDB servers now required all connections to be authenticated, even for serverStatus commands. A simple change to the server that had not been clearly documented was quickly caught and the driver updated to work with the new requirements.
In addition to the unit, integration, and acceptance tests we have also used the FindBugs, PMD and CPD static analysis tools to perform checks for common errors and defects. The driver's source code reports zero issues across all 3 tools with zero filters and maximal effort.
Why expend the effort to create a new driver? Why not use the effort to improve the 10gen supported (legacy) Java driver? Those are very good questions. To get to the answer we have to look at the very core of the legacy driver's processing model.
The legacy driver maintains a set of open connections and when a processing thread sends a request the following actions are performed:
This synchronous checkout/request/response/checkin means that each request must wait for the complete round trip time for the request, processing and response. For many applications the latency induced can be simply overcome by using more connections. For either highly distributed or high performance applications the connection explosion that occurs induces performance and scheduler issues on the MongoDB servers that limits the utility of this approach.
Rather than try to force a more asynchronous model on the 10gen driver the MongoDB Asynchronous Java Driver instead creates a completely new driver that at its very core is asynchronous. The interface to the driver still provides synchronous methods but they are implemented using the asynchronous variants of the methods.
For each physical sockets connection a pair of threads is created. The first is responsible for pushing requests to the server. The second is responsible for reading the replies to the requests and matching them with the appropriate request callback (if any). This simple reader/writer model provides for simplicity of programming and has been shown to perform as well if not better than using Java's NIO package for using a single thread across multiple connections.
The driver can not guarantee that two requests to the server, even from a single thread, will be executed in the order they are submitted on the client. This is due to the asynchronous nature of the core of the driver and its default behavior of trying to balance requests across all open connections. For client's that need to serialize requests to the server they can create a serial version of a Mongo implementation. The only difference between a serialized version and the non-serialized version is that the serialized version will only use a single connection to the server. The asSerializedMongo() method and creation of the MongoDatabase and MongoCollection client side objects are extremely lightweight and have no server interaction to make even per transaction creation feasible.