Friday, December 13, 2013

Paypal's node.js versus Java benchmark and its analysis

Paypal engineers shared their test results on node.js versus Java from both system performance and developer performance perspectives. Baron Schwartz followed with a good analysis of the result. I suggest everyone interested in service performance and scalability read them.

I happened to know well about most aspects of this topic: Java, node.js, service orchestration, NIO/AIO, performance and scalability including USL. The following are my assumptions and understandings of the original benchmark and the analysis.
  1. The application that they implemented in both Java and node.js is a service orchestration or a server-side mashup. That is, for an account overview page request, the server side needs to do a serial of service/API requests to other services/APIs/resources. According to the description, I assume there are 3 serial blocks, and inside each block, there are 2 to 5 parallel requests. 
  2. The underlying services/APIs/resources are independent of the service/resource to generate the account overview page. We can assume the characteristics of response time (mean, or second order or distribution) of the underlying services/APIs/resources do not change because of the load on the account overview page service/resource. This is a required assumption for the apple-to-apple comparison conclusion. 
  3. The account overview page requests result in I/O intensive workload on the system, not CPU intensive. More precisely, it is Reversed I/O that is the service orchestration initiates parallel clients including either HTTP or others like DB . If you want to know more about this, please google C10K or Reverse C10K. I assume they used 5 cores for the Java implementation in order to make sure the CPU would not become a bottleneck in the benchmark because of the number of concurrent threads. The max number of concurrent threads would be N x P where P is the number of concurrent threads. In this case, P could be 6, 1 for the inbound overview request, 5 for outbound API/resource requests. In their measurement, this number can reach 15 x 6 = 90. I assume they have a good Java HTTP client library that has a shared thread pool. 
  4. It is not clear what Java Servlet container was used in the benchmark. Some latest Servlet containers do support NIO. However, most Servlets are still synchronous. Only a few platforms can support asynchronous style like Jetty and Netty. The difference between synchronous Servlet/processing and asynchronous one is that the former needs to block during outbound I/Os. In node.js, everything is asynchronous if you program in a natural way.  
  5. In the USL models, Java's kappa (crosstalk or coherency penalty) is higher because the threads shares resources including CPUs and memory. Its sigma (serialization or contention penalty) is lower because the tasks are preformed in parallel by the threads. node.js's kappa is lower because the I/O operations are processed serially by the single event loop. For the same reason, its sigma is much higher. This shows the beauty of the USL model.