Friday, December 13, 2013

Paypal's node.js versus Java benchmark and its analysis

Paypal engineers shared their test results on node.js versus Java from both system performance and developer performance perspectives. Baron Schwartz followed with a good analysis of the result. I suggest everyone interested in service performance and scalability read them.

I happened to know well about most aspects of this topic: Java, node.js, service orchestration, NIO/AIO, performance and scalability including USL. The following are my assumptions and understandings of the original benchmark and the analysis.
  1. The application that they implemented in both Java and node.js is a service orchestration or a server-side mashup. That is, for an account overview page request, the server side needs to do a serial of service/API requests to other services/APIs/resources. According to the description, I assume there are 3 serial blocks, and inside each block, there are 2 to 5 parallel requests. 
  2. The underlying services/APIs/resources are independent of the service/resource to generate the account overview page. We can assume the characteristics of response time (mean, or second order or distribution) of the underlying services/APIs/resources do not change because of the load on the account overview page service/resource. This is a required assumption for the apple-to-apple comparison conclusion. 
  3. The account overview page requests result in I/O intensive workload on the system, not CPU intensive. More precisely, it is Reversed I/O that is the service orchestration initiates parallel clients including either HTTP or others like DB . If you want to know more about this, please google C10K or Reverse C10K. I assume they used 5 cores for the Java implementation in order to make sure the CPU would not become a bottleneck in the benchmark because of the number of concurrent threads. The max number of concurrent threads would be N x P where P is the number of concurrent threads. In this case, P could be 6, 1 for the inbound overview request, 5 for outbound API/resource requests. In their measurement, this number can reach 15 x 6 = 90. I assume they have a good Java HTTP client library that has a shared thread pool. 
  4. It is not clear what Java Servlet container was used in the benchmark. Some latest Servlet containers do support NIO. However, most Servlets are still synchronous. Only a few platforms can support asynchronous style like Jetty and Netty. The difference between synchronous Servlet/processing and asynchronous one is that the former needs to block during outbound I/Os. In node.js, everything is asynchronous if you program in a natural way.  
  5. In the USL models, Java's kappa (crosstalk or coherency penalty) is higher because the threads shares resources including CPUs and memory. Its sigma (serialization or contention penalty) is lower because the tasks are preformed in parallel by the threads. node.js's kappa is lower because the I/O operations are processed serially by the single event loop. For the same reason, its sigma is much higher. This shows the beauty of the USL model.

Thursday, December 5, 2013

Friday, October 25, 2013

HealthCare.gov is a service orchestration

There are a lot of news recently about the healthcare.gov website issues. The terms of hub and service have been used widely to describe the technical nature of the system to general audience like this. It is good that they did not use the more technical term of orchestration. Scalability issues are intrinsic to orchestrations that are realized by a hub-spoke structure.

The central conductor service resides at the hub, and clients and partner services are at the spoke ends.
1. The central conductor service needs to support many concurrent orchestration instances and handle more Input/Output (i/o) operations than normal services, which can bring more performance and scalability challenges.
2. The central conductor service is a single point of failure and, therefore, is critical to the reliability of a service composition.
Such scalability issues and an architectural approach to addressing it are discussed in my thesis titled RESTful Service Composition. The key idea is to implement a composition in a flow-like structure rather than hub-spoke.

Tuesday, October 8, 2013

Clone a resource in HTTP

In order to clone or copy a file, we execute a command like
cp a b
Given a file named a, the command creates a new file named b that has the same content as a. Similarly, we can clone a resource A. The client sends the following request to the server.
PUT /resources/B
...
{"source": "/resources/A"}
However, a client normally does not have the knowledge to properly name a new resource. A more reasonable alternative is the following request.
POST /resources
...
{"source": "/resources/A"}
All these are straightforward, but really smell like RPC. They did not follow the REST constraint of self-descriptive message. How could the server figure out /resources/A is what the client wants. What if the resource has been changed or deleted after the request was sent out?  The following approach may be better:
GET /resources/A
Then
POST /resources
...
the representation of /resources/A
What if the representation is huge? Maybe we can do the following:
GET /resources/A
or
HEAD /resources/A
Then
POST /resources
...
{"source": "/resources/A",
"ETag": "frompreviousrequest"}

Monday, August 5, 2013

A lightweight JavaScript form data binder

In one of my current projects, I need a JavaScrip module to serialize the user form input in JSON format and then to send it to the server. I also need it to deserialize the data from the server and input into the form in order to allow the user to view or update it. Basically it is a two-way form-JSON binder. AngularJS fits this scenario perfectly, but it is a little heavy from my perspective.

js-binding is the one that I chose. I did a little modification on it in order to avoid serializing blank inputs and deserializing undefined JSON properties. It is the server side that decides if a field should be updated if the request says nothing about it. The modified version is on GitHub.

Using the serialization function, I can also check whether the form was updated before submitting the data to the server.