Serenity Leads to a Further Vision

Friday, December 13, 2013

Paypal's node.js versus Java benchmark and its analysis

Paypal engineers shared their test results on node.js versus Java from both system performance and developer performance perspectives. Baron Schwartz followed with a good analysis of the result. I suggest everyone interested in service performance and scalability read them.

I happened to know well about most aspects of this topic: Java, node.js, service orchestration, NIO/AIO, performance and scalability including USL. The following are my assumptions and understandings of the original benchmark and the analysis.

The application that they implemented in both Java and node.js is a service orchestration or a server-side mashup. That is, for an account overview page request, the server side needs to do a serial of service/API requests to other services/APIs/resources. According to the description, I assume there are 3 serial blocks, and inside each block, there are 2 to 5 parallel requests.
The underlying services/APIs/resources are independent of the service/resource to generate the account overview page. We can assume the characteristics of response time (mean, or second order or distribution) of the underlying services/APIs/resources do not change because of the load on the account overview page service/resource. This is a required assumption for the apple-to-apple comparison conclusion.
The account overview page requests result in I/O intensive workload on the system, not CPU intensive. More precisely, it is Reversed I/O that is the service orchestration initiates parallel clients including either HTTP or others like DB . If you want to know more about this, please google C10K or Reverse C10K. I assume they used 5 cores for the Java implementation in order to make sure the CPU would not become a bottleneck in the benchmark because of the number of concurrent threads. The max number of concurrent threads would be N x P where P is the number of concurrent threads. In this case, P could be 6, 1 for the inbound overview request, 5 for outbound API/resource requests. In their measurement, this number can reach 15 x 6 = 90. I assume they have a good Java HTTP client library that has a shared thread pool.
It is not clear what Java Servlet container was used in the benchmark. Some latest Servlet containers do support NIO. However, most Servlets are still synchronous. Only a few platforms can support asynchronous style like Jetty and Netty. The difference between synchronous Servlet/processing and asynchronous one is that the former needs to block during outbound I/Os. In node.js, everything is asynchronous if you program in a natural way.
In the USL models, Java's kappa (crosstalk or coherency penalty) is higher because the threads shares resources including CPUs and memory. Its sigma (serialization or contention penalty) is lower because the tasks are preformed in parallel by the threads. node.js's kappa is lower because the I/O operations are processed serially by the single event loop. For the same reason, its sigma is much higher. This shows the beauty of the USL model.

Thursday, December 5, 2013

RESTful service composition slides

The slides I presented in my PhD defense.

Friday, October 25, 2013

HealthCare.gov is a service orchestration

There are a lot of news recently about the healthcare.gov website issues. The terms of hub and service have been used widely to describe the technical nature of the system to general audience like this. It is good that they did not use the more technical term of orchestration. Scalability issues are intrinsic to orchestrations that are realized by a hub-spoke structure.

The central conductor service resides at the hub, and clients and partner services are at the spoke ends.
1. The central conductor service needs to support many concurrent orchestration instances and handle more Input/Output (i/o) operations than normal services, which can bring more performance and scalability challenges.
2. The central conductor service is a single point of failure and, therefore, is critical to the reliability of a service composition.

Such scalability issues and an architectural approach to addressing it are discussed in my thesis titled RESTful Service Composition. The key idea is to implement a composition in a flow-like structure rather than hub-spoke.

Tuesday, October 8, 2013

Clone a resource in HTTP

In order to clone or copy a file, we execute a command like

cp a b

Given a file named a, the command creates a new file named b that has the same content as a. Similarly, we can clone a resource A. The client sends the following request to the server.

PUT /resources/B
...
{"source": "/resources/A"}

However, a client normally does not have the knowledge to properly name a new resource. A more reasonable alternative is the following request.

POST /resources
...
{"source": "/resources/A"}

All these are straightforward, but really smell like RPC. They did not follow the REST constraint of self-descriptive message. How could the server figure out /resources/A is what the client wants. What if the resource has been changed or deleted after the request was sent out? The following approach may be better:

GET /resources/A

Then

POST /resources
...
the representation of /resources/A

What if the representation is huge? Maybe we can do the following:

GET /resources/A

HEAD /resources/A

Then

POST /resources
...
{"source": "/resources/A",
"ETag": "frompreviousrequest"}

Monday, August 5, 2013

A lightweight JavaScript form data binder

In one of my current projects, I need a JavaScrip module to serialize the user form input in JSON format and then to send it to the server. I also need it to deserialize the data from the server and input into the form in order to allow the user to view or update it. Basically it is a two-way form-JSON binder. AngularJS fits this scenario perfectly, but it is a little heavy from my perspective.

js-binding is the one that I chose. I did a little modification on it in order to avoid serializing blank inputs and deserializing undefined JSON properties. It is the server side that decides if a field should be updated if the request says nothing about it. The modified version is on GitHub.

Using the serialization function, I can also check whether the form was updated before submitting the data to the server.

Friday, November 16, 2012

The performance of a simple PV snapshot application

In order to check the impact of programming style, synchronous and asynchronous, on the performance of a PV snapshot, I set up a simple benchmark and tested it with five different implementations. The benchmark is a snapshot to get the current value of (1) 97 connected PV's, (2) 97 connected and 4 disconnected PV's, and (3) 97 connected and 15 disconnected PV's. Four implementations is programmed with Python and pyepics library. The last one is programmed with node.js and it gets the PV value via system caget command line tool. All the implementations are on github.

The pv and ca implementations are based on the suggestions by pyepics developers. The improvement is huge when all the PV's are connected. For 97 PV's, the time required drops from about 2700 milliseconds to about 200 milliseconds. However, they do not perform good when there are disconnected PV's. In the programs, I set all the connection timeout to be 1 second. The pv version program takes extra 2 seconds for each disconnected PV and the ca version program takes extra 1 second for that. The major part of the pv version is The major part of the ca version is Threading should improve the situation by waiting for the timeout in parallel rather than in sequence. However, the overheads of creating processes and sharing state between processes are also enormous. The major part of the multi-threading pv version is And the corresponding multi-threading ca version is The node.js implementation performs much better than both the pv and ca multi-threading implementations. I think this is because 1) the process in node.js is much lighter than those in python, and 2) there is no shared state between processes in node.js because of the usage of asynchronous callback and closure. On the protocol level, channel access is fully parallel and asynchronous. It seems we should do the same on high-level programming level to leverage that advantage.

Monday, September 17, 2012

Ubuntu 12.04 and nvidia video card

I have updated Ubuntu on Dell Vostro 460 from 10.04 to 12.04. The video driver had no problem when I activated the "version current" driver available in system additional drivers. Later I messed up the system when tried to install a driver from nvidia website. Lightdm failed to load, and I can only log into console mode. It can be fixed by forcing reinstall the nvidia-current package.

sudo apt-get --reinstall install nvidia-current

If this does not work, you might also try

sudo apt-get --reinstall install lightdm

A few days ago, Ubuntu updated its kernel. The lightdm loaded but the configuration was wrong. I realized I had to re-enable the additional drivers. A little surprise was the "version current" stopped working, while the "post-release update" worked. The other thing annoying me was the blue face problem of youtube flash video. The fix was described in this post. The most tricky part was that you can only turn off the hardware acceleration option when the video was in full-screen mode.