Serenity Leads to a Further Vision: asynchronous processing

Monday, September 26, 2016

Observations of Netflix's Zuul 2 experience

An engineering team from Netflix just published their experience of building a new Zuul from synchronous blocking I/O to asynchronous non-blocking I/O. Here are interesting observations from my perspective.

1. Asynchronous programming is hard.

Concurrency might be one of the most challenging problem in programming. In Java, multithreading already became the norm for concurrency. The Java concurrency package made programmers' life easier than it really is. But it did not change the nature of concurrency. Testing and debugging of multithreading programs are difficult. Reproducing racing, locking, and suspicious execution sequences are tricking.

If a programmer likes to be the boss in her/his program, then she/he will be disappointed by asynchronous programming. No executing sequence is garnered by the sequence in code. The routines say "do not call me, I'll call you." right after the initial call. The way to testing and debugging is different from that for synchronous code. The biggest challenge is that programmers have to change their mindset from sequential to concurrent, from a single player game to a multiple player game, and accept the fact that there is no NOW!

2. For CPU-intensive independent jobs, there is no big performance difference between multi-threading sync blocking and event-based async non-blocking.

When discussing performance and scalability, we should first identify the bottleneck. We should see only one bottleneck at one time, and will see a new one emerging when the current one disappears. When the jobs are CPU-intensive, the bottleneck is the CPU, and the throughput is decided by the CPU. Although whether the I/O is blocking or non-blocking will affect how long the processes will wait, its impact on the throughput is shrouded by the processing time in CPU. However, it will definitely have an impact on the memory. I hope the Netflix team could compare the memory profiles.

In fact, multi-threading does not help accelerate CPU-intensive jobs. In theory, multiprocess with a non-blocking I/O feeding to the job queue will do the best, where the number of processes is decided by the number of cores. Timeout might be required when a job takes too long to finish.

3. Async is the natural way to work with non-blocking I/O (or async I/O), and it introduces capacity gain for I/O intensive jobs.

For I/O intensive jobs, the time to finish the I/O will be significant. If resources are occupied while waiting for the I/O to finish, they are wasted and do not contribute to the throughput but results in a high utilization. On the contrary, async frees resources when I/O is going on, and recapture them only when I/O is done and data is available. The C10K problem is the best resource discussing this on the OS level.

4. Contention and Coherency of a system are decided by both the characteristics of jobs and the system's implementation.

In the Universal Scalability Law (USL), the contention is the coefficient of the first order penalty for concurrency, and the coherency is that of the second order. The contention represents the part of job that cannot be paralleled. When the number of requests in a system is N, the contention will affect (N-1) of the rest requests. The coherency represents the part of each job that can result in the generation of a new contention, which results in the second order effect. Thrashing occurs typically when coherency penalty is dominant. One example of coherency occurs when a request resulting in updating a DB record finds its copy of data is stale. Obviously, the request needs to wait and try again when its copy gets refreshed. The refreshment time can potentially be affected by all other requests that update the DB. This can also occur to shared memory of multiple threads.

In order to achieve better scalability, we need to design the system so that the contention and coherency can be reduced. Async can do nothing about the contention on CPU, but it can reduce the contention and coherency on I/O.

Wednesday, February 24, 2016

Resource representation generation on first GET retrieval

Intent

Provide a resource type that has a new resource representation instance generated on the first retrieval. In HTTP, this is often a GET request to a URL that the resource provides.

Also Known As

The closest pattern I can find is the Multiton pattern.

Motivation

I have implement this pattern in two scenarios:

An application provides the remote control of a specific CCD detector. The detector scans a sample and acquires an image on every scan spot. The native image format is not supported by Web browsers. The users can view the scan process on a page. The page will retrieve a new image once the acquisition finishes on a scan spot. The image is then converted to PNG, and sent to the client. The PNG image is saved, and all later requests will be served directly without conversion. The challenge, in this scenario, is that the application should convert the image to PNG only once. That implies that all the requests of that image before the PNG file is available are served together.

In the other scenario, a client retrieves a user's thumbnail photo from an application. The application gets a photo from an Active Directory when the photo is requested from the first time. The application saves the photo and serves it locally thereafter. Similar to the first scenario, the application should retrieve the photo from the AD service only once.

Design

When a resource representation is requested but is not available in the local file system, the request is put into an array storing all the requests for the same representation. Such an array needs to be put into a hash table, and its key is the resource's identifier. When a resource is requested for the first time, the resource is not available in the local file system, and the corresponding key is not in the hash table. Then the key will be created in the hash table, and the first request is pushed into the array. All following requests of the same resource are pushed into the array when the application is generating the resource representation. When the representation is ready, it is saved in the file system, and the key is removed from the hash table. All the requests in the array are served in a batch. The design can be implemented in various programming languages. There is a big difference between the implementation in a non-event-driven programming language like Java and that in an event-driven programming language like node.js.

Challenge 1: synchronization of the hash table

Adding and removing similar resource requests into the hash table have to be synchronized.

Java	node.js
We will have to use a concurrent util class like java.util.concurrent.ConcurrentHashMap(String, List).	A simple object like {"resource-identifier": []} will work.

Challenge 2: asynchronous processing

When the application is generating or retrieving the resource, we want the thread previously allocated to the request to be freed, and the handling of the request continues when the resource is finally available. Before Servlet 3, we will have to use something like Jetty continuation for this in Java. On the contrary, because node.js by nature has only a single thread, the processing is by default asynchronous.

Java	node.js
We will need put a Jetty continuation or a Servlet 3 AsyncContext into the hash map, and write the response from there when the resource is available.	Just put the standard http.ServerResponse instances in the array.

In order to improve performance, we can also add cache control to these generated resources in addition to the copy in the file system.