An engineering team from Netflix just published their experience of building a new Zuul from synchronous blocking I/O to asynchronous non-blocking I/O. Here are interesting observations from my perspective.
1. Asynchronous programming is hard.
Concurrency might be one of the most challenging problem in programming. In Java, multithreading already became the norm for concurrency. The Java concurrency package made programmers' life easier than it really is. But it did not change the nature of concurrency. Testing and debugging of multithreading programs are difficult. Reproducing racing, locking, and suspicious execution sequences are tricking.
If a programmer likes to be the boss in her/his program, then she/he will be disappointed by asynchronous programming. No executing sequence is garnered by the sequence in code. The routines say "do not call me, I'll call you." right after the initial call. The way to testing and debugging is different from that for synchronous code. The biggest challenge is that programmers have to change their mindset from sequential to concurrent, from a single player game to a multiple player game, and accept the fact that there is no NOW!
2. For CPU-intensive independent jobs, there is no big performance difference between multi-threading sync blocking and event-based async non-blocking.
When discussing performance and scalability, we should first identify the bottleneck. We should see only one bottleneck at one time, and will see a new one emerging when the current one disappears. When the jobs are CPU-intensive, the bottleneck is the CPU, and the throughput is decided by the CPU. Although whether the I/O is blocking or non-blocking will affect how long the processes will wait, its impact on the throughput is shrouded by the processing time in CPU. However, it will definitely have an impact on the memory. I hope the Netflix team could compare the memory profiles.
In fact, multi-threading does not help accelerate CPU-intensive jobs. In theory, multiprocess with a non-blocking I/O feeding to the job queue will do the best, where the number of processes is decided by the number of cores. Timeout might be required when a job takes too long to finish.
3. Async is the natural way to work with non-blocking I/O (or async I/O), and it introduces capacity gain for I/O intensive jobs.
For I/O intensive jobs, the time to finish the I/O will be significant. If resources are occupied while waiting for the I/O to finish, they are wasted and do not contribute to the throughput but results in a high utilization. On the contrary, async frees resources when I/O is going on, and recapture them only when I/O is done and data is available. The C10K problem is the best resource discussing this on the OS level.
4. Contention and Coherency of a system are decided by both the characteristics of jobs and the system's implementation.
In the Universal Scalability Law (USL), the contention is the coefficient of the first order penalty for concurrency, and the coherency is that of the second order. The contention represents the part of job that cannot be paralleled. When the number of requests in a system is N, the contention will affect (N-1) of the rest requests. The coherency represents the part of each job that can result in the generation of a new contention, which results in the second order effect. Thrashing occurs typically when coherency penalty is dominant. One example of coherency occurs when a request resulting in updating a DB record finds its copy of data is stale. Obviously, the request needs to wait and try again when its copy gets refreshed. The refreshment time can potentially be affected by all other requests that update the DB. This can also occur to shared memory of multiple threads.
In order to achieve better scalability, we need to design the system so that the contention and coherency can be reduced. Async can do nothing about the contention on CPU, but it can reduce the contention and coherency on I/O.