Tuesday, October 28, 2008

Two models of load testing and corresponding tools

When a load testing is conducted, a tester first needs to decide what kind of workload should be applied, that is, load characterized by the number of clients or arrival rate (interarrival time) . For those who always use one tool for load testing, it is better to try other options to get different interpretation of system behaviors.

Tools specify the number of clients:
These two are both developed in Java. The most important parameter to be configured is the number of simulated users that will send requests to the server. The simulated concurrent users are natually mapped to Java threads. Both of them can specify thinking time of different distributions. A difference between them is that JMeter specifies how many rounds the request scenario will run and Faban specifies how long the test will run.

Tools specify the arrival rate:
Since the processes in Erlang are very light, new process can be spawned very fast and a machine can support thousands of Erlang processes without problem. This make it possible for Tsung to generate load of given interarrival time, while it is difficult for Java-based tools.

Friday, October 17, 2008

Finally they heard that ... "web-based services"

Back in 2007, W3C had a workshop of "Web of Services for Enterprise Computing". I have blogged it here. The presentations at the workshop made me understand more about why some thought the so-called "web services" were not about web. It was around that time that I started to believe that REST is much better than WS-* for the problems I was looking at. These came to my mind when I read the CFP of ICWS2009 this morning. It sounds like that the organizers began to realize that "web services" should be "web-based services". Two of three major conference areas have web titled -- "Web-based Services", "Web Services Applications beyond Web". Hope I can have a manuscript for it by January 19, 2009.

Tuesday, September 23, 2008

Reflection from Pat Helland about Amazon S3 outrage

The post titled "Confidence in the Cloud" by Data guru Pat is really worth reading. Interesting quotations from his post:
the implementations which sent the minimum amount of data seemed to be the most resilient.
which leads to
Communicating less information within a message is usually best. If you send extra stuff, it can cause corruption!
For cloud, he predicts
As I look at data center cost structures, it is clear that it is going to be a competitive business with many advantages to large data center managers with large economies of scale. In a handful of years, most companies will look to offsite providers for their reliable servers.

Wednesday, September 3, 2008

Word cloud of my reading

Created by Wordle. You may need Firefox, because Chrome does not support the applet. 

Comic of Google Chrome

Its quite interesting to read what Google engineers think about the web browser or web client design with respect to multi-threading and multi-process for tabs (http tasks). 

And what the multi-process means for memory management.  Many blocking and crashing problems result from shared memory and memory space.

Wednesday, August 27, 2008

UofS calendar updated to Aug. 2009

A new academic year will start soon. I have updated the calendar of UofS Academic Schedule to Aug. 2009.

You can subscribe the calendar by clicking

You can also get the XML and ICAL format.

Wednesday, July 30, 2008

Amazon S3 gossip and decentralized control

Via Werner, I read the technical explanation and solutions about the last S3 outrage. It is interesting to know that a "gossip" protocol is used in S3 for messaging around servers. The explanation did not give too many details about this gossip protocol. I suspect that it is a kind of p2p flooding. Then rumor flooding will produce the result as "On Sunday, we saw a large number of servers that were spending almost all of their time gossiping and a disproportionate amount of servers that had failed while gossiping." The rumor resulted from small number of corruptions of original message. This recalls me about a manuscript I wrote about centralized or decentralized systems.
A system applying decentralized control paradigms can easily reach several local optimal solutions, while it is hard for such a system to check which solution is the global optimal solution. The systems are sometimes trapped in locally optimized situations, and cannot get out without outside interferences. The “circular mill” of army ants is a typical example for the local-optimization issue. For army ants that are blind and move by following the ant ahead, an isolated group of ants may form a circle which will get larger and larger until the ants die of starvation.

(picture from T. C. Schneirla. Army ants. A study in social organization. W. H. Freeman & Co, San Francisco, 1971.).

Thursday, July 10, 2008

Find the tail in a distribution using Pareto principle

In July-August 2008 issue of Harvard Business Review, an article titled "Should You Invest in the Long Tail?" raised discussion between the author of the Long Tail book and the article's author. One of the major disagreements is how to distinguish the head and the tail of a distribution. The term tail is also used in context of heavy tailed distributions and power law. The mathematical definition of heavy tail is to compare a cumulative distribution function with an exponential function. If the function increases slower than an exponential function, which suggested that the function is likely polynomial, then the function has a heavy tail. Using exponential function to tell where the tail begins in a distribution is not straightforward.

I found another easy way to cut the tail out. That is via Pareto principle, or 80-20 rule. No matter it is 80-20 or 70-30, we can always get an equation
where k is the sequence number of an item in the studied set (size N) sorted in a descending order according to their popularity, rank(k) is the ranking of kth item in term of percent (k/N), and F(k) is the cumulative function. Suppose we have a set of N items, the tail of its distribution can be determined by the intersection of F(k) and f(k)=1-k/N. The intersection point indicate where the cumulative function value is equal to 1- rank(k). For example, when rank(k) = 20%, F(k) = 80% at the intersection, which is 80-20. Since F(k) is a strictly increasing function, and f(k)=1- k/N is a strictly decreasing function, and their value ranges are the same, [0, 1], we can ensure that there is one and only one intersection of the two functions.

let's use an example to check if this method can find a "good" point to divide the tail from the head. Zipf distribution is very popular for characterizing the distribution of ranked items like web pages, words, films, and books. The figure shows the cumulative function of a Zipf distribution with N = 1000 and s = 1. We get a very interesting result that the intersection is right at 80-20 place.

Monday, July 7, 2008

Solution for latex2html not generating images

Recently I updated GhostScript and NetPbm on my Windows box, and found the latex2html started to report errors when generating images. The error messages said "bad file descriptor" when executed pstoimg.bat. Google gave several pages that contains exactly the same symptom of this problem but no solutions. Some suggested to use the debug mode of latex2html to see more detailed tracks. It really helps! I found the NetPbm executable file was trying to locate a file named rgb.txt, and it tried several places that are the directories on Linux systems, but failed. It also suggested a RGBDEF environment variable. The file is in the misc directory of the NetPbm I installed. After I set the environment variable, the problem is fixed.

It seems that all applications better have a DEBUG mode.

Wednesday, May 21, 2008

Java String to byte array and InputStream

Based on Google result, the best solutions are
byte[] byteArray = yourString.getBytes(charsetName);

InputStream stream = new ByteArrayInputStream(yourString.getBytes(charsetName));

Monday, May 19, 2008

Start trying Google App Engine

Just got a message from Google that my account was activated. A little anxious to start a journey of learning Python and doing experiments.

Sunday, May 18, 2008

Two font problems causing EPS cannot appear in LaTeX document

I am working on Windows XP, using Visio for drawing. My procedure to produce an EPS graphics is printing Visio page to PS -> PS to EPS using GSView. There are many other ways to get EPS out of Visio drawing. MetafileToEPSConverter is a good tool highly recommended, and remember to adjust its printing quality to higher than 600 dpi to get nice figures in PS file.

If you have adjusted your system DPI setup recently for your new big LCD, then you may find your old EPS files cannot be displayed correctly any more in a new PS file. The symptom is that the graphics just flashes and disappears, and leaves a blank space. The solution is just to change the DPI setup to the old value.

The other problem I encountered was that the font in EPS files looked weird and some characters were even missing. When embedded them in LaTeX documents, the graphics was simply not there. Later I figured out that that might caused by the Ghostscript, it was solved when I updated both GSview and Ghostscript to the latest versions.

Tuesday, May 13, 2008

Switch to blogger from MoveableType

I used to blog on Movable Type hosted by the University, but was a little disappointed after losing my long-typing several times. The first alternative blog host service I considered was WordPress, but I was surprised by they even charge for update the CSS of a template. I have registered already on blogger, and read many hosted by blogger in my reader. So I finally decided to move here. My old posts will still on MT for searching engines' sake until I have no access of the university service.

Wednesday, February 27, 2008

Steve Vinoski on REST, Web Services, and Erlang

Just finished the video of Interview with Steve Vinoski by Stefan Tilkov. Highly recommend to watch it if you read Vinoski's blog constantly. As Stefan promised, InfoQ has a nice new video infrastructure now. The two-direction synchronization of video and transcript is cool.

Thursday, January 17, 2008

Say thanks to Mark

Mark Baker is one of the Canadians in my list of "FIELDING HAS A POSSE". You can find his name in many old discusses about "REST VS SOAP" topics at various blogs and Yahoo SOA and REST discussion groups. Maybe he is a little tired of that now. But I agree that "the war really has been won" for him. He had already broadcast the ideas to many like me. And now I try to tell others the message when I am learning it. Thank you, Mark.

Tuesday, January 15, 2008

"Grid computing zone is being retired"

Thank you for your continued interest in grid computing and for looking to developerWorks to keep you up to date. The Grid computing zone is no longer being updated because the zone is being discontinued.

from IBM developerWorks

Tuesday, January 8, 2008

Slides for TRLabs SOA workshop

The TRLabs just organized a successful workshop on SOA. I gave a short presentation about REST there. Here is the slides.