Wednesday, July 30, 2008

Amazon S3 gossip and decentralized control

Via Werner, I read the technical explanation and solutions about the last S3 outrage. It is interesting to know that a "gossip" protocol is used in S3 for messaging around servers. The explanation did not give too many details about this gossip protocol. I suspect that it is a kind of p2p flooding. Then rumor flooding will produce the result as "On Sunday, we saw a large number of servers that were spending almost all of their time gossiping and a disproportionate amount of servers that had failed while gossiping." The rumor resulted from small number of corruptions of original message. This recalls me about a manuscript I wrote about centralized or decentralized systems.
A system applying decentralized control paradigms can easily reach several local optimal solutions, while it is hard for such a system to check which solution is the global optimal solution. The systems are sometimes trapped in locally optimized situations, and cannot get out without outside interferences. The “circular mill” of army ants is a typical example for the local-optimization issue. For army ants that are blind and move by following the ant ahead, an isolated group of ants may form a circle which will get larger and larger until the ants die of starvation.

(picture from T. C. Schneirla. Army ants. A study in social organization. W. H. Freeman & Co, San Francisco, 1971.).