Tuesday, August 21, 2012

Two leaky designs related to URL length limit

I discuss two cases of leaky design in this post. This first is Google Calendar, and the second is eBay POST caching.

Google Calendar

If you are using Google Calendar, you might have seen the broken droid with 414 code like this.

The reason for 414 is that the request URI is so long that the server refuses to handle. The maximum length of URL allowed by a server depends on its implementation. No matter it is 255 bytes or 2k characters or even 4k characters, it will reach the limit when URL is used to encoding a complicate request query parameters.

In the case of Google Calendar, when a user clicks the link to add events in GMail. An HTTP GET request is sent to Calender server with the details of the events encoded in the URL. For normal event often represented in iCalendar format, the URL can provide enough room for the data. Some event message might attach other extra text inside, when GMail tries to encode those into the request URI to Calendar, 414 might happen.

A possible solution can be a modified version of GMail that ensure a proper length of the URL with important information encoded inside. However, that is not a right solution. It is because the interface design between GMail and Google Calendar is not RESTful or simply does not follow the HTTP specification. HTTP GET is for retrieving the representation of a resource not for creating a new resource. A side effect of such a design is a duplicate even will be created every time a user clicks the same link and issues the same HTTP GET request.

A straightforward fix is to use POST to create a new event, and the details in the original email can then be easily included in the message body. 414 will never happen. In order to prevent creation of duplicated event, the server might need to verify if it is the same event that already existed on the same time slot.

If, in some cases, the designer really want to create a new resource by a GET request, the request URI should be able to identify the resource to be created. That is, instead return a 404, the server creates a new resource based on the request URI, and then returns the representation of that resource. For a server serves many concurrent clients that might request that URL at the same time, special care needs to be taken to prevent creation of duplication resource on the server and also holds the following requests until the resource representation is available. I developed such a service by using a Servlet Filter to do these tasks with Jetty continuation.

eBay POST caching

In this post, eBay engineer described their careful design of caching POST messages for eBay search. Two major motivation of this design is 1) to leverage the performance and scalability benefits of HTTP caching, and 2) to avoid the length limitation of HTTP request URI. I think the design is leaky because it might introduce extra special header to the message and very naturally not all the intermediaries can understand those headers and help the caching. The forward and reverse proxies in the diagram might only be the customized deployment of eBay.

As Akara Sucharitakul pointed out in his comment, a 303 can be used for the initial POST request,and then caching will work with the following GET. However, this does not reduce the round-trip time. My thought is to generate the MD5 key on the client via JavaScript, and then GET the resource identified by the key. If the user is lucky to be the first to issue such a request, a 404 will be returned, and then the client will do the real POST to create that resource on the server.