Picture the scene: A perfectly ordinary apocalypse-morning. I wake up to the sun shining, and to birds chirping. After breakfast, I head over to Sainsbury’s, put on my home-made apocalypse-mask and, as I’ve recently become accustomed to, I open up the SmartShop app to begin scanning and bagging my groceries as I shop.

Everything is going smoothly. A bag of coffee takes a while to process, but I attribute that to spotty mobile coverage in that part of the store. Checking out takes less than a minute and, as usual, there is no queue.

I walk my groceries home and, feeling somewhat accomplished,1 I unpack my bags and settle on the sofa to review my receipt.2 Everything is fine, until I get to the coffee; or rather, the three identical coffee entries I find on my receipt. I’m puzzled for a moment; I’m quite sure I only scanned the coffee once. Then it hits me… TCP

TCP?

The Transmission Control Protocol has sat at the heart of network communication since the inception of the internet. HTTP requests are actually TCP packets underneath, with a few custom headers.

There’s a good reason TCP is so popular, and it’s to do with how unreliable networks are. When a networked device sends a packet, it has no idea whether that packet will get to its intended destination intact, especially over a wireless network.

TCP’s cunning solution to this problem is to require an acknowledgement, or “ack”, to be sent for each packet. This lets the sender know that its packets were received. If the sender doesn’t receive an ack before a timeout, the data is resent… which is where we get into problems.

Two Generals’

To characterise the issue we run into, let’s use a classic thought experiment - the Two Generals’ problem. Let’s imagine there are two generals in an army, leading their respective forces to attack a target from two sides. They need to attack at the same time, and so they send messengers between each other to determine when they will attack.

The first general sends a message saying they should attack at midday. However, he doesn’t trust the messenger – he could get lost, or ambushed, or even desert – so in his message, he asks the second general to send a messenger back to him to confirm he received the message. But now the second general is in a dilemma; how does he know that the first general will receive his response? Should he ask for confirmation of his confirmation? Won’t the first general want a confirmation of his confirmation of the second general’s confirmation of his message? Where does it end?

This is a solved problem… right?

I hear you cry.

TCP puts a sequence number on each packets so that if a packet is received twice, it is ignored. This solves the two generals problem. If the first general doesn’t receive an acknowledgement of his message within the expected timeframe, he sends another messenger with the same message until an acknowledgement is received. he puts an ID number in the message so that if the second general receives two messages, he knows they are duplicates and doesn’t… attack twice? I don’t know; this analogy is breaking down.

Assuming the first general receives at least one acknowledgement in time for the attack, all goes as planned… except when it doesn’t.

Back to reality and client-server communication

In the TCP world, connections are memory hogs. To alleviate the stress, and to prevent abuse, TCP connections ordinarily have short timeout, so that if a client stops responding, the server can quickly drop the connection and focus on responsive clients. Unfortunately, this makes TCP connections high maintenance, potentially causing different, but equally important problems for servers maintaining a high volume of long-lived connections.3

For HTTP to work at scale, it has to be cheap for servers to maintain the appearance of a session with the user. Add to that spotty low-bandwidth (and sometimes expensive!) wireless networking, and it’s easy to see how one might conclude that HTTP connections should be as short-lived as possible. But here-in lies the issue – TCP sequence numbers are allocated on a per-connection basis, so numbering from any previous connections is lost. This means that if a connection is lost after a request is sent but before an acknowledgement is received, we are left with a dilemma – do we try again? In order to gracefully handle spotty wireless networking, we would like to be able to retry a few times, but without knowing if our request was received, we might create duplicates.

It’s actually even worse than that

Now let’s assume we choose not to retry. We now have a worse problem – how to break the news to the user. Most apps display some kind of error message along the lines of “That didn’t work, maybe try again?”, but this actually compounds the problem. After all, what happens if the user does try again like you asked them to? Now you have the same problem – duplicate requests. The user doesn’t know any better, and it’s even harder to link together the original operation with the retry, since only the user is aware that this even is a retry.

And how should they know that this could be a problem? Your app told them that their first attempt failed, so they would naturally expect their shopping basket (or whatever server-side state your app is manipulating) not to have changed. So now we’re relying on the user to fix our problems, even though we left them potentially misinformed.

Back to TCP

Let’s tackle one problem at a time – first, retrying our HTTP requests. Essentially, we’re trying to make sure that a request will have the same effect regardless of how many times we send it. This is called idempotence.

We already know that TCP has sequence numbers, and we can draw from that. Let’s put an additional header into our requests - a GUID that’s unique to each request. When we send retries, we can use the same GUID as in the original request. The server can keep track of these GUIDs, ignoring duplicate requests that contain the same GUID as a previously received request (we can just respond with an empty “200 OK”, or some other suitable situation-dependant behaviour).

We still have a problem

Whether we retry or not, an operation may still need to eventually fail if the maximum number of retries is reached, which leaves us with our second problem. When this happens, we still have to tell the user that something failed, and they’re still likely to retry the operation manually themselves, which could still cause duplicate operations.

Fortunately, there are other ways to achieve idempotence – for our shopping basket example, we can simply send the full basket in our request instead of just the item we intend to add. That way, no matter how many times we retry an operation, we’ll still end up in the same state once it succeeds.

Of course, the basket could have hundreds of items in it, which would make requests unexpectedly large. So we could instead hash the basket contents and send the hash of the previous basket together with the change we’d like to make. That request might look something like this:

POST /basket/add_remove HTTP/1.1
Host: supermarket.example.com
Content-Type: application/json
Content-Length: 103
Authorization: Bearer <token>

{
  "basket_md5": "9a5cd1f7af275b0ac11adb1630d44ac0",
  "add": {
    "coffee": 1
  },
  "remove": {}
}

If the hashes don’t match, the server can return an error indicating this, and the client can either make a separate request for the contents of the current basket and decide what to do, or it can send a “full basket update” to the server to re-sync.

Conclusion

When designing a RESTful API, it’s important to carefully consider how clients will handle request failures. Most of the time, this will mean making requests idempotent in some way.

This isn’t really a problem with TCP, but more a problem with the assumptions that people make when designing and consuming APIs. But since this misconception is so common, it has ultimately become a problem inherent in many web APIs and consumers of those APIs. Even when an API is designed with features such as idempotency headers, they are often misused (or not used at all) by clients, rendering them useless.

As for my coffee, Sainsbury’s were kind enough to give me a partial refund in the form of a voucher, and in future I’ll be diligently running through my basket at the checkout before I pay.


  1. I went outside today!

  2. I split the cost of some items with my housemates.

  3. TCP connections aren’t considered resource heavy by modern standards, but historically they were. In any case, long-lived TCP connections still put you at risk of an attack.