Using the If-Match
Header to Avoid Collisions
by Christoph Schiessl on Python and FastAPI
If you are building an HTTP API and have to support users with concurrent access to the same updatable resource, you must safeguard against collisions. There are many ways to do this, but today, I want to discuss one approach that uses the ETag
response header and the If-Match
request header.
What are Collisions?
Imagine you have your web application running on localhost:8000
with the following two endpoints:
GET /documents/{id}
— returns the current content of the document, identified by the givenid
.PUT /documents/{id}
— accepts a request body with some content and creates a new document under the givenid
(or replaces the document's content if it already exists).
Now, consider the following timeline of two users, Alice and Bob, who are concurrently interacting with your web application:
- Alice loads the current content of document number 1 with
GET /documents/1
. - Bob also loads the current content of document number 1 with
GET /documents/1
. At this point, Alice and Bob have the same content. - Both users modify their local copy of the content somehow.
- Alice is happy with her modifications and submits her finished work to the server with
PUT /documents/1
. - Shortly after, Bob is also done with his modifications and submits his updated local content to the server with
PUT /documents/1
.
The last step causes the collision! Bob is unaware that Alice has already updated the document before him, so he doesn't know that his local copy is outdated. When he saves his work, he overwrites the changes that Alice has saved before. This situation is known as a lost update and there are many ways to deal with it. For instance, we could introduce pessimistic locking to guarantee exclusivity to users so that updates from other users are blocked while they modify the content.
That said, pessimistic locking is often overkill because, in many cases, it's good enough to detect conflicts when they occur so that later updates are blocked and earlier updates aren't overwritten.
How to Prevent Lost Updates?
We must extend our API endpoints to support the ETag
response header and the If-Match
request header to detect conflicts and block users from accidentally overwriting each other user's updates. Essentially, we have to implement the following logic:
GET /documents/{id}
— returns the document's current content and anETag
response header that represents the document's current content (e.g., using some hash function). For that purpose, it doesn't matter whether we use a strong or weakETag
header.PUT /documents/{id}
— accepts a request body with some content and creates a new document identified by the givenid
(or replaces the document's content if it already exists). It also accepts an optionalIf-Match
header that triggers the following logic if present: If the value of the document'sETag
matches the value of the givenIf-Match
header, the endpoint performs the update as before. However, if the two values don't match, the HTTP request must fail with the status412 Precondition Failed
and not update the document's content.
Given this extended API, we can run the timeline from before again and observe the different behavior:
- Alice loads the current content of document number 1 with
GET /documents/1
. - Bob also loads the current content of document number 1 with
GET /documents/1
. At this point, Alice and Bob have the same content and the sameETag
value representing the content. - Both users modify their local copy of the content somehow.
- Alice is happy with her modifications and submits her finished work to the server with
PUT /documents/1
. She also adds anIf-Match
header with theETag
value she received when loading the document. - Shortly after, Bob is also done with his modifications and submits his updated local copy to the server with
PUT /documents/1
. He, too, adds anIf-Match
header with theETag
value he received when loading the document.
Now, we get a different behavior for the last step. The If-Match
header Bob submitted no longer matches the ETag
value the server computes from the document's current content. You see, Alice modified the content and, hence, made the server calculate a different ETag
value for the updated content. Long story short, due to the new logic in the extended API, the server now rejects Bob's PUT
request with the status 412 Precondition Failed
, which previously would have resulted in the loss of Alice's update.
Blocking Conflicting Updates with FastAPI
All of this isn't too hard to implement in FastAPI. The GET
endpoint below attempts to return the content of the document with the given id
. If this document doesn't exist, it simply fails with the status 404 Not Found
. On the other hand, if it does exist, there are two possible responses.
Firstly, if the request has an If-None-Match
header and its value matches the calculated ETag
value, then the request is short-circuited with 304 Not Modified
(e.g., HTTP caching behavior). Secondly, if the request doesn't include an If-None-Match
header or its value doesn't match the computed ETag
value, the server responds with a normal 200 OK
status and returns the document's content in the response body.
For the PUT
endpoint, we start by determining whether the given document id
is already known. If not, we create a new document no matter what. However, if this request is about updating an existing document, we have two options again.
Firstly, if the request has an If-Match
header and its value doesn't match the calculated ETag
value, then the request fails with the status 412 Precondition Failed
, and the document's content is not updated. Secondly, if the request doesn't include an If-Match
header or its value matches the computed ETag
value, the server responds with a 204 No Content
status, and it updates the document's content according to the text in the request body.
from fastapi import FastAPI, Request, status
from hashlib import md5
from fastapi.responses import PlainTextResponse
import uvicorn
app = FastAPI()
documents_database: dict[str, str] = {}
@app.get("/documents/{id}")
async def get_document(request: Request, id: str) -> PlainTextResponse:
if (current_content := documents_database.get(id)) is not None:
etag = md5(current_content.encode()).hexdigest()
if request.headers.get("If-None-Match") == etag:
return PlainTextResponse(status_code=status.HTTP_304_NOT_MODIFIED, headers={"ETag": etag})
return PlainTextResponse(status_code=status.HTTP_200_OK, headers={"ETag": etag}, content=current_content)
else:
return PlainTextResponse(status_code=status.HTTP_404_NOT_FOUND)
@app.put("/documents/{id}")
async def put_document(request: Request, id: str) -> PlainTextResponse:
if (current_content := documents_database.get(id)) is not None:
etag = md5(current_content.encode()).hexdigest()
if request.headers.get("If-Match") and request.headers.get("If-Match") != etag:
return PlainTextResponse(status_code=status.HTTP_412_PRECONDITION_FAILED)
documents_database[id] = (await request.body()).decode()
return PlainTextResponse(status_code=status.HTTP_204_NO_CONTENT)
else:
documents_database[id] = (await request.body()).decode()
return PlainTextResponse(status_code=status.HTTP_201_CREATED)
if __name__ == "__main__":
uvicorn.run(app=app)
If we run this program with python app.py
, we can test it using curl
.
$ curl -i -s -X GET localhost:8000/documents/1
HTTP/1.1 404 Not Found
date: Fri, 14 Jun 2024 07:09:41 GMT
server: uvicorn
content-length: 0
content-type: text/plain; charset=utf-8
$ curl -i -s -X PUT --header "Content-Type: text/plain" --data "This is the content." localhost:8000/documents/1
HTTP/1.1 201 Created
date: Fri, 14 Jun 2024 07:09:49 GMT
server: uvicorn
content-length: 0
content-type: text/plain; charset=utf-8
$ curl -i -s -X GET localhost:8000/documents/1
HTTP/1.1 200 OK
date: Fri, 14 Jun 2024 07:10:00 GMT
server: uvicorn
etag: 3cf463e834022556dbf5f5dc571cbb0f
content-length: 20
content-type: text/plain; charset=utf-8
This is the content.
Document number 1 doesn't exist initially. After creating it, we can fetch its content in subsequent GET
requests. The responses of these GET
requests, include an ETag
header, whose value we can use to see the HTTP caching behavior in action ...
$ curl -i -s -X GET --header "If-None-Match: 3cf463e834022556dbf5f5dc571cbb0f" localhost:8000/documents/1
HTTP/1.1 304 Not Modified
date: Fri, 14 Jun 2024 07:10:40 GMT
server: uvicorn
etag: 3cf463e834022556dbf5f5dc571cbb0f
content-type: text/plain; charset=utf-8
We can then proceed to update the document's content, using the same ETag
value for the If-Match
header. At this point, the PUT
succeeds and responds with the status 204 No Content
because the value of the If-Match
header and the calculated ETag
value do match. The successful update of the content changes the ETag
value, and therefore, the HTTP caching behavior no longer works if we are still using the ETag
value that we originally received.
$ curl -i -s -X PUT --header "Content-Type: text/plain" --header "If-Match: 3cf463e834022556dbf5f5dc571cbb0f" --data "This is *updated* content." localhost:8000/documents/1
HTTP/1.1 204 No Content
date: Fri, 14 Jun 2024 07:10:49 GMT
server: uvicorn
content-type: text/plain; charset=utf-8
$ curl -i -s -X GET --header "If-None-Match: 3cf463e834022556dbf5f5dc571cbb0f" localhost:8000/documents/1
HTTP/1.1 200 OK
date: Fri, 14 Jun 2024 07:11:11 GMT
server: uvicorn
etag: c7557d8a6989d574813b034fa42c86f6
content-length: 26
content-type: text/plain; charset=utf-8
This is *updated* content.
Next, we try to PUT
the document again, still using the original ETag
value for the If-Match
request header. This time, the request fails with the status 412 Precondition Failed
and does not update the document's content. Needless to say, this is the whole point of this article because this is precisely the behavior that would avoid lost updates in the real world.
$ curl -i -s -X PUT --header "Content-Type: text/plain" --header "If-Match: 3cf463e834022556dbf5f5dc571cbb0f" --data "This is *differently updated* content." localhost:8000/documents/1
HTTP/1.1 412 Precondition Failed
date: Fri, 14 Jun 2024 07:11:20 GMT
server: uvicorn
content-length: 0
content-type: text/plain; charset=utf-8
$ curl -i -s -X GET localhost:8000/documents/1
HTTP/1.1 200 OK
date: Fri, 14 Jun 2024 07:11:29 GMT
server: uvicorn
etag: c7557d8a6989d574813b034fa42c86f6
content-length: 26
content-type: text/plain; charset=utf-8
This is *updated* content.
Finally, if we PUT
the document again but don't include an If-Match
request header, then all bets are off. The behavior that gave us 412 Precondition Failed
is not active, meaning there is no protection against lost updates.
$ curl -i -s -X PUT --header "Content-Type: text/plain" --data "This is *differently updated* content." localhost:8000/documents/1
HTTP/1.1 204 No Content
date: Fri, 14 Jun 2024 07:11:37 GMT
server: uvicorn
content-type: text/plain; charset=utf-8
$ curl -i -s -X GET localhost:8000/documents/1
HTTP/1.1 200 OK
date: Fri, 14 Jun 2024 07:11:45 GMT
server: uvicorn
etag: d8ed2dc3564dbab400f8b4ec488b0f6e
content-length: 38
content-type: text/plain; charset=utf-8
This is *differently updated* content.
Conclusion
I hope this article made you consider competing updates and inspired you to write safer code for your next project. Thank you very much for reading, and see you soon!