HTTP Caching with ETag
and If-None-Match
Headers
by Christoph Schiessl on Python and FastAPI
When you build web applications, you generally want to limit their resource consumption as much as possible. Usually, you want to keep file sizes for transfer to the client small or, even better, avoid transfers altogether. Modern browsers have a variety of mechanisms built into them to make caching of previously requested resources seamless, thereby helping to prevent retransfers of data in many cases. One of these mechanisms and this article's topic is the ETag
header.
ETag
response header
The idea behind ETag
headers, which is short for entity tag, is easy to explain: When the HTTP server delivers a resource (i.e., a file), it adds an ETag
response header that contains a representation of the resource. For this purpose, it's common to use a hash value of the response body (e.g., using the SHA1 algorithm). Other alternatives, such as the currently deployed Git revision of the requested resource, would also be feasible. Whatever you use, you must ensure that the representation changes whenever the underlying resource changes.
Imagine having a simple FastAPI application and a dist
directory containing a single index.html
file.
import uvicorn
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
app = FastAPI()
app.mount("/", StaticFiles(directory="dist", html=True), name="dist")
uvicorn.run(app=app, port=3000)
$ tree --noreport .
.
├── app.py
└── dist
└── index.html
If you start your FastAPI app with python app.py
and request the index.html
file, you'll see the ETag
header:
$ http GET http://localhost:3000/index.html
HTTP/1.1 200 OK
content-length: 62
content-type: text/html; charset=utf-8
date: Tue, 02 Apr 2024 18:29:56 GMT
etag: "ad14c587836bb86ca86326dd61c9bf23"
last-modified: Tue, 02 Apr 2024 18:29:50 GMT
server: uvicorn
<!doctype html><meta charset=utf-8><title>/index.html</title>
The HTTP client doesn't need to know and usually doesn't care how the HTTP server calculates its ETag
headers, but in this case, you can look it up in the FastAPI source code. As it turns out, StaticFiles
uses an MD5 hash of the file's modification timestamp and size in bytes. Knowing how that works, we can replicate the calculation on the command line:
$ python -c "import os; s = os.stat('dist/index.html'); print(f'{s.st_mtime}-{s.st_size}')" | xargs echo -n | md5sum
ad14c587836bb86ca86326dd61c9bf23 -
Here, I use the os.stat()
function to get meta-information about index.html
, and then I concatenate and print its modification time and size. Next, I use echo -n
to remove the trailing line break that Python prints out, and finally, I pipe the result into md5sum
to calculate the hash. And sure enough, the hash value from the command line matches the one we got in the HTTP response header.
We can also reverse this and use the command line to predict the hash values we will get from the HTTP response. For example, if we somehow modify index.html
so that its modification time and/or file size change, we can use the shell script from above again to recalculate the hash value ...
$ touch dist/index.html
$ python -c "import os; s = os.stat('dist/index.html'); print(f'{s.st_mtime}-{s.st_size}')" | xargs echo -n | md5sum
eb2785e4b5012179e9ffffca80d32eb7 -
... then we get the same hash value that we will get from a subsequent HTTP response.
$ http GET http://localhost:3000/index.html
HTTP/1.1 200 OK
content-length: 62
content-type: text/html; charset=utf-8
date: Tue, 02 Apr 2024 18:31:10 GMT
etag: "eb2785e4b5012179e9ffffca80d32eb7"
last-modified: Tue, 02 Apr 2024 18:30:56 GMT
server: uvicorn
<!doctype html><meta charset=utf-8><title>/index.html</title>
This was the most important information about the ETag
header and its implementation in FastAPI. However, this is only one side of the coin, and it doesn't mean anything without the other side — namely, client-side support with the If-None-Match
header.
If-None-Match
request header
If the HTTP client supports ETag
caching and receives a response that includes an ETag
header, then it will copy the value of this header (including double quotes) and include it in subsequent requests for the same resource. This is done with the If-None-Match
request header, which is interpreted by the HTTP server as follows:
- If the computed value for the
ETag
header in the response and the value of theIf-None-Match
header in the request is the same, it responds with the status304 Not Modified
(without response body). - Otherwise, if these two values differ, the server responds with the status
200 OK
(with a response body).
The bottom line is that the server doesn't resend the same response body again if the client already has it, and thereby, it saves resources that would have been wasted by transferring the same response body again.
Fortunately, it's straightforward with httpie
to include If-Not-Modified
headers in requests to simulate this behavior on the command line.
$ http GET http://localhost:3000/index.html
HTTP/1.1 200 OK
content-length: 62
content-type: text/html; charset=utf-8
date: Tue, 02 Apr 2024 18:33:23 GMT
etag: "eb2785e4b5012179e9ffffca80d32eb7"
last-modified: Tue, 02 Apr 2024 18:30:56 GMT
server: uvicorn
<!doctype html><meta charset=utf-8><title>/index.html</title>
$ http GET http://localhost:3000/index.html 'If-None-Match: "eb2785e4b5012179e9ffffca80d32eb7"'
HTTP/1.1 304 Not Modified
date: Tue, 02 Apr 2024 18:33:48 GMT
etag: "eb2785e4b5012179e9ffffca80d32eb7"
server: uvicorn
$ # touch the file so that its ETag changes
$ touch dist/index.html
$ http GET http://localhost:3000/index.html 'If-None-Match: "eb2785e4b5012179e9ffffca80d32eb7"'
HTTP/1.1 200 OK
content-length: 62
content-type: text/html; charset=utf-8
date: Tue, 02 Apr 2024 18:33:57 GMT
etag: "8aeaf4ec41fe782adf2f7b86d884754a"
last-modified: Tue, 02 Apr 2024 18:33:55 GMT
server: uvicorn
<!doctype html><meta charset=utf-8><title>/index.html</title>
That's everything for today. You now know the basics of ETag
-based HTTP caching and have seen it work in a FastAPI application. Thank you for reading, and see you soon!