Date/Time Formatting According to RFC 5322
by Christoph Schiessl on Python
Several legacy standards define their own way of date/time formatting. The modern world usually depends on well-known formats like ISO8601, but older standards often do not. Obviously, we cannot just abandon these standards because they are widely used, and replacing them would be a contradiction in itself. The point of establishing standards in the first place is that they do not change over time once published. One of these standards with custom date/time formatting rules, which may be the most important one for the Internet today, is RFC 5322.
RFC 5322 defines the format of email messages. Every email is serialized to text using this standard when transmitted over the wire. Like HTTP requests and responses, serialized emails consist of a header section (a list of key-value pairs) and a body. One of the mandatory fields in the header is the Date
field, which is why the standard defines its own date/time format.
Furthermore, other standards are building on top of RFC 5322, and some are referencing it specifically to reuse its date/time formatting rules. This is one more reason to talk about it because, for instance, the HTTP headers Last-Modified
, If-Modified-Since
, and If-Unmodified-Since
all borrow a subset of the date/time formatting permitted in RFC 5322 to define their own formatting.
Date/Time Format in RFC 5322
Here is the date/time format defined in RFC 5322. I took the liberty to remove certain details, providing backward compatibility with older standards like RFC 2822 (one of the predecessors of RFC 5322). Furthermore, I already constrained the definition to the subset of formats permitted for usage in HTTP headers.
date-time = day-name "," SP date SP time SP "GMT"
day-name = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" / "Sun"
date = day SP month SP year
day = 2DIGIT
month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" /
"Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"
year = 4DIGIT
time = hour ":" minute ":" second
hour = 2DIGIT
minute = 2DIGIT
second = 2DIGIT
Overall, this format is very strict and should be relatively easy to serialize and parse with Python. For example, the date strings Wed, 20 Feb 1991 17:16:15 GMT
and Mon, 01 Jan 2024 13:14:15 GMT
comply with this format.
Using the datetime
module
Maybe the most apparent approach is datetime.strftime()
and datetime.strptime()
. So, in the following example, I start by instantiating a datetime
object without microseconds since RFC 5322 doesn't support fractional seconds. Next, I serialize the datetime
object into a valid RFC 5322 string using the strftime()
method. Finally, I convert this string back to a datetime
object equal to the original datetime
object using the class method strptime()
.
Python 3.12.2 (main, Mar 29 2024, 14:30:28) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime, UTC
>>> original_timestamp = datetime.now(UTC).replace(microsecond=0)
>>> original_timestamp.isoformat() # datetime object with tzinfo=UTC
'2024-04-07T15:24:47+00:00'
>>> rfc5322_format = "%a, %d %b %Y %H:%M:%S GMT"
>>> print(rfc5322_timestamp := original_timestamp.strftime(rfc5322_format))
Sun, 07 Apr 2024 15:24:47 GMT
>>> parsed_timestamp = datetime.strptime(rfc5322_timestamp, rfc5322_format)
>>> parsed_timestamp.isoformat() # datetime object with tzinfo=None
'2024-04-07T15:24:47'
>>> assert original_timestamp == parsed_timestamp.replace(tzinfo=UTC)
Locale
The approach above works fine as long as you are in control of the environment your program is running in. But, as soon as this is not the case, you have a subtle problem because the format codes %a
(abbreviated weekday) and %b
(abbreviated month) depend on the current locale. This can easily be demonstrated:
Python 3.12.2 (main, Mar 29 2024, 14:30:28) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '') # enable usage of default locale
'de_DE.UTF-8'
>>> from datetime import datetime, UTC
>>> timestamp = datetime(2024, 5, 7, 11, 13, 15, 0, UTC)
>>> rfc5322_format = "%a, %d %b %Y %H:%M:%S GMT"
>>> timestamp.strftime(rfc5322_format)
'Di, 07 Mai 2024 11:13:15 GMT'
So, I changed my default locale to German by setting the environment variable LANG
as de_DE.UTF-8
before starting Python. Secondly, I called locale.setlocale(locale.LC_ALL, '')
to make the Python interpreter use the default locale, which is required as explained in the documentation:
According to POSIX, a program which has not called
setlocale(LC_ALL, '')
runs using the portable'C'
locale. Callingsetlocale(LC_ALL, '')
lets it use the default locale as defined by the LANG variable.
In any case, as you can see in the last line, the output is clearly not English. Therefore, it's not compliant with the format required by RFC 5322.
You may now be tempted to temporarily change the locale to en_US.UTF-8
before calling strftime()
or strptime()
and changing it back when you are done. Technically, this would work, but it's quite an expensive operation, and more importantly, it's not thread-safe. Your program's locale is a global state; therefore, changing it immediately impacts all threads. Needless to say, this may cause many hard-to-diagnose bugs, and hence, it's not a good approach.
Using the email.utils
module
By far, the best solution is provided by Python's standard library. The email.utils
module provides the format_datetime()
function to serialize datetime
objects to strings according to RFC 5322. Furthermore, it also provides the parsedate_to_datetime()
function to convert such strings back to datetime
objects.
Python 3.12.2 (main, Mar 29 2024, 14:30:28) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime, UTC
>>> original_timestamp = datetime.now(UTC).replace(microsecond=0)
>>> original_timestamp.isoformat() # datetime object with tzinfo=UTC
'2024-04-07T17:11:59+00:00'
>>> from email.utils import format_datetime, parsedate_to_datetime
>>> print(rfc5322_timestamp := format_datetime(original_timestamp, usegmt=True))
Sun, 07 Apr 2024 17:11:59 GMT
>>> parsed_timestamp = parsedate_to_datetime(rfc5322_timestamp)
>>> parsed_timestamp.isoformat() # datetime object with tzinfo=UTC
'2024-04-07T17:11:59+00:00'
>>> assert original_timestamp == parsed_timestamp
This last approach has the advantage of being isolated from the outside world. In particular, it doesn't depend on your program's current locale. Therefore, it's clearly the safest approach, and it's also how popular libraries, such as FastAPI, are implemented under the hood.
That's everything for today. Thank you very much for reading, and see you soon!