Why naïve times are local times in Python
Back in 2019, I exhorted everyone to stop using datetime.utcnow() and datetime.utcfromtimestamp() because as of Python 3, naïve datetimes stopped representing abstract calendar dates and started representing the current local time. No longer would datetime.now().timestamp() fail because no mapping exists between a naïve time and the absolute timeline! At the time, I only explained this so that I could tell you why utcnow() was dangerous — which may give the mistaken impression that this change just added a footgun and gave us nothing in exchange. However, over time I have come to the opinion that this may in fact be the most elegant way to represent system local time in Python
Ideally, we would create a "local time" tzinfo object representing system local times (like dateutil.tz.tzlocal tries to do), but as it turns out, it is not possible to do that while maintaining datetime's guaranteed semantics when the system's time zone changes. It is, however, possible to tack "local time" semantics onto the existing naïve datetime object in a way that gives a lot of the same functionality.
A local timezone object
Early in the process of putting together PEP 615, which added support for IANA time zones to the standard library, I originally was hoping to broadly solve the problem of time zones in Python. My contention was that nearly all time zone users want one of three types of time zone:
- UTC and fixed offsets
- System local time
- IANA Time Zones
At the time, we already had UTC and fixed offset zones, and I was hoping to create classes that would represent local time and IANA time zones. I knew that naïve times were in a sense local times, but things like subtraction between an aware datetime and a naïve datetime were still not supported. It seemed like there was no "first class" solution to the local time problem, and some analogue of dateutil.tz.tzlocal should be added to the standard library. However, when starting to work out the semantics of what such an object should be, I found that any such object would have very unfortunate and counter-intuitive properties, and that making naïve datetimes represent local time was actually a stroke of genius on the part of Alexander Belopolsky. The reason for this is simple: it's possible to change your system local time zone during the run of a Python program, and datetime objects are not designed to allow tzinfo objects to return different offsets at different points in their lifespan.
Invariants
Important to note is that datetimes are both immutable and hashable. This means that you can use them as, for example, the keys to a dict. Along with hashability also comes some constraints on the equality semantics, most notably the fact that two objects that compare equal must have the same hash; in other words, if a == b it must be the case that hash(a) == hash(b). This is where the problem comes in, because datetime equality semantics say that two aware datetimes in different zones are equal if they represent the same time in UTC, which means that equality depends on the time zone offset, which in turn means that the hash must depend on the time zone offset.
Now bringing this back to local times — at any point during the run of a program, the system local time zone could change; if you were to use dateutil.tz.tzlocal or some equivalent, that means that the offsets of existing datetimes can change, for example:
# Local time is America/New_York
dt = datetime(2021, 4, 1, 12, tzinfo=tzlocal())
dt_utc = dt.astimezone(timezone.utc)
print(dt.utcoffset() / timedelta(hours=1)) # -4.0
print(dt == dt_utc) # True
# Change local time to America/Los_Angeles
print(dt.utcoffset() / timedelta(hours=1)) # -7.0
print(dt == dt_utc) # False
This is a major problem! It also means that we must choose between hash immutability and keeping the hash linked to equality, because datetimes that once compared equal no longer compare equal. In the current implementation, datetime.datetime caches its hash value when first calculated to deal with precisely this kind of problem, but what that means is that a otherwise-identical datetime objects will have different hashes depending on when hash was first called on them!
# Local time is America/New_York
dt1 = datetime(2021, 4, 1, 12, tzinfo=tzlocal())
dt2 = datetime(2021, 4, 1, 12, tzinfo=dt1.tzinfo)
dt3 = datetime(2021, 4, 1, 12, tzinfo=dt1.tzinfo)
my_dict = {dt1: "Before the change"}
print(my_dict[dt2]) # "Before the change"
print(dt1 == dt2) # True
# Local time is America/Los_Angeles
dt4 = datetime(2021, 4, 1, 12, tzinfo=dt1.tzinfo)
dt5 = datetime(2021, 4, 1, 12, tzinfo=dt1.tzinfo)
my_dict[dt4] = "After the change"
print(my_dict[dt2]) # "Before the change"
print(my_dict[dt3]) # "After the change"
print(my_dict[dt5]) # "After the change"
print(dt1 == dt2 == dt3 == dt4 == dt5) # True
This is slightly better than breaking all existing keys, but it's confusing and still violates some of our invariants. This drives home the fact that there is no way for an aware datetime to satisfy our invariants if its offset can change. What this means is that we cannot simply create a tzinfo object representing local times, we must create some new datetime type with semantics that can survive a change in the system local time.
The solution
The solution to this is quite elegant:[1] change naïve datetimes to represent a system local time for the purposes of conversion to absolute time without altering naïve datetime semantics! Since naïve datetimes originally represented "abstract" datetime objects, they have no UTC offset and both hash and equality is based only on the raw values, so there is no problem if the concrete time represented by a given "system local time" datetime changes.
The only thing missing from this is that sometimes people want their local times to be actual aware datetimes — they want to be able to compare them to other aware datetimes, or to print out the actual UTC offset.[2] This is solved neatly with .astimezone(None) / .astimezone(), which takes an aware or naïve datetime and gives it a fixed offset in the current system time zone:
dt = datetime(2021, 4, 1, 12)
print(dt)
# 2021-04-01 12:00:00
print(dt.astimezone())
# 2021-04-01 12:00:00-04:00
dt_la = datetime(2021, 4, 1, 12, tzinfo=ZoneInfo("America/Los_Angeles"))
print(dt_la.astimezone())
# 2021-04-01 15:00:00-04:00
The way this avoids the problems from the previous section is that it requires you to be explicit as to when you query for the offset. The result of any .astimezone() calls will always have the same offset, and naïve datetimes are always "floating" until you convert them into concrete times.
A footgun?
It is probably a bit rich for me to write this post praising "naïve-as-local" as an elegant solution to the problem considering that the solution is not too different to pytz's localize method, which I famously consider to be a footgun. In both cases, we need a specific "localization" step that takes our datetime from a naïve datetime and creates an aware datetime with a fixed offset from it. Neither is particularly discoverable, and people are constantly tripping over it. So why is one of these elegant and the other one is a footgun?
To be honest, I will not go so far as to say that the naïve-as-local paradigm is not a footgun. People clearly are not expecting this behavior, and they are usually made aware of it when it introduces a bug in their code. It seems unlikely that we will ever get to a world where this is the obvious way to work with local times and I wish there were a better solution. I'd say pytz's biggest crime is that pytz.timezone is a tzinfo subclass that can be passed to the tzinfo= parameter, because it makes it easy to think you've created an aware time zone, when in fact you haven't. At least with the naïve-as-local paradigm, you will either think that your datetime object doesn't have a UTC representation, or you've deliberately created the fixed-offset version of it.
At the end of the day, it's not clear that there is a good, intuitive solution that doesn't have these problems, even if we were designing datetime from scratch. I do think that the naïve-as-local solution is about as clean as can be achieved given the constraints, but that may be damning with faint praise.
Takeaways
I have spilled out many words trying to justify why I actually like the naïve-as-local paradigm and why I think the obvious solution — a tzinfo object representing system local time — is not workable, but I imagine most people don't care about why these things are the case. I appreciate you bearing with me despite the high ratio of justification to practical advice. In exchange, I will give you some simple bullet points that you can write down for safe keeping before trying to scrub your brain clean of all the useless information I buried it in:
- The local offset may change during the course of the interpreter run.
- You can use datetime.astimezone with None to convert a naïve time into an aware datetime with a fixed offset representing the current system local time.
- All arithmetic operations should be applied to naïve datetimes when working in system local civil time — only call .astimezone(None) when you need to represent an absolute time, e.g. for display or comparison with aware datetimes.[3]
Foonotes
[1] | "Elegant" is a relative term here, of course. A more elegant design would be to have a separate LocalDateTime class that is distinct from the "abstract" datetime class. It would also probably be a good idea to have "aware datetime" and "naïve datetime" reflected in the class hierarchy rather than on a per-instance basis, but it would be very difficult to implement these things in a backwards-compatible fashion. |
[2] | One may imagine that, for completeness, one would also want a way to convert from a UTC time to a naïve "system local" time, but it seems unlikely that this would be useful in practice. In general, when you are working with system local times, you care about what time it says on the clock locally, independent of what time that is in UTC. If you care about what time something happens at in UTC, then you don't want it shifting around based on what the system local time zone is, so you should keep it as an aware datetime. If for some reason you find you do want this, you can achieve it with dt.astimezone().replace(tzinfo=None), but I am hard pressed to come up with a situation where this would be useful. |
[3] | This is similar to pytz's arithmetic problem. Once you call .astimezone(None), a fixed offset is attached, and the result of arithmetical operations will always have the same fixed offset, even across DST boundaries. Note that I'm assuming here that "working in system local civil time" means that you want "wall time" arithmetic. So if you want "give me an aware local datetime for when the clock on the wall has advanced 6 hours", you want (naive_dt + timedelta(hours=6)).astimezone(). If what you want instead is "absolute time" arithmetic, meaning, "give me an aware local datetime representing the time when 6 hours have elapsed after the starting datetime", do the math in any fixed-offset zone and make a call to .astimezone(None) afterwards, e.g. (naive_dt.astimezone() + timedelta(hours=6)).astimezone(). My earlier post on the semantics of datetime arithmetic has more details that can help you build an intuition as to what kind of arithmetic you should use. |