Stop using utcnow and utcfromtimestamp
If you are the kind of developer who prefers to work in UTC, you may have seen Python's datetime.utcnow() and datetime.utcfromtimestamp() methods and thought, "Ah, yes, this is what I should do to work in UTC!" But alas, this is not the best way to work with UTC datetimes. In fact I would say that it is extremely rare that you would want to use either of these functions. Consider the following dangerous code:
from datetime import datetime
ts = 1571595618.0
x = datetime.utcfromtimestamp(ts)
x_ts = x.timestamp()
assert ts == x_ts, f"{ts} != {x_ts}"
When executed with your system locale set to UTC, this will succeed just fine, but when executed in any locale where the offset at that particular timestamp is something other than 0, the assertion fails — for example when executed with an America/New_York locale, you'll get AssertionError: 1571595618.0 != 1571610018.0.
This is due to an unfortunate quirk of history and a subtle shift in what it means for a datetime to be naïve that took place in the Python 2 to 3 transition. I imagine that these functions would not exist if the datetime library were redesigned today, but at the moment there are a mix of harmful and harmless uses of them out there, and it's not a simple matter to rip them all out. [1]
Rather than make you stick around for a history lesson as to why this problem exists, I'm going to spoil the ending and say that the right thing to do is to pass a UTC object to the tz parameter of now() and fromtimestamp(), respectively, to get a time zone-aware datetime:
from datetime import datetime, timezone
ts = 1571595618.0
x = datetime.fromtimestamp(ts, tz=timezone.utc)
x_ts = x.timestamp()
assert ts == x_ts, f"{ts} != {x_ts}" # This assertion succeeds
The problem with datetime.utcnow() and datetime.utcfromtimestamp() occurs because these return naïve datetimes (i.e. with no timezone attached), and in Python 3, these are interpreted as system-local times. Explicitly specifying a time zone solves the problem.
Naïve datetimes as local time
When originally conceived, naïve datetimes were intended to be abstract, not representing any specific time zone, and it was up to the program to determine what they represent — this is no different from abstract numbers which can represent mass in kilograms, distance in meters or any other specific quantity according to the programmer's intention. By contrast aware datetimes represent a specific point in time in a specific time zone. Awareness of the datetime's time zone allows you to do things like arithmetic and comparison between time zones, conversion to other time zones and other operations which require a concrete datetime.
In Python 3, two things have changed that make utcnow unnecessary and, in fact, dangerous. The first is that a concrete time zone class, datetime.timezone, was introduced, along with a constant UTC object, datetime.timezone.utc. With this change, you now have a clear and unambiguous way to mark which of your datetimes are in UTC without bringing in third party code or implementing your own UTC class.
The change that made utcnow dangerous is that naïve datetimes underwent a subtle shift in meaning: for certain operations that require interpreting a datetime as a fixed point in time, rather than throwing an error they would instead assume that the datetime represents the current system local time zone. So in Python 2, operations like astimezone() will raise an exception when called on a naïve datetime:
>>> from datetime import datetime
>>> from dateutil import tz
>>> datetime(2015, 5, 1).astimezone(tz.UTC)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: astimezone() cannot be applied to a naive datetime
but in Python 3 it will use your system's locale (on my machine it's America/New_York) and convert accordingly:
>>> from datetime import datetime
>>> from dateutil import tz
>>> datetime(2015, 5, 1).astimezone(tz.UTC)
datetime.datetime(2015, 5, 1, 4, 0, tzinfo=tzutc())
This is why the example that I started this post off with fails. The .timestamp() method gives a representation of a fixed point in time, not a point on the calendar; it returns Unix time, which is the number of seconds since 1970-01-01T00:00:00 UTC, and if you call it on a naïve datetime, Python will assume that that datetime represents your machine's local time, even if you originally intended it to be UTC.
Conclusions
Even without the change in Python's model of what a naïve datetime means, I would still recommend that you not use utcnow() or utcfromtimestamp() simply because it's the wrong abstraction: to do so would be to represent a concrete point in time as an abstract datetime. [2] You know that your datetime represents UTC, and it's easy to mark that clearly in Python, so there's very little reason not to do it. As it says in the warning recently added to the documentation, you should prefer to use now in place of utcnow and fromtimestamp in place of utcfromtimestamp, so replace:
>>> dt_now = datetime.utcnow()
>>> dt_ts = datetime.utcfromtimestamp(1571595618.0)
with
>>> from datetime import timezone
>>> dt_now = datetime.now(tz=timezone.utc)
>>> dt_ts = datetime.fromtimestamp(1571595618.0, tz=timezone.utc)
or the equivalent using positional arguments.
One last thing to note: the reason that we cannot simply change utcnow() into an alias for now(timezone.utc) in the standard library is that would change the semantics of how those datetimes are treated by their consumers (and as such it would not be backwards-compatible). You should keep this in mind when converting over old code that uses utcnow and utcfromtimestamp — you will need to make sure that any code that consumes your datetimes is expecting an aware datetime. In my experience, this is not a high bar to clear, but you probably don't want to just do a search-and-replace on untested code before deploying to production and leaving work for the weekend.
Footnotes
[1] | This may sound like a familiar story to those who have read my earlier post on pytz. |
[2] | You could make the case that datetime.now() and datetime.fromtimestamp() suffer from the same problem, since "the current time in the system timezone" is just as concrete as "the current time in UTC" is. I generally do agree with this, and if you want the "current system time", you can use dateutil.tz.tzlocal to get an aware datetime representing this; however, there are edge cases that make defining a clean "local time zone" object less straightforward. For UTC the story is much less complicated. When I first wrote this post, I felt that there was no good way to represent local times in Python. Since then, I have come to believe that using naïve datetimes to represent system local time is probably the best compromise available that leaves all the datetime invariants intact without basically redesigning the whole thing from scratch, which I have written up in Why naïve times are local times in Python. |