# A curious case of non-transitive datetime comparison

Published Thu 15 February 2018 (updated Tue 12 October 2021) in programming

In December 2016, a user reported an interesting bug to the dateutil tracker. The bug is summarized as follows :

```from datetime import datetime
from dateutil import tz

LON = tz.gettz('Europe/London')

# Construct a datetime
x = datetime(2007, 3, 25, 1, 0, tzinfo=LON)
ts = x.timestamp()      # Get a timestamp representing the same datetime

# Get the same datetime from the timestamp
y = datetime.fromtimestamp(ts, LON)

# Get the same datetime from the timestamp with a fresh instance of LON
z = datetime.fromtimestamp(ts, tz.gettz.nocache('Europe/London'))

print(x == y)       # False
print(x == z)       # True
print(y == z)       # True
```

To summarize: x, y and z should all represent the same datetime – they all have the same time zone, and y and z are the result of converting x into a timestamp and then back into a datetime, but for some reason x != y, and, even more curiously, x == z, even though the only difference between y and z is that z uses a different tzinfo object (representing the same zone). Even stranger, the equality relationship between the three is non-transitive, because x != y even though x == z and y == z. What the hell is going on? There are two key facts you need in order to understand what's happening here.

## Imaginary times

The first piece of information you need to know is that the datetime constructor will not prevent you from creating a datetime that does not exist, which is what here:

```x = datetime(2007, 3, 25, 1, 0, tzinfo=LON)
print(tz.datetime_exists(x))    # False
```

Turns out that Daylight Saving Time started at 01:00 on 25 March 2007 in London, so times from 01:00:00 to 01:59:59 were skipped over that day. Imaginary datetimes like this violate an assumption built in to the datetime.fromtimestamp(x.timestamp()) round trip, which is that all datetimes should be able to survive a round trip to and from UTC, or, in code:

```dt.astimezone(tz.UTC).astimezone(dt.tzinfo) == dt
```

This is true for all real datetimes, but it cannot be true for an imaginary datetime because astimezone is guaranteed to produce a real datetime - since this datetime never existed, there's no time in UTC to map to it. Any trip from an erroneously constructed imaginary datetime to UTC is necessarily one-way. Looking at the actual datetimes produced, you can thus see why x == y is not obviously True:

```print(x)
# 2007-03-25 01:00:00+01:00

print(y)
# 2007-03-25 00:00:00+00:00
```

But now the question is, if x == y is False, why is x == z True?

## Aware datetime comparison

The next thing you need to know to unravel this mystery is how datetime equality semantics works between timezone-aware datetimes, since this is not an unambiguous operation. Python's approach is most recently documented as part of PEP 495; datetime comparison is divided into "same zone" and "different zone" comparison. When two datetimes are in the same zone, they are equal if the "wall time" is the same:

```dt1 = datetime(2017, 10, 29, 1, 30, tzinfo=LON)
# 2017-10-29 01:30:00+01:00

dt2 = datetime(2017, 10, 29, 1, 30, fold=1, tzinfo=LON)
# 2017-10-29 01:30:00+00:00

print(dt1 == dt2)                               # True
print(dt1.timestamp() == dt2.timestamp())       # False
```

Note that in the above ambiguous time, the wall times are the same, but they resolve to different absolute timestamps because they are two sides of a daylight saving time transition (this is only possible in Python 3.6+, unless you use the dateutil.tz.enfold backport).

For comparisons between different zones, however, two datetimes are equal if they resolve to the same absolute UTC timestamp :

```dt1 = datetime(2017, 10, 28, 1, 30, tzinfo=LON)
dt2 = datetime(2017, 10, 28, 0, 30, tzinfo=tz.UTC)
dt3 = datetime(2017, 10, 28, 1, 30, tzinfo=tz.UTC)

# Resolves to the same timestamp
print(dt1 == dt2)               # True

# Has the same "wall time"
print(dt1 == dt3)               # False
```

The way this relates to our problem above is that "same zone" and "different zone" is defined by object identity, not by object equality , which is to say that dt1 == dt2 is an same-zone comparison if and only if dt1.tzinfo is dt2.tzinfo, even if dt1.tzinfo == dt2.tzinfo, which explains why y and z are treated differently:

```print(x.tzinfo is y.tzinfo) # True
print(x.tzinfo is z.tzinfo) # False
print(x.tzinfo == z.tzinfo) # True
```

x == y is a same-zone comparison while x == z and y == z are between-zone comparisons.

## Why it was non-transitive

Now looking back at the dates:

```print(x)
# 2007-03-25 01:00:00+01:00

print(y)
# 2007-03-25 00:00:00+00:00

print(z)
# 2007-03-25 00:00:00+00:00
```

For x == y, we have an same-zone comparison, so we're only comparing 2007-03-25 01:00 to 2007-03-25 00:00, which is False. For x == z, we have an between-zones comparison, so we convert them both to UTC first, then compare:

```print(x.astimezone(tz.UTC))
# 2007-03-25 00:00:00+00:00

print(z.astimezone(tz.UTC))
# 2007-03-25 00:00:00+00:00
```

I don't think Python's specification defines what happens when you map imaginary datetimes to UTC , but since the way z was constructed involved converting to UTC in order to calculate the UTC timestamp, it's no surprise that these two are equal.

Finally, y == z is true under either semantics, since both the wall clock and the offset are the same.

Note

This post was adapted from a small portion of my 2017 PyBay talk on time zones (slides). If you're interested in this topic, I go into greater detail about time zones in Python in that talk.

Note

This post was updated on 2018-06-11 to use dateutil 2.7.3 which made this particular issue harder to stumble upon. tz.gettz('Europe/London') will no longer get a fresh instance of the Europe/London timezone as of version 2.7, instead, tz.gettz.nocache('Europe/London') is used. I then forgot to upload the fixed version for over 3 years, so the modification only went public on 2021-10-12.

  All the code in this is executed with Python 3.6 and dateutil==2.7.3
  One odd case is that in Python 3.6 (which introduces the fold attribute), inter-zone comparisons are always False if either object is ambiguous, apparently for backwards compatibility reasons.
  More correctly, I don't think there is any specification for what should be returned when calling utcoffset() on an imaginary datetime. The convention is to use the last valid offset and DST values, but a case could be made for returning None, returning the next offset or throwing an error.