Testing Weak References

Recently we added a feature to dateutil (issue #635, PR #672) that involves holding a cache of weak references to objects so that only values that would otherwise still be alive remain in the cache. This is useful for when you aren't necessarily caching for performance but want an invariant like: [1]

if x == y:
    assert cached_func(x) is cached_func(y)

The problem with this was how to test such a thing. There should be two elements to the test: the function should satisfy the invariant and when the original strong reference expires, the item should be removed from the cache. The first part is simple enough, but how do you verify that the weak reference has expired without either digging into the implementation details or holding a strong reference to the original object? I was surprised to find that some cursory googling did not turn up a canonical reference for how to do this, and discussing it with a few people at a sprint did not turn up the answer I eventually landed on, so I thought I would write up my findings here.

Problem

Imagine one wanted to test the following code:

from weakref import WeakValueDictionary

class Example:
    """An example class"""
    def __init__(self, x):
        self.x = x

def get_example(x):
    # Either return the cached value or populate the cache
    return get_example.__cache.setdefault(x, Example(x))

def __clear_cache():
    get_example.__cache = WeakValueDictionary()

get_example.clear_cache = __clear_cache

# Create the initial empty cache
get_example.clear_cache()

I can fairly easily demonstrate that this works in the REPL:

>>> # Get an Example, but don't store the result anywhere
>>> print(get_example('Key'))
<__main__.Example object at 0x7f3cbe059908>

>>> # Calling get_example a second time gets a new instance
>>> print(get_example('Key'))
<__main__.Example object at 0x7f3cbe059978>

>>> # Create a strong reference to an example retrieved this way
>>> key_example = get_example('Key')

>>> get_example('Key') is key_example
True

The problem with this approach is that the Python standard makes no guarantees about the reuse of object IDs, so any test that relies on the fact that the ID tends to be different for different objects is going to be flaky – particularly in a test that will almost by necessity involve deleting an object. A test that checks for the existence of the key in the cache would work, but the location of the cache is an implementation detail. We want to make guarantees about the properties of the code, not its structure. [2]

Solution

In retrospect, the way to test this property seems obvious. The test needs some reference to the original object that was returned by the first function call, but you cannot hold strong references without modifying the behavior of the code under test. What kind of reference has these properties? A weak reference!

So, here's an implementation of tests for our get_example function from above that uses a weak reference: [3]

# Note: get_example_cache_lock is used here for thread safety
#       and test setup. See the code in the footnote below
import weakref
import gc

@get_example_cache_lock
def test_get_example_cache():
    # Tests the invariant we're trying to provide
    assert get_example('Key') is get_example('Key')

@get_example_cache_lock
def test_get_example_cache_weakref():
    # Tests that the implementation doesn't hold a strong reference
    key_example = get_example('Key')
    key_example_ref = weakref.ref(key_example)

    # Make sure this is a weak reference to the right thing
    assert get_example('Key') is key_example_ref()

    # Delete the only strong reference
    del key_example

    # Trigger garbage collection
    gc.collect()

    # key_example_ref() should be None at this point
    assert get_example('Key') is not key_example_ref()

This is just one implementation using weak references, but there are others. One could make the argument that the final line of test_get_example_cache_weakref is actually testing the fact that the key_example_ref weak reference is deleted and using that as a proxy for "all weak references, including those in the cache, have been deleted", in which case it would be more explicit to register a callback that detects deletion of the weak reference:

@get_example_cache_lock
def test_get_example_cache_weakref_callback():
    key_example = get_example('Key')

    weakref_deleted = False
    def callback():
        nonlocal weakref_deleted
        weakref = True

    key_example_ref = weakref.ref(key_example, callback)

    del key_example
    gc.collect()

    assert weakref_deleted

This is a valid point, but I prefer the first example because it is less explicit about the implementation of the test and more explicit about the property under test. That said, these are approximately equivalent implementations, and both should pass just fine:

$ pytest weakref-test.py
weakref-test.py::test_get_example_cache PASSED                      [ 33%]
weakref-test.py::test_get_example_cache_weakref PASSED              [ 66%]
weakref-test.py::test_get_example_cache_weakref_callback PASSED     [100%]

Footnotes

[1]The reason you might provide such an invariant is if the objects you are providing have operations whose semantics depends on the "is" relationship.
[2]

I take it as a general rule to never test implementation details for several reasons. One example is that you may want the freedom to change the implementation details at some point in the future. Additionally, you may have more than one implementation that provides the same guarantees; this is true for many of the modules in the Python standard library, which tend to have both a C and Python implementation in the CPython interpreter alone (not including alternative interpreters like PyPy).

As a side note, in the motivating example, I eventually decided that the fact that the cache holds weak references is enough of an implementation detail that these tests should be considered "smoke tests"; it's the behavior I'm intending for the current version, but it's not guaranteed to hold in future versions. In fact, for performance reasons the behavior will be somewhat different from the original implementation.

[3]

For simplicity of the example, I left out the handling of thread-safety and making sure the cache is clean for each test. You can find the code for this below:

from threading import Lock

CACHE_LOCK = Lock()

def get_example_cache_lock(f):
    # Provide a fresh cache and acquire the cache lock
    def test_inner(*args, **kwargs):
        with CACHE_LOCK:
            get_example.clear_cache()
            f(*args, **kwargs)