A pseudo-TDD workflow using expected failures

After coming to the conclusion that xfail is a good thing, I've also started to occasionally integrate it into my workflow to get some of the benefits of test driven development without needing any sort of discipline or rigor in my work. In TDD, you are supposed to write a failing test, then write as much code as you need to make that test pass, then write another failing test. The idea is that you know each test is working because it was failing and then it passed, and your code is more testable because you designed it with tests in mind.

In practice, I often write some mixture of code and tests in whatever order and then forget that I haven't run the test suite in a while and then find that I've made a mistake and rip a bunch of stuff out, but I leave some of the tests in place — your normal chaotic meandering workflow. However, I like to curate my git history — which is to say that I make heavy use of git add -p, git commit --fixup and interactive rebasing to make it look like I did things in a disciplined manner to any future observers reading through my VCS history.[1] So in the case where I write the code and the tests together (i.e. when I don't already have failing tests in the test suite added as part of the bug report), I can re-write my git history so that every commit will pass if my tests are testing what I think they are!

Reusing the example from my earlier post imagine I start with a (broken) function for calculating perfect squares:

def is_perfect_square(n: int) -> bool:
    """Determine if any int i exists such that i × i = n."""
    s = math.sqrt(n)
    return s == int(s)

I realize that this will raise ValueError when n is negative, so I fix this edge case and add a test:

def is_perfect_square(n: int) -> bool:
    """Determine if any int i exists such that i × i = n."""
    if n < 0:
        return False  # No negative numbers are perfect squares

    s = math.sqrt(n)
    return s == int(s)

def test_negative():
    """No negative values are perfect squares."""
    assert not is_perfect_square(-4)

I know the tests are passing now, but I want to make sure they would have failed had I run them without implementing the bugfix, so what I do is add the xfail decorator before committing:[2]

@pytest.mark.xfail(raises=ValueError, strict=True,
                  reason="Domain error when passed negative numbers.")
def test_negative():
    """No negative values are perfect squares."""
    assert not is_perfect_square(-4)

I then use git add -p to first commit the test with xfail to the repository without committing the bugfix as well, so the first diff looks like this:

index 21684b5..c91953c 100644
--- a/test.py
+++ b/test.py
@@ -7,6 +7,11 @@ def is_perfect_square(n: int) -> bool:
     s = math.sqrt(n)
     return s == int(s)

+@pytest.mark.xfail(raises=ValueError, strict=True,
+                   reason="Domain error")
+def test_negative():
+    """No negative numbers are perfect squares."""
+    assert not is_perfect_square(-4)

 def test_positive():
     assert is_perfect_square(4)

Then I remove the pytest.mark.xfail decorator and use git add -u to add a change that introduces both the bugfix and the removal of the xfail mark.

index c91953c..5a82614 100644
--- a/test.py
+++ b/test.py
@@ -4,11 +4,12 @@ import pytest

 def is_perfect_square(n: int) -> bool:
     """Determine if any int i exists such that i × i = n."""
+    if n < 0:
+        return False  # No negative numbers are perfect squares
+
     s = math.sqrt(n)
     return s == int(s)

-@pytest.mark.xfail(raises=ValueError, strict=True,
-                   reason="Domain error")
 def test_negative():
     """No negative numbers are perfect squares."""
     assert not is_perfect_square(-4)

You can then use one of these options to run your test command against all your commits, and make sure that the test suite passes on every commit.

This workflow has the desirable feature that later, if someone is curious as to why a test was added or is suspicious that the test might not be exercising the feature it's intended to exercise, they can check out the commit with the xfail mark still in place and run the test suite to verify that it indeed xfails, and they can run with --runxfail to make sure it fails the right way. Additionally, since we've designed it in such a way that "this test failed" is actually the passing condition for the test suite, this doesn't break git bisect! If everything is working right, the test suite should pass on every commit even though we're deliberately introducing intermediate commits with broken tests.

Unfortunately, I don't know of any decent mechanism enforcing this automatically in CI which means likely there will always be broken commits. It's certainly possible to set up a CI workflow that runs your test suite against every commit, but this may be unnecessarily time-consuming in many situations, and many of your users will be confused if asked to edit their git history so that CI passes on every single commit. This tension is one of the reasons that in code review systems like Gerrit the commit is the atomic unit at which changes are reviewed — to make a workflow in which the test suite is expected to pass in every commit, but obviously the vast majority of open source activity uses GitHub or something equivalent where "every commit passes CI" is not a design goal.

If you are serious about using a workflow like this for your team or project, it may be worth either using a gerrit-like code review system or spending some time setting up enforcement of CI checks at the level of individual commits, even if only in a final pre-merge check.[3]

Footnotes

[1]I tend to use git on the command line, but the mercurial version of this is an even nicer UI, in my opinion, and presumably any UI wrappers you use will have similar functionality (e.g. PyCharm's partial commit functionality).
[2]In practice, I have the strict parameter configured globally, but I've included it here explicitly to make clear that this workflow relies on strict xfail.
[3]One reasonable mechanism for having a separate "pre-merge check" on systems that don't have it as an explicit concept would be to configure a bot to do all merges for your repo, and have that bot always trigger a special required workflow before merge.