Getting Started Testing your Python

Ned Batchelder

@nedbat

http://nedbatchelder/text/starttest.html

Goals

Convince you to test
Show you a way to test your Python code
Remove the mystery

Why test?

The best way to know if your code works
Saves time in the long run
Produces better code
Removes fear from development

Common testing mindset

Most developers' #1 thought about testing:

I AM BAD

Reality

Developers aren't bad (most of them)
Testing is hard
It needs to be given its due
It's real engineering
It's worth it

Starting from first principles

Evolving tests

Stock portfolio class

A simple system under test:

    # portfolio1.py

    class Portfolio(object):
        """A simple stock portfolio"""
        def __init__(self):
            # stocks is a list of lists:
            #   [[name, shares, price], ...]
            self.stocks = []

        def buy(self, name, shares, price):
            """Buy `name`: `shares` shares at `price`."""
            self.stocks.append([name, shares, price])

        def cost(self):
            """What was the total cost of this portfolio?"""
            amt = 0.0
            for name, shares, price in self.stocks:
                amt += shares * price
            return amt

First test: interactive

    $ python
    Python 2.7.2 (default, Jun 12 2011, 15:08:59) 
    >>> from portfolio1 import Portfolio
    >>> p = Portfolio()
    >>> p.cost()
    0.0

    >>> p.buy("IBM", 100, 176.48)
    >>> p.cost()
    17648.0

    >>> p.buy("HPQ", 100, 36.15)
    >>> p.cost()
    21263.0

Good: testing the code
Bad: not repeatable
Bad: labor intensive
Bad: is it right?

Second test: standalone

    # porttest1.py
    from portfolio1 import Portfolio

    p = Portfolio()
    print "Empty portfolio cost: %s" % p.cost()
    p.buy("IBM", 100, 176.48)
    print "With 100 IBM @ 176.48: %s" % p.cost()
    p.buy("HPQ", 100, 36.15)
    print "With 100 HPQ @ 36.15: %s" % p.cost()

    $ python porttest1.py
    Empty portfolio cost: 0.0
    With 100 IBM @ 176.48: 17648.0
    With 100 HPQ @ 36.15: 21263.0

Good: testing the code
Better: repeatable
Better: low effort
Still bad: is it right?

Third test: expected results

    # porttest2.py
    from portfolio1 import Portfolio

    p = Portfolio()
    print "Empty portfolio cost: %s, should be 0.0" % p.cost()
    p.buy("IBM", 100, 176.48)
    print "With 100 IBM @ 176.48: %s, should be 17648.0" % p.cost()
    p.buy("HPQ", 100, 36.15)
    print "With 100 HPQ @ 36.15: %s, should be 21263.0" % p.cost()

    $ python porttest2.py
    Empty portfolio cost: 0.0, should be 0.0
    With 100 IBM @ 176.48: 17648.0, should be 17648.0
    With 100 HPQ @ 36.15: 21263.0, should be 21263.0

Good: testing the code repeatably with low effort
Better: explicit expected results
Bad: have to check the results yourself

Fourth test: check results automatically

    # porttest3.py
    from portfolio1 import Portfolio

    p = Portfolio()
    print "Empty portfolio cost: %s, should be 0.0" % p.cost()
    assert p.cost() == 0.0
    p.buy("IBM", 100, 176.48)
    print "With 100 IBM @ 176.48: %s, should be 17648.0" % p.cost()
    assert p.cost() == 17648.0
    p.buy("HPQ", 100, 36.15)
    print "With 100 HPQ @ 36.15: %s, should be 21263.0" % p.cost()
    assert p.cost() == 21263.0

    $ python porttest3.py
    Empty portfolio cost: 0.0, should be 0.0
    With 100 IBM @ 176.48: 17648.0, should be 17648.0
    With 100 HPQ @ 36.15: 21263.0, should be 21263.0

Good: testing the code repeatably with low effort
Good: explicit expected results
Good: results checked automatically

Fourth test: what failure looks like

    $ python porttest3_broken.py
    Empty portfolio cost: 0.0, should be 0.0
    With 100 IBM @ 176.48: 17600.0, should be 17648.0
    Traceback (most recent call last):
      File "porttest3_broken.py", line 9, in <module>
        assert p.cost() == 17648.0
    AssertionError

Good: testing the code repeatably with low effort
Good: expected results checked automatically
So-so: failure is visible, but output is cluttered with success
Bad: failure prevents further tests

This is starting to get complicated!

Like any code, tests will grow
A test is a real program
Testing deserves real engineering
Common issues can be handled in standard ways
Unique issues can be handled in custom ways

unittest

Writing tests

unittest

Part of the Python standard library
Provides infrastructure for well-structured tests
Provides a test runner
Based on patterns used across many languages

Test classes

Tests are organized into classes
Tests are methods starting with test_

A simple unit test

    # test_port1.py

    import unittest
    from portfolio1 import Portfolio

    class PortfolioTest(unittest.TestCase):
        def test_ibm(self):
            p = Portfolio()
            p.buy("IBM", 100, 176.48)
            assert p.cost() == 17648.0

    if __name__ == '__main__':
        unittest.main()

    $ python test_port1.py
    .
    ----------------------------------------------------------------------
    Ran 1 test in 0.000s

    OK

Under the covers

    # unittest runs the tests as if I had written:
    testcase = PortfolioTest()
    try:
        testcase.test_ibm()
    except:
        [record failure]
    else:
        [record success]

Add more tests

    class PortfolioTest(unittest.TestCase):
        def test_empty(self):
            p = Portfolio()
            assert p.cost() == 0.0

        def test_ibm(self):
            p = Portfolio()
            p.buy("IBM", 100, 176.48)
            assert p.cost() == 17648.0

        def test_ibm_hpq(self):
            p = Portfolio()
            p.buy("IBM", 100, 176.48)
            p.buy("HPQ", 100, 36.15)
            assert p.cost() == 21263.0

    $ python test_port2.py
    ...
    ----------------------------------------------------------------------
    Ran 3 tests in 0.000s

    OK

A dot for every passed test

Under the covers

    # unittest runs the tests as if I had written:
    testcase = PortfolioTest()
    try:
        testcase.test_empty()
    except:
        [record failure]
    else:
        [record success]

    testcase = PortfolioTest()
    try:
        testcase.test_ibm()
    except:
        [record failure]
    else:
        [record success]
    
    testcase = PortfolioTest()
    try:
        testcase.test_ibm_hpq()
    except:
        [record failure]
    else:
        [record success]

Test isolation

Every test is run with a new test object
Test isolation is very important
Tests are focused on a small amount of code
Tests can't affect each other
Failure doesn't prevent further tests

What failure looks like

    $ python test_port2_broken.py
    .F.
    ======================================================================
    FAIL: test_ibm (__main__.PortfolioTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "test_port2_broken.py", line 14, in test_ibm
        assert p.cost() == 17648.0
    AssertionError

    ----------------------------------------------------------------------
    Ran 3 tests in 0.001s

    FAILED (failures=1)

Better: the failed test didn't keep others from running
Bad: what value was returned?

unittest assert helpers

Use self.assertEqual(x, y) instead of assert x == y
You get nicer messages

        def test_ibm(self):
            p = Portfolio()
            p.buy("IBM", 100, 176.48)
            self.assertEqual(p.cost(), 17648.0)

    $ python test_port3_broken.py
    .F.
    ======================================================================
    FAIL: test_ibm (__main__.PortfolioTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "test_port3_broken.py", line 14, in test_ibm
        self.assertEqual(p.cost(), 17648.0)
    AssertionError: 17600.0 != 17648.0

    ----------------------------------------------------------------------
    Ran 3 tests in 0.001s

    FAILED (failures=1)

unittest assert helpers

There are lots of specialized assert methods

    assertEqual(first, second)
    assertNotEqual(first, second)
    assertTrue(expr)
    assertFalse(expr)
    assertIn(first, second)
    assertNotIn(first, second)
    assertAlmostEqual(first, second)
    assertGreater(first, second)
    assertLess(first, second)
    assertRegexMatches(text, regexp)
    .. etc ..

Major enhancements in 2.7

unittest expanded significantly in 2.7 and 3.2
Some of what I show here is only available then
Test discovery
More asserters

Three possible outcomes

Test raises an exception other than AssertionError
Produces an E outcome

    $ python test_port3_broken2.py
    .E.
    ======================================================================
    ERROR: test_ibm (__main__.PortfolioTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "test_port3_broken2.py", line 13, in test_ibm
        p.buyxxxxx("IBM", 100, 176.48)
    AttributeError: 'Portfolio' object has no attribute 'buyxxxxx'

    ----------------------------------------------------------------------
    Ran 3 tests in 0.000s

    FAILED (errors=1)

Under the covers

    testcase = PortfolioTest()
    try:
        testcase.test_method()
    except AssertionError:
        [record failure]
    except:
        [record error]
    else:
        [record success]

Testing for failure

What if your code is supposed to raise an exception?

    $ python
    Python 2.7.2 (default, Jun 12 2011, 15:08:59) 
    >>> from portfolio1 import Portfolio
    >>> p = Portfolio()
    >>> p.buy("IBM")
    Traceback (most recent call last):
      File "<console>", line 1, in <module>
    TypeError: buy() takes exactly 4 arguments (2 given)

Can't just call the function

        def test_bad_input(self):
            p = Portfolio()
            p.buy("IBM")

    $ python test_port4_broken.py
    E...
    ======================================================================
    ERROR: test_bad_input (__main__.PortfolioTest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "test_port4_broken.py", line 24, in test_bad_input
        p.buy("IBM")
    TypeError: buy() takes exactly 4 arguments (2 given)

    ----------------------------------------------------------------------
    Ran 4 tests in 0.001s

    FAILED (errors=1)

assertRaises

assertRaises(exc, callable, arg1, arg2, ...)
Have to provide callable and args separately

        def test_bad_input(self):
            p = Portfolio()
            self.assertRaises(TypeError, p.buy, "IBM")

    $ python test_port4.py
    ....
    ----------------------------------------------------------------------
    Ran 4 tests in 0.000s

    OK

Nicer in 2.7:

    def test_bad_input(self):
        p = Portfolio()
        with self.assertRaises(TypeError):
            p.buy("IBM")

Add sell functionality

        def sell(self, name, shares):
            """Sell some number of shares of `name`."""
            for holding in self.stocks:
                if holding[0] == name:
                    if holding[1] < shares:
                        raise ValueError("Not enough shares")
                    holding[1] -= shares
                    break
            else:
                raise ValueError("You don't own that stock")

Testing sell

    class PortfolioSellTest(unittest.TestCase):
        def test_sell(self):
            p = Portfolio()
            p.buy("MSFT", 100, 27.0)
            p.buy("DELL", 100, 17.0)
            p.buy("ORCL", 100, 34.0)
            p.sell("MSFT", 50)
            self.assertEqual(p.cost(), 6450)

        def test_not_enough(self):
            p = Portfolio()
            p.buy("MSFT", 100, 27.0)
            p.buy("DELL", 100, 17.0)
            p.buy("ORCL", 100, 34.0)
            with self.assertRaises(ValueError):
                p.sell("MSFT", 200)

        def test_dont_own_it(self):
            p = Portfolio()
            p.buy("MSFT", 100, 27.0)
            p.buy("DELL", 100, 17.0)
            p.buy("ORCL", 100, 34.0)
            with self.assertRaises(ValueError):
                p.sell("IBM", 1)

Setting up a test

Tests can get repetitive
Many tests may need similar setup
A class can define setUp() to create initial state

    class PortfolioSellTest(unittest.TestCase):
        def setUp(self):
            self.p = Portfolio()
            self.p.buy("MSFT", 100, 27.0)
            self.p.buy("DELL", 100, 17.0)
            self.p.buy("ORCL", 100, 34.0)

        def test_sell(self):
            self.p.sell("MSFT", 50)
            self.assertEqual(self.p.cost(), 6450)

        def test_not_enough(self):
            with self.assertRaises(ValueError):
                self.p.sell("MSFT", 200)

        def test_dont_own_it(self):
            with self.assertRaises(ValueError):
                self.p.sell("IBM", 1)

Under the covers

    testcase = PortfolioTest()
    try:
        testcase.setUp()
    except:
        [record error]
    else:
        try:
            testcase.test_method()
        except AssertionError:
            [record failure]
        except:
            [record error]
        else:
            [record success]
        finally:
            try:
                testcase.tearDown()
            except:
                [record error]

setUp and tearDown: isolation!

Establish context for tests
Use them for common pre-test or post-test work
They ensure isolation, even in the face of failures

How NOT to do it

    class MyBadTestCase(unittest.TestCase):
        def test_a_thing(self):
            old_global = some_global_thing
            some_global_thing = new_test_value

            do_my_test_stuff()
            
            some_global_thing = old_global

Common impulse
But if test fails, globals aren't restored

The right way

    class MyGoodTestCase(unittest.TestCase):
        def setUp(self):
            self.old_global = some_global_thing
            some_global_thing = new_test_value

        def tearDown(self):
            some_global_thing = self.old_global

        def test_a_thing(self):
            do_my_test_stuff()

tearDown gets run no matter what *
Reuse: same setUp and tearDown apply to all the test methods
Test method is small and focused
* unless setUp fails

A better way

    def test_with_special_settings(self):
        with patch_settings(SOMETHING='special', ANOTHER='weird'):
            do_my_test_stuff()

    # From: http://stackoverflow.com/questions/913549
    from somewhere import MY_GLOBALS

    NO_SUCH_SETTING = object()

    @contextlib.contextmanager
    def patch_settings(**kwargs):
        old_settings = []
        for key, new_value in kwargs.items():
            old_value = getattr(MY_GLOBALS, key, NO_SUCH_SETTING)
            old_settings.append((key, old_value))
            setattr(MY_GLOBALS, key, new_value)

        yield

        for key, old_value in old_settings:
            if old_value is NO_SUCH_SETTING:
                delattr(MY_GLOBALS, key)
            else:
                setattr(MY_GLOBALS, key, old_value)

Tests are real code

Test helpers can become significant
All the same techniques you use for product code, you can use for test code
Helper functions, classes, etc.
BTW: These helpers might need tests!

Approaches

crafting tests

What to test?

Any code that you want to work
Start with what worries you
Where are your bugs?

How to start?

Pick a chunk of code
What should it do?
Test that
What shouldn't it do?
Test that

Test granularity

How big is the code chunk?
Creates a hierarchy of tests:
- Unit
- Functional
- Integration
- System
Start small

What should the code do?

This can be difficult!
Tests force you to decide
Be as clear as you can
(Document it in your docstrings)

Test what the code should do

Typical values
Edge values
Combinations of values

Test what the code shouldn't do

Error conditions
Absurd values
Common mistakes in usage
Does it fail the way it should?

Where to put them?

Put tests in their own files
Name them test_*.py
Parallel to the code they're testing
Pick your granularity

Good tests should be...

Automated
Repeatable
Convenient: fast
Unambiguous

How much is enough?

Hard to know
Depends on your needs
Coverage measurement can help

Tests will fail

Could be the code is broken
Could also be the tests are broken
Tests have to be maintained
Refactoring code might mean refactoring tests too

Test-Driven Development

Write tests before code!
Ensures code is testable
Produces better code

Testability

The quality of being testable
Affects the design of your code
Mostly: improved modularity
See my PyCon 2010 talk: Tests & Testability

nose and py.test

Running tests

Test runners

Third-party test runners: nose, py.test, trial
Know how to run your unittest tests
Test discovery
Run tests better
Plugins provide lots of features

Test discovery

Running one file of tests is easy
Pre-2.7 unittest: hard to run many test files
New test runners provide test discovery
They sniff out your tests wherever they are

Test runners

unittest (2.7)
nose, py.test, trial
IDEs, Vim, Emacs

Lots of options

However you want to run your tests, you can
Choose the runner that suits you
Plugins allow further customization

Coverage

Testing tests

What code are you testing?

The goal: tests execute product code
But do they really?
How much of it?

Coverage measurement

Run your tests
Track what parts of product code are executed
Report on covered / not covered
You find code not being tested
Write more tests
The world is better!

HTML report

Coverage can only tell you a few things

What lines were executed
What branches were taken
100% coverage is difficult to reach
100% coverage doesn't tell you everything

What coverage can't tell you

Are you exercising all your data?
Are you exercising all your HTML templates?
Are you checking results properly?
Are you hitting all edge conditions?
Are you testing the right things?
Are you building the right product? ;-)

Test Doubles

Focusing tests

Testing small amounts of code

Systems are built in layers
Components depend on each other
How to test just one component without its dependencies?

Dependencies are bad

More suspect code in each test
Some components are slow
Some components are unpredictable

Test Doubles

Replace a component's dependencies
Focus on one component
Different kinds: Mocks, stubs, fakes, etc

Real-time data!

Hit the web to get current stock prices

        def current_prices(self):
            """Return a dict mapping names to current prices."""
            # http://download.finance.yahoo.com/d/quotes.csv?f=sl1&s=ibm,hpq
            # returns comma-separated values:
            #   "IBM",174.23
            #   "HPQ",35.13
            url = "http://download.finance.yahoo.com/d/quotes.csv?f=sl1&s="
            names = [name for name, shares, price in self.stocks]
            url += ",".join(sorted(names))
            data = urllib.urlopen(url)
            prices = dict((sym, float(last)) for sym, last in csv.reader(data))
            return prices

        def value(self):
            """Return the current value of the portfolio."""
            prices = self.current_prices()
            total = 0.0
            for name, shares, price in self.stocks:
                total += shares * prices[name]
            return total

It works great!

    $ python
    Python 2.7.2 (default, Jun 12 2011, 15:08:59) 
    >>> from portfolio3 import Portfolio
    >>> p = Portfolio()
    >>> p.buy("IBM", 100, 150.0)
    >>> p.buy("HPQ", 100, 30.0)

    >>> p.current_prices()
    {'HPQ': 35.61, 'IBM': 185.21}

    >>> p.value()
    22082.0

But how to test it?

Live data from the web is unpredictable
Could be slowish
Could be unavailable
Question should be:
- "Assuming yahoo.com is working,
- does my code work?"

Fake implementation of current_prices

    # Replace Portfolio.current_prices with a stub implementation.
    # This avoids the web, but also skips all our current_prices
    # code.
    class PortfolioValueTest(unittest.TestCase):
        def fake_current_prices(self):
            return {'IBM': 140.0, 'HPQ': 32.0}

        def setUp(self):
            self.p = Portfolio()
            self.p.buy("IBM", 100, 120.0)
            self.p.buy("HPQ", 100, 30.0)
            self.p.current_prices = self.fake_current_prices

        def test_value(self):
            self.assertEqual(self.p.value(), 17200)

Good: test results are predictable

But some code isn't tested!

    $ coverage run test_port7.py
    ........
    ----------------------------------------------------------------------
    Ran 8 tests in 0.000s

    OK
    $ coverage report -m
    Name         Stmts   Miss  Cover   Missing
    ------------------------------------------
    portfolio3      33      6    82%   57-62
    test_port7      45      0   100%   
    ------------------------------------------
    TOTAL           78      6    92%

        def current_prices(self):
            """Return a dict mapping names to current prices."""
            # http://download.finance.yahoo.com/d/quotes.csv?f=sl1&s=ibm,hpq
            # returns comma-separated values:
            #   "IBM",174.23
            #   "HPQ",35.13
            url = "http://download.finance.yahoo.com/d/quotes.csv?f=sl1&s="
            names = [name for name, shares, price in self.stocks]
            url += ",".join(sorted(names))
            data = urllib.urlopen(url)
            prices = dict((sym, float(last)) for sym, last in csv.reader(data))
            return prices

Fake urllib.urlopen instead

    # A simple fake for urllib that implements only one method,
    # and is only good for one request.  You can make this much
    # more complex for your own needs.
    class FakeUrllib(object):
        """An object that can stand in for the urllib module."""

        def urlopen(self, url):
            """A stub urllib.urlopen() implementation."""
            return StringIO('"IBM",140\n"HPQ",32\n')


    class PortfolioValueTest(unittest.TestCase):
        def setUp(self):
            self.old_urllib = portfolio3.urllib
            portfolio3.urllib = FakeUrllib()

            self.p = Portfolio()
            self.p.buy("IBM", 100, 120.0)
            self.p.buy("HPQ", 100, 30.0)

        def tearDown(self):
            portfolio3.urllib = self.old_urllib

        def test_value(self):
            self.assertEqual(self.p.value(), 17200)

All of our code is executed

    $ coverage run test_port8.py
    ........
    ----------------------------------------------------------------------
    Ran 8 tests in 0.001s

    OK
    $ coverage report -m
    Name         Stmts   Miss  Cover   Missing
    ------------------------------------------
    portfolio3      33      0   100%   
    test_port8      51      0   100%   
    ------------------------------------------
    TOTAL           84      0   100%

Stdlib is stubbed
All our code is run
No web access during tests

Even better: mock objects

Mock objects are automatic chameleons
Can act like any object
Also record what happened to them
You can make assertions after the fact

Automatic chameleons

    class PortfolioValueTest(unittest.TestCase):
        def setUp(self):
            self.p = Portfolio()
            self.p.buy("IBM", 100, 120.0)
            self.p.buy("HPQ", 100, 30.0)

        def test_value(self):
            # Create a mock urllib.urlopen
            with mock.patch('urllib.urlopen') as urlopen:

                # When called, it will return this value
                urlopen.return_value = StringIO('"IBM",140\n"HPQ",32\n')

                # Run the test!
                self.assertEqual(self.p.value(), 17200)

                # We can ask the mock what its arguments were
                urlopen.assert_called_with(
                    "http://download.finance.yahoo.com/d/quotes.csv"
                    "?f=sl1&s=HPQ,IBM"
                    )

Using Foord's Mock: http://www.voidspace.org.uk/python/mock

Test doubles

A powerful way to isolate your code
Focuses tests on one thing at a time
Removes speed bumps and randomness
Lots of libraries to make it easier
Also called Dependency Injection