The 4 Gears of Test-Driven Development

When I explain Test-Driven Development to people who are new to the concept, I try to be clear that TDD is not just about using unit tests to drive design at the internal code level.

Unit tests and the familiar red-green-refactor micro feedback cycle that we most commonly associate with TDD – thanks to 1,001 TDD katas that focus at that level – is actually just the innermost feedback cycle of TDD. There are multiple outer feedback loops that drive the choice of unit tests. Otherwise, how would we know what unit tests we needed to write?

Outside the rapid unit test feedback loop, there’s a slower customer test feedback loop that drives our understanding of what your units need to do in a particular software usage scenario.

Outside the customer test feedback loop, there’s a slower-still feature feedback loop, which may require us to pass multiple customer tests to complete.

And, most important of all, there’s an even slower goal feedback loop that drives our understanding of what features might be required to solve a business problem.

On the Codemanship TDD course, pairs experience these feedback loops first hand. They’re asked to think of a real-world problem they believe might be solved with a simple piece of software. For example, “It’s hard to find good vegan takeaway in my local area.” We’re now in the first feedback loop of TDD – goals.

Then they imagine a headline feature – a proverbial button the user clicks that solves this problem: what would that feature do? Perhaps it displays a list of takeaway restaurants with vegan dishes on their menu that will deliver to my address, ordered by customer ratings. We’re now in the next feedback loop of TDD – features.

Next, we need to think about what other features the software might require to make the headline feature possible. For example, we need to gather details of takeaway restaurants in the area, including their vegan menus and their locations, and whether or not they’ll deliver to the customer’s address. Our headline feature might require a number of such supporting features to make it work.

We work with our customer to design a minimum feature set that we believe will solve their problem. It’s important to keep it as simple as we can, because we want to have a working prototype ready as soon as we’re able that we can test with real end users in the real world.

Next, for each feature – starting with the most important one, which is typically the headline feature – we drive out a precise understanding of exactly what that feature will do using examples harvested from the real world. We might go online, or grab a phone book, and start checking out takeaway restaurants, collecting their menus and asking what postcode areas they deliver in. Then we would pick addresses in our local area, and figure out – for each address – which restaurants would be available according to our criteria. We could search on sites like Google and Trip Advisor for reviews of the restaurants, or – if we can’t find reviews, invent some ratings – so we can describe how the result lists should be ordered.

We capture these examples in a format that’s human readable and machine readable, so we can collaborate directly with the customer on them and also pull the same data into automated executable tests.

We’re now in the customer test feedback loop. Working one customer test at a time, we automate execution of that test so we can continuously check our progress in passing it.

For each customer test, we then test-drive an implementation that will pass the test, using unit tests to drive out the details of how the software will complete each unit of work required. If the happy path for our headline feature requires that we

  • calculate a delivery map location using the customer’s address
  • identify for each restaurant in our list if they will deliver to that location
  • filter the list to exclude the restaurants that don’t
  • order the filtered list by average customer rating

…then that’s a bunch of unit tests we might need to write. We’re now in the unit test feedback loop.

Once we’ve completed our units and seen the customer test pass, we can move on to the next customer test, passing them one at a time until the feature is complete.

Many dev teams make the mistake of thinking that we’re done at this point. This is usually because they have no visibility of the real end goal. We’re rarely invited to participate in that conversation, to be fair. Which is a terrible, terrible mistake.

Once all the features – headline and supporting – are complete, we’re ready to test our minimum solution with real end users. We release our simple software to a representative group of tame vegan takeaway diners, who will attempt to use it to find good food. Heck, we can try using it ourselves, too. I’m all in favour of developers eating their own (vegan) dog food, because there’s no substitute for experiencing it for ourselves.

Our end users may report that some of the restaurants in their search results were actually closed, and that they had to phone many takeaway restaurants to find one open. They may report that when they ordered food, it took over an hour to be delivered to their address because the restaurant had been a little – how shall we say? – optimistic about their reach. They may report that they were specifically interested in a particular kind of cuisine – e.g., Chinese or Indian – and that they had to scroll through pages and pages of results for takeaway that was of no interest to find what they wanted.

We gather this real-world feedback and feed that back into another iteration, where we add and change features so we can test again to see if we’re closer to achieving our goal.

I like to picture these feedback loops as gear wheels. The biggest gear – goals – turns the slowest, and it drives the smaller features gear, which turns faster, driving the smaller and faster customer tests wheel, which drives the smallest and fastest unit tests wheel.

tdd_gears

It’s important to remember that the outermost wheel – goals – drives all the other wheels. They should not turning by themselves. I see many teams where it’s actually the features wheel driving the goals wheel, and teams force their customers to change their goals to fit the features they’re delivering. Bad developers! In your beds!

It’s also very, very important to remember that the goals wheel never stops turning, because there’s actually an even bigger wheel making it turn – the real world – and the real world never stops turning. Things change, and there’ll always be new problems to solve, especially as – when we release software into the world, the world changes.

This is why it’s so very important to keep all our wheels well-oiled so they can keep on turning for as long as we need them to. If there’s too much friction in our delivery processes, the gears will grind to a halt: but the real world will keep on turning whether we like it or not.

 

The 2 Most Critical Feedback Loops in Software Development

When I’m explaining the inner and outer feedback loops of Test-Driven Development – the “wheels within wheels”, if you like – I make the point that the two most important feedback loops are the outermost and the innermost.

feedbackloops

The outermost because the most important question of all is “Did we solve the problem?” The innermost because the answer is usually “No”, so we have to go round again. This means that the code we delivered will need to change, which raises the second most important question; “Did we break the code?”

The sooner we can deliver something so we can answer “Did we solve the problem?”, the sooner we can feedback the lessons learned on the next go round. The sooner we can re-test the code, the sooner we can know if our changes broke it, and the sooner we can fix it ready for the next release.

I realised nearly two decades ago that everything in between – requirements analysis, customer tests, software design, etc etc – is, at best, guesswork. A far more effective way of building the right thing is to build something, get folk to use it, and feedback what needs to change in the next iteration. Fast iterations accelerate this learning process. This is why I firmly believe these days that fast iterations – with all that entails – is the true key to building the right thing.

Continuous Delivery – done right, with meaningful customer feedback drawn from real use in the world world (or as close as we dare bring our evolving software to the real world) – is the ultimate requirements discipline.

Fast-running automated tests that provide good assurance that our code’s always working are essential to this. How long it takes to build, test and deploy our software will determine the likely length of those outer feedback loops. Typically, the lion’s share of that build time is regression testing.

About a decade ago, many teams told me “We don’t need unit tests because we have integration tests”, or “We have <insert name of trendy new BDD tool here> tests”. Then, a few years later, their managers were crying “Help! Our tests take 4 hours to run!” A 4-hour build-and-test cycle creates a serious bottleneck, leading to code that’s almost continuously broken without teams knowing. In other words, not shippable.

Turn a 4-hour build-and-test cycle into a 40-second build-and-test cycle, and a lot of problems magically disappear. You might be surprised how many other bottlenecks in software development have slow-running tests as their underlying cause – analysis paralysis, for example. That’s usually a symptom of high stakes in getting it wrong, and that’s usually a symptom of infrequent releases. “We better deliver the right thing this time, because the next go round could be 6 months later.” (Those among us old enough to remember might recall just how much more care we had to take over our code because of how long it took to compile. It’s a similar effect, but on a much larger scale with much higher stakes than a syntax error.)

Where developers usually get involved in this process – user stories and backlogs – is somewhere short of where they need to be involved. User stories – and prioritised queues of user stories – are just guesses at what an analyst or customer or product owner believes might solve the problem. To obsess over them is to completely overestimate their value. The best teams don’t guess their way to solving a problem; they learn their way.

Like pennies to the pound, the outer feedback loop of “Does it actually work in the real world?” is made up of all the inner feedback loops, and especially the innermost loop of regression testing after code is changed.

Teams who invest in fast-running automated regression tests have a tendency to out-learn teams who don’t, and their products have a tendency to outlive the competition.

 

 

Overcoming Solution Bias

Just a short post this morning about a phenomenon I’ve seen many times in software development – which, for want of a better name, I’m calling solution bias.

It’s the tendency of developers, once they’ve settled on a solution to a problem, to refuse to let go of it – regardless of what facts may come to light that suggest it’s the wrong solution.

I’ve even watched teams argue with their customer to try to get them to change their requirements to fit a solution design the team have come up with. It seems once we have a solution in our heads (or in a Git repository) we can become so invested in it that – to borrow a metaphor – everything looks like a nail.

The damage this can do is obvious. Remember your backlog? That’s a solution design. And once a backlog’s been established, it has a kind of inertia that makes it unlikely to change much. We may fiddle at the edges, but once the blueprints have been drawn up, they don’t change significantly. It’s vanishingly rare to see teams throw their designs away and start afresh, even when it’s screamingly obvious that what they’re building isn’t going to work.

I think this is just human nature: when the facts don’t fit the theory, our inclination is to change the facts and not the theory. That’s why we have the scientific method: because humans are deeply flawed in this kind of way.

In software development, it’s important – if we want to avoid solution bias – to first accept that it exists, and that our approach must actively take steps to counteract it.

Here’s what I’ve seen work:

  • Testable Goals – sounds obvious, but it still amazes me how many teams have no goals they’re working towards other than “deliver on the plan”. A much more objective picture of whether the plan actually works can help enormously, especially when it’s put front-and-centre in all the team’s activities. Try something. Test it against the goal. See if it really works. Adapt if it doesn’t.
  • Multiple Designs – teams get especially invested in a solution design when it’s the only one they’ve got. Early development of candidate solutions should explore multiple design avenues, tested against the customer’s goals, and selected for extinction if they don’t measure up. Evolutionary design requires sufficiently diverse populations of possible solutions.
  • Small, Frequent Releases – a team that’s invested a year in a solution is going to resist that solution being rejected with far more energy than a team who invested a week in it. If we accept that an evolutionary design process is going to have failed experiments, we should seek to keep those experiments short and cheap.
  • Discourage Over-Specialisation – solution architectures can define professional territory. If the best solution is a browser-based application, that can be good news for JavaScript folks, but bad news for C++ developers. I often see teams try to steer the solution in a direction that favours their skill sets over others. This is understandable, of course. But when the solution to sorting a list of surnames is to write them into a database and use SQL because that’s what the developers know how to do, it can lead to some pretty inappropriate architectures. Much better, I’ve found, to invest in bringing teams up to speed on whatever technology will work best. If it needs to be done in JavaScript, give the Java folks a couple of weeks to learn enough JavaScript to make them productive. Don’t put developers in a position where the choice of solution architecture threatens their job.
  • Provide Safety – I can’t help feeling that a good deal of solution bias is the result of fear. Fear of failure.  Fear of blame. Fear of being sidelined. Fear of losing your job. If we accept that the design process is going to involve failed experiments, and engineer the process so that teams fail fast and fail cheaply – with no personal or professional ramifications when they do – then we can get on with the business of trying shit and seeing if it works. I’ve long felt that confidence isn’t being sure you’ll succeed, it’s not being afraid to fail. Reassure teams that failure is part of the process. We expect it. We know that – especially early on in the process of exploring the solution space – candidate solutions will get rejected. Importantly: the solutions get rejected, not the people who designed them.

As we learn from each experiment, we’ll hopefully converge on the likeliest candidate solution, and the whole team will be drawn in to building on that, picking up whatever technical skills are required as they do. At the end, we may not also deliver a good working solution, but a stronger team of people who have grown through this process.

 

Wheels Within Wheels Within Wheels

Much is made of the cycles-within-cycles of Test-Driven Development.

At the core, we do micro-iterations with small, single-question unit tests to drive out the details of our internal design.

Surrounding those micro-cycles are the feedback loops provided by customer tests, which may require us to pass multiple unit tests to complete end-to-end.

User stories typically come with multiple customer tests – happy paths and edge cases – providing us with bigger cycles around our customer test feedback loops.

Orbiting those are release loops, where we bundle a set of user stories and await feedback from end users in the real world (or a simulated approximation of it for test purposes).

What’s not discussed, though, are the test criteria for those release loops. If we already established through customer testing that we delivered what we agreed we would i that release, what’s left to test for?

The minority of us who practice development driven by business goals may know the answer: we test to see if what we released achieves the goal(s) of that release.

feedbackloops

This is the outer feedback loop – the strategic feedback loop – that most dev teams are missing. if we’re creating software with a purpose, it stands to reason that at some point we must test for its fitness for that purpose. Does it do the job it was designed to do?

When explaining strategic feedback loops, I often use the example of a business start-up who deliver parcels throughout the London area. They have a fleet of delivery vans that go out every day across the city, delivering to a list of addresses parcels that were received into their depot overnight.

Delivery costs form the bulk of their overheads. They rent the vans. They charge them up with electrical power (it’s an all-electric fleet – green FTW!) They pay the drivers. And so on. It all adds up.

Business is good, and their customer base is growing rapidly. Do they rent more vans? Do they hire more drivers? Do they do longer routes, with longer driver hours, more recharging return-to-base trips, and higher energy bills? Or could the same number of drivers, in the same number of vans, deliver more parcels with the same mileage as before? Could their deliveries be better optimised?

Someone analyses the routes drivers have been taking, and theorises that they could have delivered the same parcels in less time driving less miles. They believe it could be done 35% more efficiently just by optimising the routes.

Importantly, using historical delivery and route data, they show on paper that an algorithm they have in mind would have saved 37% on miles and driver-hours. I, for one, would think twice about setting out to build a software system that implements unproven logic.

But the on-paper execution of it takes far too long. So they hatch a plan for a software system that selects the optimum delivery routes every day using this algorithm.

Taking route optimisation as the headline goal, the developers produce a first release in 2 weeks that takes in delivery addresses from an existing data source and – as command line utility initially – produces optimised routes in simple text files to be emailed to the drivers’ smartphones. It’s not pretty, and not a long-term solution by any means. But the core logic is in there, it’s been thoroughly unit and customer tested, and it seems to work.

While the software developers move on to thinking about the system could be made more user-friendly with a graphical UI (e.g., a smartphone app), the team – which includes the customer – monitor deliveries for the next couple of weeks very closely. How long are the routes taking? How many miles are vans driving? How much energy is being used on each route? How many recharging pit-stops are drivers making each day?

This is the strategic feedback loop: have we solved the problem? If we haven’t, we need to go around again and tweak the solution (or maybe even scrap it and try something else, if we’re so far off the target, we see no value in continuing down that avenue).

This is my definition of “done”; we keep iterating until we hit the target, learning lessons with each release and getting it progressively less wrong.

Then we move on to the next business goal.

How I Do Requirements

The final question of our Twitter code craft quiz seems to have divided the audience.

The way I explain it is that the primary goal of code craft is to allow us to rapidly iterate our solutions, and to sustain the pace of iterating for longer. We achieve that by delivering a succession of production-ready prototypes – tested and ready for use – that are open to change based on the feedback users give us.

(Production-ready because we learned the hard way with Rapid Application Development that when you hit on a solution that works, you tend not to be given the time and resources to make it production-ready afterwards. And also, we discovered that production-ready tends not to cost much more or take much more time than “quick-and-dirty”. So we may as well.)

Even in the Agile community – who you’d think might know better – there’s often too much effort on trying to get it right first time. The secret sauce in Agile is that it’s not necessary. Agile is an iterative search algorithm. Our initial input – our first guess at what’s needed – doesn’t need to be perfect, or even particularly good. It might take us an extra couple of feedback loops if release #1 is way off. What matters more are:

  • The frequency of iterations
  • The number of iterations

Code craft – done well – is the enabler of rapid and sustainable iteration.

And, most importantly, iterating requires a clear and testable goal. Which, admittedly, most dev teams suffer a lack of.

To illustrate how I handle software requirements, imagine this hypothetical example that came up in a TDD workshop recently:

First, I explore with my customer a problem that technology might be able to help solve. We do not discuss solutions at all. It is forbidden at this stage. We work to formulate a simple problem statement.

Walking around my city, there’s a risk of falling victim to crime. How can I reduce that risk while retaining the health and enviromental benefits of walking?

The next step in this process is to firm up a goal, by designing a test for success.

A sufficiently large sample of people experience significantly less crime per mile walked than the average for this city.

This is really vital: how will we know our solution worked? How can we steer our iterative ship without a destination? The failure of so very many development efforts seems, in my experience, to stem from the lack of clear, testable goals. It’s what leads us to the “feature factory” syndrome, where teams end up working through a plan – e.g. a backlog – instead of working towards a goal.

I put a lot of work into defining the goal. At this point, the team aren’t envisioning technology solutions. We’re collecting data and refining measures for success. Perhaps we poll people in the city to get an estimate of average miles walked per year. Perhaps we cross-reference that with crimes statistics – freely available online – for the city, focusing on crimes that happened outside on the streets like muggings and assaults. We build a picture of the current reality.

Then we paint a picture of the desired future reality: what does the world look like with our solution in it? Again, no thought yet is given to what that solution might look like. We’re simply describing a solution-shaped hole into which it must fit. What impact do we want it to have on the world?

If you like, this is our overarching Given…When…Then…

Given that the average rate of street crime in our city is currently 1.2 incidents per 1,000 person-miles walked,

When people use our solution,

They should experience an average rate of street crime of less than 0.6 incidents per 1,000 miles walked

Our goal is to more than halve the risk for walkers who use our solution of being a victim of crime on the streets. Once we have a clear idea of where we’re aiming, only then do we start to imagine potential solutions.

I’m of the opinion that the best software developent organisations are informed gamblers. So, at this early stage I think it’s a good idea to have more than one idea for a solution. Don’t put all our eggs in one solution’s basket! So I might split the team up into pairs – dependending on how big the team is – and ask each pair to envisage a simple solution to our problem. Each pair works closely wth the customer while doing this, to get input and feedback on their basic idea.

Imagine I’m in Pair A: given a clear goal, how do we decide what features our solution will need? I always go for the headline feature first. Think of this is “the button the user would press to make their goal happen” – figuratively speaking. Pair A imagines a button that, given a start point and a destination, will show the user the walking route with the least reported street crime.

We write a user story for that:

As a walker, I want to see the route for my planned journey that has the least reported street crime, so I can get there safely.

The headline feature is important. It’s the thread we pull on that reveals the rest of the design. We need a street map we can use to do our search in. We need to know what the start point and destination are. We need crime statistics by street.

All of these necessary features are consequences of the headline feature. We don’t need a street map because the user wants a street map. We don’t need crime statistics because the user wants crime statistics. The user wants to see the safest walking route. As I tend to put it: nobody uses software because they want to log in. Logging in is a consequence of the real reason for using the software.

This splits features into:

  • Headline
  • Supporting

In Pair A, we flesh out half a dozen user stories driven by the headline feature. We work with our customer to storyboard key scenarios for these features, and refine the ideas just enough to give them a sense of whether we’re on the right track – that is, could this solve the problem?

We then come back together with the other pairs and compare our ideas, allowing the customer to decide the way forward. Some solution ideas will fall by the wayside at this point. Some will get merged. Or we might find that none of the ideas is in the ballpark, and go around again.

Once we’ve settled on a potential solution – described as a headline feature and a handful of supporting features – we reform as a team, and we’re in familiar territory now. We assign features to pairs. Each pair works with the customer to drive out the details – e.g., as customer tests and wireframes etc. They deliver in a disciplined way, and as soon as there’s working software the customer can actually try, they give it a whirl. Some teams call this a “Minimum Viable Product”. I call it Prototype #1 – the first of many.

Through user testing, we realise that we have no way of knowing if people got to their destination safely. So the next iteration adds a feature where users “check in” at their destination – Prototype #2.

We increase the size of the user testing group from 100 to 1,000 people, and learn that – while they on average felt safer from crime – some of the recommended walking routes required them to cross some very dangerous roads. We add data on road traffic accidents involving pedestrians for each street – Prototype #3.

With a larger testing group (10,000 people), we’re now building enough data to see what the figure is on incidents per 1000 person-miles, and it’s not as low as we’d hoped. From observing a selected group of suitably incentivised users, we realise that time of day makes quite a difference to some routes. We add that data from the crime statistics, and adapt the search to take time into account – Prototype #4.

And rinse and repeat…

The point is that each release is tested against our primary goal, and each subsequent release tries to move us closer to it by the simplest means possible.

This is the essence of the evolutionary design process described in Tom Gilb’s book Competitive Engineering. When we combine it with technical practices that enable rapid and sustained iteration – with each release being production-ready in case it needs to be ( let’s call it “productizing”), then that, in my experience, is the ultimate form of “requirements engineering”.

I don’t consider features or change requests beyond the next prototype. There’s no backlog. There is a goal. There is a product. And each iteration closes the gap between them.

The team is organised around achieving the goal. Whoever is needed is on the team, and the team works one goal at a time, one iteration at a time, to do what is necessary to achieve that iteration’s goal. Development, UX design, testing, documentation, operations – whatever is required to make the next drop production-ready – are all included, and they all revolve around the end goal.

 

When Are We ‘Done’? – What Iterating Really Means

This week saw a momentous scientific breakthrough, made possible by software. The Event Horizon Telescope – an international project that turned the Earth into a giant telescope – took the first real image of a super-massive black hole in the M87 galaxy, some 55 million light years away.

This story serves to remind me – whenever I need reminding – that the software we write isn’t an end in itself. We set out to achieve goals and to solve problems: even when that goal is to learn a particuar technology or try a new technique. (Yes, the point of FizzBuzz isn’t FizzBuzz itself. Somebody already solved that problem!)

The EHT image is the culmination of years of work by hundreds of scientists around the world. The image data itself was captured two years ago, on a super-clear night, coordinated by atomic clocks. Ever since then, the effort has been to interpret and “stitch together” the massive amount of image data to create the photo that broke the Internet this week.

Here’s Caltech computer scientist Katie Bouman, who designed the algorithm that pulled this incredible jigsaw together, explaining the process of photographing M87 last year.

From the news stories I’ve read about this, it sounds like much time was devoted to testing the results to ensure the resulting image had fidelity – and wasn’t just some software “fluke” – until the team had the confidence to release the image to the world.

They weren’t “done” after the code was written (you can read the code on Github). They weren’t “done” after the first result was achieved. They were “done” when they were confident they had achieved their goal.

This is a temporary, transient “done”, of course. EHT are done for now. But the work goes on. There are other black holes and celestial objects of interest. They built a camera: ain’t gonna take just the one picture with it, I suspect. And the code base has a dozen active pull requests, so somebody’s still working on it. The technology and the science behind it will be refined and improved, and the next picture will be better. But that’s the next goal.

I encourage teams to organise around achieving goals and solving problems together, working one goal at a time. (If there are two main goals, that’s two teams, as far as I’m concerned.) The team is defined by the goal. And the design process iterates towards that goal.

Iterating is goal-seeking – we’re supposed to be converging on something. When it’s not, then we’re not iterating; we’re just going around in circles. (I call it “orbiting” when teams deliver every week, over and over, but the problem never seems to get solved. The team is orbiting the problem.)

This is one level of team enlightment above a product focus. Focusing on products tends to produce… well, products. The goal of EHT was not to create a software imaging product. That happened as a side effect of achieving the main goal: to photograph the event horizon of a black hole.

Another really important lesson here is EHT’s definition of “team”: hundreds of people – physicists, astronomers, engineers, computer scientists, software and hardware folk – all part of the same multi-disciplinary team working towards the same goal. I’d be surprised if the software team at MIT referred to the astrophysicists as their “customer”. The “customer” is us – the world, the public, society, civilisation, and the taxpayers who fund science.

That got me to thinking, too: are our “customers” really our customers? Or are they part of the same team as us, defined by a shared end goal or a problem they’re tasked with solving?

Photographing a black hole takes physics, astronomy, optical engineering, mechanical and electrical and electronic engineering, software, computer networks, and a tonne of other stuff.

Delivering – say – print-on-demand birthday cards takes graphic design, copywriting, printing, shipping, and a tonne of other stuff. I genuinely believe we’re not “done” until the right card gets to the right person, and everyone involved in making that happen is part of the team.