What Can We Learn From The Movie Industry About Testing Feedback Loops?

In any complex creative endeavour – and, yes, software development is a creative endeavour – feedback is essential to getting it right (or, at least, less wrong).

The best development approaches tend to be built around feedback loops, and the last few decades of innovation in development practices and processes have largely focused on shrinking those feedback loops so we can learn our way to Better faster.

When we test our software, that’s a feedback loop, for example. Although far less common these days, there are still teams out there doing it manually. Their testing feedback loops can last days or even weeks. Many teams, though, write fast-running automated tests, and can test the bulk of their code in minutes or even seconds.

What difference does it make if your tests take days instead of seconds?

To illustrate, I’m going to draw a parallel with movie production. Up until the late 1960s, feedback loops in movie making were at best daily. Footage shot on film during the day were processed by a lab and then watched by directors, producers, editors and so on at the end of the day. Hence the movie industry term “dailies”. If a shot didn’t come out right – maybe the performance didn’t fit into the context of that scene with a previous scene (the classic “boom microphone in shot” or “character just ran 6 miles but is mysteriously not out of breath” spring to mind) – chances are the production team wouldn’t know until they saw the footage later.

That could well mean going back and reshooting some scenes. That means calling back the actors and the crew, and potentially remounting the whole thing if the sets have already been pulled down. Expensive. Sometimes prohibitively expensive, which is why lower-budget productions had little choice but to keep those shots in their theatrical releases.

In the 1960s, comedy directors like Jerry Lewis and Blake Edwards pioneered the use of Video assist. These were systems that enabled the same footage to be recorded simultaneously on film and on videotape, so actors and directors could watch takes back as soon as they’d been captured, and correct mistakes there and then when the actors, crew, sets and so on were all still there. Way, way cheaper than remounting.

The speed of testing feedback in software development has a similar impact. If I make a change that breaks the code, and my code is tested overnight, say, then I probably won’t know it’s broken until the next day (or the next week, or the next month, or the next year when a user reports the bug).

But I’ve already moved on. The sets have been dismantled, so to speak. To fix a bug long after the fact requires the equivalent of remounting a shoot in movies. Time has to be scheduled, the developer has to wrap their head around that code again, and the bug fix has to go through the whole testing and release process again. Far more expensive. Often orders of magnitude more expensive. Sometimes prohibitively expensive, which is why many teams ship software they know has bugs, but they just don’t have budget to fix them (or, at least, they believe they’re not worth fixing.)

If my code is tested in seconds, that’s like having Video assist. I can make one change and run the tests. If I broke the code, I’ll know there and then, and can easily fix it while I’m still in the zone.

Just as Video assist helps directors make better movies for less money, fast-running automated tests can help us deliver more reliable software with less effort. This is a measurable effect (indeed, it has been measured), so we know it works.

Stuck In Service-Oriented Hell? You Need Contract Tests

As our system architectures get increasingly distributed, many teams experience the pain of writing code that consumes services through APIs that are changing.

Typically, we don’t know that a non-backwards-compatible change to an external dependency has happened until our own code suddenly stops working. I see many organisations spending a lot of time fixing those problems. I shudder to think how much time and money, as an industry, we’re wasting on it.

Ideally, developers of APIs wouldn’t make changes that break contracts with client code. But this is not an ideal world.

What would be very useful is an early warning system that flags up the fact that a change we’ve made is going to break client code before we release it. As a general rule with bugs, the sooner we know, the cheaper they are to fix.

Contract tests are a good tool for getting that early warning. A contract test is a kind of integration test that focuses specifically on the interaction between our code and an external dependency. When they fail, they pinpoint the source of the problem: namely that something has likely changed at the other end.

There are different ways of writing contract tests, but one of favourites is to use Abstract Tests. Take this example from my Guitar Shack code:

package com.guitarshack;
import com.guitarshack.net.Network;
import com.guitarshack.net.RESTClient;
import com.guitarshack.net.RequestBuilder;
import com.guitarshack.sales.SalesData;
import org.junit.Test;
import java.util.Calendar;
import static org.junit.Assert.assertTrue;
/*
This Abstract Test allows us to create two versions of the set-up, one with
stubbed JSON and one that actually connects to the web service, effectively pinpointing
whether an error has occurred because of a change to our code or a change to the external dependency
*/
public abstract class SalesDataTestBase {
@Test
public void fetchesSalesData(){
SalesData salesData = new SalesData(new RESTClient(getNetwork(), new RequestBuilder()), () > {
Calendar calendar = Calendar.getInstance();
calendar.set(2019, Calendar.JANUARY, 31);
return calendar.getTime();
});
int total = salesData.getTotal(811);
assertTrue(total > 0);
}
protected abstract Network getNetwork();
}

The getNetwork() method is left abstract so it can be overridden in subclasses.

One implementation uses a stub implement of the Network interface that returns hardcoded JSON, so I can unit test most of my SalesData class.

package com.guitarshack.unittests;
import com.guitarshack.net.Network;
import com.guitarshack.SalesDataTestBase;
public class SalesDataUnitTest extends SalesDataTestBase {
@Override
protected Network getNetwork() {
return request > "{\"productID\":811,\"startDate\":\"7/17/2019\",\"endDate\":\"7/27/2019\",\"total\":31}";
}
}

Another implementation returns the real implementation of Network, called Web, and connects to a real web service hosted as an AWS Lambda.

package com.guitarshack.contracttests;
import com.guitarshack.*;
import com.guitarshack.net.Network;
import com.guitarshack.net.Web;
/*
If this test fails when the SalesDataUnitTest is still passing, this indicates a change
in the external API
*/
public class SalesDataContractTest extends SalesDataTestBase {
@Override
protected Network getNetwork() {
return new Web();
}
}

If the contract test suddenly starts failing while the unit test is still passing, that’s a big hint that the problem is at the other end.

To illustrate, I changed the output field ‘total’ to ‘salesTotal’ in the JSON being outputted from the web service. See what happens when I run my tests.

The contract test fails (as well as a larger integration test, that wouldn’t pinpoint the source of the problem as effectively), while the unit test version is still passing.

When I change ‘salesTotal’ back to ‘total’, all the tests pass again.

This is very handy for as a client code developer, writing code that consumes the sales data API. But it would be even more useful for the developer of that API to be able to run my contract tests, perhaps before a release, or as an overnight job, so they could get early warning that their changes have broken the contract.

For teams who are able to access each others’ code (e.g., on GitHub), that’s quite straightforward. I could rig up my Maven project to enable developers to build and run just those tests, for example. Notice that my unit and contract tests are in different packages to make that easy.

For teams who can’t access each other’s repos, it may take a little more ingenuity. But we’re probably used to seeing our code built and tested on other machines (e.g., on cloud CI servers) and getting the results back over the web. It’s not rocket science to offer Contract Testing as a Service. You could then give the API developers exclusive – possibly secured – access to your contract test builds over HTTP.

I’ve seen contract testing – done well – save organisations a lot of blood, sweat and tears. At the very least, it can defend API developers from breaking the First Law of Software Development:

Though shalt not break shit that was working

Jason Gorman

Don’t Succumb To Illusions Of Productivity

One thing I come across very often is development teams who have adopted processes or practices that they believe are helping them go faster, but that are probably making no difference, or even slowing them down.

The illusion of productivity can be very seductive. When I bash out code without writing tests, or without refactoring, it really feels like I’m getting sh*t done. But when I measure my progress more objectively, it turns out I’m not.

That could be because typing code faster – without all those pesky interruptions – feels like delivering working software faster. But it usually takes longer to get something working when we take less care.

We seem hardwired not to notice how much time we spend fixing stuff later that didn’t need to be broken. We seem hardwired not to notice the team getting bigger and bigger as the bug count and the code smells and the technical debt pile up. We seem hardwired not to notice the merge hell we seem to end up in every week as developers try to get their changes into the trunk.

We just feel like…

Getting sh*t done

Not writing automated tests is one classic example. I mean, of course unit tests slow us down! It’s, like, twice as much code! The reality, though, is that without fast-running regression tests, we usually end up spending most of our time fixing bugs when we could be adding value to the product. The downstream costs typically outweigh the up-front investment in unit tests. Skipping tests is almost always a false economy, even on relatively short projects. I’ve measured myself with and without unit tests, and on ~1 hour exercises, and I’m slightly faster with them. Typing is not the bottleneck.

Another example is when teams mistakenly believe that working on separate branches of the code will reduce bottlenecks in their delivery pipelines. Again, it feels like we’re getting more done as we hack away in our own isolated sandboxes. But this, too, is an illusion. It doesn’t matter how many lanes the motorway has if every vehicle has to drive on to the same ferry at the end of it. No matter how many parallel dev branches you have, there’s only one branch deployments can be made from, and all those parallel changes have to somehow make it into that branch eventually. And the less often developers merge, the more changes in each merge. And the more changes in each merge, the more conflicts. And, hey presto, merge hell.

Closely related is the desire of many developers to work without “interruptions”. It may feel like sh*t’s getting done when the developers go off into their cubicles, stick their noise-cancelling headphones on, and hunker down on a problem. But you’d be surprised just how much communication and coordination’s required to avoid some serious misunderstandings. I recall working on a team where we ended up with three different architectures and four customer tables in the database, because my colleagues felt that standing around a whiteboard drawing pictures – or, as they called it, “meetings” – was a waste of valuable Getting Sh*t Done time. With just a couple of weeks of corrective work, we were able to save ourselves 20 minutes around a whiteboard. Go us!

I guess my message is simple. In software development, productivity doesn’t look like this:

Don’t be fooled by that illusion.

The Jason’s Guitar Shack kata – Part I (Core Logic)

This week, I’ve been coaching developers for an automotive client in Specification By Example (or, as I call it these days, “customer-driven TDD”).

The Codemanship approach to software design and development has always been about solving problems, as opposed to building products or delivering features.

So I cooked up an exercise that starts with a customer with a business problem, and tasked pairs to work with that customer to design a simple system that might solve the problem.

It seems to have gone well, so I thought I’d share the exercise with you for you to try for yourselves.

Jason’s Guitar Shack

I’m a retired international rock legend who has invested his money in a guitar shop. My ex-drummer is my business partner, and he runs the shop, while I’ve been a kind of silent partner. My accountant has drawn my attention to a problem in the business. We have mysterious gaps in the sales of some of our most popular products.

I can illustrate it with some data I pulled off our sales system:

DateTimeProduct IDQuantityPrice Charged
13/07/201910:477571549
13/07/201912:157571549
13/07/201917:238111399
14/07/201911:454491769
14/07/201913:378111399
14/07/201915:018111399
15/07/201909:267571549
15/07/201911:558111399
16/07/201910:3337411199
20/07/201914:074491769
22/07/201911:284491769
24/07/201910:178112798
24/07/201915:318111399
Product sales for 4 selected guitar models

Product 811 – the Epiphone Les Paul Classic in worn Cherry Sunburst – is one of our biggest sellers.

Epiphone Les Paul Classic Worn in Heritage Cherry | GAK

We sell one or two of these a day, usually. But if you check out the sales data, you’ll notice that between July 15th and July 24th, we didn’t sell any at all. These gaps appear across many product lines, throughout the year. We could be losing hundreds of thousands of pounds in sales.

After some investigation, I discovered the cause, and it’s very simple: we keep running out of stock.

When we reorder stock from the manufacturer or other supplier, it takes time for them to fulfil our order. Every product has a lead time on delivery to our warehouse, which is recorded in our warehouse system.

DescriptionPrice (£)StockRack SpaceManufacturer Delivery Lead Time (days)Min Order
Fender Player Stratocaster w/ Maple Fretboard in Buttercream54912201410
Fender Deluxe Nashville Telecaster MN in 2 Colour Sunburst769510215
Ibanez RG652AHMFX-NGB RG Prestige Nebula Green Burst (inc. case)119925601
Epiphone Les Paul Classic In Worn Heritage Cherry Sunburst39922301420
Product supply lead times for 4 selected guitars

My business partner – the store manager – typically only reorders stock when he realises we’ve run out (usually when a customer asks for it, and he checks to see if we have any). Then we have no stock at all while we wait for the manufacturer to supply more, and during that time we lose a bunch of sales. In this age of the Electric Internet, if we don’t have what the customer wants, they just take out their smartphone and buy it from someone else.

This is the business problem you are tasked with solving: minimise lost sales due to lack of stock.

There are some wrinkles to this problem, of course. We could solve it by cramming our warehouse full of reserve stock. But that would create a cash flow problem for the business, as we have bills to pay while products are gathering dust on our shelves. So the constraint here is, while we don’t want to run out of products, we actually want as few in stock as possible, too.

The second wrinkle we need to deal with is that sales are seasonal. We sell three times as much of some products in December as we do in August, for example. So any solution would need to take that into account to reduce the risk of under- or over-stocking.

So here’s the exercise for a group of 2 or more:

  • Nominate someone in your group as the “customer”. They will decide what works and what doesn’t as far as a solution is concerned.
  • Working with your customer, describe in a single sentence a headline feature – this is a simple feature that solves the problem. (Don’t worry about how it works yet, just describe what it does.)
  • Now, think about how your headline feature would work. Describe up to a maximum of 5 supporting features that would make the headline feature possible. These could be user-facing features, or internal features used by the headline feature. Remember, we’re trying to design the simplest solution possible.
  • For each feature, starting with the headline feature, imagine the scenarios the system would need to handle. Describe each scenario as a simple headline (e.g., “product needs restocking”). Build a high-level test list for each feature.
  • The design and development process now works one feature at a time, starting with the headline feature.
    • For each feature’s scenario, describe in more detail how that scenario will work. Capture the set-up for that scenario, the action or event that triggers the scenario, and the outcomes the customer will expect to happen as a result. Feel free to use the Given…When…Then style. (But remember: it’s not compulsory, and won’t make any difference to the end result.)
    • For each scenario, capture at least one example with test data for every input (every variable in the set-up and every parameter of the action or event), and for every expected output or outcome. Be specific. Use the sample data from our warehouse and sales systems as a starting point, then choose values that fit your scenario.
    • Working one scenario at a time, test-drive the code for its core logic using the examples, writing one unit test for each output or outcome. Organise and name your tests and test fixture so it’s obvious which feature, which scenario and which output or outcome they are talking about. Try as much as possible to choose names that appear in the text you’ve written with your customer. You’re aiming for unit tests that effectively explain the customer’s tests.
    • Use test doubles – stubs and mocks – to abstract external dependencies like the sales and warehouse systems, as well as to Fake It Until You Make it for any supporting logic covered by features you’ll work on later.

And that’s Part I of the exercise. At the end, you should have the core logic of your solution implemented and ready to incorporate into a complete working system.

Here’s a copy of the sample data I’ve been using with my coachees – stick close to it when discussing examples, because this is the data that your system will be consuming in Part II of this kata, which I’ll hopefully write up soon.

Good luck!

Readable Parameterized Tests

Parameterized tests (sometimes called “data-driven tests”) can be a useful technique for removing duplication from test code, as well as potentially buying teams much greater test assurance with surprisingly little extra code.

But they can come at the price of readability. So if we’re going to use them, we need to invest some care in making sure it’s easy to understand what the parameter data means, and to ensure that the messages we get when tests fail are meaningful.

Some testing frameworks make it harder than others, but I’m going to illustrate using some mocha tests in JavaScript.

Consider this test code for a Mars Rover:

it("turns right from N to E", () => {
let rover = {facing: "N"};
rover = go(rover, "R");
assert.equal(rover.facing, "E");
})
it("turns right from E to S", () => {
let rover = {facing: "E"};
rover = go(rover, "R");
assert.equal(rover.facing, "S");
})
it("turns right from S to W", () => {
let rover = {facing: "S"};
rover = go(rover, "R");
assert.equal(rover.facing, "W");
})
it("turns right from W to N", () => {
let rover = {facing: "W"};
rover = go(rover, "R");
assert.equal(rover.facing, "N");
})
view raw rover_test.js hosted with ❤ by GitHub

These four tests are different examples of the same behaviour, and there’s a lot of duplication (I should know – I copied and pasted them myself!)

We can consolidate them into a single parameterised test:

[{input: "N", expected: "E"}, {input: "E", expected: "S"}, {input: "S", expected: "W"},
{input: "W", expected: "N"}].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.input};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.expected);
})
})
view raw rover_test.js hosted with ❤ by GitHub

While we’ve removed a fair amount of duplicate test code, arguably this single parameterized test is harder to follow – both at read-time, and at run-time.

Let’s start with the parameter names. Can we make it more obvious what roles these data items play in the test, instead of just using generic names like “input” and “expected”?

[{startsFacing: "N", endsFacing: "E"}, {startsFacing: "E", endsFacing: "S"}, {startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

And how about we format the list of test cases so they’re easier to distinguish?

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

And how about we declutter the body of the test a little by destructuring the testCase object?

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function ({startsFacing, endsFacing}) {
it("turns right", () => {
let rover = {facing: startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

Okay, hopefully this is much easier to follow. But what happens when we run these tests?

It’s not at all clear which test case is which. So let’s embed some identifying data inside the test name.

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function ({startsFacing, endsFacing}) {
it(`turns right from ${startsFacing} to ${endsFacing}`, () => {
let rover = {facing: startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

Now when we run the tests, we can easily identify which test case is which.

With a bit of extra care, it’s possible with most unit testing tools – not all, sadly – to have our cake and eat it with readable parameterized tests.

Pull Requests & Defensive Programming – It’s All About Trust

A very common thing I see on development teams is reliance on code reviews for every check-in (in this age of Git-Everything, often referred to as “Pull Requests”). This can create bottlenecks in the delivery process, as our peers are pulled away from their own work and we have to wait for their feedback. And, often, the end result is that these reviews are superficial at best, missing a tonne of problems while still holding up delivery.

Pull Request code reviews on a busy team

But why do we do these reviews in the first place?

I think of it in programming terms. Imagine a web service. It has a number of external clients that send it requests via the Web.

Some clients can be trusted, others not

These client apps were not written by us. We have no control over their code, and therefore can’t guarantee that the requests they’re sending will be valid. There’s a need, therefore, to validate these requests before acting on them.

This is what we call defensive programming, and in these situations where we cannot trust the client to call us right, it’s advisable.

Inside our firewall, our web service calls a microservice. Both services are controlled by us – that is, we’re writing both client and server code in this interaction.

Does the microservice need to validate those requests? Not if we can be trusted to write client code that obeys the contract.

In that case, a more appropriate style of programming might be Design By Contract. Clients are trusted to satisfy the preconditions of service calls before they call them: in short, if it ain’t valid, they don’t call, and the supplier doesn’t need to waste time – and code – checking the requests. That’s the client’s job.

Now let’s project these ideas on to code reviews. If a precondition of merging to the main branch is that your code satisfies certain code quality preconditions – test coverage, naming, simplicity etc – then we have two distinct potential situations:

  • The developer checking in can be trusted not to break those preconditions (e.g., they never check in code that doesn’t have passing tests)
  • The developer can’t be trusted not to break them

In an open source code base, we have a situation where potentially anyone can attempt to contribute. The originators of that code base – e.g., Linus – have no control over who tries to push changes to the main branch. So he defends the code base – somewhat over-enthusiastically, perhaps – from bad inputs like our web service defends our system from bad requests.

In a closed-source situation, where the contributors have been chosen and we can exercise some control over who can attempt to check in changes, a different situation may arise. Theoretically, we hired these developers because we believe we can trust them.

I personally do not check in code that doesn’t have good, fast-running, passing automated tests. I personally do not check in spaghetti code (unless it’s for a workshop on refactoring spaghetti code). If we agree what the standards are for our code, I will endeavour not to break them. I may also use tools to help me keep my code clean pre-check-in. I’m the web service talking to the microservice in that diagram. I’m a trusted client.

But not all developers can be trusted not to break the team’s coding standards. And that’s the problem we need to be addressing here. Ask not so much “How do we do Pull Requests?”, but rather “Why do we need to do Pull Requests?” There are some underlying issues about skills and about trust.

Pull Requests are a symptom, not a solution.

Codemanship Code Craft Videos

Over the last 6 months, I’ve been recording hands-on tutorials about code craft – TDD, design principles, refactoring, CI/CD and more – for the Codemanship YouTube channel.

I’ve recorded the same tutorials in JavaScript, Java, C# and (still being finished) Python.

As well as serving as a back-up for the Codemanship Code Craft training course, these series of videos forms possibly the most comprehensive free learning resource on the practices of code craft available anywhere.

Each series has over 9 hours of video, plus links to example code and other useful resources.

Codemanship Code Craft videos currently available

I’ve heard from individual developers and teams who’ve been using these videos as the basis for their practice road map. What seems to work best is to watch a video, and then straight away try out the ideas on a practical example (e.g., a TDD kata or a small project) to see how they can work on real code.

In the next few weeks, I’ll be announcing Codemanship Code Craft Study Groups, which will bring groups of like-minded learners together online once a week to watch the videos and pair program on carefully designed exercises with coaching from myself.

This will be an alternative way of receiving our popular training, but with more time dedicated to hands-on practice and coaching, and more time between lessons for the ideas to sink in. It should also be significantly less disruptive than taking a whole team out for 3 days for a regular training course, and significantly less exhausting than 3 full days of Zoom meetings! Plus the price per person will be the same as the regular Code Craft course.

Leading By Example

We’re used to the idea of leaders saying “Do as I say, not as I do”. Politicians, for example, are notorious for their double standards.

But the long-term effect of leaders not being seen to “eat their own dog food” is that it undermines the faith of those being led in their leaders and in their policies.

When we see public servants avoiding tax, we assume “Well, if it’s good enough for them…”, and before you know it you’ve got industrial-scale tax avoidance going on.

When we see government advisors breaking their own lockdown rules, we think “Actually, I quite fancy a trip to Barnard’s Castle”, and before you know it, lockdown has broken down.

When we see self-proclaimed socialists sending their children to private schools, we think “Obviously, state schools must be a bit crap” and start ordering prospectuses for Eton and Harrow.

And when we see lead developers who don’t follow their own advice, we naturally assume that the advice doesn’t apply to any of us.

If you want your team to write tests first, write your tests first. If you want your team to merge to trunk frequently, merge to trunk frequently. If you want your team to be kind in code reviews, be kind in your code reviews.

As Gandhi once put it: be the change you want to see in the world. Be the developer you want to see on your team.

This means that, among the many qualities that make a good lead developer, the willingness to roll up your sleeves and lead from the front is essential. Teams can see when the rules you impose on them don’t seem to apply to you, and it undermines those rules and your authority. That authority has to be earned.

This is why I’ve made damn sure that every single idea people learn on a Codemanship course – even the more “out there” ideas – is something I’ve applied successfully on real teams, and why I demonstrate rather than just present those ideas whenever possible. You can make any old nonsense seem viable on a PowerPoint slide.

As a trainer and mentor – and mentoring is a large part of leading a development team – I choose to lead by example because, after 3 decades working with developers, I’ve found that to be most effective. Don’t tell them. Show them.

This puts an onus on lead developers to do the legwork when new ideas and unfamiliar technologies need to be explored. If you need to get your legacy COBOL programmers writing unit tests, then it’s time to learn some COBOL and write some COBOL unit tests. This is another kind of leading by example. Getting out of your comfort zone can serve as an example for teams who are maybe just a little too comfortable in theirs.

And this extends beyond programming languages and technical practices. If you believe your team need a better work-life balance, don’t mandate a better work-life balances and then stay at your desk until 8pm every day. Go home at 5:30pm. Show them it’s fine to do that. Show them it’s fine to learn new skills during office hours. Show them it’s fine to switch your phone off and not check your emails when you’re on holiday. Show them that you don’t marginalise people on your team because of, say, their gender or ethnic background. Show them that you don’t act inappropriately at the Christmas party. Show them that you actively consider questions of ethics in your work.

Whatever it is you want the team to be, that’s what you need to be, because there are far too many people saying and not doing in this world.

Slow Tests Kill Businesses

I’m always surprised at how few organisations track some pretty fundamental stats about software development, because if they did then they might notice what’s been killing their business.

It’s a picture I’ve seen many, many times; a software product or system is created, and it goes live. But it has bugs. Many bugs. So, a bigger chunk of the available development time is used up fixing bugs for the second release. Which has even more bugs. Many, many bugs. So an even bigger chunk of the time is used to fix bugs for the third release.

It looks a little like this:

Over the lifetime of the software, the proportion of development time devoted to bug fixing increases until that’s pretty much all the developers are doing. There’s precious little time left for new features.

Naturally, if you can only spare 10% of available dev time for new features, you’re going to need 10 times as many developers. Right? This trend is almost always accompanied by rapid growth of the team.

So the 90% of dev time you’re spending on bug fixing is actually 90% of the time of a team that’s 10x as large – 900% of the cost of your first release, just fixing bugs.

So every new feature ends up in real terms costing 10x in the eighth release what it would have in the first. For most businesses, this rules out change – unless they’re super, super successful (i.e., lucky). It’s just too damned expensive.

And when you can’t change your software and your systems, you can’t change the way you do business at scale. Your business model gets baked in – petrified, if you like. And all you can do is throw an ever-dwindling pot of money at development just to stand still, while you watch your competitors glide past you with innovations you’ll never be able to offer your customers.

What happens to a business like that? Well, they’re no longer in business. Customers defected in greater and greater numbers to competitor products, frustrated by the flakiness of the product and tired of being fobbed off with promises about upgrades and hotly requested features and fixes that never arrived.

Now, this effect is entirely predictable. We’ve known about it for many decades, and we’ve known the causal mechanism, too.

Source: IBM System Science Institute

The longer a bug goes undetected, exponentially the more it costs to fix. In terms of process, the sooner we test new or changed code, the cheaper the fix is. This effect is so marked that teams actually find that if they speed up testing feedback loops – testing earlier and more often – they deliver working software faster.

This is very simply because they save more time downstream on bug fixes than they invest in earlier and more frequent testing.

The data used in the first two graphs was taken from a team that took more than 24 hours to build and test their code.

Here’s the same stats from a team who could build and test their code in less than 2 minutes (I’ve converted from releases to quarters to roughly match the 12-24 week release cycles of the first team – this second team was actually releasing every week):

This team has nearly doubled in size over the two years, which might sound bad – but it’s more of a rosy picture than the first team, whose costs spiraled to more than 1000% of their first release, most of which was being spent fixing bugs and effectively going round and round in circles chasing their own tails while their customers defected in droves.

I’ve seen this effect repeated in business after business – of all shapes and sizes: software companies, banks, retail chains, law firms, broadcasters, you name it. I’ve watched $billion businesses – some more than a century old – brought down by their inability to change their software and their business-critical systems.

And every time I got down to the root cause, there they were – slow tests.

Every. Single. Time.

Where’s User Experience In Your Development Process?

I ran a little poll through the Codemanship twitter account yesterday, and thought I’d share the result with you.

There are two things that strike me about the results. Firstly, it looks like teams who actively involve user experience experts throughout the design process are very much in the minority. To be honest, this comes as no great surprise. My own observations of development teams over years tend to see UXD folks getting involved early on – often before any developers are involved, or any customer tests have been discussed – in a kind of a Waterfall fashion. “We’re agile. But the user interface design must not change.”

To me, this is as nonsensical as those times when I’ve arrived on a project that has no use cases or customer tests, but somehow magically has a very fleshed-out database schema that we are not allowed to change.

Let’s be clear about this: the purpose of the user experience is to enable the user to achieve their goals. That is a discussion for everybody involved in the design process. It’s also something that is unlikely we’ll get right first time, so iterating the UXD multiple times with the benefit of end user feedback almost certainly will be necessary.

The most effective teams do not organise themselves into functional silos of requirements analysis, UXD, architecture, programming, security, data management, testing, release and operations & support and so on, throwing some kind of output (a use case, a wireframe, a UML diagram, source code, etc) over the wall to the next function.

The most effective teams organise themselves around achieving a goal. Whoever’s needed to deliver on that should be in the room – especially when those goals are being discussed and agreed.

I could have worded the question in my poll “User Experience Designers: when you explore user goals, how often are the developers involved?” I suspect the results would have been similar. Because it’s the same discussion.

On a movie production, you have people who write scripts, people who say the lines, people who create sets, people who design costumes, and so on. But, whatever their function, they are all telling the same story.

The realisation of working software requires multiple disciplines, and all of them should be serving the story. The best teams recognise this, and involve all of the disciplines early and throughout the process.

But, sadly, this still seems quite rare. I hear lip service being paid, but see little concrete evidence that it’s actually going on.

The second thing I noticed about this poll is that, despite several retweets, the response is actually pretty low compared to previous polls. This, I suspect, also tells a story. I know from both observation and from polls that teams who actively engage with their customers – let alone UXD professionals etc – in their BDD/ATDD process are a small minority (maybe about 20%). Most teams write the “customer tests” themselves, and mistake using a BDD tool like Cucumber for actually doing BDD.

But I also get a distinct sense, working with many dev teams, that UXD just isn’t on their radar. That is somebody else’s problem. This is a major, major miscalculation – every bit as much as believing that quality assurance is somebody else’s problem. Any line of code that doesn’t in some way change the user’s experience – and I use the term “user” in the wider sense that includes, for example, people supporting the software in production, who will have their own user experience – is a line of code that should be deleted. Who is it for? Whose story does it serve?

We are all involved in creating the user experience. Bad special effects can ruin a movie, you know.

We may not all be qualified in UXD, of course. And that’s why the experts need to be involved in the ongoing design process, because UX decisions are being taken throughout development. It only ends when the software ends (and even that process – decommissioning – is a user experience).

Likewise, every decision a UI designer takes will have technical implications, and they may not be the experts in that. Which is why the other disciplines need to be involved from the start. It’s very easy to write a throwaway line in your movie script like “Oh look, it’s Bill, and he’s brought 100,000 giant fighting robots with him”, but writing 100,000 giant fighting robots and making 100,000 giant fighting robots actually appear on the screen are two very different propositions.

So let’s move on from the days of developers being handed wire-frames and told to “code this up”, and from developers squeezing input validation error messages into random parts of web forms, and bring these – and all the other – disciplines together into what I would call a “development team”.