Inner-Loop Agility (or “Why Your Agile Transformation Failed”)

Over the last couple of decades, I’ve witnessed more than my fair share of “Agile transformations”, and seen most of them produce disappointing results. In this post, I’m going to explain why they failed, and propose a way to beat the trend.

First of all, we should probably ask ourselves: what is an Agile transformation? This might seem like an obvious question, but you’d be surprised just how difficult it is to pin down any kind of accepted definition.

For some, it’s a process of adopting certain processes and practices, like the rituals of Scrum. If we do the rituals, then we’re Agile. Right?

Not so fast, buddy!

This is what many call “Cargo Cult Agility”. If we wear the right clothes and make offerings to the right gods, we’ll be Agile.

If we lose the capital “A”, and talk instead about agility, what is the goal of an agile transformation? To enable organisations to change direction quickly, I would argue.

How do we make organisations more responsive to change? The answer lies in that organisation’s feedback loops.

In software development, the most important feedback loop comes from delivering working software and systems to end users. Until our code hits the real world, it’s all guesswork.

So if we can speed up our release cycles so we can get more feedback sooner, and maintain the pace of those releases for as long as the business needs us to – i.e., the lifetime of that software – then we can effectively out-learn our competition.

Given how important the release cycle is, then, it’s no surprise that most Agile (with a capital “A”) transformations tend to focus on that feedback loop. But this is a fundamental mistake. The release cycle contains inner loops – wheels within wheels within wheels. If our goal is to speed up this outer feedback loop, we should be focusing most of our attention on the innermost feedback loops.

To understand why, let’s think about how we go about speeding up nested loops in code.

for (Release release:
releases) {
Thread.sleep(10);
System.out.println("RELEASE");
for (Feature feature:
release.features) {
Thread.sleep(10);
System.out.println("–FEATURE");
for (Scenario scenario:
feature.scenarios) {
Thread.sleep(10);
System.out.println("—-SCENARIO");
for (BuildAndTest buildAndTest:
scenario.buildAndTestCycles) {
Thread.sleep(1);
System.out.println("——BUILD & TEST");
}
}
}
}

Here’s some code that loops through a collection of releases. Each release loops through a list of features, and each feature has a list of scenarios that the system has to handle to implement that feature. For each scenario, it runs a build & test cycle multiple times. It’s a little model of a software development process.

Think of the development process as a set of gears. The largest gear turns the slowest, and drives a smaller, faster gear, which drives an even smaller and faster gear and so on.

In each loop, I’ve built in a delay of 10 ms to approximate the overhead of performing that particular loop (e.g., 10 ms to plan a release).

When I run this code, it takes 1 m 53 s to execute. Our release cycles are slow.

Now, here’s where most Agile transformations go wrong. They focus most of their attention on those outer loops. This produces very modest improvements in release cycle time.

Let’s “optimise” the three outer loops, reducing the delay by 90%.

for (Release release:
releases) {
Thread.sleep(1);
System.out.println("RELEASE");
for (Feature feature:
release.features) {
Thread.sleep(1);
System.out.println("–FEATURE");
for (Scenario scenario:
feature.scenarios) {
Thread.sleep(1);
System.out.println("—-SCENARIO");
for (BuildAndTest buildAndTest:
scenario.buildAndTestCycles) {
Thread.sleep(1);
System.out.println("——BUILD & TEST");
}
}
}
}

When I run this optimised code, it executes in 1 m 44 s. That’s only a 9% improvement in release cycle time, and we had to work on three loops to get it.

This time, let’s ignore those outer loops and just work on the innermost loop – build & test.

Now it finished in just 22 seconds. That’s an 81% improvement, just from optimising that innermost loop.

When we look at the output from this code, it becomes obvious why.

RELEASE
--FEATURE
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST

Of course, this is a very simplistic model of a much more complex reality, but the principle at any scale works just as well, and the results I’ve seen over the years bear it out: to reduce release cycle times, focus your attention on the innermost feedback loops. I call this Inner-Loop Agility.

Think of the micro-iterations of Test-Driven Development, refactoring and Continuous Integration. They all involve one key step – the part where we find out if the software works – which is to build and test it. We test it at every green light in TDD. We test it after every refactoring. We test it before we check in our changes (and afterwards, on a build server to rule out configuration differences with our desktops).

In Agile Software Development, we build and test our code A LOT – many times an hour. And we can only do this if building and testing our code is fast. If it takes an hour, then we can’t have Inner Loop Agility. And if we can’t have Inner-Loop Agility, we can’t have fast release cycles.

Of course, we could test less often. That always ends well. Here’s the thing, the more changes we make to the code before we test it, the more bugs we introduce and then catch later. The later we catch bugs, the more they cost to fix. When we test less often, we tend to end up spending more and more of our cycle time fixing bugs.

It’s not uncommon for teams to end up doing zero-feature releases, where there’s just a bunch of bug fixes and no value-add for the customer in each release.

A very common end result of a costly Agile transformation is often little more than Agility Theatre. Sure, we do the sprints. We have the stand-ups. We estimate the story points. But’s it ends up being all work and little useful output in each release. The engine’s at maximum revvs, but our car’s going nowhere.

Basically, the gears of our development process are the wrong way round.

Organisations who optimise their outer feedback loops but neglect the inner loops are operating in a “lower gear”.

There’s no real mystery about why Agile transformations tend to focus most of their attention on the outer feedback loops.

Firstly, the people signing the cheques understand those loops, and can actively engage with them – in the mistaken belief that agility is all about them.

Secondly, the $billion industry – the “Agile-Industrial Complex” – that trains and mentors organisations during these transformations is largely made up of coaches and consultants who have either a lapsed programming background, or no programming background at all. In a sample of 100 Agile Coach CV’s, I found that 70% had no programming background, and a further 20% hadn’t done it for at least a decade. 90% of Agile Coaches can’t help you with the innermost feedback loops. Or to put it more bluntly, 90% of Agile Coaches focus on the feedback loops that deliver the least impressive reductions in release cycle time.

Just to be clear, I’m not suggesting these outer feedback loops don’t matter. There’s usually much work to be done at all levels from senior management down to help organisations speed up their cycle times, and to attempt it without management’s blessing is typically folly. Improving build and test cycles requires a very significant investment – in skills, in time, in resource – and that shouldn’t be underestimated.

But to focus almost exclusively on the outer feedback loops produces very modest results, and it’s arguably where Agile transformations have gained their somewhat dismal reputation among business stakeholders and software professionals alike.

Code Craft – The Proof of the Pudding

In extended code craft training, I work with pairs on a multi-session exercise called “Jason’s Guitar Shack”. They envision and implement a simple solution to solve a stock control problem for a fictional musical instrument store, applying code craft disciplines like Specification By Example, TDD and refactoring as they do it.

The most satisfying part for me is that, at the end, there’s a demonstrable pay-off – a moment where we review what they’ve created and see how the code is simple, readable, low in duplication and highly modular, and how it’s all covered by a suite of good – as in, good at catching it when we break the code – and fast-running automated tests.

We don’t explicitly set out to achieve these things. They’re something of a self-fulfilling prophecy because of the way we worked.

Of course all the code is covered by automated tests: we wrote the tests first, and we didn’t write any code that wasn’t required to pass a failing test.

Of course the code is simple: we did the simplest things to pass our failing tests.

Of course the code is easy to understand: we invested time establishing a shared language working directly with our “customer” that subconsciously influenced the names we chose in our code, and we refactored whenever code needed explaining.

Of course the code is low in duplication: we made a point of refactoring to remove duplication when it made sense.

Of course the code is modular: we implemented it from the outside in, solving one problem at a time and stubbing and mocking the interfaces of other modules that solved sub-problems – so all our modules do one job, hide their internal workings from clients – because to begin with, there were no internal workings – and they’re swappable by dependency injection. Also, their interfaces were designed from the client’s point of view, because we stubbed and mocked them first so we could test the clients.

Of course our tests fail when the code is broken: we specifically made sure they failed when the result was wrong before we made them pass.

Of course most of our tests run fast: we stubbed and mocked external dependencies like web services as part of our outside-in TDD design process.

All of this leads up to our end goal: the ability to deploy new iterations of the software as rapidly as we need to, for as long as we need to.

With their code in version control, built and tested and potentially deployed automatically when they push their changes to the trunk branch, that process ends up being virtually frictionless.

Each of these pay-offs is established in the final few sessions.

First, after we’ve test-driven all the modules in our core logic and the integration code behind that, we write a single full integration test – wiring all the pieces together. Pairs are often surprised – having never tested them together – that it works first time. I’m not surprised. We test-drove the pieces of the jigsaw from the outside in, explicitly defining their contracts before implementing them. So – hey presto – all the pieces fit.

Then we do code reviews to check if the solution is readable, low in duplication, as simple as we could make it, and that the code is modular. Again, I’m not surprised when we find that the code ticks these boxes, even though we didn’t mindfully set out to do so.

Then we measure the code coverage of the tests – 100% or very near. Again, I’m not surprised, even though that was never the goal. But just because 100% of our code is covered by tests, does that mean it’s really being tested. So we perform mutation testing on the code. Again, the coverage is very high. These are test suites that should give us confidence that the code really works.

The final test is to measure the cycle time from completing a change to seeing it production. How long does it take to test, commit, push, build & re-test and then deploy changes into the target environment? The answer is minutes. For developers whose experience of this process is that it can take hours, days or even weeks to get code into production, this is a revelation.

It’s also kind of the whole point. Code craft enables rapid and sustained innovation on software and systems (and the business models that rely on them).

Now, I can tell you this in a 3-day intensive training course. But the extended training – where I work with pairs in weekly sessions over 10-12 weeks – is where you actually get to see it for yourself.

If you’d like to talk about extended code craft training for your team, drop me a line.

‘Agility Theatre’ Keeps Us Entertained While Our Business Burns

I train and coach developers and teams in the technical practices of Agile Software Development like Test-Driven Development, Refactoring and Continuous Integration. I’m one of a rare few who exclusively does that. Clients really struggle to find Agile technical coaches these days.

There seems to be no shortage of help on the management practices and the process side of Agile, though. That might be a supply-and-demand problem. A lot of “Agile transitions” seem to focus heavily on those aspects, and the Agile coaching industry has risen to meet that demand with armies of certified bods.

I’ve observed, though, that without effective technical practices, agility eludes those organisations. You can have all the stand-ups and planning meetings and burn-down charts and retrospectives you like, but if your teams are unable to rapidly and sustainably evolve your software, it amounts to little more than Agility Theatre.

Agility Theatre is when you have all the ceremony of Agile Software Development, but none of the underlying technical discipline. It’s a city made of chipboard facades, painted to look like the real thing to the untrained eye from a distance.

In Agile Software Development, there’s one metric that matters: how much does it cost to change our minds? That’s kind of the point. In this rapidly changing, constantly evolving world, the ability to adapt matters. It matters more than executing a plan. Because plans don’t last long in the 21st century.

I’ve watched some pretty big, long-established, hugely successful companies brought down ultimately by their inability to change their software and core systems.

And I’ve measured the difference the technical practices can make to that metric.

Teams who write automated tests after the code being tested tend to find that the cost of changing their software rises exponentially over the average lifespan of 8 years. I know exactly what causes this. Test-after tends to produce a surfeit of tests that hit external dependencies like databases and web services, and test suites that run slow.

If your tests run slow, then you’ll test less often, which means bugs will be caught later, when they’re more expensive to fix.

Teams whose test suites run slow end up spending more and more of their time – and your money – fixing bugs. Until, one day, that’s pretty much all they’re doing.

Teams who write their tests first have a tendency to end up with fast-running test suites. It’s a self-fulfilling prophecy – using unit tests as specifications unsurprisingly produces code that is inherently more unit-testable, as we’re forced to stub and mock those expensive external dependencies.

This means teams that go test-first can test more frequently, catching bugs much sooner, when they’re orders of magnitude cheaper to fix. Teams who go test-first spend a lot less time fixing bugs.

The upshot of all this is that teams who go test-first tend to have a much shallower cost-of-change curve, allowing them sustain the pace of software evolution for longer. Basically, they outrun the test-after teams.

Now, I’m not going to argue that breaking work down into smaller batch sizes and scheduling deliveries more frequently can’t make a difference. But what I will argue is that if the technical discipline is lacking, all that will do is enable you to observe – in almost real time – the impact of a rising cost of change.

You’ll be in a car, focusing on where to go next, while your Miles Per Gallon rises exponentially. You reach a point where the destination doesn’t matter, because you ain’t going nowhere.

As the cost of changes rises, it piles on the risk of building the wrong thing. Trying to get it right first time is antithetical to an evolutionary approach. I’ve worked with analysts and architects who believed they could predict the value of a feature set, and went to great lengths to specify the Right Thing. In the final reckoning, they were usually out by a country mile. No matter how hard we try to predict the market, ultimately it’s all just guesswork until our code hits the real world.

So the ability to change our minds – to learn from the software we deliver and adapt – is crucial. And that all comes down to the cost of change. Over the last 25 years, it’s been the best predictor I’ve personally seen of long-term success or failure of software-dependent businesses. It’s the entropy of tech.

You may be a hugely successful business today – maybe even the leader in your market – but if the cost of changing your code is rising exponentially, all you’re really doing is market research for your more agile competitors.

Agile without Code Craft is not agile at all.

Big Test Set-Ups Don’t Necessarily Point to Design Problems

I was discussing what our test code can tell us about the design of our solutions this morning with a friend. It’s an interesting topic. The received wisdom is that big test set-ups mean that the class or module being tested has too many dependencies and is therefore almost certainly doing too much.

This is often the case, but not always. Let me illustrate with an example. Here’s an integration test for my Guitar Shack solution:

package com.guitarshack.integrationtests;
import com.guitarshack.*;
import com.guitarshack.net.RESTClient;
import com.guitarshack.net.RequestBuilder;
import com.guitarshack.net.Web;
import com.guitarshack.product.ProductData;
import com.guitarshack.sales.SalesData;
import com.guitarshack.sales.ThirtyDayAverageSalesRate;
import org.junit.Test;
import java.util.Calendar;
import static org.mockito.Matchers.any;
import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.verify;
/*
It's a good idea to have at least one test that wires together most or all
of the implementations of our interfaces to check that we haven't missed anything
*/
public class StockMonitorIntegrationTest {
@Test
public void alertShouldBeTriggered(){
Alert alert = mock(Alert.class);
StockMonitor monitor = new StockMonitor(
alert,
new ProductData(
new RESTClient(
new Web(),
new RequestBuilder())),
new LeadTimeReorderLevel(
new ThirtyDayAverageSalesRate(
new SalesData(
new RESTClient(
new Web(),
new RequestBuilder()
),
() > {
Calendar calendar = Calendar.getInstance();
calendar.set(2019, Calendar.AUGUST, 1);
return calendar.getTime();
}
)
)
)
);
monitor.productSold(811, 40);
verify(alert).send(any());
}
}

The set-up for this test is pretty big. Does that mean my StockMonitor class has too many dependencies? Let’s take a look.

public class StockMonitor {
private final Alert alert;
private final Warehouse warehouse;
private final ReorderLevel reorderLevel;
public StockMonitor(Alert alert, Warehouse warehouse, ReorderLevel reorderLevel) {
this.alert = alert;
this.warehouse = warehouse;
this.reorderLevel = reorderLevel;
}
public void productSold(int productId, int quantity) {
Product product = warehouse.fetchProduct(productId);
if(needsReordering(product, quantity))
alert.send(product);
}
private Boolean needsReordering(Product product, int quantitySold) {
return product.getStock() quantitySold <= reorderLevel.calculate(product);
}
}
view raw StockMonitor.java hosted with ❤ by GitHub

That actually looks fine to me. StockMonitor essentially does one job, and collaborates with three other classes in my solution. The rest of the design is hidden behind those interfaces.

In fact, the design is like that all the way through. Each class only does on job. Each class hides its internal workings behind small, client-specific interfaces. Each dependency is swappable by dependency injection. This code is highly modular.

When we look at the unit test for StockMonitor, we see a much smaller set-up.

public class StockMonitorTest {
@Test
public void alertSentWhenProductNeedsReordering() {
Alert alert = mock(Alert.class);
ReorderLevel reorderLevel = product1 > 10;
Product product = new Product(811, 11, 14);
Warehouse warehouse = productId > product;
StockMonitor monitor = new StockMonitor(alert, warehouse, reorderLevel);
monitor.productSold(811, 1);
verify(alert).send(product);
}
}

The nesting in the set-up for the integration test is a bit of clue here.

StockMonitor monitor = new StockMonitor(
alert,
new ProductData(
new RESTClient(
new Web(),
new RequestBuilder())),
new LeadTimeReorderLevel(
new ThirtyDayAverageSalesRate(
new SalesData(
new RESTClient(
new Web(),
new RequestBuilder()
),
() > {
Calendar calendar = Calendar.getInstance();
calendar.set(2019, Calendar.AUGUST, 1);
return calendar.getTime();
}
)
)
)
);

This style of object construction is what I call “Russian dolls”. The objects at the bottom of the call stack are injected into the objects one level up, which are injected into objects another level up, and so on. Each object only sees its direct collaborators, and the lower layers are hidden behind their interfaces.

This is a natural consequence of the way I test-drove my solution: from the outside in, solving one problem at a time and using stubs and mocks as placeholders for sub-solutions.

So the big set-up in my integration test is not a sign of a class that’s doing too much and a lack of separation of concerns, and that’s because it’s a “Russian dolls” set-up. if it was a “flat set-up”, where every object is passed in as a direct parameter of StockMonitor‘s constructor, then that would surely be a sign of StockMonitor doing too much.

So, big set-up != lack of modularity in certain cases. What about the other way around? Does a small set-up always mean no problems in the solution design?

Before Christmas I refuctored my Guitar Shack solution to create some practice “legacy code” for students to stretch their refactoring skills on.

public class StockMonitor {
private final Alert alert;
public StockMonitor(Alert alert) {
this.alert = alert;
}
public void productSold(int productId, int quantity) {
String baseURL = "https://6hr1390c1j.execute-api.us-east-2.amazonaws.com/default/product";
Map<String, Object> params = new HashMap<>() {{
put("id", productId);
}};
String paramString = "?";
for (String key : params.keySet()) {
paramString += key + "=" + params.get(key).toString() + "&";
}
HttpRequest request = HttpRequest
.newBuilder(URI.create(baseURL + paramString))
.build();
String result = "";
HttpClient httpClient = HttpClient.newHttpClient();
HttpResponse<String> response = null;
try {
response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
result = response.body();
} catch (IOException | InterruptedException e) {
e.printStackTrace();
}
Product product = new Gson().fromJson(result, Product.class);
Calendar calendar = Calendar.getInstance();
calendar.setTime(Calendar.getInstance().getTime());
Date endDate = calendar.getTime();
calendar.add(Calendar.DATE, 30);
Date startDate = calendar.getTime();
DateFormat format = new SimpleDateFormat("M/d/yyyy");
Map<String, Object> params1 = new HashMap<>(){{
put("productId", product.getId());
put("startDate", format.format(startDate));
put("endDate", format.format(endDate));
put("action", "total");
}};
String paramString1 = "?";
for (String key : params1.keySet()) {
paramString1 += key + "=" + params1.get(key).toString() + "&";
}
HttpRequest request1 = HttpRequest
.newBuilder(URI.create("https://gjtvhjg8e9.execute-api.us-east-2.amazonaws.com/default/sales" + paramString1))
.build();
String result1 = "";
HttpClient httpClient1 = HttpClient.newHttpClient();
HttpResponse<String> response1 = null;
try {
response1 = httpClient1.send(request1, HttpResponse.BodyHandlers.ofString());
result1 = response1.body();
} catch (IOException | InterruptedException e) {
e.printStackTrace();
}
SalesTotal total = new Gson().fromJson(result1, SalesTotal.class);
if(product.getStock() quantity <= (int) ((double) (total.getTotal() / 30) * product.getLeadTime()))
alert.send(product);
}
}
view raw StockMonitor.java hosted with ❤ by GitHub

Yikes!

I think it’s beyond any reasonable doubt that this class does too much. There’s almost no separation of concerns in this design.

Now, I didn’t write any unit tests for this (because “legacy code”), but I do have a command line program I can use the StockMonitor with for manual or shell script testing. Take a look at the set-up.

public class Program {
private static StockMonitor monitor = new StockMonitor(product > {
// We are faking this for now
System.out.println(
"You need to reorder product " + product.getId() +
". Only " + product.getStock() + " remaining in stock");
});
public static void main(String[] args) {
int productId = Integer.parseInt(args[0]);
int quantity = Integer.parseInt(args[1]);
monitor.productSold(productId, quantity);
}
}
view raw Program.java hosted with ❤ by GitHub

It’s pretty small. And that’s because StockMonitor‘s dependencies are nearly all hard-wired inside it. Ironically, lack of separation of concerns in this case means a simple interface and a tiny set-up.

So, big set-ups don’t always point to a lack of modularity, and small set-ups don’t always mean that that we have modularity in our design.

Of course, what the big set-up in our integration test does mean is that this test could fail for many reasons, in many layers of our call stack. So if all our tests have big set-ups, that in itself could spell trouble.

Explore the Guitar Shack source code

The Jason’s Guitar Shack kata – Part I (Core Logic)

This week, I’ve been coaching developers for an automotive client in Specification By Example (or, as I call it these days, “customer-driven TDD”).

The Codemanship approach to software design and development has always been about solving problems, as opposed to building products or delivering features.

So I cooked up an exercise that starts with a customer with a business problem, and tasked pairs to work with that customer to design a simple system that might solve the problem.

It seems to have gone well, so I thought I’d share the exercise with you for you to try for yourselves.

Jason’s Guitar Shack

I’m a retired international rock legend who has invested his money in a guitar shop. My ex-drummer is my business partner, and he runs the shop, while I’ve been a kind of silent partner. My accountant has drawn my attention to a problem in the business. We have mysterious gaps in the sales of some of our most popular products.

I can illustrate it with some data I pulled off our sales system:

DateTimeProduct IDQuantityPrice Charged
13/07/201910:477571549
13/07/201912:157571549
13/07/201917:238111399
14/07/201911:454491769
14/07/201913:378111399
14/07/201915:018111399
15/07/201909:267571549
15/07/201911:558111399
16/07/201910:3337411199
20/07/201914:074491769
22/07/201911:284491769
24/07/201910:178112798
24/07/201915:318111399
Product sales for 4 selected guitar models

Product 811 – the Epiphone Les Paul Classic in worn Cherry Sunburst – is one of our biggest sellers.

Epiphone Les Paul Classic Worn in Heritage Cherry | GAK

We sell one or two of these a day, usually. But if you check out the sales data, you’ll notice that between July 15th and July 24th, we didn’t sell any at all. These gaps appear across many product lines, throughout the year. We could be losing hundreds of thousands of pounds in sales.

After some investigation, I discovered the cause, and it’s very simple: we keep running out of stock.

When we reorder stock from the manufacturer or other supplier, it takes time for them to fulfil our order. Every product has a lead time on delivery to our warehouse, which is recorded in our warehouse system.

DescriptionPrice (£)StockRack SpaceManufacturer Delivery Lead Time (days)Min Order
Fender Player Stratocaster w/ Maple Fretboard in Buttercream54912201410
Fender Deluxe Nashville Telecaster MN in 2 Colour Sunburst769510215
Ibanez RG652AHMFX-NGB RG Prestige Nebula Green Burst (inc. case)119925601
Epiphone Les Paul Classic In Worn Heritage Cherry Sunburst39922301420
Product supply lead times for 4 selected guitars

My business partner – the store manager – typically only reorders stock when he realises we’ve run out (usually when a customer asks for it, and he checks to see if we have any). Then we have no stock at all while we wait for the manufacturer to supply more, and during that time we lose a bunch of sales. In this age of the Electric Internet, if we don’t have what the customer wants, they just take out their smartphone and buy it from someone else.

This is the business problem you are tasked with solving: minimise lost sales due to lack of stock.

There are some wrinkles to this problem, of course. We could solve it by cramming our warehouse full of reserve stock. But that would create a cash flow problem for the business, as we have bills to pay while products are gathering dust on our shelves. So the constraint here is, while we don’t want to run out of products, we actually want as few in stock as possible, too.

The second wrinkle we need to deal with is that sales are seasonal. We sell three times as much of some products in December as we do in August, for example. So any solution would need to take that into account to reduce the risk of under- or over-stocking.

So here’s the exercise for a group of 2 or more:

  • Nominate someone in your group as the “customer”. They will decide what works and what doesn’t as far as a solution is concerned.
  • Working with your customer, describe in a single sentence a headline feature – this is a simple feature that solves the problem. (Don’t worry about how it works yet, just describe what it does.)
  • Now, think about how your headline feature would work. Describe up to a maximum of 5 supporting features that would make the headline feature possible. These could be user-facing features, or internal features used by the headline feature. Remember, we’re trying to design the simplest solution possible.
  • For each feature, starting with the headline feature, imagine the scenarios the system would need to handle. Describe each scenario as a simple headline (e.g., “product needs restocking”). Build a high-level test list for each feature.
  • The design and development process now works one feature at a time, starting with the headline feature.
    • For each feature’s scenario, describe in more detail how that scenario will work. Capture the set-up for that scenario, the action or event that triggers the scenario, and the outcomes the customer will expect to happen as a result. Feel free to use the Given…When…Then style. (But remember: it’s not compulsory, and won’t make any difference to the end result.)
    • For each scenario, capture at least one example with test data for every input (every variable in the set-up and every parameter of the action or event), and for every expected output or outcome. Be specific. Use the sample data from our warehouse and sales systems as a starting point, then choose values that fit your scenario.
    • Working one scenario at a time, test-drive the code for its core logic using the examples, writing one unit test for each output or outcome. Organise and name your tests and test fixture so it’s obvious which feature, which scenario and which output or outcome they are talking about. Try as much as possible to choose names that appear in the text you’ve written with your customer. You’re aiming for unit tests that effectively explain the customer’s tests.
    • Use test doubles – stubs and mocks – to abstract external dependencies like the sales and warehouse systems, as well as to Fake It Until You Make it for any supporting logic covered by features you’ll work on later.

And that’s Part I of the exercise. At the end, you should have the core logic of your solution implemented and ready to incorporate into a complete working system.

Here’s a copy of the sample data I’ve been using with my coachees – stick close to it when discussing examples, because this is the data that your system will be consuming in Part II of this kata, which I’ll hopefully write up soon.

Good luck!

Readable Parameterized Tests

Parameterized tests (sometimes called “data-driven tests”) can be a useful technique for removing duplication from test code, as well as potentially buying teams much greater test assurance with surprisingly little extra code.

But they can come at the price of readability. So if we’re going to use them, we need to invest some care in making sure it’s easy to understand what the parameter data means, and to ensure that the messages we get when tests fail are meaningful.

Some testing frameworks make it harder than others, but I’m going to illustrate using some mocha tests in JavaScript.

Consider this test code for a Mars Rover:

it("turns right from N to E", () => {
let rover = {facing: "N"};
rover = go(rover, "R");
assert.equal(rover.facing, "E");
})
it("turns right from E to S", () => {
let rover = {facing: "E"};
rover = go(rover, "R");
assert.equal(rover.facing, "S");
})
it("turns right from S to W", () => {
let rover = {facing: "S"};
rover = go(rover, "R");
assert.equal(rover.facing, "W");
})
it("turns right from W to N", () => {
let rover = {facing: "W"};
rover = go(rover, "R");
assert.equal(rover.facing, "N");
})
view raw rover_test.js hosted with ❤ by GitHub

These four tests are different examples of the same behaviour, and there’s a lot of duplication (I should know – I copied and pasted them myself!)

We can consolidate them into a single parameterised test:

[{input: "N", expected: "E"}, {input: "E", expected: "S"}, {input: "S", expected: "W"},
{input: "W", expected: "N"}].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.input};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.expected);
})
})
view raw rover_test.js hosted with ❤ by GitHub

While we’ve removed a fair amount of duplicate test code, arguably this single parameterized test is harder to follow – both at read-time, and at run-time.

Let’s start with the parameter names. Can we make it more obvious what roles these data items play in the test, instead of just using generic names like “input” and “expected”?

[{startsFacing: "N", endsFacing: "E"}, {startsFacing: "E", endsFacing: "S"}, {startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

And how about we format the list of test cases so they’re easier to distinguish?

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

And how about we declutter the body of the test a little by destructuring the testCase object?

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function ({startsFacing, endsFacing}) {
it("turns right", () => {
let rover = {facing: startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

Okay, hopefully this is much easier to follow. But what happens when we run these tests?

It’s not at all clear which test case is which. So let’s embed some identifying data inside the test name.

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function ({startsFacing, endsFacing}) {
it(`turns right from ${startsFacing} to ${endsFacing}`, () => {
let rover = {facing: startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

Now when we run the tests, we can easily identify which test case is which.

With a bit of extra care, it’s possible with most unit testing tools – not all, sadly – to have our cake and eat it with readable parameterized tests.

Codemanship Code Craft Videos

Over the last 6 months, I’ve been recording hands-on tutorials about code craft – TDD, design principles, refactoring, CI/CD and more – for the Codemanship YouTube channel.

I’ve recorded the same tutorials in JavaScript, Java, C# and (still being finished) Python.

As well as serving as a back-up for the Codemanship Code Craft training course, these series of videos forms possibly the most comprehensive free learning resource on the practices of code craft available anywhere.

Each series has over 9 hours of video, plus links to example code and other useful resources.

Codemanship Code Craft videos currently available

I’ve heard from individual developers and teams who’ve been using these videos as the basis for their practice road map. What seems to work best is to watch a video, and then straight away try out the ideas on a practical example (e.g., a TDD kata or a small project) to see how they can work on real code.

In the next few weeks, I’ll be announcing Codemanship Code Craft Study Groups, which will bring groups of like-minded learners together online once a week to watch the videos and pair program on carefully designed exercises with coaching from myself.

This will be an alternative way of receiving our popular training, but with more time dedicated to hands-on practice and coaching, and more time between lessons for the ideas to sink in. It should also be significantly less disruptive than taking a whole team out for 3 days for a regular training course, and significantly less exhausting than 3 full days of Zoom meetings! Plus the price per person will be the same as the regular Code Craft course.

Slow Tests Kill Businesses

I’m always surprised at how few organisations track some pretty fundamental stats about software development, because if they did then they might notice what’s been killing their business.

It’s a picture I’ve seen many, many times; a software product or system is created, and it goes live. But it has bugs. Many bugs. So, a bigger chunk of the available development time is used up fixing bugs for the second release. Which has even more bugs. Many, many bugs. So an even bigger chunk of the time is used to fix bugs for the third release.

It looks a little like this:

Over the lifetime of the software, the proportion of development time devoted to bug fixing increases until that’s pretty much all the developers are doing. There’s precious little time left for new features.

Naturally, if you can only spare 10% of available dev time for new features, you’re going to need 10 times as many developers. Right? This trend is almost always accompanied by rapid growth of the team.

So the 90% of dev time you’re spending on bug fixing is actually 90% of the time of a team that’s 10x as large – 900% of the cost of your first release, just fixing bugs.

So every new feature ends up in real terms costing 10x in the eighth release what it would have in the first. For most businesses, this rules out change – unless they’re super, super successful (i.e., lucky). It’s just too damned expensive.

And when you can’t change your software and your systems, you can’t change the way you do business at scale. Your business model gets baked in – petrified, if you like. And all you can do is throw an ever-dwindling pot of money at development just to stand still, while you watch your competitors glide past you with innovations you’ll never be able to offer your customers.

What happens to a business like that? Well, they’re no longer in business. Customers defected in greater and greater numbers to competitor products, frustrated by the flakiness of the product and tired of being fobbed off with promises about upgrades and hotly requested features and fixes that never arrived.

Now, this effect is entirely predictable. We’ve known about it for many decades, and we’ve known the causal mechanism, too.

Source: IBM System Science Institute

The longer a bug goes undetected, exponentially the more it costs to fix. In terms of process, the sooner we test new or changed code, the cheaper the fix is. This effect is so marked that teams actually find that if they speed up testing feedback loops – testing earlier and more often – they deliver working software faster.

This is very simply because they save more time downstream on bug fixes than they invest in earlier and more frequent testing.

The data used in the first two graphs was taken from a team that took more than 24 hours to build and test their code.

Here’s the same stats from a team who could build and test their code in less than 2 minutes (I’ve converted from releases to quarters to roughly match the 12-24 week release cycles of the first team – this second team was actually releasing every week):

This team has nearly doubled in size over the two years, which might sound bad – but it’s more of a rosy picture than the first team, whose costs spiraled to more than 1000% of their first release, most of which was being spent fixing bugs and effectively going round and round in circles chasing their own tails while their customers defected in droves.

I’ve seen this effect repeated in business after business – of all shapes and sizes: software companies, banks, retail chains, law firms, broadcasters, you name it. I’ve watched $billion businesses – some more than a century old – brought down by their inability to change their software and their business-critical systems.

And every time I got down to the root cause, there they were – slow tests.

Every. Single. Time.

Is Your Agile Transformation Just ‘Agility Theatre’?

I’ve talked before about what I consider to be the two most important feedback loops in software development.

When I explain the feedback loops – the “gears” – of Test-Driven Development, I go to great pains to highlight which of those gears matter most, in terms of affecting our odds of success.

tdd_gears

Customer or business goals drive the whole machine of delivery – or at least, they should. We are not done because we passed some acceptance tests, or because a feature is in production. We’re only done when we’ve solved the customer’s problem.

That’s very likely going to require more than one go-around. Which is why the second most important feedback loop is the one that establishes if we’re good to go for the next release.

The ability to establish quickly and effectively if the changes we made to the software have broken it is critical to our ability to release it. Teams who rely on manual regression testing can take weeks to establish this, and their release cycles are inevitably very slow. Teams who rely mostly on automated system and integration tests have faster release cycles, but still usually far too slow for them to claim to be “agile”. Teams who can re-test most of the code in under a minute are able to release as often as the customer wants – many times a day, if need be.

The speed of regression testing – of establishing if our software still works – dictates whether our release cycles span months, weeks, or hours. It determines the metabolism of our delivery cycle and ultimately how many throws of the dice we get at solving the customer’s problem.

It’s as simple as that: faster tests = more throws of the dice.

If the essence of agility is responding to change, then I conclude that fast-running automated tests lie at the heart of that.

What’s odd is how so many “Agile transformations” seem to focus on everything but that. User stories don’t make you responsive to change. Daily stand-ups don’t make you responsive to change. Burn-down charts don’t make you responsive to change. Kanban boards don’t make you responsive to change. Pair programming doesn’t make you responsive to change.

It’s all just Agility Theatre if you’re not addressing the two must fundamental feedback loops, which the majority of organisations simply don’t. Their definition of done is “It’s in production”, as they work their way through a list of features instead of trying to solve a real business problem. And they all too often under-invest in the skills and the time needed to wrap software in good fast-running tests, seeing that as less important than the index cards and the Post-It notes and the Jira tickets.

I talk often with managers tasked with “Agilifying” legacy IT (e.g., mainframe COBOL systems). This means speeding up feedback cycles, which means speeding up delivery cycles, which means speeding up build pipelines, which – 99.9% of the time – means speeding up testing.

After version control, it’s #2 on my list of How To Be More Agile. And, very importantly, it works. But then, we shouldn’t be surprised that it does. Maths and nature teach us that it should. How fast do bacteria or fruit flies evolve – with very rapid “release cycles” of new generations – vs elephants or whales, whose evolutionary feedback cycles take decades?

There are two kinds of Agile consultant: those who’ll teach you Agility Theatre, and those who’ll shrink your feedback cycles. Non-programmers can’t help you with the latter, because the speed of the delivery cycle is largely determined by test execution time. Speeding up tests requires programming, as well as knowledge and experience of designing software for testability.

70% of Agile coaches are non-programmers. A further 20% are ex-programmers who haven’t touched code for over a decade. (According to the hundreds of CVs I’ve seen.) That suggests that 90% of Agile coaches are teaching Agility Theatre, and maybe 10% are actually helping teams speed up their feedback cycles in any practical sense.

It also strongly suggests that most Agile transformations have a major imbalance; investing heavily in the theatre, but little if anything in speeding up delivery cycles.

Introduction to Test-Driven Development Video Series

Over the last month, I’ve been recording screen casts introducing folks to the key ideas in TDD.

Each series covers 6 topics over 7 videos, with practical demonstrations instead of slides – just the way I like it.

They’re available in the four most used programming languages today:

Of course, like riding a bike, you won’t master TDD just by watching videos. You can only learn TDD by doing it.

On our 2-day TDD training workshop, you’ll get practical, hands-on experience applying these ideas with real-time guidance from me.