‘Agility Theatre’ Keeps Us Entertained While Our Business Burns

I train and coach developers and teams in the technical practices of Agile Software Development like Test-Driven Development, Refactoring and Continuous Integration. I’m one of a rare few who exclusively does that. Clients really struggle to find Agile technical coaches these days.

There seems to be no shortage of help on the management practices and the process side of Agile, though. That might be a supply-and-demand problem. A lot of “Agile transitions” seem to focus heavily on those aspects, and the Agile coaching industry has risen to meet that demand with armies of certified bods.

I’ve observed, though, that without effective technical practices, agility eludes those organisations. You can have all the stand-ups and planning meetings and burn-down charts and retrospectives you like, but if your teams are unable to rapidly and sustainably evolve your software, it amounts to little more than Agility Theatre.

Agility Theatre is when you have all the ceremony of Agile Software Development, but none of the underlying technical discipline. It’s a city made of chipboard facades, painted to look like the real thing to the untrained eye from a distance.

In Agile Software Development, there’s one metric that matters: how much does it cost to change our minds? That’s kind of the point. In this rapidly changing, constantly evolving world, the ability to adapt matters. It matters more than executing a plan. Because plans don’t last long in the 21st century.

I’ve watched some pretty big, long-established, hugely successful companies brought down ultimately by their inability to change their software and core systems.

And I’ve measured the difference the technical practices can make to that metric.

Teams who write automated tests after the code being tested tend to find that the cost of changing their software rises exponentially over the average lifespan of 8 years. I know exactly what causes this. Test-after tends to produce a surfeit of tests that hit external dependencies like databases and web services, and test suites that run slow.

If your tests run slow, then you’ll test less often, which means bugs will be caught later, when they’re more expensive to fix.

Teams whose test suites run slow end up spending more and more of their time – and your money – fixing bugs. Until, one day, that’s pretty much all they’re doing.

Teams who write their tests first have a tendency to end up with fast-running test suites. It’s a self-fulfilling prophecy – using unit tests as specifications unsurprisingly produces code that is inherently more unit-testable, as we’re forced to stub and mock those expensive external dependencies.

This means teams that go test-first can test more frequently, catching bugs much sooner, when they’re orders of magnitude cheaper to fix. Teams who go test-first spend a lot less time fixing bugs.

The upshot of all this is that teams who go test-first tend to have a much shallower cost-of-change curve, allowing them sustain the pace of software evolution for longer. Basically, they outrun the test-after teams.

Now, I’m not going to argue that breaking work down into smaller batch sizes and scheduling deliveries more frequently can’t make a difference. But what I will argue is that if the technical discipline is lacking, all that will do is enable you to observe – in almost real time – the impact of a rising cost of change.

You’ll be in a car, focusing on where to go next, while your Miles Per Gallon rises exponentially. You reach a point where the destination doesn’t matter, because you ain’t going nowhere.

As the cost of changes rises, it piles on the risk of building the wrong thing. Trying to get it right first time is antithetical to an evolutionary approach. I’ve worked with analysts and architects who believed they could predict the value of a feature set, and went to great lengths to specify the Right Thing. In the final reckoning, they were usually out by a country mile. No matter how hard we try to predict the market, ultimately it’s all just guesswork until our code hits the real world.

So the ability to change our minds – to learn from the software we deliver and adapt – is crucial. And that all comes down to the cost of change. Over the last 25 years, it’s been the best predictor I’ve personally seen of long-term success or failure of software-dependent businesses. It’s the entropy of tech.

You may be a hugely successful business today – maybe even the leader in your market – but if the cost of changing your code is rising exponentially, all you’re really doing is market research for your more agile competitors.

Agile without Code Craft is not agile at all.

The Trouble With Objects

A perennial debate that I enjoy wading into is the classic “Should it be kick(ball) or ball.kick()?”

This seems to reveal a fundamental dichotomy in our shared understanding of Object-Oriented Programming.

It’s a trick question, of course. If the effect is the same – the displacement of the ball – then kick(ball) and ball.kick() mean exactly the same thing. But the debate rages around who is doing the kicking and who is being kicked.

Many programmers quite naturally assign agency to objects, and object (pun intended) to the ball kicking itself. Balls don’t kick themselves! They will often counter with “It should be player.kick(ball)“.

But this can lead us down the rabbit hole to distinctly non-OO code. Taking an example from a Codemanship training course about an online CD warehouse, the same question comes about whether it should be cd.buy() or warehouse.buy(cd).

Again, the protestation is that “CDs don’t buy themselves!” I can completely understand why students might think this, having had it drummed into us that objects do work. (Although why nobody ever objects that “Warehouses don’t buy CDs!” is one of life’s little mysteries.)

I’m the first person to say that object design should start with the work. Then we figure out what data is required to do that work. Put the data with the work. And, hey presto, you got an object. Assign one job to each object, and get them talking to each other to coordinate bigger jobs, and – hey presto! – you got OOP.

(The art of OOP is really in deciding where to put the work, and that’s what this debate is essentially all about.)

But warehouse.buy(cd) – in the training exercise we do – can lead us into deep water regarding encapsulation. The are told that the effect of buying a CD is that the stock count of that CD goes down, and that the customer’s credit card is charged the price of that CD.

So our test looks a bit like this:

public class WarehouseTest {
public void buyCd(){
CD cd = new CD(10, 9.99);
CreditCard card = mock(CreditCard.class);
Warehouse warehouse = new Warehouse();
warehouse.buy(cd, 1, card);
assertEquals(9, cd.getStock());

The implementation that passes this test suffers from a distinct case of Feature Envy between Warehouse and CD, because buying a CD requires access to a CD’s stock and price.

public class Warehouse {
public void buy(CD cd, int quantity, CreditCard card) {
card.charge(cd.getPrice() * quantity);
cd.setStock(cd.getStock() – quantity);
view raw Warehouse.java hosted with ❤ by GitHub

When we refactor this code to eliminate the Feature Envy (i.e., to encapsulate the work)…

…we end up with a CD that – shock, horror! – buys itself!

public class CD {
private int stock;
private final double price;
public CD(int stock, double price) {
this.stock = stock;
this.price = price;
public int getStock() {
return stock;
public void buy(int quantity, CreditCard card) {
card.charge(price * quantity);
stock = stockquantity;
view raw CD.java hosted with ❤ by GitHub

This refactoring is typically followed by “But… but….”. Placing this behaviour inside the CD class conflicts with our mental model of the world. CD’s don’t buy themselves!

And yet we encounter objects apparently doing things to themselves in OO libraries all the time: lists that filter themselves, database connections that open themselves, files that read themselves.

And that’s what’s meant by “object-oriented”. The CD is the thing being bought. It’s the object of the buy action. In OOP, we put the object first, followed by the action. Read cd.buy() not as “the CD buys” but as “buy this CD”.

Millions of people around the world read OO code the wrong way around. The ones who tend to grock that it’s object-oriented are those of us who’ve had to approximate OOP in non-OO languages – particularly C. (Check out previous posts about encapsulating in C and applying SOLID principles to C code.)

Without the benefit of an OO syntax, we resort to defining all the functions that apply to a type of data structure in one place, and the first parameter to every function is a pointer to an instance of that structure, usually named this.

Then we might hide the data definition of the structure – just declaring its type in our .h file – in the same .c implementation file, so only those functions can access the data. Then we might define a table of virtual functions – a “v-table” – that can be applied to that data structure, and attach the data structure to them so that clients can invoke functions on instances of the data structure. Is this all starting to sound familiar?

The set of operations defined by a class are the operations that can be applied to that object.

In reality, objects don’t do work. The CPU does. The object identifies the thing – the record in memory – to or with which the work is to be done. That’s literally how object-oriented programming works. cd.buy() means “apply the buy() function to this CD”. list.filter() means “filter this list”. file.read() means “read this file”.

The idea of objects doing work, and passing messages to each other to coordinate larger pieces of work – “collaborations” – is a metaphor. And it works just fine once you let go of the idea that balls don’t kick themselves.

But words are powerful things, and in programming especially, they can get tangled in our mental models of how the problem domain works in the real world. In the real world, only life has agency (well, maybe). Most things are acted upon. So we have a natural tendency to separate agency from data, and this leads us to oodles and oodles of Feature Envy.

I learned to read object-oriented code as “do that to this” a long time ago, and it therefore has no conflict with my mental model of the world. The CD isn’t buying. The CD is bought.



I’ve been very much enjoying the ensuing furore that suggesting ball.kick() means “kick the ball” inevitably starts. The fun part is reading the “better” designs folk come up with to avoid accepting that.

player.kick(ball) is one of the most popular. Note now that we have two classes instead of one to achieve the same outcome.

Likewise, cd.buy() seems to have offended the design senses of some. It should be cart.add(cd), they say. Again, we now have two classes involved, and also the CD didn’t actually get bought yet. And it also kind of proves my point, because the CD is being add to the cart.

On a more general note, when students go down the warehouse.buy(cd) route, I ask them why the warehouse needs to be involved if we know which CD we’re buying.

object.action() tends to simplify things.

How To (And Not To) Use Interfaces

If you’re writing code in a language that supports the concept of interfaces – or variants on the theme of pure abstract types with no implementation – then I can think of several good reasons for using them.


There are often times when our software needs the ability to perform the same task in a variety of ways. Take, for example, calculating the area of a room. This code generates quotes for fitted carpets based on room area.

double quote(double pricePerSqMtr, Room room) {
double area = room.area();
return pricePerSqMtr * Math.ceil(area);
view raw Carpet.java hosted with ❤ by GitHub

Rooms can have different shapes. Some are rectangular, so the area is the width multiplied by the length. Some are even circular, where the area is π r².

We could have a big switch statement that does a different calculation for each room shape, but every time we want to add new shapes to the software, we have to go back and modify it. That’s not very extensible. Ideally, we’d like to be able to add new room shapes without changing our lovely tested existing code.

If we define an interface for calculating the area, then we can easily have multiple implementations that our client code binds to dynamically.

public interface Room {
double area();
public class RectangularRoom implements Room {
private final double width;
private final double length;
public RectangularRoom(double width, double length) {
this.width = width;
this.length = length;
public double area() {
return width * length;
public class CircularRoom implements Room {
private final double radius;
public CircularRoom(double radius) {
this.radius = radius;
public double area() {
return PI * Math.pow(radius, 2);
view raw Room.java hosted with ❤ by GitHub

Hiding Things

Consider a class that has multiple features for various purposes (e.g., for testing, or for display).

public class Movie {
private final String title;
private int availableCopies = 1;
private List<Member> onLoanTo = new ArrayList<>();
public Movie(String title){
this.title = title;
public void borrowCopy(Member member){
availableCopies -= 1;
public void returnCopy(Member member){
public String getTitle() {
return title;
public int getAvailableCopies() {
return availableCopies;
public Boolean isOnLoanTo(Member member) {
return onLoanTo.contains(member);
view raw Movie.java hosted with ❤ by GitHub

Then consider a client that only needs a subset of those features.

public class LoansView {
private Member member;
private Movie selectedMovie;
public LoansView(Member member, Movie selectedMovie){
this.member = member;
this.selectedMovie = selectedMovie;
public void borrowMovie(){
public void returnMovie(){
view raw LoansView.java hosted with ❤ by GitHub

We can use client-specific interfaces to hide features for clients who don’t need to (or shouldn’t) use them, simplifying the interface and protecting clients from changes to features they never use.

public interface Loanable {
void borrowCopy(Member member);
void returnCopy(Member member);
public class Movie implements Loanable {
private final String title;
private int availableCopies = 1;
private List<Member> onLoanTo = new ArrayList<>();
public Movie(String title) {
this.title = title;
public void borrowCopy(Member member) {
availableCopies -= 1;
public void returnCopy(Member member) {
public String getTitle() {
return title;
public int getAvailableCopies() {
return availableCopies;
public Boolean isOnLoanTo(Member member) {
return onLoanTo.contains(member);
public class LoansView {
private Member member;
private Loanable selectedMovie;
public LoansView(Member member, Loanable selectedMovie) {
this.member = member;
this.selectedMovie = selectedMovie;
public void borrowMovie(){
public void returnMovie(){
view raw Loanable.java hosted with ❤ by GitHub

In languages with poor support for encapsulation, like Visual Basic 6.0, we can use interfaces to hide what we don’t want client code to be exposed to instead.

Many languages support classes or modules implementing multiple interfaces, enabling us to present multiple client-specific views of them.

Faking It ‘Til You Make It

There are often times when we’re working on code that requires some other problem to be solved. For example, when processing the sale of a CD album, we might need to take the customer’s credit card payment.

Instead of getting caught in a situation where we have to solve every problem to deliver or test a feature, we can instead use interfaces as placeholders for those parts of the solution, defining explicitly what we expect that class or module to do without the need to write the code to make it do it.

public interface Payments {
Boolean process(double amount, CreditCard card);
public class BuyCdTest {
private Payments payments;
private CompactDisc cd;
private CreditCard card;
public void setUp() {
payments = mock(Payments.class);
when(payments.process(any(), any())).thenReturn(true); // payment accepted
cd = new CompactDisc(10, 9.99);
card = new CreditCard(
public void saleIsSuccessful(){
cd.buy(1, card);
assertEquals(9, cd.getStock());
public void cardIsChargedCorrectAmount(){
cd.buy(2, card);
verify(payments).process(19.98, card);
view raw Payments.java hosted with ❤ by GitHub

Using interfaces as placeholders for parts of the design we’re eventually going to get to – including external dependencies – is a powerful technique that allows us to scale our approach. It also tends to lead to inherently more modular designs, with cleaner separation of concerns. CompactDisc need not concern itself with how payments are actually being handled.

Describing Protocols

In statically-typed languages like Java and C#, if we say that an implementation must implement a certain interface, then there’s no way of getting around that.

But in dynamic languages like Python and JavaScript, things are different. Duck typing allows us to present client code with any implementation of a method or function that matches the signature of what the client invokes at runtime. This can be very freeing, and can cut out a lot of code clutter, as there’s no need to have lots of interfaces defined explicitly.

It can also be dangerous. With great power comes great responsibility (and hours of debugging!) Sometimes it’s useful to document that fact that, say, a parameter needs to look a certain way.

In those instances, experienced programmers might define a class – since Python, for example, doesn’t have interfaces – that has no implementation, that developers are instructed to extend and override when they create their implementations. Think of an interface in Python as a class that only defines methods that you must only override.

A class that processes sales of CD albums might need a way to handle payments through multiple different payment processors (e.g., Apple Pay, PayPal etc). The code that invokes payments defines a contract that any payment processor must fulfil, but we might find it helpful to document exactly what that interface looks like with a base class.

class Payments(object):
def pay(self, credit_card, amount):
raise Exception("This an an abstract class")
view raw payments.py hosted with ❤ by GitHub

Type hinting in Python enables us to make it clear that any object passed in as the payments constructor parameter should extend this class and override its method.

class CompactDisc(object):
def __init__(self, stock, price, payments: Payments):
self.payments = payments
self.price = price
self.stock = stock
def buy(self, quantity, credit_card):
self.stock -= quantity
self.payments.pay(credit_card, self.price)
view raw compact_disc.py hosted with ❤ by GitHub

You can do this in most dynamic languages, but the usefulness of explicitly defining abstractions in Python is acknowledged by the widely-used Abstract Base Class (ABC) Python library that enforces their rules.

from abc import ABC, abstractmethod
class Payments(ABC):
def pay(self, credit_card, amount):
view raw payments.java hosted with ❤ by GitHub

So, from a design point of view, interfaces are really jolly useful. They can make our lives easier in a variety of ways, and are very much the key to achieving clean separation of concerns on modular systems, and to scaling our approach to software developer.

But they can also have their downsides.

How Not To Use Interfaces

Like all useful things, interfaces can be overused and abused. For every code base I see where there are few if any interfaces, I see one where everything has an interface, regardless of motive.

When is separation of concerns not separation of concerns?

If an interface does not provide polymorphism (i.e., there’s only ever one implementation), and does not hide features, and is not a placeholder for something you’re Faking Until You’re Making, and describes no protocol that isn’t already explicitly defined by the class that implements it, then you can clutter up your code base with large amounts of useless indirection.

In real code bases of the order of tens or hundreds of thousands, or even millions, of lines of code, classes tend to cluster. As our code grows, we may split out multiple helper classes that are intimately tied together – if one changes, they all change – by the job they collaborate to do.

A better design acknowledges these clusters and packages them together behind a simple public interface. Think of each of these packages as being like an internal microservice. (They may literally be microservices, of course. But even if they’re all released in the same component, we can treat them as internal microservices.)

Hide clusters of classes that change together behind simple interfaces

In practising outside-in Test-Driven Development, I will use interfaces to stub or mock solutions to other problems to separate those concerns from the problem I’m currently solving. So I naturally introduce interfaces within an architecture.

But I also refactor quite mercilessly, and many problems require more than one class or module to solve them. These will emerge through the refactoring process, and they tend to stay hidden behind their placeholder interfaces.

(Occasionally I’ll introduce an interface as part of a refactoring because it solves one of the problems described above and adds value to the design.)

So, interfaces – useful and powerful. But don’t overdo it.

New Refactoring: Remove Feature

If you’re using a modern, full-featured IDE like IntelliJ or Rider, you may have used an automated refactoring called Safe Delete. This is a jolly handy thing that I use often. If, say, I want to delete a Java class file, it will search for any references to that class first. If the class is still be used, then it will warn me.

Occasionally, I want to delete a whole feature, though. And for this, I am imagining a new refactoring which I’m calling – wait for it! – Remove Feature. Say what you see.

Let’s say I want to delete a public method in a class, like rentFor() in this Video class.

public void rentFor(Customer customer) throws CustomerUnderageException {
throw new CustomerUnderageException();

view raw


hosted with ❤ by GitHub

Like Safe Delete, first we would look for any references to rentFor() in the rest of the code. If there are no references, it will not only delete rentFor(),  but also any other features of this and other classes that rentFor() uses – but only if they’re not being used anywhere else. It’s a recursive Safe Delete.

isUnderAge() is only used by rentFor(), so that would be deleted. CustomerUnderageException is also only used by rentFor(), so that too would be deleted. And finally, the addRental() method of the Customer class is only used here, so that would also go.

This is a recursive refactoring, so we’d also look inside isUnderAge(), CustomerUnderageException and addRental() to see if there’s anything they’re using in our code that can be Safe Deleted.

In addRental(), for example:

public List<Video> getRentedVideos() {
return rentals;
public void addRental(Video video) {

view raw


hosted with ❤ by GitHub

…we see that it uses a rentals collection field. This field is also used by getRentedVideos(), so it can’t be Safe Deleted. Our recursion would end here.

If the method or function we’re removing is the only public (or exported) feature in that class or module, then the module would be deleted also. (Since private/non-exported features can’t be used anywhere else).

But if customers can’t rent videos, what’s the purpose of this field? Examining my tests reveals that the getter only exists to test rentFor().

public void customerMustBeOverTwelveToRentAVideoRatedTwelve() throws Exception {
Customer customer = new Customer(null, null, "2012-01-01");
Video video = new Video(null, Rating.TWELVE);
public void videoRentedByCustomerOfLegalAgeIsAddedToCustomersRentedVideos() throws Exception {
Customer customer = new Customer(null, null, "1964-01-01");
Video video = new Video(null, Rating.TWELVE);

view raw


hosted with ❤ by GitHub

One could argue that the feature isn’t defined by the API, but by the code that uses the API – in this case, the test code reflects the set of public methods that make up this feature.

So I might choose to Remove Feature by selecting one or more tests, identifying what they use from the public API, and recursively Safe Deleting each method before deleting those tests.

Test-Driven Development in JavaScript

I’m in the process of redesigning elements of the Codemanship training workshops, and I’ve been spit-balling new demos in JavaScript on TDD. Rather than taking copious notes, I’ve recorded screencasts of these demos so I can refer back and see what I actually did in each one.

I thought it might be useful to post these screencasts online, so if you’re a JS developer – or have ambitions to be one (TDD is a sought-after skill) – here they are.

I’ve strived for each demonstration to make three key points to remember.

#1 – The 3 Steps of TDD

  • Start by writing a test that fails
  • Write the simplest code to pass the test
  • Refactor to make changing the code easier


#2 – Assert First & Useful Tests

  • Write the test assertion first and work backwards to the setup
  • See the test fail before you make it pass
  • Tests should only have one reason to fail


#3 – What Should We Test?

  • List your tests
  • Test meaningful behaviour and let those tests drive design details, not the other way around
  • When the implementation is obvious, just write it


#4 – Duplication & The Rule of Three

  • Removing duplication to reveal abstractions
  • The Rule Of Three
  • When to leave duplicate code in


#5 – Part I – Inside-Out TDD

  • Advantage: tests pinpoint failures better in the stack
  • Drawbacks
    • Risk the pieces don’t fit together
    • Tests are coupled closely to internal design


#5 – Part II – Outside-In TDD

  • Advantages
    • Pieces guaranteed to fit together
    • Test code more decoupled from internal design
  • Disadvantage: tests don’t pinpoint source of failure easily


#6 – Stubs, Mocks & Dummies

  • Writing unit tests with external dependencies using:
    • Stubs to return test data
    • Mocks to test that messages were sent
    • Dummies as placeholders so we can run the tests
  • Driving complex multi-layered designs from the outside in using stubs, mocks and dummies
    • Advantage: pieces guaranteed to fit and tests pinpoint sources of failure better
    • Risk: (not discussed in video) excessive use of test doubles un-encapsulates details of internal design, tightly coupling test code to implementation
  • More unit-testable code – achieved with dependency injection – tends to lead to more modular architectures


These videos are rough and ready first attempts, but I think you may find the useful as they are if you’re new to TDD.

I’ll be doing versions of these in Python soon.

Code Craft Bootstrapped

I’ll be blogging about this soon, but just wanted to share some initial thoughts on a phenomenon I’ve observed in very many development teams. A lot of teams confuse their tools with associated practices.

“We do TDD” often really means “We’re using JUnit”. “We refactor” often means “We use Resharper”. “We do CI” often means “We’re using Jenkins”. And so on.

As two current polls I’m running strongly suggest, a lot of teams who think they’re doing Continuous Integration appear to develop on long-lived branches (e.g., “feature branches”). But because they’re using the kind of tools we associate with CI, they believe that’s what they’re doing.

This seems to me to be symptomatic of our “solution first” culture in software development. Here’s a solution. Solution to what, exactly? We put the cart before the horse, adopting, say, Jenkins before we think about how often we merge our changes and how we can test those merges frequently to catch conflicts and configuration problems earlier.

Increasingly, I believe that developers should learn the practices first – without the tools. It wasn’t all that long ago when many of these tools didn’t exist, after all. And all the practices predate the tools we know today. You can write automated tests in a main() method, for example, and learn the fundamentals of TDD without a unit testing framework. (Indeed, as you refactor the test code, you may end up discovering a unit testing framework hiding inside the duplication.)

Talking of refactoring, once upon a time we had no automated refactoring tools beyond cut, copy and paste and maybe Find/Replace. Maybe developers will grok refactoring better if they start learning to do refactorings the old-school way?

And for many years we automated our builds using shell scripts. Worked just fine. We just got into the habit of running the script on the build machine every time we checked in code.

These tools make it easier to apply these practices, and help us scale them up by taking out a lot of donkey work. But I can’t help wondering if starting without them might help us focus on the practices initially, as well as maybe helping us to develop a real appreciation for how much they can help – when used appropriately.

Automated Tests Aren’t Just For The Long-Term

Something I hear worryingly often still is teams – especially managers – saying “Oh, we don’t need to automate our tests because there’s only going to be one release.”

The perceived wisdom is that investing in fast-running automated tests is only worth it if the software’s going to have a long lifespan, with many subsequent releases. (This is a sentiment often expressed about code craft in general.)

The assumption is that fast-running unit tests have less – or zero – value in the short-to-medium term. But this is easily disproved.

Ask ourselves what we need fast-running tests for in the first place? To guard against regressions when we change the code. The inexperienced team or manager might argue that “we won’t be changing the code, because there’s only going to to be one release”.

Analysis by GitLab’s data sciences team clearly shows that code churn – when classified as code that changes within 2-3 weeks of being checked in – for the average team runs at about 25%. An average team of, say, four developers might check in 10,000 LOC on a 12-week release schedule. 2,500 lines of that code will change within 2-3 weeks. That’s a lot of changes.

And that’s normal. Expect it.

This is before we take into account the many changes a programmer will make to code before they check it in. If only tested my code when it was time to check it in, I think I’d really struggle.

It’s a question of batch size. If I make one change and then re-test, and I’ve broken something, it’s much, much easier to pinpoint what’s gone wrong. And it’s way, way easier to get back to code that works. If I make 100 changes and re-test, I’m probably going to end up knee-deep in the debugger and up to me neck in print statements, and reverting to the last working copy means losing a tonne of work.

So I test pretty much continuously, and find even on relatively small projects that my hide gets saved multiple times by having these tests.

Change is much easier with fast-running tests, and change is a normal part of delivery.

And then there’s the whole question of whether it really will be the only release of the software. Experience has taught me that if software gets used, it gets changed. The only one-shot deals I’ve experienced in harumpty-twelve years of writing software have been the unsuccessful ones.

Imagine we’re asked to dig out an underground shelter for our customer. They tell us they need a chamber 8 ft x 8 ft x 6 ft – big enough for a bed – and we dutifully start digging. Usually, we would put up wooden supports as we dig, to stop the chamber from caving in. “No need”, says the customer. “It’s only one room, and we’ll only use it once.”

So, we don’t put in any supports. And that makes completing the chamber harder, because it keeps caving in due to the vibrations of our ongoing excavations. For every cubic metre of dirt we excavate, we end up digging out another half a cubic metre from the cave-ins. But we get there in the end, and the customer pays us our money and moves their bed in.

Next week, we get a phone call. “Where do we keep our food supplies?” Turns out, they’ll need another room. Would they like us to put supports up in the main chamber before we start digging again? “No time! We need our food store ASAP.” Okey dokey. We start digging gain, and the existing chamber starts caving in again, but we dig out the loose earth and carry on as best we can. We manage to get the food store done, but with a lot more work this time, because both spaces keep caving in, and we keep having to dig them out again and again, recreating spaces we’d already excavated several times.

The customer moves in their food supplies, but their elderly mother now refuses to go into the shelter because she’s not sure it’s safe.

A week later: “Oh hi. Er. Where do we go to the bathroom?” Work begins on a third chamber. Would they like us to put supports in to the other two chambers first? “No. Need a bathroom ASAP!!!” they exclaim with a rather pained expression. So we dig and dig and dig, now so tired that we barely notice that most of the space we’re excavating has been excavated before, and most of the earth we’re removing has been coming from the ceilings of the existing chambers as well as from the new bathroom.

This is what it is to work without fast-running tests. Even on small, one-shot deals of just a few days, regressions can become a major expense, quickly outweighing the cost of writing tests in the first place.

When Should We Do Code Reviews?

One question that I get asked often is “When is the best time to do code reviews?” My pithy answer is: now. And now. And now. Yep, and now.

Typically, teams batch up a whole bunch of design decisions for a review – for example, in a pull request. If we’ve learned anything about writing good software, it’s that the bigger the batch, the more slips through the quality control net.

Releasing 50 features at a time, every 12 months, means we tend to bring less focus to testing each feature to see if it’s what the customer really needs. Releasing one feature at a time allows us to really focus in on that feature, see how it gets used, see how users respond to it.

Reviewing 50 code changes at a time gives similarly woolly results. A tonne of code smells tend to make it into production. Reviewing a handful of code changes – or, ideally, just one – at a time brings much more focus to each change.

Unsurprisingly, teams who review code continuously, working in rapid feedback cycles (e.g., doing TDD) tend to produce cleaner code – code that’s easier to understand, simpler, has less duplication and more loosely-coupled modules. (We’ve measured this – for example in this BBC TDD case study.)

One theory about why TDD tends to produce cleaner code is that the short feedback loops – “micro-cycles” – bring much more focus to every design decision. TDD deliberately has a step built in to each micro-cycle to stop, look at the code we just wrote or changed, and refactor if necessary. I strongly encourage developers not to waste this opportunity. The Green Light is our signal to do a mini code-review on the work we just did.

I’ve found, through working with many teams, that the most effective code reviews are rigorous and methodical. Check all the code that changed, and check for a list of potential code quality issues every single time. Don’t just look at the code to see if it “looks okay” to you.

In the Codemanship TDD course, I ask developers to run through a check list on every green light:

  • Is the code easy to understand? (Not sure? Ask someone else.)
  • Is there obvious duplication?
  • Is each method or function and class or module as simple as it could be?
  • Do any methods/functions or classes/modules have more than one responsibility?
  • Can you see any Feature Envy – where a method/function (or part of a method/function) of one class/module depends on multiple features of another class/module?
  • Are a class’s/module’s dependencies easily swappable?
  • Is the class/module exposed to things it isn’t using (e.g., methods of a C++ interface it doesn’t call, or unused imports from other modules)?

You may, according to your needs and your team’s coding standards, have a different checklist. What seems to make the difference is that your team has a checklist, and that you are in the habit of applying it whenever you have the luxury of working code.

This is where the relationship exists between code review and Continuous Delivery. If our code isn’t working , it isn’t shippable. If you go for hours at a time with failing automated tests (or no testing at all), code review is a luxury. Your top priority’s to get it working – that’s the most important quality of any software design. If it doesn’t work, and you can’t deploy it, then whether or not there are any, say, long parameter lists in it is rather academic.

Now, I appreciate that stopping on every passing test and going through a checklist for all the code you changed may sound like a real drag. But, once upon a time, writing a unit test, writing the test assertion first and working backwards, remembering to see the test fail, and all the the habits of effective TDD felt like a bit of a chore. Until I’d done them 10,000 times. And then I stopped noticing that I was doing them.

The same goes for code review checklists. The more we apply them, the more it becomes “muscle memory”. After a year or two, you’ll develop an intuitive sense of code quality – problems will tend to leap out at you when you look at code, just as one bum note in an entire orchestra might leap out at a conductor with years of listening experience and ear training. You can train your eyes to notice code smells like long methods, large classes, divergent change, feature envy, primitive obsession, data clumps and all the other things that can make code harder to change.

This is another reason why I encourage very frequent code reviews. If you were training your musical ear, one practice session every couple of weeks is going to be far less effective than 20 smaller practice sessions a day. And if each practice session is much more focused – i.e., we don’t ear-train musicians with whole symphonies – then that, too, will speed up the learning curve.

The other very important reason I encourage continuous code review is that when we batch them up, we also tend to end up with batches of remedial actions to rectify any problems. If I add a branch to a method, review that, and decide that method is now too logically complex, fixing it there and then is a doddle.

If I make 50 boo-boos like that, not only will an after-the-horse-has-bolted code review probably miss many of those 50 issues, but the resulting TO-DO list is likely to require an amount of time and effort that will make it a task that has to be scheduled – very possibly by someone who doesn’t understand the need to do them. In the zero-sum game of software development scheduling, the most common result is that the work never gets done.


The Hidden Cost of “Dependency Drag”


The mysterious Sailing Stones of Death Valley are moved by some unseen natural force.

When I demonstrate mutation testing, I try to do it in the programming language my audience uses day-to-day. In most of the popular programming languages, there’s a usable, current mutation testing tool available. But for a long time, the .NET platform had none. That’s not to say there were never any decent mutation testing tools for .NET programs. There’s been several. But they had all fallen by the wayside.

Here’s the thing: some community-spirited developer kindly creates a mutation testing tool we can all use. That’s a sizable effort for no financial reward. But still they write it. It works. Folk are using it. And there’s no real need to add to it. Job done.

Then, one day, you try to use it with the new version of the unit testing tool you’ve been working with, and – mysteriously – it stops working. Like the Sailing Stones of Death Valley, the mutation testing is inexplicably 100 metres from where you left it, and to get it working again it has to be dragged back to its original position.

This is the hidden cost of a force I might call Dependency Drag. I see it all the time: developers forced to maintain software products that aren’t changing, but that are getting out of step with the environment in which they run, which is constantly changing under their feet.

GitHub – and older OSS repositories – is littered with the sun-bleached skeletons of code bases that got so out of step they simply stopped working, and maintainers didn’t want to waste any more time keeping them operational. Too much effort just to stand still.

Most of us don’t see Dependency Drag, because it’s usually hidden within an overall maintenance effort on a changing product. And the effect is usually slow enough that it looks like the stones aren’t actually moving.

But try and use some code that was written 5 years ago, 10 years ago, 20 years ago, if it hasn’t been maintained, and you’ll see it. The stones are a long way from where you left them.

This effect can include hardware, of course. I hang on to my old 3D TV so that I can play my 3D Blu-rays. One day, that TV will break down. Maybe I’ll be able to find another one on eBay. But 10 years from now? 20 years from now? My non-biodegradable discs may last centuries if kept safe. But it’s unlikely there’ll be anything to play them on 300 years from now.

This is why it will become increasingly necessary to preserve the execution environments of programs as well as the programs themselves. It’s no use preserving the 1960s Fortran compiler if you don’t have the 1960s computer and operating system and punch card reader it needs to work.

And as execution environments get exponentially more complex, the cost of Dependency Drag will multiply.


Architects – Hang Up Your Capes & Go Back To The Code

Software architecture is often framed as a positive career move for a developer. Organisations tend to promote their strongest technical people into these strategic and supervisory roles. The pay is better, so the lure is obvious.

I progressed into lead architecture roles in the early 00s, having “earned my spurs” as a developer and then tech lead in the 1990s. But I came to realise that, from my ivory tower, I was having less and less influence over the code that got written, and therefore less and less influence over the actual design and architecture of the software.

I could draw as many boxes and arrows as I liked, give as many PowerPoint presentations as I liked, write as many architecture and standards documents as I liked: none of it made much difference. It was like to trying to direct traffic using my mind.

So I hung up my shiny architect cape and pointy architect wizard hat and went back to working directly with developers on real code as part of the team.

Instead of decreeing “Thou shalt…”, I could – as part of a programming pair (and a programming mob, which was quite the thing with me) – instead suggest “Maybe we could…” and then take the keyboard and demonstrate what I meant. On the actual code. That actually got checked in and ended up in the actual product, instead of just in a Word document nobody ever reads.

The breakthrough for me was realising that “big design decisions” vs “small design decisions” was an artificial distinction. Most architecture decisions are about dependencies: what uses what? And “big” software dependencies – microservice A uses microservice B, for example – can be traced to “small” design decisions – a class in microservice A uses a class in microservice B – which can be traced to even “smaller” design decisions – a line of code in the class in microservice A needs a data value from the class in microservice B.

The “big” architecture decisions start in the code. And the code is full of tiny design decisions that have the potential to become “big”. And drawing an arrow pointing from a box labeled “Microservice A” to a box labeled “Microservice B” doesn’t solve the problems.

Try as we might to dictate the components, their roles and their and dependencies in a system up-front, the reality often deviates wildy from what the architect planned. This is how “layered architectures” – the work of the devil – permeated software architecture for so long, despite it being a complete falsehood that they “separate concerns”. (Spoiler Alert: they don’t.)

Don’t get me wrong: I’m all for visualisation and for a bit of up-front planning when it comes to software design. But sooner rather than later, we have to connect with the reality as the code emerges and evolves. And the most valuable service a software architect can offer to a dev team is to be right there with them fighting the complexity and the dependencies – and helping them to make sense of it all – on the front line.

You can offer more value in the long term by mentoring developers and helping them to reason about design and ultimately make better design decisions – “big” or “small” – than attempting to direct the whole effort from 30,000 ft.

Plus, it seems utter folly to me to take your most experienced developers and promote them away from the thing you believe they do well. (And paying them more to have less impact just adds insult to injury.)