GitHub Copilot – Productivity Boon or Considered Harmful?

We need to talk about GitHub Copilot. This is the ML-driven programming tool – powered by OpenAI Codex – that Microsoft is promoting as “Your AI pair programmer”, and which they claim “works alongside you directly in your editor, suggesting whole lines or entire functions for you.”

Now, full disclaimer: I’ve not been able to try the Copilot Beta yet – there’s a waiting list – so my thoughts are based purely on what I’ve read about it, and what I’ve seen of it in demonstration videos by people who’ve tried.

At first glance, Copilot looks very impressive. You can, for example, just declare a descriptive function or method name, and it will suggest a matching implementation. Or you can write a comment about what you want the code to do, and it will generate it for you.

All the examples I’ve seen were for well-defined, self-contained problems – “calculate a square root”, “find the lowest number” and so on. I’ve yet to see it handle more complex problems like “send an SMS message to this number when a product is running low on stock”.

Copilot was trained on GitHub’s enormous wealth of other people’s code. This in itself is contentious, because when it autosuggests a solution, that might be your code that it’s reproducing without any license. Much has been made of the legality and the ethics of this in the tech press and on social media, so I don’t want to go into that here.

As someone who trains and coaches teams in code craft, though, I have other concerns about Copilot.

My chief concern is this: what Copilot does, to all intents and purposes, is copy and paste code off the Internet. As the developers of Copilot themselves admit:

GitHub Copilot doesn’t actually test the code it suggests, so the code may not even compile or run. 

https://copilot.github.com/

I warn teams constantly that copying and pasting code verbatim off the Internet is like eating food you found in a dumpster. You don’t know what’s in it. You don’t know where it’s been. You don’t know if it’s safe.

When we buy food in a store, or a restaurant, there are rules and regulations. The food, its ingredients, its preparation, its storage, its transportation are all subject to stringent checks to make sure as best we can that it will be safe to eat. In countries where the rules are more relaxed, incidents of food poisoning – including deaths – are much higher.

Code is like food. When we reuse code, we need to know if its safe. The ingredients (the code it reuses), its preparation and its delivery all need to go through stringent checks to make sure that it works. This is why we have a specific package design principle called the Reuse-Release Equivalency Principle – the unit of code reuse is the unit of code release. In other words, we should only reuse code that’s been through a proper, disciplines and predictable release process that includes sufficient testing and no further changes after that.

Maybe that Twinkie you fished out of the dumpster was safe when it left the store. But it’s been in a dumpster, and who knows where else, since then.

So my worry is that prolific use of a tool like Copilot will riddle production software – software that you and I consume – with potentially unsafe code.

My second concern is about understanding and – as a trainer and coach – about learning. I work with developers all the time who rely heavily on copying and pasting to solve problems in their code. Often, they’ll find an example of something in their own code base, and copy and paste it. Or they’ll find an example on the Web and copy and paste that. What I’ve noticed is that the developers who copy and paste a lot tend to pick things up slower – if ever.

I can buy a ready-made cake from Marks & Spencer, but that doesn’t make me a baker. I learn nothing about baking from that experience. No matter how many cakes I buy, I don’t get any better at baking.

Of course, when folk copy and paste code, they may change bits of it to suit their specific need. And that’s essentially what Copilot is doing – it’s not an exact copy of existing code. Well, you can also buy plain cake bases and decorate them yourself. But it still doesn’t make you a baker.

Some will argue “Oh, but Jason, you learned to program by copying code examples.” And they’d be right. But I copied them out of books and out of computing magazines. I had to read the code, and then type it in myself. The code had to go through my brain to get into the software.

Just like the code had to go through Copilot’s neural network to get into its repertoire. There’s perhaps an irony here that what Codex has done is automate the part where programmers learn.

So, my fear is that heavy use of Copilot could result in software that’s riddled with code that doesn’t necessarily work and that nobody on the team really understands. This is a restaurant where most of the food comes from dumpsters.

Putting aside other Copilot features I might take issue with (generating tests from implementation code? – shudder), I really feel that its a brilliant solution to completely the wrong problem. And I’m not the only who thinks this.

If we were to observe developers and measure where their time goes, how much of it is spent looking for code examples? How much of it is spent typing code? That’s a pie chart I’d like to see. What we do know from decades of experience is that developers spend most of their time trying to understand code – often code they wrote themselves. (Hands up. Who else hates Monday mornings?)

Copilot’s main selling point is like trying to optimise a database application that does 10 reads for every 1 write by making the writes faster.

Having the code pasted into your project for you doesn’t reduce this overhead. It’s someone else’s code. You have to read it and you have to understand it (and then, ideally, you have to test it.) It breaks the Reuse-Release Equivalency Principle. It’s not safe reuse.

And Copilot isn’t a safe pair programming partner, being as its only skill is fishing Twinkies out of the code dumpster of GitHub.

I think a lot of more experienced developers – especially those of us who’ve lived through both the promise of general A.I. (still 30 years away, no matter when you ask) and of Computer-Aided Software Engineering – have seen it all before in one form or another. We’re not going to lose any sleep over it.

The tagline for Copilot is “Don’t fly solo”, but anyone using it instead of programming with a real human is most definitely flying solo.

Wake me up when Copilot suggests removing the duplication its creating, instead of generating more of it.

Wax On, Wax Off. There’s Value In Simple Exercises.

One of the risks of learning software development practices using simple, self-contained exercises is that developers might not see the relevance of them to their day-to-day work.

A common complaint is that exercises like the Mars Rover kata or the Fibonacci Number calculator look nothing like “real” code. They’re too simple. There’s no external dependencies. There’s no UI. And so on.

My response to this is that, yes, real code is much more complicated, but if you’re just starting out, you ain’t up to that level of complicated – nowhere near. When students have demanded more complex exercises, the inevitable result is they get stuck and then they get frustrated and they never manage to make much progress. They’re trying to learn to swim in the Atlantic, when what they need is a nice safe shallow pool to get them started in.

So, these simple exercises help students to build their skills and grow their confidence with practices. They also help to build habits. Taking Test-Driven Development as an example, outside of the design thinking that goes on top of the practice, much of it is about habits. Writing a failing test first is a habit. Seeing the test fail before you make it pass is a habit. And so on.

In tackling a simple exercise like the Mars Rover kata, you may apply these habits 20 or more times before you complete the exercise. That repetition reinforces the habits, just like practicing piano scales reinforces muscle memory (as well as building actual muscles, so that you can play faster and more consistently).

As an amateur guitar player, I try to find time every day to repeat some basic exercises. They have nothing to do with real music. But if I don’t do them, I become less capable of playing real music with confidence.

Likewise, as a software developer, I try to find time every day to repeat some basic exercises. Code katas tend to be perfect for this. When it gets more complicated, I can end up bogged down in the complexity – googling APIs, noodling with build scripts, upgrading frameworks and tools (yak shaving, basically). This is also what happens on training courses. As soon as you add, say, React.js, the whole exercise slows to a crawl and the original point of it gets buried under a pile of unshaved yaks.

In music, there are short-form and long-form pieces. To grow as a musician, you do need to expand the scope of the music you play. Not every song can be a 4-bar exercise.

To grow as a software developer, you do need to progress from simple self-contained problems to larger, interconnected systems. But my experience as a developer myself and as a trainer and coach is that it’s a mistake to start with large, complex systems.

It’s also a mistake to think that once you’ve graduated to catching bigger fish, there’s no longer any value in the small ones. Just as it’s a mistake to think that once you’ve learned to play piano concertos, there’s no value in practicing scales any more.

Those habits still need reinforcing, and when I’ve lapsed in daily short-form practice, I find myself getting sloppy on the bigger problems.

Now, here’s the thing: when I teach developers TDD, to begin with they’re focusing on how they’re writing the code far more than what code they’re writing, because that way of working is new to them. They have to remind themselves to write a failing test first. They have to remind themselves to see the test fail. They have to remind themselves to run the tests after every refactoring.

I try to bring them to a point where they don’t need to think about it any more, freeing their minds up to think about requirements and about design. That takes hours and hours of practice, and the need for regular practice never goes away.

Similarly, after thousands of hours of guitar practice, you’ll notice that I don’t even look at what my picking hand is doing most of the time. The pick just hits the right string at the right time to play the note I want to play, even when I’m playing fast.

It’s the same with practices like TDD and refactoring. As long as I maintain those good habits, I don’t have to consciously remind myself to apply them on real code – it just happens. And the end result is code that’s more reliable, simpler, modular, and much easier to change.

So you may be thinking “What has this simple exercise got to do with real software?”. But they do have a serious purpose and they do help build and maintain fundamental habits, freeing our minds to focus on the things that matter.

As Mr. Miyagi in Karate Kid says, ‘Wax On, Wax Off’.

Measuring Inner-Loop Agility

When I teach teams and managers about the feedback loops of software development, I try to stress the two most important loops – the ones that define agility.

Releases are where the value gets delivered, and – more importantly – where the end user feedback starts, so we can learn what works and what doesn’t and adapt for the next release.

The sooner we can release, the sooner we can start learning and adapting. So agile teams release frequently.

But frequency of releases doesn’t really define agility. I see teams who release every day or every week, but feature requests still take months to get into production.

That feature-to-production lead time is our best measure of how responsive to change we’re really being. How soon can we adapt to customer feedback?

For a portfolio of software products, a client of mine plotted average feature-to-production lead times against the average time it took to build and test the product.

We see a correlation between that feature-to-production lead time and the innermost loop of software delivery – build & test time.

Of course, this is a small data set, and all the usual caveats about “lies, damned lies and statistics” apply (I would love to do a bigger study, if anyone’s interested in participating).

But I’ve seen this distribution multiple times, and experienced it – and observed many, many teams experiencing it – in the field.

Products with slow build & test cycles tend to have much older backlogs. Indeed, backlogs themselves are a sign of slow lead times. I explained the causal mechanism for this in a previous post about Inner-Loop Agility. When we want to optimise nested loops, we get the biggest improvements in overall cycle time by focusing on the innermost loop.

Now, here’s the thing: everything that goes on between releases is really just guesswork. The magic happens when real end users get real working software, and we get to see how good our guesses were, and make more educated guesses in the next release cycle. We learn our way to value.

That’s why Inner-Loop Agility is so important, and why I’ve chosen to focus entirely on it as a trainer, coach and consultant. I can’t guarantee that you’re building the right thing (you almost certainly aren’t, no matter how well you plan), but I can offer you more throws of the dice.

Inner-Loop Agility (or “Why Your Agile Transformation Failed”)

Over the last couple of decades, I’ve witnessed more than my fair share of “Agile transformations”, and seen most of them produce disappointing results. In this post, I’m going to explain why they failed, and propose a way to beat the trend.

First of all, we should probably ask ourselves: what is an Agile transformation? This might seem like an obvious question, but you’d be surprised just how difficult it is to pin down any kind of accepted definition.

For some, it’s a process of adopting certain processes and practices, like the rituals of Scrum. If we do the rituals, then we’re Agile. Right?

Not so fast, buddy!

This is what many call “Cargo Cult Agility”. If we wear the right clothes and make offerings to the right gods, we’ll be Agile.

If we lose the capital “A”, and talk instead about agility, what is the goal of an agile transformation? To enable organisations to change direction quickly, I would argue.

How do we make organisations more responsive to change? The answer lies in that organisation’s feedback loops.

In software development, the most important feedback loop comes from delivering working software and systems to end users. Until our code hits the real world, it’s all guesswork.

So if we can speed up our release cycles so we can get more feedback sooner, and maintain the pace of those releases for as long as the business needs us to – i.e., the lifetime of that software – then we can effectively out-learn our competition.

Given how important the release cycle is, then, it’s no surprise that most Agile (with a capital “A”) transformations tend to focus on that feedback loop. But this is a fundamental mistake. The release cycle contains inner loops – wheels within wheels within wheels. If our goal is to speed up this outer feedback loop, we should be focusing most of our attention on the innermost feedback loops.

To understand why, let’s think about how we go about speeding up nested loops in code.

for (Release release:
releases) {
Thread.sleep(10);
System.out.println("RELEASE");
for (Feature feature:
release.features) {
Thread.sleep(10);
System.out.println("–FEATURE");
for (Scenario scenario:
feature.scenarios) {
Thread.sleep(10);
System.out.println("—-SCENARIO");
for (BuildAndTest buildAndTest:
scenario.buildAndTestCycles) {
Thread.sleep(1);
System.out.println("——BUILD & TEST");
}
}
}
}

Here’s some code that loops through a collection of releases. Each release loops through a list of features, and each feature has a list of scenarios that the system has to handle to implement that feature. For each scenario, it runs a build & test cycle multiple times. It’s a little model of a software development process.

Think of the development process as a set of gears. The largest gear turns the slowest, and drives a smaller, faster gear, which drives an even smaller and faster gear and so on.

In each loop, I’ve built in a delay of 10 ms to approximate the overhead of performing that particular loop (e.g., 10 ms to plan a release).

When I run this code, it takes 1 m 53 s to execute. Our release cycles are slow.

Now, here’s where most Agile transformations go wrong. They focus most of their attention on those outer loops. This produces very modest improvements in release cycle time.

Let’s “optimise” the three outer loops, reducing the delay by 90%.

for (Release release:
releases) {
Thread.sleep(1);
System.out.println("RELEASE");
for (Feature feature:
release.features) {
Thread.sleep(1);
System.out.println("–FEATURE");
for (Scenario scenario:
feature.scenarios) {
Thread.sleep(1);
System.out.println("—-SCENARIO");
for (BuildAndTest buildAndTest:
scenario.buildAndTestCycles) {
Thread.sleep(1);
System.out.println("——BUILD & TEST");
}
}
}
}

When I run this optimised code, it executes in 1 m 44 s. That’s only a 9% improvement in release cycle time, and we had to work on three loops to get it.

This time, let’s ignore those outer loops and just work on the innermost loop – build & test.

Now it finished in just 22 seconds. That’s an 81% improvement, just from optimising that innermost loop.

When we look at the output from this code, it becomes obvious why.

RELEASE
--FEATURE
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST

Of course, this is a very simplistic model of a much more complex reality, but the principle at any scale works just as well, and the results I’ve seen over the years bear it out: to reduce release cycle times, focus your attention on the innermost feedback loops. I call this Inner-Loop Agility.

Think of the micro-iterations of Test-Driven Development, refactoring and Continuous Integration. They all involve one key step – the part where we find out if the software works – which is to build and test it. We test it at every green light in TDD. We test it after every refactoring. We test it before we check in our changes (and afterwards, on a build server to rule out configuration differences with our desktops).

In Agile Software Development, we build and test our code A LOT – many times an hour. And we can only do this if building and testing our code is fast. If it takes an hour, then we can’t have Inner Loop Agility. And if we can’t have Inner-Loop Agility, we can’t have fast release cycles.

Of course, we could test less often. That always ends well. Here’s the thing, the more changes we make to the code before we test it, the more bugs we introduce and then catch later. The later we catch bugs, the more they cost to fix. When we test less often, we tend to end up spending more and more of our cycle time fixing bugs.

It’s not uncommon for teams to end up doing zero-feature releases, where there’s just a bunch of bug fixes and no value-add for the customer in each release.

A very common end result of a costly Agile transformation is often little more than Agility Theatre. Sure, we do the sprints. We have the stand-ups. We estimate the story points. But’s it ends up being all work and little useful output in each release. The engine’s at maximum revvs, but our car’s going nowhere.

Basically, the gears of our development process are the wrong way round.

Organisations who optimise their outer feedback loops but neglect the inner loops are operating in a “lower gear”.

There’s no real mystery about why Agile transformations tend to focus most of their attention on the outer feedback loops.

Firstly, the people signing the cheques understand those loops, and can actively engage with them – in the mistaken belief that agility is all about them.

Secondly, the $billion industry – the “Agile-Industrial Complex” – that trains and mentors organisations during these transformations is largely made up of coaches and consultants who have either a lapsed programming background, or no programming background at all. In a sample of 100 Agile Coach CV’s, I found that 70% had no programming background, and a further 20% hadn’t done it for at least a decade. 90% of Agile Coaches can’t help you with the innermost feedback loops. Or to put it more bluntly, 90% of Agile Coaches focus on the feedback loops that deliver the least impressive reductions in release cycle time.

Just to be clear, I’m not suggesting these outer feedback loops don’t matter. There’s usually much work to be done at all levels from senior management down to help organisations speed up their cycle times, and to attempt it without management’s blessing is typically folly. Improving build and test cycles requires a very significant investment – in skills, in time, in resource – and that shouldn’t be underestimated.

But to focus almost exclusively on the outer feedback loops produces very modest results, and it’s arguably where Agile transformations have gained their somewhat dismal reputation among business stakeholders and software professionals alike.

Solve The Problem You’ve Got, Not The One You Want

So, I’m demonstrating basic principles of modularity with a very simple example of some code that calculates the prices of fitted carpets.

public class CarpetQuote {
public double calculate(double width, double length, double pricePerSqM) {
double area = width * length;
return Math.ceil(area * pricePerSqM);
}
}

The premise of the demonstration is that our customer has told us that not all rooms are rectangular, and the code will therefore need to handle different room shapes when calculating the area of carpet required.

Now we could handle this with an enum and a switch statement, and have parameters for all the different kinds of room dimensions.

public class CarpetQuote {
public double calculate(Shape shape,
double pricePerSqM,
double width,
double length,
double radius,
double side,
double a,
double b,
double height) {
double area = 0.0;
switch(shape){
case Rectangle:
area = width * length;
break;
case Circle:
area = Math.PI * radius * radius;
break;
case EquilateralTriangle:
area = (Math.sqrt(3)/4) * side * side;
break;
case Trapezium:
area = ((a + b) / 2) * height;
break;
}
return Math.ceil(area * pricePerSqM);
}
}

But the drawback of that approach is that every time we need to add a new room shape, we have to modify code that was – at some point – working, and also change the method signature, breaking client code. Switch statements have a tendency to grow, as do long parameter lists. This design is rigid – difficult to change – and brittle – easy to break.

A refactored, more modular version makes it possible to extend the design much more easily, with no impact on the rest of the code.

public class CarpetQuote {
public double calculate(double pricePerSqM,
Shape shape) {
return Math.ceil(shape.getArea() * pricePerSqM);
}
}

Each type of shape calculates its own area, and they all implement a Shape interface. To add a new type of room shape, we just write a new class that implements that interface, and no other code is affected.

Now, this might all seem quite reasonable to you. But there’s always one person who will say something like “No, you should put it all in a single static method because it will be faster“, or “The area calculations should run in their own microservices so it will be more scalable“.

And I’ll reply “It prices fitted carpets. How fast or scalable does it need to be?”

And then there are the people who say “What if the rooms can change shape?” or “What if we want to start selling laminate flooring?”

And I’ll reply “What if they can’t?” and “What if we don’t?”

More generally, there’s a tendency for developers to try to reframe the problem to fit their desired solution. One person, for example, asked “What if that code is running inside a loop doing real-time 3D rendering?” It prices fitted carpets, for goodness’ sake!

I strongly encourage developers to solve the problem in front of them, and not to meander off into “Ah, but what if…?” territory.

The carpet quote code is optimised for extension, not for speed, because we know it needs to handle multiple room shapes – the customer specifically requested it.

Indeed, we know that code that gets used almost always gets changed, so we should optimise for easy changes unless we have a genuine need to balance that against other design goals like speed and scalability.

Most code doesn’t need to be super-fast. Most code doesn’t need to scale to Netflix levels. And when it does, that should be explicitly part of its performance requirements, and not up to the whims of developers who would rather be solving those problems than calculating prices for boring old fitted carpets.

Code Craft – The Proof of the Pudding

In extended code craft training, I work with pairs on a multi-session exercise called “Jason’s Guitar Shack”. They envision and implement a simple solution to solve a stock control problem for a fictional musical instrument store, applying code craft disciplines like Specification By Example, TDD and refactoring as they do it.

The most satisfying part for me is that, at the end, there’s a demonstrable pay-off – a moment where we review what they’ve created and see how the code is simple, readable, low in duplication and highly modular, and how it’s all covered by a suite of good – as in, good at catching it when we break the code – and fast-running automated tests.

We don’t explicitly set out to achieve these things. They’re something of a self-fulfilling prophecy because of the way we worked.

Of course all the code is covered by automated tests: we wrote the tests first, and we didn’t write any code that wasn’t required to pass a failing test.

Of course the code is simple: we did the simplest things to pass our failing tests.

Of course the code is easy to understand: we invested time establishing a shared language working directly with our “customer” that subconsciously influenced the names we chose in our code, and we refactored whenever code needed explaining.

Of course the code is low in duplication: we made a point of refactoring to remove duplication when it made sense.

Of course the code is modular: we implemented it from the outside in, solving one problem at a time and stubbing and mocking the interfaces of other modules that solved sub-problems – so all our modules do one job, hide their internal workings from clients – because to begin with, there were no internal workings – and they’re swappable by dependency injection. Also, their interfaces were designed from the client’s point of view, because we stubbed and mocked them first so we could test the clients.

Of course our tests fail when the code is broken: we specifically made sure they failed when the result was wrong before we made them pass.

Of course most of our tests run fast: we stubbed and mocked external dependencies like web services as part of our outside-in TDD design process.

All of this leads up to our end goal: the ability to deploy new iterations of the software as rapidly as we need to, for as long as we need to.

With their code in version control, built and tested and potentially deployed automatically when they push their changes to the trunk branch, that process ends up being virtually frictionless.

Each of these pay-offs is established in the final few sessions.

First, after we’ve test-driven all the modules in our core logic and the integration code behind that, we write a single full integration test – wiring all the pieces together. Pairs are often surprised – having never tested them together – that it works first time. I’m not surprised. We test-drove the pieces of the jigsaw from the outside in, explicitly defining their contracts before implementing them. So – hey presto – all the pieces fit.

Then we do code reviews to check if the solution is readable, low in duplication, as simple as we could make it, and that the code is modular. Again, I’m not surprised when we find that the code ticks these boxes, even though we didn’t mindfully set out to do so.

Then we measure the code coverage of the tests – 100% or very near. Again, I’m not surprised, even though that was never the goal. But just because 100% of our code is covered by tests, does that mean it’s really being tested. So we perform mutation testing on the code. Again, the coverage is very high. These are test suites that should give us confidence that the code really works.

The final test is to measure the cycle time from completing a change to seeing it production. How long does it take to test, commit, push, build & re-test and then deploy changes into the target environment? The answer is minutes. For developers whose experience of this process is that it can take hours, days or even weeks to get code into production, this is a revelation.

It’s also kind of the whole point. Code craft enables rapid and sustained innovation on software and systems (and the business models that rely on them).

Now, I can tell you this in a 3-day intensive training course. But the extended training – where I work with pairs in weekly sessions over 10-12 weeks – is where you actually get to see it for yourself.

If you’d like to talk about extended code craft training for your team, drop me a line.

‘Agility Theatre’ Keeps Us Entertained While Our Business Burns

I train and coach developers and teams in the technical practices of Agile Software Development like Test-Driven Development, Refactoring and Continuous Integration. I’m one of a rare few who exclusively does that. Clients really struggle to find Agile technical coaches these days.

There seems to be no shortage of help on the management practices and the process side of Agile, though. That might be a supply-and-demand problem. A lot of “Agile transitions” seem to focus heavily on those aspects, and the Agile coaching industry has risen to meet that demand with armies of certified bods.

I’ve observed, though, that without effective technical practices, agility eludes those organisations. You can have all the stand-ups and planning meetings and burn-down charts and retrospectives you like, but if your teams are unable to rapidly and sustainably evolve your software, it amounts to little more than Agility Theatre.

Agility Theatre is when you have all the ceremony of Agile Software Development, but none of the underlying technical discipline. It’s a city made of chipboard facades, painted to look like the real thing to the untrained eye from a distance.

In Agile Software Development, there’s one metric that matters: how much does it cost to change our minds? That’s kind of the point. In this rapidly changing, constantly evolving world, the ability to adapt matters. It matters more than executing a plan. Because plans don’t last long in the 21st century.

I’ve watched some pretty big, long-established, hugely successful companies brought down ultimately by their inability to change their software and core systems.

And I’ve measured the difference the technical practices can make to that metric.

Teams who write automated tests after the code being tested tend to find that the cost of changing their software rises exponentially over the average lifespan of 8 years. I know exactly what causes this. Test-after tends to produce a surfeit of tests that hit external dependencies like databases and web services, and test suites that run slow.

If your tests run slow, then you’ll test less often, which means bugs will be caught later, when they’re more expensive to fix.

Teams whose test suites run slow end up spending more and more of their time – and your money – fixing bugs. Until, one day, that’s pretty much all they’re doing.

Teams who write their tests first have a tendency to end up with fast-running test suites. It’s a self-fulfilling prophecy – using unit tests as specifications unsurprisingly produces code that is inherently more unit-testable, as we’re forced to stub and mock those expensive external dependencies.

This means teams that go test-first can test more frequently, catching bugs much sooner, when they’re orders of magnitude cheaper to fix. Teams who go test-first spend a lot less time fixing bugs.

The upshot of all this is that teams who go test-first tend to have a much shallower cost-of-change curve, allowing them sustain the pace of software evolution for longer. Basically, they outrun the test-after teams.

Now, I’m not going to argue that breaking work down into smaller batch sizes and scheduling deliveries more frequently can’t make a difference. But what I will argue is that if the technical discipline is lacking, all that will do is enable you to observe – in almost real time – the impact of a rising cost of change.

You’ll be in a car, focusing on where to go next, while your Miles Per Gallon rises exponentially. You reach a point where the destination doesn’t matter, because you ain’t going nowhere.

As the cost of changes rises, it piles on the risk of building the wrong thing. Trying to get it right first time is antithetical to an evolutionary approach. I’ve worked with analysts and architects who believed they could predict the value of a feature set, and went to great lengths to specify the Right Thing. In the final reckoning, they were usually out by a country mile. No matter how hard we try to predict the market, ultimately it’s all just guesswork until our code hits the real world.

So the ability to change our minds – to learn from the software we deliver and adapt – is crucial. And that all comes down to the cost of change. Over the last 25 years, it’s been the best predictor I’ve personally seen of long-term success or failure of software-dependent businesses. It’s the entropy of tech.

You may be a hugely successful business today – maybe even the leader in your market – but if the cost of changing your code is rising exponentially, all you’re really doing is market research for your more agile competitors.

Agile without Code Craft is not agile at all.

The Trouble With Objects

A perennial debate that I enjoy wading into is the classic “Should it be kick(ball) or ball.kick()?”

This seems to reveal a fundamental dichotomy in our shared understanding of Object-Oriented Programming.

It’s a trick question, of course. If the effect is the same – the displacement of the ball – then kick(ball) and ball.kick() mean exactly the same thing. But the debate rages around who is doing the kicking and who is being kicked.

Many programmers quite naturally assign agency to objects, and object (pun intended) to the ball kicking itself. Balls don’t kick themselves! They will often counter with “It should be player.kick(ball)“.

But this can lead us down the rabbit hole to distinctly non-OO code. Taking an example from a Codemanship training course about an online CD warehouse, the same question comes about whether it should be cd.buy() or warehouse.buy(cd).

Again, the protestation is that “CDs don’t buy themselves!” I can completely understand why students might think this, having had it drummed into us that objects do work. (Although why nobody ever objects that “Warehouses don’t buy CDs!” is one of life’s little mysteries.)

I’m the first person to say that object design should start with the work. Then we figure out what data is required to do that work. Put the data with the work. And, hey presto, you got an object. Assign one job to each object, and get them talking to each other to coordinate bigger jobs, and – hey presto! – you got OOP.

(The art of OOP is really in deciding where to put the work, and that’s what this debate is essentially all about.)

But warehouse.buy(cd) – in the training exercise we do – can lead us into deep water regarding encapsulation. The are told that the effect of buying a CD is that the stock count of that CD goes down, and that the customer’s credit card is charged the price of that CD.

So our test looks a bit like this:

public class WarehouseTest {
@Test
public void buyCd(){
CD cd = new CD(10, 9.99);
CreditCard card = mock(CreditCard.class);
Warehouse warehouse = new Warehouse();
warehouse.buy(cd, 1, card);
assertEquals(9, cd.getStock());
verify(card).charge(9.99);
}
}

The implementation that passes this test suffers from a distinct case of Feature Envy between Warehouse and CD, because buying a CD requires access to a CD’s stock and price.

public class Warehouse {
public void buy(CD cd, int quantity, CreditCard card) {
card.charge(cd.getPrice() * quantity);
cd.setStock(cd.getStock() quantity);
}
}
view raw Warehouse.java hosted with ❤ by GitHub

When we refactor this code to eliminate the Feature Envy (i.e., to encapsulate the work)…

…we end up with a CD that – shock, horror! – buys itself!

public class CD {
private int stock;
private final double price;
public CD(int stock, double price) {
this.stock = stock;
this.price = price;
}
public int getStock() {
return stock;
}
public void buy(int quantity, CreditCard card) {
card.charge(price * quantity);
stock = stock quantity;
}
}
view raw CD.java hosted with ❤ by GitHub

This refactoring is typically followed by “But… but….”. Placing this behaviour inside the CD class conflicts with our mental model of the world. CD’s don’t buy themselves!

And yet we encounter objects apparently doing things to themselves in OO libraries all the time: lists that filter themselves, database connections that open themselves, files that read themselves.

And that’s what’s meant by “object-oriented”. The CD is the thing being bought. It’s the object of the buy action. In OOP, we put the object first, followed by the action. Read cd.buy() not as “the CD buys” but as “buy this CD”.

Millions of people around the world read OO code the wrong way around. The ones who tend to grock that it’s object-oriented are those of us who’ve had to approximate OOP in non-OO languages – particularly C. (Check out previous posts about encapsulating in C and applying SOLID principles to C code.)

Without the benefit of an OO syntax, we resort to defining all the functions that apply to a type of data structure in one place, and the first parameter to every function is a pointer to an instance of that structure, usually named this.

Then we might hide the data definition of the structure – just declaring its type in our .h file – in the same .c implementation file, so only those functions can access the data. Then we might define a table of virtual functions – a “v-table” – that can be applied to that data structure, and attach the data structure to them so that clients can invoke functions on instances of the data structure. Is this all starting to sound familiar?

The set of operations defined by a class are the operations that can be applied to that object.

In reality, objects don’t do work. The CPU does. The object identifies the thing – the record in memory – to or with which the work is to be done. That’s literally how object-oriented programming works. cd.buy() means “apply the buy() function to this CD”. list.filter() means “filter this list”. file.read() means “read this file”.

The idea of objects doing work, and passing messages to each other to coordinate larger pieces of work – “collaborations” – is a metaphor. And it works just fine once you let go of the idea that balls don’t kick themselves.

But words are powerful things, and in programming especially, they can get tangled in our mental models of how the problem domain works in the real world. In the real world, only life has agency (well, maybe). Most things are acted upon. So we have a natural tendency to separate agency from data, and this leads us to oodles and oodles of Feature Envy.

I learned to read object-oriented code as “do that to this” a long time ago, and it therefore has no conflict with my mental model of the world. The CD isn’t buying. The CD is bought.

object.action().

UPDATE

I’ve been very much enjoying the ensuing furore that suggesting ball.kick() means “kick the ball” inevitably starts. The fun part is reading the “better” designs folk come up with to avoid accepting that.

player.kick(ball) is one of the most popular. Note now that we have two classes instead of one to achieve the same outcome.

Likewise, cd.buy() seems to have offended the design senses of some. It should be cart.add(cd), they say. Again, we now have two classes involved, and also the CD didn’t actually get bought yet. And it also kind of proves my point, because the CD is being add to the cart.

On a more general note, when students go down the warehouse.buy(cd) route, I ask them why the warehouse needs to be involved if we know which CD we’re buying.

object.action() tends to simplify things.

How To (And Not To) Use Interfaces

If you’re writing code in a language that supports the concept of interfaces – or variants on the theme of pure abstract types with no implementation – then I can think of several good reasons for using them.

Polymorphism

There are often times when our software needs the ability to perform the same task in a variety of ways. Take, for example, calculating the area of a room. This code generates quotes for fitted carpets based on room area.

double quote(double pricePerSqMtr, Room room) {
double area = room.area();
return pricePerSqMtr * Math.ceil(area);
}
view raw Carpet.java hosted with ❤ by GitHub

Rooms can have different shapes. Some are rectangular, so the area is the width multiplied by the length. Some are even circular, where the area is π r².

We could have a big switch statement that does a different calculation for each room shape, but every time we want to add new shapes to the software, we have to go back and modify it. That’s not very extensible. Ideally, we’d like to be able to add new room shapes without changing our lovely tested existing code.

If we define an interface for calculating the area, then we can easily have multiple implementations that our client code binds to dynamically.

public interface Room {
double area();
}
public class RectangularRoom implements Room {
private final double width;
private final double length;
public RectangularRoom(double width, double length) {
this.width = width;
this.length = length;
}
@Override
public double area() {
return width * length;
}
}
public class CircularRoom implements Room {
private final double radius;
public CircularRoom(double radius) {
this.radius = radius;
}
@Override
public double area() {
return PI * Math.pow(radius, 2);
}
}
view raw Room.java hosted with ❤ by GitHub

Hiding Things

Consider a class that has multiple features for various purposes (e.g., for testing, or for display).

public class Movie {
private final String title;
private int availableCopies = 1;
private List<Member> onLoanTo = new ArrayList<>();
public Movie(String title){
this.title = title;
}
public void borrowCopy(Member member){
availableCopies -= 1;
onLoanTo.add(member);
}
public void returnCopy(Member member){
availableCopies++;
onLoanTo.remove(member);
}
public String getTitle() {
return title;
}
public int getAvailableCopies() {
return availableCopies;
}
public Boolean isOnLoanTo(Member member) {
return onLoanTo.contains(member);
}
}
view raw Movie.java hosted with ❤ by GitHub

Then consider a client that only needs a subset of those features.

public class LoansView {
private Member member;
private Movie selectedMovie;
public LoansView(Member member, Movie selectedMovie){
this.member = member;
this.selectedMovie = selectedMovie;
}
public void borrowMovie(){
selectedMovie.borrowCopy(member);
}
public void returnMovie(){
selectedMovie.returnCopy(member);
}
}
view raw LoansView.java hosted with ❤ by GitHub

We can use client-specific interfaces to hide features for clients who don’t need to (or shouldn’t) use them, simplifying the interface and protecting clients from changes to features they never use.

public interface Loanable {
void borrowCopy(Member member);
void returnCopy(Member member);
}
public class Movie implements Loanable {
private final String title;
private int availableCopies = 1;
private List<Member> onLoanTo = new ArrayList<>();
public Movie(String title) {
this.title = title;
}
@Override
public void borrowCopy(Member member) {
availableCopies -= 1;
onLoanTo.add(member);
}
@Override
public void returnCopy(Member member) {
availableCopies++;
onLoanTo.remove(member);
}
public String getTitle() {
return title;
}
public int getAvailableCopies() {
return availableCopies;
}
public Boolean isOnLoanTo(Member member) {
return onLoanTo.contains(member);
}
}
public class LoansView {
private Member member;
private Loanable selectedMovie;
public LoansView(Member member, Loanable selectedMovie) {
this.member = member;
this.selectedMovie = selectedMovie;
}
public void borrowMovie(){
selectedMovie.borrowCopy(member);
}
public void returnMovie(){
selectedMovie.returnCopy(member);
}
}
view raw Loanable.java hosted with ❤ by GitHub

In languages with poor support for encapsulation, like Visual Basic 6.0, we can use interfaces to hide what we don’t want client code to be exposed to instead.

Many languages support classes or modules implementing multiple interfaces, enabling us to present multiple client-specific views of them.

Faking It ‘Til You Make It

There are often times when we’re working on code that requires some other problem to be solved. For example, when processing the sale of a CD album, we might need to take the customer’s credit card payment.

Instead of getting caught in a situation where we have to solve every problem to deliver or test a feature, we can instead use interfaces as placeholders for those parts of the solution, defining explicitly what we expect that class or module to do without the need to write the code to make it do it.

public interface Payments {
Boolean process(double amount, CreditCard card);
}
public class BuyCdTest {
private Payments payments;
private CompactDisc cd;
private CreditCard card;
@Before
public void setUp() {
payments = mock(Payments.class);
when(payments.process(any(), any())).thenReturn(true); // payment accepted
cd = new CompactDisc(10, 9.99);
card = new CreditCard(
"MR P SQUIRE",
"1234234534564567",
"10/24",
567);
}
@Test
public void saleIsSuccessful(){
cd.buy(1, card);
assertEquals(9, cd.getStock());
}
@Test
public void cardIsChargedCorrectAmount(){
cd.buy(2, card);
verify(payments).process(19.98, card);
}
}
view raw Payments.java hosted with ❤ by GitHub

Using interfaces as placeholders for parts of the design we’re eventually going to get to – including external dependencies – is a powerful technique that allows us to scale our approach. It also tends to lead to inherently more modular designs, with cleaner separation of concerns. CompactDisc need not concern itself with how payments are actually being handled.

Describing Protocols

In statically-typed languages like Java and C#, if we say that an implementation must implement a certain interface, then there’s no way of getting around that.

But in dynamic languages like Python and JavaScript, things are different. Duck typing allows us to present client code with any implementation of a method or function that matches the signature of what the client invokes at runtime. This can be very freeing, and can cut out a lot of code clutter, as there’s no need to have lots of interfaces defined explicitly.

It can also be dangerous. With great power comes great responsibility (and hours of debugging!) Sometimes it’s useful to document that fact that, say, a parameter needs to look a certain way.

In those instances, experienced programmers might define a class – since Python, for example, doesn’t have interfaces – that has no implementation, that developers are instructed to extend and override when they create their implementations. Think of an interface in Python as a class that only defines methods that you must only override.

A class that processes sales of CD albums might need a way to handle payments through multiple different payment processors (e.g., Apple Pay, PayPal etc). The code that invokes payments defines a contract that any payment processor must fulfil, but we might find it helpful to document exactly what that interface looks like with a base class.

class Payments(object):
def pay(self, credit_card, amount):
raise Exception("This an an abstract class")
view raw payments.py hosted with ❤ by GitHub

Type hinting in Python enables us to make it clear that any object passed in as the payments constructor parameter should extend this class and override its method.

class CompactDisc(object):
def __init__(self, stock, price, payments: Payments):
self.payments = payments
self.price = price
self.stock = stock
def buy(self, quantity, credit_card):
self.stock -= quantity
self.payments.pay(credit_card, self.price)
view raw compact_disc.py hosted with ❤ by GitHub

You can do this in most dynamic languages, but the usefulness of explicitly defining abstractions in Python is acknowledged by the widely-used Abstract Base Class (ABC) Python library that enforces their rules.

from abc import ABC, abstractmethod
class Payments(ABC):
@abstractmethod
def pay(self, credit_card, amount):
pass
view raw payments.java hosted with ❤ by GitHub

So, from a design point of view, interfaces are really jolly useful. They can make our lives easier in a variety of ways, and are very much the key to achieving clean separation of concerns on modular systems, and to scaling our approach to software developer.

But they can also have their downsides.

How Not To Use Interfaces

Like all useful things, interfaces can be overused and abused. For every code base I see where there are few if any interfaces, I see one where everything has an interface, regardless of motive.

When is separation of concerns not separation of concerns?

If an interface does not provide polymorphism (i.e., there’s only ever one implementation), and does not hide features, and is not a placeholder for something you’re Faking Until You’re Making, and describes no protocol that isn’t already explicitly defined by the class that implements it, then you can clutter up your code base with large amounts of useless indirection.

In real code bases of the order of tens or hundreds of thousands, or even millions, of lines of code, classes tend to cluster. As our code grows, we may split out multiple helper classes that are intimately tied together – if one changes, they all change – by the job they collaborate to do.

A better design acknowledges these clusters and packages them together behind a simple public interface. Think of each of these packages as being like an internal microservice. (They may literally be microservices, of course. But even if they’re all released in the same component, we can treat them as internal microservices.)

Hide clusters of classes that change together behind simple interfaces

In practising outside-in Test-Driven Development, I will use interfaces to stub or mock solutions to other problems to separate those concerns from the problem I’m currently solving. So I naturally introduce interfaces within an architecture.

But I also refactor quite mercilessly, and many problems require more than one class or module to solve them. These will emerge through the refactoring process, and they tend to stay hidden behind their placeholder interfaces.

(Occasionally I’ll introduce an interface as part of a refactoring because it solves one of the problems described above and adds value to the design.)

So, interfaces – useful and powerful. But don’t overdo it.

What Is ‘Leadership’ Anyway?

If you spend any time on LinkedIn you’re likely to bump into content about this thing called “leadership”. Many posters fancy themselves as experts in this mysterious quality. Many promote themselves as professional “leaders”.

I’m sure you won’t be surprised to learn that I think this is nonsense. And now I’m going to tell you why.

Leading Is Not What You Think It Is?

Let’s think of what that word means: “Lead the way”, “Follow my lead”, “Tonight’s leading news story”, “Mo Farah is in the lead”.

When you lead, it usually means that you go first.

Leading is distinctly different from commanding or inspiring, but that’s what many professional “leaders” mistake it for.

Leaders don’t tell people where to go. They show people the way by going first.

I don’t tell people to write their tests first. I write my tests first and show them how. I lead by example.

‘Leader’ Is Not A Job Title

Organisations appoint what they believe to be good leaders into roles where leading by example is difficult, if not impossible. They give them titles like “Head of” and “Director of” and “Chief” and then promote them away from any activity where they would have the time to show rather than tell.

The real leaders are still on the shop floor. It’s the only place they can lead from.

And, as we’ve probably all experienced, promoting the people who could set the best example into roles where they can’t show instead of tell is a very common anti-pattern.

We Are Not Your Flock

Another common mistake is to see leadership as some kind of pastoral care. Now, I’m not going to suggest that organisations shouldn’t take an interest in the welfare of their people. Not just because happy workers make better workers, but because they are people, and therefore it’s the right thing to do.

And executives could set examples – like work-life balance, like the way they treat people at all levels of the corporate ladder, and like how much they pay people (yeah, I’m looking at you, gig economy) – but that’s different to the way many of them perceive that role.

Often, they’re more like religious leaders, espousing principles for their followers to live by, while indulging in drug-fuelled orgies and embezzling the church’s coffers.

And the care that most people need at work is simply to not make their lives worse. If you let them, grown-ups will grown-up. They can buy their own massage chair if they want one. Nothing more disheartening than watching managers impose their ideas about well-being on to actual adults who are allowed to drink and drive and vote.

If people are having problems, and need help and understanding, then be there for that. Don’t make me go to paintball. I don’t need it, thanks.

The Big Bucks

Most developers I know who moved into those “leadership” roles knew it was a mistake at the time – for the organisation and for themselves – but they took the promotion anyway. Because “leadership” is where the big bucks are.

The average UK salary for a CTO is £85,000. For a senior developer, it’s £60,000 (source: itjobswatch.co.uk). But how senior is “senior”? I’m quite a senior developer. Most CTOs are junior by comparison.

And in most cases, CTO is a strategic command – not a leadership – role (something I freely admit I suck at). A CTO cannot lead in the way I can, because I set an example for a living. For all I know, there are teams out there I’ve never even met who’ve been influenced more by me than by their CTO.

‘Leader’ Is A Relative Term

When I’ve been put in charge of development teams, I make a point of not asking developers to do anything I’m not prepared to at least try myself, and this means I’m having to learn new things all the time. Often I’m out of my comfort zone, and in those instances I need leadership. I need someone to show me the way.

Leadership is a relationship, not a role. It’s relative. When I follow you, and do as you do, then you are the leader. When you do as I do, I’m the leader.

In the course of our working day, we may lead, and we may follow. When we’re inexperienced, we may follow more than we lead. But every time you’ve shown someone how you do something and they’ve started to do it too, you’re a leader.

Yes, I know. That sounds like teaching. Funny, that.

But it doesn’t have to be an explicit teacher-student relationship. Every time you read someone’s code and think “Oh, that’s cool. I’m going to try that”, you have been led.

It’s lonely at the top

For sure, there are many ways a CxO could lead by example – by working reasonable hours, by not answering emails or taking calls on holidays, by putting their trust in their people, or by treating everyone with respect. That’s a rare (and beautiful) thing. But it’s the nature of heirarchies that those kinds of people tend not to get ahead. And it’s very difficult to lead by example from a higher strata. If a CTO leaves the office at 5:30pm, but none of her 5,000 employees actually sees it, does it make a sound?

Show, Don’t Tell

So, leadership is a very distinct thing from command. When you tell someone to do something, you’re commanding. When you show them how you do it – when you go first – that’s leading.

“Show, don’t tell” would be – if it had one – Codemanship’s mission statement. Right from the start, I’ve made a point of demonstrating – and not presenting – ideas. The PowerPoint content of Codemanship training courses has diminished to the point of almost non-existent over the last 12 years.

And in that sense, I started Codemanship to provide a kind of leadership: the kind a CTO or VP of Engineering can’t.

Set Your Leaders Free

I come across so many organisations who lack technical leadership. Usually this happens because of the first mistake – the original sin, if you like – of promoting the people who could be setting a good example into roles where they no longer can, and then compounding that mistake by stripping authority and autonomy from people below that pay grade – because “Well, that’s leadership taken care of”.

I provide a surrogate technical leadership service that shouldn’t need to exist. I’m the CTO who never took that promotion and has time – and up-to-date skills – to show you how to refactor a switch statement. I know people who market themselves as an “Interim CTO”. Well, I’m the Interim Old Programmer Who’s Been Around Forever.

I set myself free by taking an alternative career path – starting my own company. I provide the workshops and the brown bag sessions and the mobbing sessions and the screencasts and the blog posts that you could be creating and sharing within your organisation, if only they’d let you.

If only they’d trust you: trust you to manage your own time and organise things the way you think will work best – not just for getting things done, but for learning how to do them better.

People working in silos, keeping their heads down, is antithetical to effective leadership. Good ideas tend to stay in their silos. And my long experience has taught me that broadcasting these ideas from on-high simply changes nothing.

Oh, The Irony

I believe this is a pretty fundamental dysfunction in organisational life. We don’t just have this problem in tech: we see it repeated in pretty much every industry.

Is there a cure? I believe so, and I’ve seen and been involved with companies who’ve managed to open up the idea of leadership and give their people the trust and the autonomy (and the resources) to eventually provide their own internal technical leadership that is self-sustaining.

But they are – if I’m being honest – in the minority. Training and mentoring from someone like me is more likely to lead to your newly inspired, more highly skilled people moving on to a company where they do get trust and autonomy.

This is why I warn clients that “If you water the plant, eventually it’ll need a bigger pot”. And if pushed to describe what I do, I tell them “I train developers for their next job”. Would that it were not so, but I have no control over that.

Because I’m not in charge.