GitHub Copilot – Productivity Boon or Considered Harmful?

We need to talk about GitHub Copilot. This is the ML-driven programming tool – powered by OpenAI Codex – that Microsoft is promoting as “Your AI pair programmer”, and which they claim “works alongside you directly in your editor, suggesting whole lines or entire functions for you.”

Now, full disclaimer: I’ve not been able to try the Copilot Beta yet – there’s a waiting list – so my thoughts are based purely on what I’ve read about it, and what I’ve seen of it in demonstration videos by people who’ve tried.

At first glance, Copilot looks very impressive. You can, for example, just declare a descriptive function or method name, and it will suggest a matching implementation. Or you can write a comment about what you want the code to do, and it will generate it for you.

All the examples I’ve seen were for well-defined, self-contained problems – “calculate a square root”, “find the lowest number” and so on. I’ve yet to see it handle more complex problems like “send an SMS message to this number when a product is running low on stock”.

Copilot was trained on GitHub’s enormous wealth of other people’s code. This in itself is contentious, because when it autosuggests a solution, that might be your code that it’s reproducing without any license. Much has been made of the legality and the ethics of this in the tech press and on social media, so I don’t want to go into that here.

As someone who trains and coaches teams in code craft, though, I have other concerns about Copilot.

My chief concern is this: what Copilot does, to all intents and purposes, is copy and paste code off the Internet. As the developers of Copilot themselves admit:

GitHub Copilot doesn’t actually test the code it suggests, so the code may not even compile or run. 

https://copilot.github.com/

I warn teams constantly that copying and pasting code verbatim off the Internet is like eating food you found in a dumpster. You don’t know what’s in it. You don’t know where it’s been. You don’t know if it’s safe.

When we buy food in a store, or a restaurant, there are rules and regulations. The food, its ingredients, its preparation, its storage, its transportation are all subject to stringent checks to make sure as best we can that it will be safe to eat. In countries where the rules are more relaxed, incidents of food poisoning – including deaths – are much higher.

Code is like food. When we reuse code, we need to know if its safe. The ingredients (the code it reuses), its preparation and its delivery all need to go through stringent checks to make sure that it works. This is why we have a specific package design principle called the Reuse-Release Equivalency Principle – the unit of code reuse is the unit of code release. In other words, we should only reuse code that’s been through a proper, disciplines and predictable release process that includes sufficient testing and no further changes after that.

Maybe that Twinkie you fished out of the dumpster was safe when it left the store. But it’s been in a dumpster, and who knows where else, since then.

So my worry is that prolific use of a tool like Copilot will riddle production software – software that you and I consume – with potentially unsafe code.

My second concern is about understanding and – as a trainer and coach – about learning. I work with developers all the time who rely heavily on copying and pasting to solve problems in their code. Often, they’ll find an example of something in their own code base, and copy and paste it. Or they’ll find an example on the Web and copy and paste that. What I’ve noticed is that the developers who copy and paste a lot tend to pick things up slower – if ever.

I can buy a ready-made cake from Marks & Spencer, but that doesn’t make me a baker. I learn nothing about baking from that experience. No matter how many cakes I buy, I don’t get any better at baking.

Of course, when folk copy and paste code, they may change bits of it to suit their specific need. And that’s essentially what Copilot is doing – it’s not an exact copy of existing code. Well, you can also buy plain cake bases and decorate them yourself. But it still doesn’t make you a baker.

Some will argue “Oh, but Jason, you learned to program by copying code examples.” And they’d be right. But I copied them out of books and out of computing magazines. I had to read the code, and then type it in myself. The code had to go through my brain to get into the software.

Just like the code had to go through Copilot’s neural network to get into its repertoire. There’s perhaps an irony here that what Codex has done is automate the part where programmers learn.

So, my fear is that heavy use of Copilot could result in software that’s riddled with code that doesn’t necessarily work and that nobody on the team really understands. This is a restaurant where most of the food comes from dumpsters.

Putting aside other Copilot features I might take issue with (generating tests from implementation code? – shudder), I really feel that its a brilliant solution to completely the wrong problem. And I’m not the only who thinks this.

If we were to observe developers and measure where their time goes, how much of it is spent looking for code examples? How much of it is spent typing code? That’s a pie chart I’d like to see. What we do know from decades of experience is that developers spend most of their time trying to understand code – often code they wrote themselves. (Hands up. Who else hates Monday mornings?)

Copilot’s main selling point is like trying to optimise a database application that does 10 reads for every 1 write by making the writes faster.

Having the code pasted into your project for you doesn’t reduce this overhead. It’s someone else’s code. You have to read it and you have to understand it (and then, ideally, you have to test it.) It breaks the Reuse-Release Equivalency Principle. It’s not safe reuse.

And Copilot isn’t a safe pair programming partner, being as its only skill is fishing Twinkies out of the code dumpster of GitHub.

I think a lot of more experienced developers – especially those of us who’ve lived through both the promise of general A.I. (still 30 years away, no matter when you ask) and of Computer-Aided Software Engineering – have seen it all before in one form or another. We’re not going to lose any sleep over it.

The tagline for Copilot is “Don’t fly solo”, but anyone using it instead of programming with a real human is most definitely flying solo.

Wake me up when Copilot suggests removing the duplication its creating, instead of generating more of it.

Wax On, Wax Off. There’s Value In Simple Exercises.

One of the risks of learning software development practices using simple, self-contained exercises is that developers might not see the relevance of them to their day-to-day work.

A common complaint is that exercises like the Mars Rover kata or the Fibonacci Number calculator look nothing like “real” code. They’re too simple. There’s no external dependencies. There’s no UI. And so on.

My response to this is that, yes, real code is much more complicated, but if you’re just starting out, you ain’t up to that level of complicated – nowhere near. When students have demanded more complex exercises, the inevitable result is they get stuck and then they get frustrated and they never manage to make much progress. They’re trying to learn to swim in the Atlantic, when what they need is a nice safe shallow pool to get them started in.

So, these simple exercises help students to build their skills and grow their confidence with practices. They also help to build habits. Taking Test-Driven Development as an example, outside of the design thinking that goes on top of the practice, much of it is about habits. Writing a failing test first is a habit. Seeing the test fail before you make it pass is a habit. And so on.

In tackling a simple exercise like the Mars Rover kata, you may apply these habits 20 or more times before you complete the exercise. That repetition reinforces the habits, just like practicing piano scales reinforces muscle memory (as well as building actual muscles, so that you can play faster and more consistently).

As an amateur guitar player, I try to find time every day to repeat some basic exercises. They have nothing to do with real music. But if I don’t do them, I become less capable of playing real music with confidence.

Likewise, as a software developer, I try to find time every day to repeat some basic exercises. Code katas tend to be perfect for this. When it gets more complicated, I can end up bogged down in the complexity – googling APIs, noodling with build scripts, upgrading frameworks and tools (yak shaving, basically). This is also what happens on training courses. As soon as you add, say, React.js, the whole exercise slows to a crawl and the original point of it gets buried under a pile of unshaved yaks.

In music, there are short-form and long-form pieces. To grow as a musician, you do need to expand the scope of the music you play. Not every song can be a 4-bar exercise.

To grow as a software developer, you do need to progress from simple self-contained problems to larger, interconnected systems. But my experience as a developer myself and as a trainer and coach is that it’s a mistake to start with large, complex systems.

It’s also a mistake to think that once you’ve graduated to catching bigger fish, there’s no longer any value in the small ones. Just as it’s a mistake to think that once you’ve learned to play piano concertos, there’s no value in practicing scales any more.

Those habits still need reinforcing, and when I’ve lapsed in daily short-form practice, I find myself getting sloppy on the bigger problems.

Now, here’s the thing: when I teach developers TDD, to begin with they’re focusing on how they’re writing the code far more than what code they’re writing, because that way of working is new to them. They have to remind themselves to write a failing test first. They have to remind themselves to see the test fail. They have to remind themselves to run the tests after every refactoring.

I try to bring them to a point where they don’t need to think about it any more, freeing their minds up to think about requirements and about design. That takes hours and hours of practice, and the need for regular practice never goes away.

Similarly, after thousands of hours of guitar practice, you’ll notice that I don’t even look at what my picking hand is doing most of the time. The pick just hits the right string at the right time to play the note I want to play, even when I’m playing fast.

It’s the same with practices like TDD and refactoring. As long as I maintain those good habits, I don’t have to consciously remind myself to apply them on real code – it just happens. And the end result is code that’s more reliable, simpler, modular, and much easier to change.

So you may be thinking “What has this simple exercise got to do with real software?”. But they do have a serious purpose and they do help build and maintain fundamental habits, freeing our minds to focus on the things that matter.

As Mr. Miyagi in Karate Kid says, ‘Wax On, Wax Off’.

Inner-Loop Agility (or “Why Your Agile Transformation Failed”)

Over the last couple of decades, I’ve witnessed more than my fair share of “Agile transformations”, and seen most of them produce disappointing results. In this post, I’m going to explain why they failed, and propose a way to beat the trend.

First of all, we should probably ask ourselves: what is an Agile transformation? This might seem like an obvious question, but you’d be surprised just how difficult it is to pin down any kind of accepted definition.

For some, it’s a process of adopting certain processes and practices, like the rituals of Scrum. If we do the rituals, then we’re Agile. Right?

Not so fast, buddy!

This is what many call “Cargo Cult Agility”. If we wear the right clothes and make offerings to the right gods, we’ll be Agile.

If we lose the capital “A”, and talk instead about agility, what is the goal of an agile transformation? To enable organisations to change direction quickly, I would argue.

How do we make organisations more responsive to change? The answer lies in that organisation’s feedback loops.

In software development, the most important feedback loop comes from delivering working software and systems to end users. Until our code hits the real world, it’s all guesswork.

So if we can speed up our release cycles so we can get more feedback sooner, and maintain the pace of those releases for as long as the business needs us to – i.e., the lifetime of that software – then we can effectively out-learn our competition.

Given how important the release cycle is, then, it’s no surprise that most Agile (with a capital “A”) transformations tend to focus on that feedback loop. But this is a fundamental mistake. The release cycle contains inner loops – wheels within wheels within wheels. If our goal is to speed up this outer feedback loop, we should be focusing most of our attention on the innermost feedback loops.

To understand why, let’s think about how we go about speeding up nested loops in code.

for (Release release:
releases) {
Thread.sleep(10);
System.out.println("RELEASE");
for (Feature feature:
release.features) {
Thread.sleep(10);
System.out.println("–FEATURE");
for (Scenario scenario:
feature.scenarios) {
Thread.sleep(10);
System.out.println("—-SCENARIO");
for (BuildAndTest buildAndTest:
scenario.buildAndTestCycles) {
Thread.sleep(1);
System.out.println("——BUILD & TEST");
}
}
}
}

Here’s some code that loops through a collection of releases. Each release loops through a list of features, and each feature has a list of scenarios that the system has to handle to implement that feature. For each scenario, it runs a build & test cycle multiple times. It’s a little model of a software development process.

Think of the development process as a set of gears. The largest gear turns the slowest, and drives a smaller, faster gear, which drives an even smaller and faster gear and so on.

In each loop, I’ve built in a delay of 10 ms to approximate the overhead of performing that particular loop (e.g., 10 ms to plan a release).

When I run this code, it takes 1 m 53 s to execute. Our release cycles are slow.

Now, here’s where most Agile transformations go wrong. They focus most of their attention on those outer loops. This produces very modest improvements in release cycle time.

Let’s “optimise” the three outer loops, reducing the delay by 90%.

for (Release release:
releases) {
Thread.sleep(1);
System.out.println("RELEASE");
for (Feature feature:
release.features) {
Thread.sleep(1);
System.out.println("–FEATURE");
for (Scenario scenario:
feature.scenarios) {
Thread.sleep(1);
System.out.println("—-SCENARIO");
for (BuildAndTest buildAndTest:
scenario.buildAndTestCycles) {
Thread.sleep(1);
System.out.println("——BUILD & TEST");
}
}
}
}

When I run this optimised code, it executes in 1 m 44 s. That’s only a 9% improvement in release cycle time, and we had to work on three loops to get it.

This time, let’s ignore those outer loops and just work on the innermost loop – build & test.

Now it finished in just 22 seconds. That’s an 81% improvement, just from optimising that innermost loop.

When we look at the output from this code, it becomes obvious why.

RELEASE
--FEATURE
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
----SCENARIO
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST
------BUILD & TEST

Of course, this is a very simplistic model of a much more complex reality, but the principle at any scale works just as well, and the results I’ve seen over the years bear it out: to reduce release cycle times, focus your attention on the innermost feedback loops. I call this Inner-Loop Agility.

Think of the micro-iterations of Test-Driven Development, refactoring and Continuous Integration. They all involve one key step – the part where we find out if the software works – which is to build and test it. We test it at every green light in TDD. We test it after every refactoring. We test it before we check in our changes (and afterwards, on a build server to rule out configuration differences with our desktops).

In Agile Software Development, we build and test our code A LOT – many times an hour. And we can only do this if building and testing our code is fast. If it takes an hour, then we can’t have Inner Loop Agility. And if we can’t have Inner-Loop Agility, we can’t have fast release cycles.

Of course, we could test less often. That always ends well. Here’s the thing, the more changes we make to the code before we test it, the more bugs we introduce and then catch later. The later we catch bugs, the more they cost to fix. When we test less often, we tend to end up spending more and more of our cycle time fixing bugs.

It’s not uncommon for teams to end up doing zero-feature releases, where there’s just a bunch of bug fixes and no value-add for the customer in each release.

A very common end result of a costly Agile transformation is often little more than Agility Theatre. Sure, we do the sprints. We have the stand-ups. We estimate the story points. But’s it ends up being all work and little useful output in each release. The engine’s at maximum revvs, but our car’s going nowhere.

Basically, the gears of our development process are the wrong way round.

Organisations who optimise their outer feedback loops but neglect the inner loops are operating in a “lower gear”.

There’s no real mystery about why Agile transformations tend to focus most of their attention on the outer feedback loops.

Firstly, the people signing the cheques understand those loops, and can actively engage with them – in the mistaken belief that agility is all about them.

Secondly, the $billion industry – the “Agile-Industrial Complex” – that trains and mentors organisations during these transformations is largely made up of coaches and consultants who have either a lapsed programming background, or no programming background at all. In a sample of 100 Agile Coach CV’s, I found that 70% had no programming background, and a further 20% hadn’t done it for at least a decade. 90% of Agile Coaches can’t help you with the innermost feedback loops. Or to put it more bluntly, 90% of Agile Coaches focus on the feedback loops that deliver the least impressive reductions in release cycle time.

Just to be clear, I’m not suggesting these outer feedback loops don’t matter. There’s usually much work to be done at all levels from senior management down to help organisations speed up their cycle times, and to attempt it without management’s blessing is typically folly. Improving build and test cycles requires a very significant investment – in skills, in time, in resource – and that shouldn’t be underestimated.

But to focus almost exclusively on the outer feedback loops produces very modest results, and it’s arguably where Agile transformations have gained their somewhat dismal reputation among business stakeholders and software professionals alike.

Code Craft – The Proof of the Pudding

In extended code craft training, I work with pairs on a multi-session exercise called “Jason’s Guitar Shack”. They envision and implement a simple solution to solve a stock control problem for a fictional musical instrument store, applying code craft disciplines like Specification By Example, TDD and refactoring as they do it.

The most satisfying part for me is that, at the end, there’s a demonstrable pay-off – a moment where we review what they’ve created and see how the code is simple, readable, low in duplication and highly modular, and how it’s all covered by a suite of good – as in, good at catching it when we break the code – and fast-running automated tests.

We don’t explicitly set out to achieve these things. They’re something of a self-fulfilling prophecy because of the way we worked.

Of course all the code is covered by automated tests: we wrote the tests first, and we didn’t write any code that wasn’t required to pass a failing test.

Of course the code is simple: we did the simplest things to pass our failing tests.

Of course the code is easy to understand: we invested time establishing a shared language working directly with our “customer” that subconsciously influenced the names we chose in our code, and we refactored whenever code needed explaining.

Of course the code is low in duplication: we made a point of refactoring to remove duplication when it made sense.

Of course the code is modular: we implemented it from the outside in, solving one problem at a time and stubbing and mocking the interfaces of other modules that solved sub-problems – so all our modules do one job, hide their internal workings from clients – because to begin with, there were no internal workings – and they’re swappable by dependency injection. Also, their interfaces were designed from the client’s point of view, because we stubbed and mocked them first so we could test the clients.

Of course our tests fail when the code is broken: we specifically made sure they failed when the result was wrong before we made them pass.

Of course most of our tests run fast: we stubbed and mocked external dependencies like web services as part of our outside-in TDD design process.

All of this leads up to our end goal: the ability to deploy new iterations of the software as rapidly as we need to, for as long as we need to.

With their code in version control, built and tested and potentially deployed automatically when they push their changes to the trunk branch, that process ends up being virtually frictionless.

Each of these pay-offs is established in the final few sessions.

First, after we’ve test-driven all the modules in our core logic and the integration code behind that, we write a single full integration test – wiring all the pieces together. Pairs are often surprised – having never tested them together – that it works first time. I’m not surprised. We test-drove the pieces of the jigsaw from the outside in, explicitly defining their contracts before implementing them. So – hey presto – all the pieces fit.

Then we do code reviews to check if the solution is readable, low in duplication, as simple as we could make it, and that the code is modular. Again, I’m not surprised when we find that the code ticks these boxes, even though we didn’t mindfully set out to do so.

Then we measure the code coverage of the tests – 100% or very near. Again, I’m not surprised, even though that was never the goal. But just because 100% of our code is covered by tests, does that mean it’s really being tested. So we perform mutation testing on the code. Again, the coverage is very high. These are test suites that should give us confidence that the code really works.

The final test is to measure the cycle time from completing a change to seeing it production. How long does it take to test, commit, push, build & re-test and then deploy changes into the target environment? The answer is minutes. For developers whose experience of this process is that it can take hours, days or even weeks to get code into production, this is a revelation.

It’s also kind of the whole point. Code craft enables rapid and sustained innovation on software and systems (and the business models that rely on them).

Now, I can tell you this in a 3-day intensive training course. But the extended training – where I work with pairs in weekly sessions over 10-12 weeks – is where you actually get to see it for yourself.

If you’d like to talk about extended code craft training for your team, drop me a line.

‘Agility Theatre’ Keeps Us Entertained While Our Business Burns

I train and coach developers and teams in the technical practices of Agile Software Development like Test-Driven Development, Refactoring and Continuous Integration. I’m one of a rare few who exclusively does that. Clients really struggle to find Agile technical coaches these days.

There seems to be no shortage of help on the management practices and the process side of Agile, though. That might be a supply-and-demand problem. A lot of “Agile transitions” seem to focus heavily on those aspects, and the Agile coaching industry has risen to meet that demand with armies of certified bods.

I’ve observed, though, that without effective technical practices, agility eludes those organisations. You can have all the stand-ups and planning meetings and burn-down charts and retrospectives you like, but if your teams are unable to rapidly and sustainably evolve your software, it amounts to little more than Agility Theatre.

Agility Theatre is when you have all the ceremony of Agile Software Development, but none of the underlying technical discipline. It’s a city made of chipboard facades, painted to look like the real thing to the untrained eye from a distance.

In Agile Software Development, there’s one metric that matters: how much does it cost to change our minds? That’s kind of the point. In this rapidly changing, constantly evolving world, the ability to adapt matters. It matters more than executing a plan. Because plans don’t last long in the 21st century.

I’ve watched some pretty big, long-established, hugely successful companies brought down ultimately by their inability to change their software and core systems.

And I’ve measured the difference the technical practices can make to that metric.

Teams who write automated tests after the code being tested tend to find that the cost of changing their software rises exponentially over the average lifespan of 8 years. I know exactly what causes this. Test-after tends to produce a surfeit of tests that hit external dependencies like databases and web services, and test suites that run slow.

If your tests run slow, then you’ll test less often, which means bugs will be caught later, when they’re more expensive to fix.

Teams whose test suites run slow end up spending more and more of their time – and your money – fixing bugs. Until, one day, that’s pretty much all they’re doing.

Teams who write their tests first have a tendency to end up with fast-running test suites. It’s a self-fulfilling prophecy – using unit tests as specifications unsurprisingly produces code that is inherently more unit-testable, as we’re forced to stub and mock those expensive external dependencies.

This means teams that go test-first can test more frequently, catching bugs much sooner, when they’re orders of magnitude cheaper to fix. Teams who go test-first spend a lot less time fixing bugs.

The upshot of all this is that teams who go test-first tend to have a much shallower cost-of-change curve, allowing them sustain the pace of software evolution for longer. Basically, they outrun the test-after teams.

Now, I’m not going to argue that breaking work down into smaller batch sizes and scheduling deliveries more frequently can’t make a difference. But what I will argue is that if the technical discipline is lacking, all that will do is enable you to observe – in almost real time – the impact of a rising cost of change.

You’ll be in a car, focusing on where to go next, while your Miles Per Gallon rises exponentially. You reach a point where the destination doesn’t matter, because you ain’t going nowhere.

As the cost of changes rises, it piles on the risk of building the wrong thing. Trying to get it right first time is antithetical to an evolutionary approach. I’ve worked with analysts and architects who believed they could predict the value of a feature set, and went to great lengths to specify the Right Thing. In the final reckoning, they were usually out by a country mile. No matter how hard we try to predict the market, ultimately it’s all just guesswork until our code hits the real world.

So the ability to change our minds – to learn from the software we deliver and adapt – is crucial. And that all comes down to the cost of change. Over the last 25 years, it’s been the best predictor I’ve personally seen of long-term success or failure of software-dependent businesses. It’s the entropy of tech.

You may be a hugely successful business today – maybe even the leader in your market – but if the cost of changing your code is rising exponentially, all you’re really doing is market research for your more agile competitors.

Agile without Code Craft is not agile at all.

Forget SOLID. Say Hello To SHOC Principles for Modular Design.

Yesterday, I ran a little experiment on social media. Like TDD, SOLID “class design principles” have become de rigueur on software developers’ CVs. And, like TDD, most of the CVs SOLID is featured on belong to people who evidently haven’t understood – let alone applied – the principles.

There’s also been much talk in recent years about whether SOLID principles need updating. Certainly, as a trainer and coach in software design practices, I’ve found SOLID to be insufficient and often confusing.

So, with both these things in mind, I threw out my own set of principles – SHOC – to explain how Codemanship teaches principles of modular software design. Note that I’m very deliberately not saying “object-oriented” or “class” design.

Here are the four principles taught on Codemanship Code Craft courses. Good modules should:

  • Be Swappable
  • Hide their internal workings
  • Do One job
  • Have Client-driven interfaces

Now, those of you who do understand SOLID may notice that SHOC covers 4 of the 5 letters.

The Liskov Substitution and Dependency Inversion principles in SOLID are about making modules interchangeable (basically, about polymorphism).

The Single Responsibility Principle is essentially “do one job”, though it’s rationale I’ve long found to be a red herring. The real design benefit of SRP is greater composability (think of UNIX pipes), so I focus on that when I explain it.

The Interface Segregation Principle is a backwards way of saying that interfaces should be designed from the client’s point of view.

So that’s the S, the L, the I and the D of SOLID. SLID.

But SOLID is missing something really rather crucial. It doesn’t explicitly mandate encapsulation. Combined, it may imply it, depending on how you interpret and apply the principles.

But you can easily satisfy every SOLID – or SLID – principle and still have every module expose its internals in a very unseemly manner, with getters and setters galore. e.g., Interface Segregation says that’s fine just as long as only the getters and setters the client’s using are exposed.

So I find the need to add Tell, Don’t Ask – that objects shouldn’t ask for data to do work, they should tell the objects that contain the data to do the work themselves, enabling us to hide that data – to the list of modular design principles developers need to learn. SLIDT.

And what happened to the Open-Closed Principle of SOLID – that classes should be open to extension and closed to modification. The original rationale for the OCP was the time it took to build and test code. This is a callback to a time when our computers were about 1000x slower than today. I used to take a long coffee break when my code was compiling, and our tests ran overnight. And that was advanced at the time.

Now we can build and test our code in minutes or even seconds – well, if our testing pyramid is the right way up – and modifying existing modules really is no big deal. The refactoring discipline kind of relies on modules being open to modification, for example.

And, as Kevlin Henney has rightly pointed out, we could think of OCP as being more of a language design principles than a software design principle. “Open to extension” in C++ means something quite different in JavaScript.

So I dropped the O in SOLID. It’s a 90’s thing. Technology moved on.

“SLIDT”**, of course, doesn’t exactly trip off the tongue, and I doubt many would use it as their “memorable information” for their online banking log-in.

So I came up with SHOC. Without a K*. Modules should be swappable, hide their internal workings, do one job and have interfaces designed from the client’s point of view.

This is how I teach modular design now – in multiple programming paradigms and at multiple levels of code organisation – and I can report that it’s been far more successful when you measure it in terms of the impact on code quality it’s had.

It’s much easier to understand, and much easier to apply, be it in C++, or Java, or C#, or JavaScript, or Clojure, or COBOL. Yes, you heard me. COBOL.

SHOC is built on the original principles of modular design dating back to the late 1960s and early 70s – namely that modules should be interchangeable, have good separation of concerns, present “well-defined” interfaces (okay, so I interpreted that) and hide their information. Like Doctor Martin’s boots, I’m not expecting these to go out of fashion anytime soon.

When I teach software design principles, I teach them as two sets: Simple Design and SHOC. Occasionally, students ask “But what about SOLID?” and we have that conversation – but increasingly less often of late.

So, will you be adding “SHOC” to your CV? Probably not. That was never the point. SOLID as a “must-have” skill will soldier on, despite being out-of-date, rarely correctly applied, and widely misunderstood.

But that’s never stopped us before 😉

*It’s been especially fun watching people try to add an arbitrary K to SHOC. Nature abhors an incomplete mnemonic.

**It just occurred to me that if I’d used “encapsulates its internal workings” instead of “Tell, Don’t Ask”, I could have had SLIDE, but folk would still get confused by the SLID part

Stuck In Service-Oriented Hell? You Need Contract Tests

As our system architectures get increasingly distributed, many teams experience the pain of writing code that consumes services through APIs that are changing.

Typically, we don’t know that a non-backwards-compatible change to an external dependency has happened until our own code suddenly stops working. I see many organisations spending a lot of time fixing those problems. I shudder to think how much time and money, as an industry, we’re wasting on it.

Ideally, developers of APIs wouldn’t make changes that break contracts with client code. But this is not an ideal world.

What would be very useful is an early warning system that flags up the fact that a change we’ve made is going to break client code before we release it. As a general rule with bugs, the sooner we know, the cheaper they are to fix.

Contract tests are a good tool for getting that early warning. A contract test is a kind of integration test that focuses specifically on the interaction between our code and an external dependency. When they fail, they pinpoint the source of the problem: namely that something has likely changed at the other end.

There are different ways of writing contract tests, but one of favourites is to use Abstract Tests. Take this example from my Guitar Shack code:

package com.guitarshack;
import com.guitarshack.net.Network;
import com.guitarshack.net.RESTClient;
import com.guitarshack.net.RequestBuilder;
import com.guitarshack.sales.SalesData;
import org.junit.Test;
import java.util.Calendar;
import static org.junit.Assert.assertTrue;
/*
This Abstract Test allows us to create two versions of the set-up, one with
stubbed JSON and one that actually connects to the web service, effectively pinpointing
whether an error has occurred because of a change to our code or a change to the external dependency
*/
public abstract class SalesDataTestBase {
@Test
public void fetchesSalesData(){
SalesData salesData = new SalesData(new RESTClient(getNetwork(), new RequestBuilder()), () > {
Calendar calendar = Calendar.getInstance();
calendar.set(2019, Calendar.JANUARY, 31);
return calendar.getTime();
});
int total = salesData.getTotal(811);
assertTrue(total > 0);
}
protected abstract Network getNetwork();
}

The getNetwork() method is left abstract so it can be overridden in subclasses.

One implementation uses a stub implement of the Network interface that returns hardcoded JSON, so I can unit test most of my SalesData class.

package com.guitarshack.unittests;
import com.guitarshack.net.Network;
import com.guitarshack.SalesDataTestBase;
public class SalesDataUnitTest extends SalesDataTestBase {
@Override
protected Network getNetwork() {
return request > "{\"productID\":811,\"startDate\":\"7/17/2019\",\"endDate\":\"7/27/2019\",\"total\":31}";
}
}

Another implementation returns the real implementation of Network, called Web, and connects to a real web service hosted as an AWS Lambda.

package com.guitarshack.contracttests;
import com.guitarshack.*;
import com.guitarshack.net.Network;
import com.guitarshack.net.Web;
/*
If this test fails when the SalesDataUnitTest is still passing, this indicates a change
in the external API
*/
public class SalesDataContractTest extends SalesDataTestBase {
@Override
protected Network getNetwork() {
return new Web();
}
}

If the contract test suddenly starts failing while the unit test is still passing, that’s a big hint that the problem is at the other end.

To illustrate, I changed the output field ‘total’ to ‘salesTotal’ in the JSON being outputted from the web service. See what happens when I run my tests.

The contract test fails (as well as a larger integration test, that wouldn’t pinpoint the source of the problem as effectively), while the unit test version is still passing.

When I change ‘salesTotal’ back to ‘total’, all the tests pass again.

This is very handy for as a client code developer, writing code that consumes the sales data API. But it would be even more useful for the developer of that API to be able to run my contract tests, perhaps before a release, or as an overnight job, so they could get early warning that their changes have broken the contract.

For teams who are able to access each others’ code (e.g., on GitHub), that’s quite straightforward. I could rig up my Maven project to enable developers to build and run just those tests, for example. Notice that my unit and contract tests are in different packages to make that easy.

For teams who can’t access each other’s repos, it may take a little more ingenuity. But we’re probably used to seeing our code built and tested on other machines (e.g., on cloud CI servers) and getting the results back over the web. It’s not rocket science to offer Contract Testing as a Service. You could then give the API developers exclusive – possibly secured – access to your contract test builds over HTTP.

I’ve seen contract testing – done well – save organisations a lot of blood, sweat and tears. At the very least, it can defend API developers from breaking the First Law of Software Development:

Though shalt not break shit that was working

Jason Gorman

Don’t Succumb To Illusions Of Productivity

One thing I come across very often is development teams who have adopted processes or practices that they believe are helping them go faster, but that are probably making no difference, or even slowing them down.

The illusion of productivity can be very seductive. When I bash out code without writing tests, or without refactoring, it really feels like I’m getting sh*t done. But when I measure my progress more objectively, it turns out I’m not.

That could be because typing code faster – without all those pesky interruptions – feels like delivering working software faster. But it usually takes longer to get something working when we take less care.

We seem hardwired not to notice how much time we spend fixing stuff later that didn’t need to be broken. We seem hardwired not to notice the team getting bigger and bigger as the bug count and the code smells and the technical debt pile up. We seem hardwired not to notice the merge hell we seem to end up in every week as developers try to get their changes into the trunk.

We just feel like…

Getting sh*t done

Not writing automated tests is one classic example. I mean, of course unit tests slow us down! It’s, like, twice as much code! The reality, though, is that without fast-running regression tests, we usually end up spending most of our time fixing bugs when we could be adding value to the product. The downstream costs typically outweigh the up-front investment in unit tests. Skipping tests is almost always a false economy, even on relatively short projects. I’ve measured myself with and without unit tests, and on ~1 hour exercises, and I’m slightly faster with them. Typing is not the bottleneck.

Another example is when teams mistakenly believe that working on separate branches of the code will reduce bottlenecks in their delivery pipelines. Again, it feels like we’re getting more done as we hack away in our own isolated sandboxes. But this, too, is an illusion. It doesn’t matter how many lanes the motorway has if every vehicle has to drive on to the same ferry at the end of it. No matter how many parallel dev branches you have, there’s only one branch deployments can be made from, and all those parallel changes have to somehow make it into that branch eventually. And the less often developers merge, the more changes in each merge. And the more changes in each merge, the more conflicts. And, hey presto, merge hell.

Closely related is the desire of many developers to work without “interruptions”. It may feel like sh*t’s getting done when the developers go off into their cubicles, stick their noise-cancelling headphones on, and hunker down on a problem. But you’d be surprised just how much communication and coordination’s required to avoid some serious misunderstandings. I recall working on a team where we ended up with three different architectures and four customer tables in the database, because my colleagues felt that standing around a whiteboard drawing pictures – or, as they called it, “meetings” – was a waste of valuable Getting Sh*t Done time. With just a couple of weeks of corrective work, we were able to save ourselves 20 minutes around a whiteboard. Go us!

I guess my message is simple. In software development, productivity doesn’t look like this:

Don’t be fooled by that illusion.

The Jason’s Guitar Shack kata – Part I (Core Logic)

This week, I’ve been coaching developers for an automotive client in Specification By Example (or, as I call it these days, “customer-driven TDD”).

The Codemanship approach to software design and development has always been about solving problems, as opposed to building products or delivering features.

So I cooked up an exercise that starts with a customer with a business problem, and tasked pairs to work with that customer to design a simple system that might solve the problem.

It seems to have gone well, so I thought I’d share the exercise with you for you to try for yourselves.

Jason’s Guitar Shack

I’m a retired international rock legend who has invested his money in a guitar shop. My ex-drummer is my business partner, and he runs the shop, while I’ve been a kind of silent partner. My accountant has drawn my attention to a problem in the business. We have mysterious gaps in the sales of some of our most popular products.

I can illustrate it with some data I pulled off our sales system:

DateTimeProduct IDQuantityPrice Charged
13/07/201910:477571549
13/07/201912:157571549
13/07/201917:238111399
14/07/201911:454491769
14/07/201913:378111399
14/07/201915:018111399
15/07/201909:267571549
15/07/201911:558111399
16/07/201910:3337411199
20/07/201914:074491769
22/07/201911:284491769
24/07/201910:178112798
24/07/201915:318111399
Product sales for 4 selected guitar models

Product 811 – the Epiphone Les Paul Classic in worn Cherry Sunburst – is one of our biggest sellers.

Epiphone Les Paul Classic Worn in Heritage Cherry | GAK

We sell one or two of these a day, usually. But if you check out the sales data, you’ll notice that between July 15th and July 24th, we didn’t sell any at all. These gaps appear across many product lines, throughout the year. We could be losing hundreds of thousands of pounds in sales.

After some investigation, I discovered the cause, and it’s very simple: we keep running out of stock.

When we reorder stock from the manufacturer or other supplier, it takes time for them to fulfil our order. Every product has a lead time on delivery to our warehouse, which is recorded in our warehouse system.

DescriptionPrice (£)StockRack SpaceManufacturer Delivery Lead Time (days)Min Order
Fender Player Stratocaster w/ Maple Fretboard in Buttercream54912201410
Fender Deluxe Nashville Telecaster MN in 2 Colour Sunburst769510215
Ibanez RG652AHMFX-NGB RG Prestige Nebula Green Burst (inc. case)119925601
Epiphone Les Paul Classic In Worn Heritage Cherry Sunburst39922301420
Product supply lead times for 4 selected guitars

My business partner – the store manager – typically only reorders stock when he realises we’ve run out (usually when a customer asks for it, and he checks to see if we have any). Then we have no stock at all while we wait for the manufacturer to supply more, and during that time we lose a bunch of sales. In this age of the Electric Internet, if we don’t have what the customer wants, they just take out their smartphone and buy it from someone else.

This is the business problem you are tasked with solving: minimise lost sales due to lack of stock.

There are some wrinkles to this problem, of course. We could solve it by cramming our warehouse full of reserve stock. But that would create a cash flow problem for the business, as we have bills to pay while products are gathering dust on our shelves. So the constraint here is, while we don’t want to run out of products, we actually want as few in stock as possible, too.

The second wrinkle we need to deal with is that sales are seasonal. We sell three times as much of some products in December as we do in August, for example. So any solution would need to take that into account to reduce the risk of under- or over-stocking.

So here’s the exercise for a group of 2 or more:

  • Nominate someone in your group as the “customer”. They will decide what works and what doesn’t as far as a solution is concerned.
  • Working with your customer, describe in a single sentence a headline feature – this is a simple feature that solves the problem. (Don’t worry about how it works yet, just describe what it does.)
  • Now, think about how your headline feature would work. Describe up to a maximum of 5 supporting features that would make the headline feature possible. These could be user-facing features, or internal features used by the headline feature. Remember, we’re trying to design the simplest solution possible.
  • For each feature, starting with the headline feature, imagine the scenarios the system would need to handle. Describe each scenario as a simple headline (e.g., “product needs restocking”). Build a high-level test list for each feature.
  • The design and development process now works one feature at a time, starting with the headline feature.
    • For each feature’s scenario, describe in more detail how that scenario will work. Capture the set-up for that scenario, the action or event that triggers the scenario, and the outcomes the customer will expect to happen as a result. Feel free to use the Given…When…Then style. (But remember: it’s not compulsory, and won’t make any difference to the end result.)
    • For each scenario, capture at least one example with test data for every input (every variable in the set-up and every parameter of the action or event), and for every expected output or outcome. Be specific. Use the sample data from our warehouse and sales systems as a starting point, then choose values that fit your scenario.
    • Working one scenario at a time, test-drive the code for its core logic using the examples, writing one unit test for each output or outcome. Organise and name your tests and test fixture so it’s obvious which feature, which scenario and which output or outcome they are talking about. Try as much as possible to choose names that appear in the text you’ve written with your customer. You’re aiming for unit tests that effectively explain the customer’s tests.
    • Use test doubles – stubs and mocks – to abstract external dependencies like the sales and warehouse systems, as well as to Fake It Until You Make it for any supporting logic covered by features you’ll work on later.

And that’s Part I of the exercise. At the end, you should have the core logic of your solution implemented and ready to incorporate into a complete working system.

Here’s a copy of the sample data I’ve been using with my coachees – stick close to it when discussing examples, because this is the data that your system will be consuming in Part II of this kata, which I’ll hopefully write up soon.

Good luck!

Readable Parameterized Tests

Parameterized tests (sometimes called “data-driven tests”) can be a useful technique for removing duplication from test code, as well as potentially buying teams much greater test assurance with surprisingly little extra code.

But they can come at the price of readability. So if we’re going to use them, we need to invest some care in making sure it’s easy to understand what the parameter data means, and to ensure that the messages we get when tests fail are meaningful.

Some testing frameworks make it harder than others, but I’m going to illustrate using some mocha tests in JavaScript.

Consider this test code for a Mars Rover:

it("turns right from N to E", () => {
let rover = {facing: "N"};
rover = go(rover, "R");
assert.equal(rover.facing, "E");
})
it("turns right from E to S", () => {
let rover = {facing: "E"};
rover = go(rover, "R");
assert.equal(rover.facing, "S");
})
it("turns right from S to W", () => {
let rover = {facing: "S"};
rover = go(rover, "R");
assert.equal(rover.facing, "W");
})
it("turns right from W to N", () => {
let rover = {facing: "W"};
rover = go(rover, "R");
assert.equal(rover.facing, "N");
})
view raw rover_test.js hosted with ❤ by GitHub

These four tests are different examples of the same behaviour, and there’s a lot of duplication (I should know – I copied and pasted them myself!)

We can consolidate them into a single parameterised test:

[{input: "N", expected: "E"}, {input: "E", expected: "S"}, {input: "S", expected: "W"},
{input: "W", expected: "N"}].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.input};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.expected);
})
})
view raw rover_test.js hosted with ❤ by GitHub

While we’ve removed a fair amount of duplicate test code, arguably this single parameterized test is harder to follow – both at read-time, and at run-time.

Let’s start with the parameter names. Can we make it more obvious what roles these data items play in the test, instead of just using generic names like “input” and “expected”?

[{startsFacing: "N", endsFacing: "E"}, {startsFacing: "E", endsFacing: "S"}, {startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

And how about we format the list of test cases so they’re easier to distinguish?

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function (testCase) {
it("turns right", () => {
let rover = {facing: testCase.startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, testCase.endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

And how about we declutter the body of the test a little by destructuring the testCase object?

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function ({startsFacing, endsFacing}) {
it("turns right", () => {
let rover = {facing: startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

Okay, hopefully this is much easier to follow. But what happens when we run these tests?

It’s not at all clear which test case is which. So let’s embed some identifying data inside the test name.

[
{startsFacing: "N", endsFacing: "E"},
{startsFacing: "E", endsFacing: "S"},
{startsFacing: "S", endsFacing: "W"},
{startsFacing: "W", endsFacing: "N"}
].forEach(
function ({startsFacing, endsFacing}) {
it(`turns right from ${startsFacing} to ${endsFacing}`, () => {
let rover = {facing: startsFacing};
rover = go(rover, "R");
assert.equal(rover.facing, endsFacing);
})
})
view raw rover_test.js hosted with ❤ by GitHub

Now when we run the tests, we can easily identify which test case is which.

With a bit of extra care, it’s possible with most unit testing tools – not all, sadly – to have our cake and eat it with readable parameterized tests.