Automated Tests Aren’t Just For The Long-Term

Something I hear worryingly often still is teams – especially managers – saying “Oh, we don’t need to automate our tests because there’s only going to be one release.”

The perceived wisdom is that investing in fast-running automated tests is only worth it if the software’s going to have a long lifespan, with many subsequent releases. (This is a sentiment often expressed about code craft in general.)

The assumption is that fast-running unit tests have less – or zero – value in the short-to-medium term. But this is easily disproved.

Ask ourselves what we need fast-running tests for in the first place? To guard against regressions when we change the code. The inexperienced team or manager might argue that “we won’t be changing the code, because there’s only going to to be one release”.

Analysis by GitLab’s data sciences team clearly shows that code churn – when classified as code that changes within 2-3 weeks of being checked in – for the average team runs at about 25%. An average team of, say, four developers might check in 10,000 LOC on a 12-week release schedule. 2,500 lines of that code will change within 2-3 weeks. That’s a lot of changes.

And that’s normal. Expect it.

This is before we take into account the many changes a programmer will make to code before they check it in. If only tested my code when it was time to check it in, I think I’d really struggle.

It’s a question of batch size. If I make one change and then re-test, and I’ve broken something, it’s much, much easier to pinpoint what’s gone wrong. And it’s way, way easier to get back to code that works. If I make 100 changes and re-test, I’m probably going to end up knee-deep in the debugger and up to me neck in print statements, and reverting to the last working copy means losing a tonne of work.

So I test pretty much continuously, and find even on relatively small projects that my hide gets saved multiple times by having these tests.

Change is much easier with fast-running tests, and change is a normal part of delivery.

And then there’s the whole question of whether it really will be the only release of the software. Experience has taught me that if software gets used, it gets changed. The only one-shot deals I’ve experienced in harumpty-twelve years of writing software have been the unsuccessful ones.

Imagine we’re asked to dig out an underground shelter for our customer. They tell us they need a chamber 8 ft x 8 ft x 6 ft – big enough for a bed – and we dutifully start digging. Usually, we would put up wooden supports as we dig, to stop the chamber from caving in. “No need”, says the customer. “It’s only one room, and we’ll only use it once.”

So, we don’t put in any supports. And that makes completing the chamber harder, because it keeps caving in due to the vibrations of our ongoing excavations. For every cubic metre of dirt we excavate, we end up digging out another half a cubic metre from the cave-ins. But we get there in the end, and the customer pays us our money and moves their bed in.

Next week, we get a phone call. “Where do we keep our food supplies?” Turns out, they’ll need another room. Would they like us to put supports up in the main chamber before we start digging again? “No time! We need our food store ASAP.” Okey dokey. We start digging gain, and the existing chamber starts caving in again, but we dig out the loose earth and carry on as best we can. We manage to get the food store done, but with a lot more work this time, because both spaces keep caving in, and we keep having to dig them out again and again, recreating spaces we’d already excavated several times.

The customer moves in their food supplies, but their elderly mother now refuses to go into the shelter because she’s not sure it’s safe.

A week later: “Oh hi. Er. Where do we go to the bathroom?” Work begins on a third chamber. Would they like us to put supports in to the other two chambers first? “No. Need a bathroom ASAP!!!” they exclaim with a rather pained expression. So we dig and dig and dig, now so tired that we barely notice that most of the space we’re excavating has been excavated before, and most of the earth we’re removing has been coming from the ceilings of the existing chambers as well as from the new bathroom.

This is what it is to work without fast-running tests. Even on small, one-shot deals of just a few days, regressions can become a major expense, quickly outweighing the cost of writing tests in the first place.

When Should We Do Code Reviews?

One question that I get asked often is “When is the best time to do code reviews?” My pithy answer is: now. And now. And now. Yep, and now.

Typically, teams batch up a whole bunch of design decisions for a review – for example, in a pull request. If we’ve learned anything about writing good software, it’s that the bigger the batch, the more slips through the quality control net.

Releasing 50 features at a time, every 12 months, means we tend to bring less focus to testing each feature to see if it’s what the customer really needs. Releasing one feature at a time allows us to really focus in on that feature, see how it gets used, see how users respond to it.

Reviewing 50 code changes at a time gives similarly woolly results. A tonne of code smells tend to make it into production. Reviewing a handful of code changes – or, ideally, just one – at a time brings much more focus to each change.

Unsurprisingly, teams who review code continuously, working in rapid feedback cycles (e.g., doing TDD) tend to produce cleaner code – code that’s easier to understand, simpler, has less duplication and more loosely-coupled modules. (We’ve measured this – for example in this BBC TDD case study.)

One theory about why TDD tends to produce cleaner code is that the short feedback loops – “micro-cycles” – bring much more focus to every design decision. TDD deliberately has a step built in to each micro-cycle to stop, look at the code we just wrote or changed, and refactor if necessary. I strongly encourage developers not to waste this opportunity. The Green Light is our signal to do a mini code-review on the work we just did.

I’ve found, through working with many teams, that the most effective code reviews are rigorous and methodical. Check all the code that changed, and check for a list of potential code quality issues every single time. Don’t just look at the code to see if it “looks okay” to you.

In the Codemanship TDD course, I ask developers to run through a check list on every green light:

  • Is the code easy to understand? (Not sure? Ask someone else.)
  • Is there obvious duplication?
  • Is each method or function and class or module as simple as it could be?
  • Do any methods/functions or classes/modules have more than one responsibility?
  • Can you see any Feature Envy – where a method/function (or part of a method/function) of one class/module depends on multiple features of another class/module?
  • Are a class’s/module’s dependencies easily swappable?
  • Is the class/module exposed to things it isn’t using (e.g., methods of a C++ interface it doesn’t call, or unused imports from other modules)?

You may, according to your needs and your team’s coding standards, have a different checklist. What seems to make the difference is that your team has a checklist, and that you are in the habit of applying it whenever you have the luxury of working code.

This is where the relationship exists between code review and Continuous Delivery. If our code isn’t working , it isn’t shippable. If you go for hours at a time with failing automated tests (or no testing at all), code review is a luxury. Your top priority’s to get it working – that’s the most important quality of any software design. If it doesn’t work, and you can’t deploy it, then whether or not there are any, say, long parameter lists in it is rather academic.

Now, I appreciate that stopping on every passing test and going through a checklist for all the code you changed may sound like a real drag. But, once upon a time, writing a unit test, writing the test assertion first and working backwards, remembering to see the test fail, and all the the habits of effective TDD felt like a bit of a chore. Until I’d done them 10,000 times. And then I stopped noticing that I was doing them.

The same goes for code review checklists. The more we apply them, the more it becomes “muscle memory”. After a year or two, you’ll develop an intuitive sense of code quality – problems will tend to leap out at you when you look at code, just as one bum note in an entire orchestra might leap out at a conductor with years of listening experience and ear training. You can train your eyes to notice code smells like long methods, large classes, divergent change, feature envy, primitive obsession, data clumps and all the other things that can make code harder to change.

This is another reason why I encourage very frequent code reviews. If you were training your musical ear, one practice session every couple of weeks is going to be far less effective than 20 smaller practice sessions a day. And if each practice session is much more focused – i.e., we don’t ear-train musicians with whole symphonies – then that, too, will speed up the learning curve.

The other very important reason I encourage continuous code review is that when we batch them up, we also tend to end up with batches of remedial actions to rectify any problems. If I add a branch to a method, review that, and decide that method is now too logically complex, fixing it there and then is a doddle.

If I make 50 boo-boos like that, not only will an after-the-horse-has-bolted code review probably miss many of those 50 issues, but the resulting TO-DO list is likely to require an amount of time and effort that will make it a task that has to be scheduled – very possibly by someone who doesn’t understand the need to do them. In the zero-sum game of software development scheduling, the most common result is that the work never gets done.

 

The Hidden Cost of “Dependency Drag”

 

16736646645_f4cfd8f770_b
The mysterious Sailing Stones of Death Valley are moved by some unseen natural force.

When I demonstrate mutation testing, I try to do it in the programming language my audience uses day-to-day. In most of the popular programming languages, there’s a usable, current mutation testing tool available. But for a long time, the .NET platform had none. That’s not to say there were never any decent mutation testing tools for .NET programs. There’s been several. But they had all fallen by the wayside.

Here’s the thing: some community-spirited developer kindly creates a mutation testing tool we can all use. That’s a sizable effort for no financial reward. But still they write it. It works. Folk are using it. And there’s no real need to add to it. Job done.

Then, one day, you try to use it with the new version of the unit testing tool you’ve been working with, and – mysteriously – it stops working. Like the Sailing Stones of Death Valley, the mutation testing is inexplicably 100 metres from where you left it, and to get it working again it has to be dragged back to its original position.

This is the hidden cost of a force I might call Dependency Drag. I see it all the time: developers forced to maintain software products that aren’t changing, but that are getting out of step with the environment in which they run, which is constantly changing under their feet.

GitHub – and older OSS repositories – is littered with the sun-bleached skeletons of code bases that got so out of step they simply stopped working, and maintainers didn’t want to waste any more time keeping them operational. Too much effort just to stand still.

Most of us don’t see Dependency Drag, because it’s usually hidden within an overall maintenance effort on a changing product. And the effect is usually slow enough that it looks like the stones aren’t actually moving.

But try and use some code that was written 5 years ago, 10 years ago, 20 years ago, if it hasn’t been maintained, and you’ll see it. The stones are a long way from where you left them.

This effect can include hardware, of course. I hang on to my old 3D TV so that I can play my 3D Blu-rays. One day, that TV will break down. Maybe I’ll be able to find another one on eBay. But 10 years from now? 20 years from now? My non-biodegradable discs may last centuries if kept safe. But it’s unlikely there’ll be anything to play them on 300 years from now.

This is why it will become increasingly necessary to preserve the execution environments of programs as well as the programs themselves. It’s no use preserving the 1960s Fortran compiler if you don’t have the 1960s computer and operating system and punch card reader it needs to work.

And as execution environments get exponentially more complex, the cost of Dependency Drag will multiply.

 

Architects – Hang Up Your Capes & Go Back To The Code

Software architecture is often framed as a positive career move for a developer. Organisations tend to promote their strongest technical people into these strategic and supervisory roles. The pay is better, so the lure is obvious.

I progressed into lead architecture roles in the early 00s, having “earned my spurs” as a developer and then tech lead in the 1990s. But I came to realise that, from my ivory tower, I was having less and less influence over the code that got written, and therefore less and less influence over the actual design and architecture of the software.

I could draw as many boxes and arrows as I liked, give as many PowerPoint presentations as I liked, write as many architecture and standards documents as I liked: none of it made much difference. It was like to trying to direct traffic using my mind.

So I hung up my shiny architect cape and pointy architect wizard hat and went back to working directly with developers on real code as part of the team.

Instead of decreeing “Thou shalt…”, I could – as part of a programming pair (and a programming mob, which was quite the thing with me) – instead suggest “Maybe we could…” and then take the keyboard and demonstrate what I meant. On the actual code. That actually got checked in and ended up in the actual product, instead of just in a Word document nobody ever reads.

The breakthrough for me was realising that “big design decisions” vs “small design decisions” was an artificial distinction. Most architecture decisions are about dependencies: what uses what? And “big” software dependencies – microservice A uses microservice B, for example – can be traced to “small” design decisions – a class in microservice A uses a class in microservice B – which can be traced to even “smaller” design decisions – a line of code in the class in microservice A needs a data value from the class in microservice B.

The “big” architecture decisions start in the code. And the code is full of tiny design decisions that have the potential to become “big”. And drawing an arrow pointing from a box labeled “Microservice A” to a box labeled “Microservice B” doesn’t solve the problems.

Try as we might to dictate the components, their roles and their and dependencies in a system up-front, the reality often deviates wildy from what the architect planned. This is how “layered architectures” – the work of the devil – permeated software architecture for so long, despite it being a complete falsehood that they “separate concerns”. (Spoiler Alert: they don’t.)

Don’t get me wrong: I’m all for visualisation and for a bit of up-front planning when it comes to software design. But sooner rather than later, we have to connect with the reality as the code emerges and evolves. And the most valuable service a software architect can offer to a dev team is to be right there with them fighting the complexity and the dependencies – and helping them to make sense of it all – on the front line.

You can offer more value in the long term by mentoring developers and helping them to reason about design and ultimately make better design decisions – “big” or “small” – than attempting to direct the whole effort from 30,000 ft.

Plus, it seems utter folly to me to take your most experienced developers and promote them away from the thing you believe they do well. (And paying them more to have less impact just adds insult to injury.)

 

Classes Start With Functions, Not Data

A common mistake developers make when designing classes is to start with a data model in mind and then try to attach functions to that data (e.g., a Zoo has a Keeper, who has a first name and a last name, etc). This data-centred view of classes tends to lead us towards anaemic models, where classes are nothing more than data containers and the logic that uses the data is distributed throughout the system. This lack of encapsulation creates huge amounts of low-level coupling.

Try instead to start with the function you need, and see what data it requires. This can be illustrated with a bit of TDD. In this example, we want to buy a CD. I start by writing the buy function, without any class to hang that on.

The parameters for buy() tell us what data this function needs. If we want to encapsulate some of that data, so that clients don’t need to know about all of them, we can introduce a parameter object to group related params.

This has greatly simplified the signature of the buy() function, and we can easily move buy() to the cd parameter.

Inside the new CompactDisc class…

We have a bunch of getters we don’t need any more. Let’s inline them.

Now, you may argue that you would have come up with this data model for a CD anyway. Maybe. But the point is that the data model is specifically there to support buying a CD.

When we start with the data, there’s a greater risk of ending up with the wrong data (e.g., many devs who try this exercise start by asking “What can we know about a CD?” and give it fields the functions don’t use), or with the right data in the wrong place – which is where we end up with Feature Envy and message chains and other coupling code smells galore.

Refactoring to Functions

While I’ve been porting the Codemanship Software Design Principles code examples to JavaScript – in both OO and FP styles – I’ve been thinking a lot about the relationship between those two programming styles.

Possibly the best way to illustrate might be to refactor an object oriented code example into a functional example that’s logically equivalent. This might also serve to illustrate how we might move from one style to the other in a disciplined way, without breaking the code.

This is the simple class I’m going to start with.

And these are its tests.

The first refactoring step might be to make each method of the class properly stateless (i.e., they don’t reference any fields).

To achieve this, we’ll have to add a parameter to each method that accepts an instance of BankAccount. Then we replace this with a reference to that parameter. This will work if the BankAccount we pass in is the exact same object this refers to.

So, in our tests, we pass in the BankAccount object we were invoking credit() and debit() on.

Now we can pull these instance methods out of BankAccount and turn them into global functions.

The tests can now invoke them directly.

One last piece of business: the BankAccount data object. We can replace it in two steps. First, let’s use a JSON version instead that matches the schema credit() and debit() expected. To make this the smallest change possible (so we don’t have to re-write those functions yet), let’s make them mutable.

Then we can re-write credit() and debit() to return mutated copies.

This will require us to re-write the tests to use the mutated copies.

So, there you have it: from OO to FP (well, functional-ish, maybe) for a simple class with no collaborators. In the next post, I’ll refactor some a code example that involves several related classes so we can examine the relationshi between dependency injection and high-order functions.

 

S.O.L.I.D. JavaScript – OO Version

A little while back I wrote a post on the old blog about how we could apply the same design principles – near enough – to functional programming as we might to object oriented programming, using JavaScript examples.

That encouraged a couple of people to get in touch saying “But we don’t do FP in JavaScript!”, and suggesting therefore that – strangely – these principles don’t apply to them. The mind boggles.

But, for completeness, here’s how I might apply S.O.L.I.D. principles to OO JavaScript code. To make things backwards compatible, I’ve not used the class syntax of later versions of JS.

First of all, the big tomale: swappable dependencies (Dependency Inversion).

Consider this snippet of code for a simplistic shopping basket:

The problem here is what happens if we want to change the way we process payments? Maybe we don’t want to use PayPal any more, for example. Or what if we don’t want to use a real payment processor in a unit test? In this design, we’d have to change the Basket class. That breaks the Open-Closed Principle of SOLID (classes should be open to extension, but closed for modification.)

If we inject the payment processor, then it becomes easy to swap the implementation for whatever purpose (in this example, to stub the processor for a test.)

And there we have it: three fifths of SOLID is about making dependencies swappable – Open-Closed, Liskov Substitution and Dependency Inversion. (or “OLD”, if you like.)

And can we agree classes should have a Single Responsibility? That’s not really an OO principle. The same’s true of functions and modules and microservices and any other discrete unit of executable code.

Finally, the Interface Segregation Principle: classes should present client-specific interfaces. That is, interfaces should only include the methods a client uses. With duck typing, it doesn’t really matter of a class presents methods a client doesn’t use. This is true whether we’re talking about methods of classes, or functions in modules.

It might help to make the code easier to understand of we document protocols by explicitly defining pure abstract classes that describe what methods any implementation would need to support. But it’s not necessary for our code to compile and run.

But, as with the functional examples I used, there is a case for saying that modules shouldn’t reference implementations they’re not using. Let’s suppose that after I refactored my Basket to use dependency injection, I forgot to remove the import for PayPalPayments:

It’s important to remember to clean up your imports regularly to avoid situations where changes to things we don’t use could break our code.

So, the sum up: the same principles apply in JavaScript regardless of whether you’re doing FP or OOP.

No excuses!