The Software Design Process

One thing that sadly rarely gets discussed these days is how we design software. That is, how we get from a concept to working code.

As a student (and teacher) of software design and architecture of many years, experiencing first-hand many different methodologies from rigorous to ad hoc, heavyweight to agile, I can see similarities between all effective approaches.

Whether you’re UML-ing or BDD-ing or Event Storming-ing your designs, when it works, the thought process is the same.

It starts with a goal.

This – more often than not – is a problem that our customer needs solving.

This, of course, is where most teams get the design thinking wrong. They don’t start with a goal – or if they do, most of the team aren’t involved at that point, and subsequently are not made aware of what the original goal or problem was. They’re just handed a list of features and told “build that”, with no real idea what it’s for.

But they should start with a goal.

In design workshops, I encourage teams to articulate the goal as a single, simple problem statement. e.g.,

It’s really hard to find good vegan takeaway in my area.

Jason Gorman, just now

Our goal is to make it easier to order vegan takeaway food. This, naturally, begs the question: how hard is it to order vegan takeaway today?

If our target customer area is Greater London, then at this point we need to hit the proverbial streets and collect data to help us answer that question. Perhaps we could pick some random locations – N, E, S and W London – and try to order vegan takeaway using existing solutions, like Google Maps, Deliveroo and even the Yellow Pages.

Our data set gives us some numbers. On average, it took 47 minutes to find a takeaway restaurant with decent vegan options. They were, on average, 5.2 miles from the random delivery address. The orders took a further 52 minutes to be delivered. In 19% of selected delivery addresses, we were unable to order vegan takeaway at all.

What I’ve just done there is apply a simple thought process known as Goal-Question-Metric.

We ask ourselves, which of these do we think we could improve on with a software solution? I’m not at all convinced software would make the restaurants cook the food faster. Nor will it make the traffic in London less of an obstacle, so delivery times are unlikely to speed up much.

But if our data suggested that to find a vegan menu from a restaurant that will deliver to our address we had to search a bunch of different sources – including telephone directories – then I think that’s something we could improve on. It hints strongly that lack of vegan options isn’t the problem, just the ease of finding them.

A single searchable list of all takeaway restaurants with decent vegan options in Greater London might speed up our search. Note that word: MIGHT.

I’ve long advocated that software specifications be called “theories”, not “solutions”. We believe that if we had a searchable list of all those restaurants we had to look in multiple directories for, that would make the search much quicker, and potentially reduce the incidences when no option was found.

Importantly, we can compare the before and the after – using the examples we pulled from the real world – to see if our solution actually does improve search times and hit rates.

Yes. Tests. We like tests.

Think about it; we describe our modern development processes as iterative. But what does that really mean? To me – a physics graduate – it implies a goal-seeking process that applies a process over and over to an input, the output of which is fed into the next cycle, which converges on a stable working solution.

Importantly, if there’s no goal, and/or no way of knowing if the goal’s been achieved, then the process doesn’t work. The wheels are turning, the engine’s revving, but we ain’t going anywhere in particular.

Now, be honest, when have you ever been involved in a design process that started like that? But this is where good design starts: with a goal.

So, we have a goal – articulated in a testable way, importantly. What next?

Next, we imaginate (or is it visionize? I can never keep up with the management-speak) a feature – a proverbial button the user clicks – that solves their problem. What does it do?

Don’t think about how it works. Just focus on visualifying (I’m getting the hang of this now) what happens when the user clicks that magical button.

In our case, we imagine that when the user clicks the Big Magic Button of Destiny, they’re shown a list of takeaway restaurants with a decent vegan menu who can deliver to their address within a specified time (e.g., 45 minutes).

That’s our headline feature. A headline feature is the feature that solves the customer’s problem, and – therefore – is the reason for the system to exist. No, “Login” is never a headline feature. Nobody uses software because they want to log in.

Now we have a testable goal and a headline feature that solves the customer’s problem. It’s time to think about how that headline feature could work.

We would need a complete list of takeaway restaurants with decent vegan menus within any potential delivery address in our target area of Greater London.

We would need to know how long it might take to deliver from each restaurant to the customer’s address.

This would include knowing if the restaurant is still taking orders at that time.

Our headline feature will require other features to make it work. I call these supporting features. They exist only because of the headline feature – the one that solves the problem. The customer doesn’t want a database. They want vegan takeaway, damn it!

Our simple system will need a way to add restaurants to the list. It will need a way to estimate delivery times (including food preparation) between restaurant and customer addresses – and this may change (e.g., during busy times). It will need a way for restaurants to indicate if they’re accepting orders in real time.

At this point, you may be envisaging some fancypants Uber Eats style of solution with whizzy maps showing delivery drivers aimlessly circling your street for 10 minutes because nobody reads the damn instructions these days. Grrr.

But it ain’t necessarily so. This early on in the design process is no time for whizzy. Whizzy comes later. If ever. Remember, we’re setting out here to solve a problem, not build a whizzy solution.

I’ve seen some very high-profile applications go live with data entry interfaces knocked together in MS Access for that first simple release, for example. Remember, this isn’t a system for adding restaurant listings. This is a system for finding vegan takeaway. The headline feature’s always front-and-centre – our highest priority.

Also remember, we don’t know if this solution is actually going to solve the problem. The sooner we can test that, the sooner we can start iterating towards something better. And the simpler the solution, the sooner we can put it in the hands of end users. Let’s face it, there’s a bit of smoke and mirrors to even the most mature software solutions. We should know; we’ve looked behind the curtain and we know there’s no actual Wizard.

Once we’re talking about features like “Search for takeaway”, we should be in familiar territory. But even here, far too many teams don’t really grok how to get from a feature to working code.

But this thought process should be ingrained in every developer. Sing along if you know the words:

  • Who is the user and what do they want to do?
  • What jobs does the software need to do to give them that?
  • What data is required to do those jobs?
  • How can the work and the data be packaged together (e.g., in classes)
  • How will those modules talk to each other to coordinate the work end-to-end?

This is the essence of high-level modular software design. The syntax may vary (classes, modules, components, services, microservices, lambdas), but the thinking is the same. The user has needs (find vegan takeaway nearby). The software does work to satisfy those needs (e.g., estimate travel time). That work involves data (e.g., the addresses of restaurant and customer). Work and data can be packaged into discrete modules (e.g., DeliveryTimeEstimator). Those modules will need to call other modules to do related work (e.g., address.asLatLong()), and will therefore need “line of sight” – otherwise known as a dependency – to send that message.

You can capture this in a multitude of different ways – Class-Responsibility-Collaboration (CRC) cards, UML sequence diagrams… heck, embroider it on a tapestry for all I care. The thought process is the same.

This birds-eye view of the modules, their responsibilities and their dependencies needs to be translated into whichever technology you’ve selected to build this with. Maybe the modules are Java classes. Maybe their AWS lambdas. Maybe they’re COBOL programs.

Here we should be in writing code mode. I’ve found that if your on-paper (or on tapestry, if you chose that route) design thinking goes into detail, then it’s adding no value. Code is for details.

Start writing automated tests. Now that really should be familiar territory for every dev team.

/ sigh /

The design thinking never stops, though. For one, remember that everything so far is a theory. As we get our hands dirty in the details, our high-level design is likely to change. The best laid plans of mice and architects…

And, as the code emerges one test at a time, there’s more we need to think about. Our primary goal is to build something that solves the customer’s problem. But there are secondary goals – for example, how easy it will be to change this code when we inevitably learn that it didn’t solve the problem (or when the problem changes).

Most kitchen designs you can cater a dinner party in. But not every kitchen is easy to change.

It’s vital to remember that this is an iterative process. It only works if we can go around again. And again. And again. So organising our code in a way that makes it easy to change is super-important.

Enter stage left: refactoring.

Half the design decisions we make will be made after we’ve written the code that does the job. We may realise that a function or method is too big or too complicated and break it down. We may realise that names we’ve chosen make the code hard to understand, and rename. We may see duplication that could be generalised into a single, reusable abstraction.

Rule of thumb: if your high-level design includes abstractions (e.g., interfaces, design patterns, etc), you’ve detailed too early.

Jason Gorman, probably on a Thursday

The need for abstractions emerges organically as the code grows, through the process of reviewing and refactoring that code. We don’t plan to use factories or the strategy pattern, or to have a Vendor interface, in our solution. We discover the need for them to solve problems of software maintainability.

By applying organising principles like Simple Design, D.R.Y. Tell, Don’t Ask, Single Responsibility and the rest to the code is it grows, good, maintainable modular designs will emerge – often in unexpected ways. Let go of your planned architecture, and let the code guide you. Face it, it was going to be wrong anyway. Trust me: I know.

Here’s another place that far too many teams go wrong. As your code grows and an architecture emerges, it’s very, very helpful to maintain a birds-eye view of what that emerging architecture is becoming. Ongoing visualisation of the software – its modules, patterns, dependencies and so on – is something surprisingly few teams do these days. Working on agile teams, I’ve invested some of my time to creating and maintaining these maps of the actual terrain and displaying them prominently in the team’s area – domain models, UX storyboards, key patterns we’ve applied (e.g., how have we done MVC?) You’d be amazed what gets missed when everyone’s buried in code, neck-deep in details, and nobody’s keeping an eye on the bigger picture. This, regrettably, is becoming a lost skill – the baby Agile threw out with the bathwater.

So we build our theoretical solution, and deliver it to end users to try. And this is where the design process really starts.

Until working code meets the real world, it’s all guesswork at best. We may learn that some of the restaurants are actually using dairy products in the preparation of their “vegan” dishes. Those naughty people! We may discover that different customers have very different ideas about what a “decent vegan menu” looks like. We may learn that our estimated delivery times are wildly inaccurate because restaurants tell fibs to get more orders. We may get hundreds of spoof orders from teenagers messing with the app from the other side of the world.

Here’s my point: once the system hits the real world, whatever we thought was going to happen almost certainly won’t. There are always lessons that can only be learned by trying it for real.

So we go again. And that is the true essence of software design.

When are we done? When we’ve solved the problem.

And then we move on to the next problem. (e.g., “Yeah, vegan food’s great, but what about vegan booze?”)

Refactoring Complex Conditionals To Conditional Look-ups

In a previous post, I demonstrated how you could refactor IF statements with straight x == y conditions to lambda maps. This is useful when we need to retain codes – literal values – and look up the appropriate action for each. (Think, for example, of the Mars Rover kata, where our rover accepts strings of characters representing instructions to turn ‘L’ and ‘R’ or move ‘F’ and ‘B’).

But what happens when the conditions aren’t straightforward? Consider this example that calculates postage for orders based on where the order is to be shipped, the weight of the order and the total price.

Could we refactor this to something equally dynamic as a lambda map? Of course we could. All it takes is a loop.

Using this approach, we can recursively compose conditional logic in a dynamic fashion, and with the judicious use of spacing, we can have our mappings read in a more tabular fashion. It also opens up the possibility of dynamically altering the rules – perhaps from a file, or based on some condition or triggered by an event (e.g., a promotional sale that removes postage on more orders).

 

 

New Refactoring: Replace Conditional With Lambda Map

There are tried-and-tested routes to replacing conditional statements that effectively map identities (e.g., if(x == “UK shipping”) ) to polymorphic implementations of what to do when an identity is determined (e.g. create a Shipping interface and then have a UKShipping class that knows what to do in that situation and pass that in to the method).

But sometimes I find, when the literal that represents the identity has to be preserved (for example, if it’s part of an API) that mapping identities to actions works better.

In these instances, I have found myself converting my conditionals to a map or dictionary instead. Each identity is mapped to a lambda expression that can be looked up and then executed.

Take this example of a function that scores games of Rock, Paper, Scissors in JavaScript:

The literals “rock”, “paper” and “scissors” have to be preserved because we have a web service that accepts those parameter values from remote players. (This is very similar to the Mars Rover kata in that respect, where R, L, F and B are inputs.)

First, I’d remove some obvious inner duplication in each outer IF statement.

Now let’s replace those 3 IF statements that map actions to “rock”, “paper” and “scissors” into an actual map.

If we take a look inside playHand(), we have an inner conditional.

This could also be replaced with a lambda map.

Note that I had to add a draws identity so that the mapping is complete. I’d also have to do this for any default case in a conditional (I suppose draws is the default case here, as nothing happens when there’s a draw – an empty lambda).

For Distributed Teams, Code Craft is Critical

Right now, most software teams all around the world are working from home. Many have not done it before, and are on a learning curve that means last week’s productivity won’t be returning for a while.

I’ve worked on distributed teams many times, and – through Codemanship – trained and mentored dozens of teams remotely. One thing I’ve learned from all that remote development experience is that coding discipline becomes super-important.

Just as distributed systems amplify every design flaw, turning what would be a headache in a monolith into a major outbreak in a service-oriented architecture, distributed working amplifies team dysfunctions as the communication pathways take on extra weight.

Here’s how code craft can help:

  • Unit tests – keeping the software working is Distributed Dev Team 101. Open Source projects rely on suites of fast-running tests to protect against check-ins that break the code.
  • Continuous Integration – is how distributed teams communicate their changes to each other. Co-located teams should merge their changes to the master branch often and be build aware, keeping one eye on other people’s merges to see what’s changed. But it’s much easier on co-located teams to keep everyone in step because we can see and talk to each other about the changes we’re making. If remote developers do infrequent large merges, integration hell gets amplified tenfold by the extra communication barriers.
  • Test-Driven Development – a lot of the communication between developers, and between developers and their customers, can be handwavy and vague. And if communication is easy – like on a co-located team – we just go around a few more times until we converge on what’s required. But when communication is harder, like in distributed teams, a few more goes around gets very expensive. Using executable tests as specifications removes the ambiguity. It should do exactly this. Also, TDD – done well – produces suites of useful, fast-running automated tests. It’s a win-win.
  • Design Principles – Well-factored code is very important to co-located teams, and super-duper-important to distributed teams. Let’s count the ways:
    • Simple Design
      • Code should work – if it don’t work, we can’t ship it. Any changes that break the code block the team. It’s a big deal on a co-located team, but it’s a really big deal on a distributed team.
      • Code should clearly communicate its intent – code should speak for itself, and when developers are working remotely, and communicating requires extra effort, this is especially true. The easier code is to understand, the less teleconferences required to understand it.
      • Code should be free of duplication – so much duplication in software is duplication of concepts. This often occurs when developers on teams work in isolation, unaware that someone else has already added a module that does what their module also does. Devs need to be aware of duplication in the code – Continuous Integration and merge awareness helps – and clued up to when they should refactor it and when they should leave it alone.
      • Code should be as simple as we can make it – every line of code that has to be maintained as another straw on the camel’s back. When the camel’s back stretches between multiple locations – possibly in multiple time zones – the impact of every additional straw is felt many-fold.
    • Modular Design
      • Modules should do one job – the ability to change the behaviour of a system by just editing one module is critical to a team’s ability to make the changes they need without treading on the toes of other developers. On distributed teams, multiple developers all making changes to one module for multiple reasons can lead to some spectacular merge train wrecks.
      • Modules should hide their internal workings – the more modules are coupled to each other, the bigger and wider the impact of even the smallest changes will be felt. Imagine your distributed team is working precariously balanced on high wires that are all interconnected. What you don’t want is for one person to start violently shaking their wire, sending ripples throughout the network. Or it could all come tumbling down. Again, it’s bad on co-located teams, but it’s Double-Plus-Triple-Word-Score-Bad on distributed teams. Ever dependency can bring pain.
      • Modules should not depend directly on implementations of other modules – it’s good architecture generally for modules not to bind directly to implementations of the other modules they use, for a variety of reasons. But it’s especially important when teams aren’t co-located. Taken together, the first three principles of modular design are better known as “Separation of Concerns”. Or, as I like to call it, the Principle of Somebody Else’s Problem. If my module needs to send an email, I shouldn’t need to know how emails are actually sent – all that detail should be hidden from me – and I should be able to work on my code without having to actually send emails when I test it. Sending emails is somebody else’s problem. It’s particularly useful in a test-driven approach to design to be able to write a test for code that has external dependencies – things it uses that other developers are working on – without actually binding directly to the implementation of that external component so that we can swap in a test double that pretends to do that job. That’s how you scale TDD. That’s how you make TDD work in distributed teams, too.
      • Module interfaces should be designed from the client’s point of view – tied together with TDD, we can specify modules very precisely from the outside: this is what it should look like (interface) and this is what it should do (tests). Imagine your distributed team is making a jigsaw: the hard way to do it is to have each person go off and make a piece of the jigsaw and then hope that they all fit together at the end. The smart way to do it is to define the shapes of the pieces as parts of the whole puzzle, and then have people implement the pieces based in the interfaces and tests agreed. You do this by designing systems from the outside in, defining modules by how they will be used from the client code’s POV. This also helps to restrict public interfaces to only what client’s need to see, hiding internal details, improving encapsulation and reducing coupling. Coupling on distributed teams can be very, very expensive.
    • Refactoring – the still-rather-too-rare discipline of reshaping code without breaking the software is the means by which we achieve good design. Try as we might to never write code that’s hard to understand, or has duplication, or is overly complex, or too tightly coupled, we’ll always need to clean up our code as we go. If the impact of poor design is amplified on distributed teams, the importance of refactoring must be proportionally amplified. The alternative is relying on after-the-fact code reviews (e.g., in GitFlow), which will become multiple times the bottleneck they already were when your team was co-located and you could just pop over to Mary’s desk and ask.

Underpinning all of this is a need for levels of delivery process automation – automated testing, automated builds, automated deployments, automated code reviews – that the majority of teams are nowhere near.

And then there’s the interpersonal: the communication, the coordination, the planning and tracking, the collaborative design. It takes a big investment to make a distributed Agile team as productive as a co-located team.

All the Jiras and GitHubs and cloud-based build pipelines and remote whiteboards and shared IDEs and Zoom meetings in the world won’t save you if the code craft isn’t up to snuff, though. It’s foundational to delivering as a distributed team.

If you want to know more about code craft, visit www.codemanship.com

 

Test-Driven Development in JavaScript

I’m in the process of redesigning elements of the Codemanship training workshops, and I’ve been spit-balling new demos in JavaScript on TDD. Rather than taking copious notes, I’ve recorded screencasts of these demos so I can refer back and see what I actually did in each one.

I thought it might be useful to post these screencasts online, so if you’re a JS developer – or have ambitions to be one (TDD is a sought-after skill) – here they are.

I’ve strived for each demonstration to make three key points to remember.

#1 – The 3 Steps of TDD

  • Start by writing a test that fails
  • Write the simplest code to pass the test
  • Refactor to make changing the code easier

 

#2 – Assert First & Useful Tests

  • Write the test assertion first and work backwards to the setup
  • See the test fail before you make it pass
  • Tests should only have one reason to fail

 

#3 – What Should We Test?

  • List your tests
  • Test meaningful behaviour and let those tests drive design details, not the other way around
  • When the implementation is obvious, just write it

 

#4 – Duplication & The Rule of Three

  • Removing duplication to reveal abstractions
  • The Rule Of Three
  • When to leave duplicate code in

 

#5 – Part I – Inside-Out TDD

  • Advantage: tests pinpoint failures better in the stack
  • Drawbacks
    • Risk the pieces don’t fit together
    • Tests are coupled closely to internal design

 

#5 – Part II – Outside-In TDD

  • Advantages
    • Pieces guaranteed to fit together
    • Test code more decoupled from internal design
  • Disadvantage: tests don’t pinpoint source of failure easily

 

#6 – Stubs, Mocks & Dummies

  • Writing unit tests with external dependencies using:
    • Stubs to return test data
    • Mocks to test that messages were sent
    • Dummies as placeholders so we can run the tests
  • Driving complex multi-layered designs from the outside in using stubs, mocks and dummies
    • Advantage: pieces guaranteed to fit and tests pinpoint sources of failure better
    • Risk: (not discussed in video) excessive use of test doubles un-encapsulates details of internal design, tightly coupling test code to implementation
  • More unit-testable code – achieved with dependency injection – tends to lead to more modular architectures

 

These videos are rough and ready first attempts, but I think you may find the useful as they are if you’re new to TDD.

I’ll be doing versions of these in Python soon.

Codemanship’s Code Craft Road Map

One of the goals behind my training courses is to help developers navigate all the various disciplines of what we these days call code craft.

It helps me to have a mental road map of these disciplines, refined from three decades of developing software professionally.

codecraftroadmap

When I posted this on Twitter, a couple of people got in touch to say that they find it helpful, but also that a few of the disciplines were unfamiliar to them. So I thought it might be useful to go through them and summarise what they mean.

  • Foundations – the core enabling practices of code craft
    • Unit Testing – is writing fast-running automated tests to check the logic of our code, that we can run many times a day to ensure any changes we’ve made haven’t broken the software. We currently know of no other practical way of achieving this. Slow tests cause major bottlenecks in the development process, and tend to produce less reliable code that’s more expensive to maintain. Some folk say “unit testing” to mean “tests that check a single function, or a single module”. I mean “tests that have no external dependencies (e.g., a database) and run very fast”.
    • Version Control – is seat belts for programmers. The ability to go back to a previous working version of the code provides essential safety and frees us to be bolder with our code experiments. Version Control Systems these days also enable more effective collaboration between developers working on the same code base. I still occasionally see teams editing live code together, or even emailing source files to each other. That, my friends, is the hard way.
    • Evolutionary Development – is what fast-running unit tests and version control enable. It is one or more programmers and their customers collectively solving problems together through a series of rapid releases of a working solution, getting it less wrong with each pass based on real-world feedback. It is not teams incrementally munching their way through a feature list or any other kind of detailed plan. It’s all about the feedback, which is where we learn what works and what doesn’t. There are many takes on evolutionary development. Mine starts with a testable business goal, and ends with that goal being achieved. Yours should, too. Every release is an experiment, and experiments can fail. So the ability to revert to a previous version of the code is essential. Fast-running unit tests help keep changes to code safe and affordable. If we can’t change the code easily, evolution stalls. All of the practices of code craft are designed to enable rapid and sustained evolution of working software. In short, code craft means more throws of the dice.
  • Team Craft – how developers work together to deliver software
    • Pair Programming – is two programmers working side-by-side (figuratively speaking, because sometimes they might not even be on the same continent), writing code in real time as a single unit. One types the code – the “driver” – and one provides high-level directions – the “navigator”. When we’re driving, it’s easy to miss the bigger picture. Just like on a car journey, in the days before GPS navigation. The person at the wheel needs to be concentrating on the road, so a passenger reads the map and tells them where to go. The navigator also keeps an eye out for hazards the driver may have missed. In programming terms, that could be code quality problems, missing tests, and so on – things that could make the code harder to change later. In that sense, the navigator in a programming pair acts as a kind of quality gate, catching problems the driver may not have noticed. Studies show that pair programming produces better quality code, when it’s done effectively. It’s also a great way to share knowledge within a team. One pairing partner may know, for example, useful shortcuts in their editor that the other doesn’t. If members of a team pair with each other regularly, soon enough they’ll all know those shortcuts. Teams that pair tend to learn faster. That’s why pairing is an essential component of Codemanship training and coaching. But I appreciate that many teams view pairing as “two programmers doing the work of one”, and pair programming can be a tough sell to management. I see it a different way: for me, pair programming is two programmers avoiding the rework of seven.
    • Mob Programming – sometimes, especially in the early stages of development, we need to get the whole team on the same page. I’ve been using mob programming – where the team, or a section of it, all work together in real-time on the same code (typically around a big TV or projector screen) – for nearly 20 years. I’m a fan of how it can bring forward all those discussions and disagreements about design, about the team’s approach, and about the problem domain, airing all those issues early in the process. More recently, I’ve been encouraging teams to mob instead of having team meetings. There’s only so much we can iron out sitting around a table talking. Eventually, I like to see the code. It’s striking how often debates and misunderstandings evaporate when we actually look at the real code and try our ideas for real as a group. For me, the essence of mob programming is: don’t tell me, show me. And with more brains in the room, we greatly increase the odds that someone knows the answer. It’s telling that when we do team exercises on Codemanship workshops, the teams that mob tend to complete the exercises faster than the teams who work in parallel. And, like pair programming, mobbing accelerates team learning. If you have junior or trainee developers on your team, I seriously recommend regular mobbing as well as pairing.
  • Specification By Example – is using concrete examples to drive out a precise understanding of what the customer needs the software to do. It is practiced usually at two levels of abstraction: the system, and the internal high-level design of the code.
    • Test-Driven Development – is using tests (typically internal unit tests) to evolve the internal design of a system that satisfies an external (“customer”) test. It mandates discovery of internal design in small and very frequent feedback loops, making a few design decisions in each feedback loop. In each feedback loop, we start by writing a test that fails, which describes something we need the code to do that it currently doesn’t. Then we write the simplest solution that will pass that test. Then we review the code and make any necessary improvements – e.g. to remove some duplication, or make the code easier to understand – before moving on to the next failing test. One test at a time, we flesh out a design, discovering the internal logic and useful abstractions like methods/functions, classes/modules, interfaces and so on as we triangulate a working solution. TDD has multiple benefits that tend to make the investment in our tests worthwhile. For a start, if we only write code to pass tests, then at the end we will have all our solution code covered by fast-running tests. TDD produces high test assurance. Also, we’ve found that code that is test-driven tends to be simpler, lower in duplication and more modular. Indeed, TDD forces us to design our solutions in such a way that they are testable. Testable is synonymous with modular. Working in fast feedback loops means we tend to make fewer design decisions before getting feedback, and this tends to bring more focus to each decision. TDD, done well, promotes a form of continuous code review that few other techniques do. TDD also discourages us from writing code we don’t need, since all solution code is written to pass tests. It focuses us on the “what” instead of the “how”. Overly complex or redundant code is reduced. So, TDD tends to produce more reliable code (studies find up to 90% less bugs in production), that can be re-tested quickly, and that is simpler and more maintainable. It’s an effective way to achieve the frequent and sustained release cycles demanded by evolutionary development. We’ve yet to find a better way.
    • Behaviour-Driven Development – is working with the customer at the system level to precisely define not what the functions and modules inside do, but what the system does as a whole. Customer tests – tests we’ve agreed with our customer that describe system behaviour using real examples (e.g., for a £250,000 mortgage paid back over 25 years at 4% interest, the monthly payments should be exactly £1,290) – drive our internal design, telling us what the units in our “unit tests” need to do in order to deliver the system behaviour the customer desires. These tests say nothing about how the required outputs are calculated, and ideally make no mention of the system design itself, leaving the developers and UX folk to figure those design details out. They are purely logical tests, precisely capturing the domain logic involved in interactions with the system. The power of BDD and customer tests (sometimes called “acceptance tests”) is how using concrete examples can help us drive out a shared understanding of what exactly a requirement like “…and then the mortgage repayments are calculated” really means. Automating these tests to pull in the example data provided by our customer forces us to be 100% clear about what the test means, since a computer cannot interpret an ambiguous statement (yet). Customer tests provide an outer “wheel” that drives the inner wheel of unit tests and TDD. We may need to write a bunch of internal units to pass an external customer test, so that outer wheel will turn slower. But it’s important those wheels of BDD and TDD are directly connected. We only write solution code to pass unit tests, and we only write unit tests for logic needed to pass the customer test.
  • Code Quality – refers specifically to the properties of our code that make it easier or harder to change. As teams mature, their focus will often shift away from “making it work” to “making it easier to change, too”. This typically signals a growth in the maturity of the developers as code crafters.
    • Software Design Principles – address the underlying factors in code mechanics that can make code harder to change. On Codemanship courses, we teach two sets of design principles: Simple Design and Modular Design.
      • Simple Design
        • The code must work
        • The code must clearly reveal it’s intent (i.e., using module names, function names, variable names, constants and so on, to tell the story of what the code does)
        • The code must be low in duplication (unless that makes it harder to understand)
        • The code must be the simplest thing that will work
      • Modular Design (where a “module” could be a class, or component, or a service etc)
        • Modules should do one job
        • Modules should know as little about each other as possible
        • Module dependencies should be easy to swap
    • Refactoring – is the discipline of improving the internal design of our software without changing what it does. More bluntly, it’s making the code easier to change without breaking it. Like TDD, refactoring works in small feedback cycles. We perform a single refactoring – like renaming a class – and then we immediately re-run our tests to make sure we didn’t break anything. Then we do another refactoring (e.g., move that class into a different package) and test again. And then another refactoring, and test. And another, and test. And so on. As you can probably imagine, a good suite of fast-running automated tests is essential here. Refactoring and TDD work hand-in-hand: the tests make refactoring safer, and without a significant amount of refactoring, TDD becomes unsustainable. Working in these small, safe steps, a good developer can quite radically restructure the code whilst ensuring all along the way that the software still works. I was very tempted to put refactoring under Foundation, because it really is a foundational discipline for any kind of programming. But it requires a good “nose” for code quality, and it’s also an advanced skill to learn properly. So I’ve grouped it here under Code Quality. Developers need to learn to recognise code quality problems when they see them, and get hundreds of hours of practice at refactoring the code safely to eliminate them.
    • Legacy Code – is code that is in active use, and therefore probably needs to be updated and improved regularly, but is too expensive and risky to change. This is usually because the code lacks fast-running automated tests. To change legacy code safely, we need to get unit tests around the parts of the code we need to change. To achieve that, we usually need to refactor that code to make it easy to unit test – i.e., to remove external dependencies from that code. This takes discipline and care. But if every change to a legacy system started with these steps, over time the unit test coverage would rise and the internal design would become more and more modular, making changes progressively easier. Most developers are afraid to work on legacy code. But with a little extra discipline, they needn’t be. I actually find it very satisfying to rehabilitate software that’s become a millstone around our customers’ necks. Most code in operation today is legacy code.
    • Continuous Inspection – is how we catch code quality problems early, when they’re easier to fix. Like anything with the word “continuous” in the title, continuous inspection implies frequent automated checking of the code for cod quality “bugs” like functions that are too big or too complicated, modules with too many dependencies and so on. In traditional approaches, teams do code reviews to find these kinds of issues. For example, it’s popular these days to require a code review before a developer’s changes can be merged into the master branch of their repo. This creates bottlenecks in the delivery process, though. Code reviews performed by people looking at the code are a form of manual testing. You have to wait for someone to be available to do it, and it may take them some time to review all the changes you’ve made. More advanced teams have removed this bottleneck by automating some or all of their code reviews. It requires some investment to create an effective suite of code quality gates, but the pay-off in speeding up the check-in process usually more than pays for it. Teams doing continuous inspection tend to produce code of a significantly higher quality than teams doing manual code reviews.
  • Software Delivery – is all about how the code we write gets to the operational environment that requires it. We typically cover it in two stages: how does code get from the developer’s desktop into a shared repository of code that could be built, tested and released at any time? And how does that code get from the repository onto the end user’s smartphone, or the rented cloud servers, or the TV set-top box as a complete usable product?
    • Continuous Integration – is the practice of developers frequently (at least once a day) merging their changes into a shared repository from which the software can be built, tested and potentially deployed. Often seen as purely a technology issue – “we have a build server” – CI is actually a set of disciplines that the technology only enables if the team applies them. First, it implies that developers don’t go too long before merging their changes into the same branch – usually the master branch or “trunk”. Long-lived developer branches – often referred to as “feature branches” – that go unmerged for days prevent frequent merging of (and testing of merged) code, and is therefore most definitely not CI. The benefit of frequent tested merges is that we catch conflicts much earlier, and more frequent merges typically means less changes in each merge, therefore less merge conflicts overall. Teams working on long-lived branches often report being stuck in “merge hell” where, say, at the end of the week everyone in the team tries to merge large batches of conflicting changes. In CI, once a developer has merged their changes to the master-branch, the code in the repo is built and the tests are run to ensure none of those changes has “broken the build”. It also acts as a double-check that the changes work on a different machine (the build server), which reduces the risk of configuration mistakes. Another implication of CI – if our intent is to have a repository of code that can be deployed at any time – is that the code in master branch must always work. This means that developers need to check before they merge that the resulting merged code will work. Running a suite of good automated tests beforehand helps to ensure this. Teams who lack those tests – or who don’t run them because they take too long – tend to find that the code in their repo is permanently broken to some degree. In this case, releases will require a “stabilisation” phase to find the bugs and fix them. So the software can’t be released as soon as the customer wants.
    • Continuous Delivery – means ensuring that our software is always shippable. This encompasses a lot of disciplines. If the is code sitting on developers’ desktops or languishing in long-lived branches, we can’t ship it. If the code sitting in our repo is broken, we can’t ship it. If there’s no fast and reliable way to take the code in the repo and deploy it as a working end product to where it needs to go, we can’t ship it. As well as disciplines like TDD and CI, continuous delivery also requires a very significant investment in automating the delivery pipeline – automating builds, automating testing (and making those test run fast enough), automating code reviews, automating deployments, and so on. And these automated delivery processes need to be fast. If your builds take 3 hours – usually because the tests take so long to run – then that will slow down those all-important customer feedback loops, and slow down the process of learning from our releases and evolving a better design. Build times in particular are like the metabolism of your development process. If development has a slow metabolism, that can lead to all sorts of other problems. You’d be surprised how often I’ve seen teams with myriad difficulties watch those issues magically evaporate after we cut their build+test time down from hours to minutes.

Now, most of this stuff is known to most developers – or, at the very least, they know of them. The final two headings caused a few scratched heads. These are more advanced topics that I’ve found teams do need to think about, but usually after they’ve mastered the core disciplines that come before.

  • Managing Code Craft
    • The Case for Code Craft – acknowledges that code craft doesn’t exist in a vacuum, and shouldn’t be seen as an end in itself. We don’t write unit tests because, for example, we’re “professionals”. We write unit tests to make changing code easier and safer. I’ve found it helps enormously to both be clear in my own mind about why I’m doing these things, as well as in persuading teams that they should try them, too. I hear it from teams all the time: “We want to do TDD, but we’re not allowed”. I’ve never had that problem, and my ability to articulate why I’m doing TDD helps.
    • Code Craft Metrics – once you’ve made your case, you’ll need to back it up with hard data. Do the disciplines of code craft really speed up feedback cycles? Do they really reduce bug counts, and does that really save time and money? Do they really reduce the cost of changing code? Do they really help us to sustain the pace of innovation for longer? I’m amazed how few teams track these things. It’s very handy data to have when the boss comes a’knockin’ with their Micro-Manager hat on, ready to tell you how to do your job.
    • Scaling Code Craft – is all about how code craft on a team and within a development organisation just doesn’t magically happen overnight. There are lots of skills and ideas and tools involved, all of which need to be learned. And these are practical skills, like riding a bicycle. You can;t just read a book and go “Hey, I’m a test-driven developer now”. Nope. You’re just someone who knows in theory what TDD is. You’ve got to do TDD to learn TDD, and lot’s of it. And all that takes time. Most teams who fail to adopt code craft practices do so because they grossly underestimated how much time would be required to learn them. They approach it with such low “energy” that the code craft learning curve might as well be a wall. So I help organisations structure their learning, with a combination of reading, training and mentoring to get teams on the same page, and peer-based practice and learning. To scale that up, you need to be growing your own internal mentors. Ad hoc, “a bit here when it’s needed”, “a smigen there when we get a moment” simply doesn’t seem to work. You need to have a plan, and you need to invest. And however much you were thinking of investing, it’s not going to be enough.
  • High-Integrity Code Craft
    • Load-Bearing Code – is that portion of code that we find in almost any non-trivial software that is much more critical than the rest. That might be because it’s on an execution path for a critical feature, or because it’s a heavily reused piece of code that lies on many paths for many features. Most teams are not aware of where their load-bearing code is. Most teams don’t give it any thought. And this is where many of the horror stories attributed to bugs in software begin. Teams can improve at identifying load-bearing code, and at applying more exhaustive and rigorous testing techniques to achieve higher levels of assurance when needed. And before you say “Yeah, but none of our code is critical”, I’ll bet a shiny penny there’s a small percentage of your code that really, really, really needs to work. It’s there, lurking in most software, just waiting to send that embarrassing email to everyone in your address book.
    • Guided Inspection – is a powerful way of testing code by reading it. Many studies have shown that code inspections tend to find more bugs than any other kind of testing. In guided inspections, we step through our code line by line, reasoning about what it will do for a specific test case – effectively executing the code in our heads. This is, of course, labour-intensive, but we would typically only do it for load-bearing code, and only when that code itself has changed. If we discover new bugs in an inspection, we feed that back into an automated test that will catch the bug if it ever re-emerges, adding it to our suite of fast-running regression tests.
    • Design By Contract – is a technique for ensuring the correctness of the interactions between components of our system. Every interaction has a contract: a pre-condition that describes when a function or service can be used (e.g., you can only transfer money if your account has sufficient funds), and a post-condition that describes what that function or service should provide to the client (e.g., the money is deducted from your account and credited to the payee’s account). There are also invariants: things that must always be true if the software is working as required (e.g., your account never goes over it’s limit). Contracts are useful in two ways: for reasoning about the correct behaviour of functions and services, and for embedding expectations about that behaviour inside the code itself as assertions that will fail during testing if an expectation isn’t satisfied. We can test post-conditions using traditional unit tests, but in load-bearing code, teams have found it helpful to assert pre-conditions to ensure that not only do functions and services do what they’re supposed to, but they’re only ever called when they should be. DBC presents us with some useful conceptual tools, as well as programming techniques when we need them. It also paves the way to a much more exhaustive kind of automated testing, namely…
    • Property-Based Testing – sometimes referred to as generative testing, is a form of automated testing where the inputs to the tests themselves are programmatically calculated. For example, we might test that a numerical algorithm works for a range of inputs from 0…1000, at increments of 0.01. or we might test that a shipping calculation works for all combinations of inputs of country, weight class and mailing class. This is achieved by generalising the expected results in our tests, so instead of asserting that the square root of 4 is 2, we might assert that the square root of any positive number multiplied by itself is equal to the original number. These properties of correct test results look a lot like the contracts we might write when we practice Design By Contract, and therefore we might find experience in writing contracts helpful in building that kind of declarative style of asserting. The beauty of property-based tests is that they scale easily. Generating 1,000 random inputs and generating 10,000 random inputs requires a change of a single character in our test. One character, 9,000 extra test cases. Two additional characters (100,000) yields 99,000 more test cases. Property-based tests enable us to achieve quite mind-boggling levels of test assurance with relatively little extra test code, using tools most developers already know.

So there you have it: my code craft road map, in a nutshell. Many of these disciplines are covered in introductory – but practical – detail in the Codemanship TDD course book

If your team could use a hands-on introduction to code craft, our 3-day hands-on TDD course can give them a head-start.

Changing Legacy Code Safely

One of the topics we cover on the Codemanship TDD course is one that developers raise often: how can we write fast-running unit tests for code that’s not easily unit-testable? Most developers are working on legacy code – code that’s difficult and risky to change – most of the time. So it’s odd there’s only one book about it.

I highly recommend Micheal Feather’s book for any developer working in any technology applying any approach to development. On the TDD course, I summarise what we mean by “legacy code” – code that doesn’t have fast-running automated tests, making it risky to change – and briefly demonstrate Michael’s process for changing legacy code safely.

The example I use is a simple Python program for pricing movie rentals based on their IMDB ratings. Average movies rentals cost £3.95. High-rated movies cost an extra pound, low-rated movies cost a pound less.

My program has no automated tests, so I’ve been testing it manually using the command line.

Suppose the business asked us to change the pricing logic; how could we do this safely if we lack automated tests to guard against breaking the code?

Michael’s process goes like this:

  • Identify what code you will need to change
  • Identify where around that code you’d want unit tests
  • Break any dependencies that are stopping you from writing unit tests
  • Write the unit tests you’d want to satisfy you the changes you’ll make didn’t break the code
  • Make the change
  • While you’re there, refactor to improve the code that’s now covered by unit tests to make life easier for the next person who changes it (which could be you)

My Python program has a class called Pricer which we’ll need to change to update the pricing logic.

I’ve been testing this logic one level above by testing the Rental class that uses Pricer.

My script that I’ve been manually testing with allows me to create Rental objects and write their data to the command line for different movies using their IMDB ID’s.

I use three example movies – one with a high rating, one low-rated and one medium-rated – to test the code. For example, the output for the high-rated movie looks like this.

C:\Users\User\Desktop\tdd 2.0\python_legacy>python program.py jgorman tt0096754
Video Rental – customer: jgorman. Video => title: The Abyss, price: £4.95

I’d like to reproduce these manual tests as unit tests, so I’ll be writing unittest tests for the Rental class for each kind of movie.

But before I can do that, there’s an external dependency we have to deal with. The Pricer class connects directly to the OMDB API that provides movie information. I want to stub that so I can provide test IMDB ratings without connecting.

Here’s where we have to get disciplined. I want to refactor the code to make it unit-testable, but it’s risky to do that because… there’s no unit tests! Opinions differ on approach, but personally – learned through bitter experience – I’ve found that it’s still important to re-test the code after every refactoring, manually if need be. It will seem like a drag, but we all tend to overlook how much time we waste downstream fixing avoidable bugs. It will seem slower to manually re-test, but it’s often actually faster in the final reckoning.

Okay, let’s do a refactoring. First, let’s get that external dependency in its own method.

I re-run my manual tests. Still passing. So far, so good.

Next, let’s move that new method into its own class.

And re-test. All passing.

To make the dependency on VideoInfo swappable, the instance needs to be injected into the constructor of Pricer from Rental.

And re-test. All passing.

Next, we need to inject the Pricer into Rental, so we can stub VideoInfo in our planned unit tests.

And re-test. All passing.

Now we can write unit tests to replicate our command line tests.

These unit tests reproduce all the checks I was doing visually at the command line, but they run in a fraction of a second. The going get’s much easier from here.

Now I can make the change to the pricing logic the business requested.

user_story

We can tackle this in a test-driven way now. Let’s update the relevant unit test so that it now fails.

Now let’s make it pass.

(And, yes – obviously in a real product, the change would likely be more complex than this.)

Okay, so we’ve made the change, and we can be confident we haven’t broken the software. We’ve also added some test coverage and dealt with a problematic dependency in our architecture. If we wanted to get movie ratings from somewhere else (e.g., Rotten Tomatoes), or even aggregate sources, it would be quite straightforward now that we’ve cleanly separated that concern from our business logic.

One last thing while we’re here: there’s a couple of things in this code that have been bugging me. Firstly, we’ve been mixing our terminology: the customer says “movie”, but our code says “video”. Let’s make our code speak the customer’s language.

Secondly, I’m not happy with clients accessing objects’ fields directly. Let’s encapsulate.

With our added unit tests, these extra refactorings were much easier to do, and hopefully that means that changing this code in the future will be much easier, too.

Over time, one change at a time, the unit test coverage will build up and the code will get easier to change. Applying this process over weeks, months and years, I’ve seen some horrifically rigid and brittle software products – so expensive and risky to change that the business had stopped asking – be rehabilitated and become going concerns again.

By focusing our efforts on changes our customer wants, we’re less likely to run into a situation where writing unit tests and refactoring gets vetoed by our managers. The results of highly visible “refactoring sprints”, or even long refactoring phases – I’ve known clients freeze requirements for up to a year to “refactor” legacy code – are typically disappointing, and run the risk of making refactoring and adding unit tests forbidden by disgruntled bosses.

One final piece of advice: never, ever discuss this process with non-technical stakeholders. If you’re asked to break down an estimate to change legacy code, resist. My experience has been that it often doesn’t take any longer to make the change safely, and the longer-term benefits are obvious. Don’t give your manager or your customer the opportunity to shoot themselves in the foot by offering up unit tests and refactoring as a line item. Chances are, they’ll say “no, thanks”. And that’s in nobody’s interests.

The Test Pyramid – The Key To True Agility

On the Codemanship TDD course, before we discuss Continuous Delivery and how essential it is to achieving real agility, we talk about the Test Pyramid.

It has various interpretations, in terms of the exactly how many layers and exactly what kinds of testing each layer is made of (unit, integration, service, controller, component, UI etc), but the overall sentiment is straightforward:

The longer tests take to run, the fewer of those kinds of tests you should aim to have

test_pyramid

The idea is that the tests we run most often need to be as fast as possible (otherwise we run them less often). These are typically described as “unit tests”, but that means different things to different people, so I’ll qualify: tests that do not involve any external dependencies. They don’t read from or write to databases, they don’t read or write files, they don’t connect with web services, and so on. Everything that happens in these tests happens inside the same memory address space. Call them In-Process Tests, if you like.

Tests that necessarily check our code works with external dependencies have to cross process boundaries when they’re executed. As our In-Process tests have already checked the logic of our code, these Cross-Process Tests check that our code – the client – and the external code – the suppliers – obey the contracts of their interactions. I call these “integration tests”, but some folk have a different definition of integration test. So, again, I qualify it as: tests that involve external dependencies.

These typically take considerably longer to execute than “unit tests”, and we should aim to have proportionally fewer of them and to run them proportionally less often. We might have thousands of unit tests, and maybe hundreds of integration tests.

If the unit tests cover the majority of our code – say, 90% of it – and maybe 10% of our code has direct external dependencies that have to be tested, on average we’ll make about 9 changes that need unit testing compared to 1 change that needs integration testing. In other words, we’d need to run our unit tests 9x as often as our integration tests, which is a good thing if each integration test is about 9 times slower than a unit test.

At the top of our test pyramid are the slowest tests of all. Typically these are tests that exercise the entire system stack, through the user interface (or API) all the way down to the external dependencies. These tests check that it all works when we plug everything together and deploy it into a specific environment. If we’ve already tested the logic of our code with unit tests, and tested the interactions with external suppliers, what’s left to test?

Some developers mistakenly believe that these system-levels tests are for checking the logic of the user experience – user “journeys”, if you like. This is a mistake. There are usually a lot of user journeys, so we’d end up with a lot of these very slow-running tests and an upside-down pyramid. The trick here is to make the logic of the user experience unit-testable. View models are a simple architectural pattern for logically representing what users see and what users do at that level. At the highest level they may be looking at an HTML table and clicking a button to submit a form, but at the logical level, maybe they’re looking at a movie and renting it.

A view model can help us encapsulate the logic of user experience in a way that can be tested quickly, pushing most of our UI/UX tests down to the base of the pyramid where they belong. What’s left – the code that must directly reference physical UI elements like HTML tables and buttons – can be wafer thin. At that level, all we’re testing is that views are rendered correctly and that user actions trigger the correct internal logic (which can easily be done using mock objects). These are integration tests, and belong in the middle layer of our pyramid, not the top.

Another classic error is to check core logic through the GUI. For example, checking that insurance premiums are calculated correctly by looking at what number is rendered on that web page. Some module somewhere does that calculation. That should be unit-testable.

So, if they’re not testing user journeys, and they’re not testing core logic, what do our system tests test? What’s left?

Well, have you ever found yourself saying “It worked on my machine”? The saying goes “There’s many a slip ‘twixt cup and lip.” Just because all the pieces work, and just because they all play nicely together, it’s not guaranteed that when we deploy the whole system into, say, our EC2 instances, that nothing could be different to the environments we tested it in. I’ve seen roll-outs go wrong because the servers handled dates different, or had the wrong locale, or a different file system, or security restrictions that weren’t in place on dev machines.

The last piece of the jigsaw is the system configuration, where our code meets the real production environment – or a simulation of it – and we find out if really works where it’s intended to work as a whole.

We may need dozens of those kinds of tests, and perhaps only need to run them on, say, every CI build by deploying the outputs to a staging environment that mirrors the production environment (and only if all our unit and integration tests pass first, of course.) These are our “good to go?” tests.

The shape of our test pyramid is critical to achieving feedback loops that are fast enough to allow us to sustain the pace of development. Ideally, after we make any change, we should want to get feedback straight away about the impact of that change. If 90% of our code can be re-tested in under 30 seconds, we can re-test 90% of our changes many times an hour and be alerted within 30 seconds if we broke something. If it takes an hour to re-test our code, then we have a problem.

Continuous Delivery means that our code is always shippable. That means it must always be working, or as near as possible always. If re-testing takes an hour, that means that we’re an hour away from finding out if changes we made broke the code. It means we’re an hour away from knowing if our code is shippable. And, after an hour’s-worth of changes without re-testing, chances are high that it is broken and we just don’t know it yet.

An upside-down test pyramid puts Continuous Delivery out of your reach. Your confidence that the code’s shippable at any point in time will be low. And the odds that it’s not shippable will be high.

The impact of slow-running test suites on development is profound. I’ve found many times that when a team invested in speeding up their tests, many other problems magically disappeared. Slow tests – which means slow builds, which means slow release cycles – is like a development team’s metabolism. Many health problems can be caused by a slow metabolism. It really is that fundamental.

Slow tests are pennies to the pound of the wider feedback loops of release cycles. You’d be surprised how much of your release cycles are, at the lowest level, made up of re-testing cycles. The outer feedback loops of delivery are made of the inner feedback loops of testing. Fast-running automated tests – as an enabler of fast release cycles and sustained innovation – are therefore highly desirable

A right-way-up test pyramid doesn’t happen by accident, and doesn’t come at no cost, though. Many organisations, sadly, aren’t prepared to make that investment, and limp on with upside-down pyramids and slow test feedback until the going gets too tough to continue.

As well as writing automated tests, there’s also an investment needed in your software’s architecture. In particular, the way teams apply basic design principles tends to determine the shape of their test pyramid.

I see a lot of duplicated code that contains duplicated external dependencies, for example. It’s not uncommon to find systems with multiple modules that connect to the same database, or that connect to the same web service. If those connections happened in one place only, that part of the code could be integration tested just once. D.R.Y. helps us achieve a right-way-up pyramid.

I see a lot of code where a module or function that does a business calculation also connects to an external dependency, or where a GUI module also contains business logic, so that the only way to test that core logic is with an integration test. Single Responsibility helps us achieve a right-way-up pyramid.

I see a lot of code where a module in one web service interacts with multiple features of another web service – Feature Envy, but on a larger scale – so there are multiple points of integration that require testing. Encapsulation helps us achieve a right-way-up pyramid.

I see a lot of code where a module containing core logic references an external dependency, like a database connection, directly by its implementation, instead of through an abstraction that could be easily swapped by dependency injection. Dependency Inversion helps us achieve a right-way-up pyramid.

Achieving a design with less duplication, where modules do one job, where components and services know as little as possible about each other, and where external dependencies can be easily stubbed or mocked by dependency injection, is essential if you want your test pyramid to be the right way up. But code doesn’t get that way by accident. There’s significant ongoing effort required to keep the code clean by refactoring. And that gets easier the faster your tests run. Chicken, meet egg.

If we’re lucky enough to be starting from scratch, the best way we know of to ensure a right-way-up test pyramid is to write the tests first. This compels us to design our code in such a way that it’s inherently unit-testable. I’ve yet to come across a team genuinely doing Continuous Delivery who wasn’t doing some kind of TDD.

If you’re working on legacy code, where maybe you’re relying on browser-based tests, or might have no automated tests at all, there’s usually a mountain to climb to get a test pyramid that’s the right way up. You need to write fast-running tests, but you will probably need to refactor the code to make that possible. Egg, meet chicken.

Like all mountains, though, it can be climbed. One small, careful step at a time. Michael Feather’s book Working Effectively With Legacy Code describes a process for making changes safely to code that lacks fast-running automated tests. It goes something like this:

  • Identify what code you need to change
  • Identify where around that code you’d want unit tests to make the change safely
  • Break any dependencies in that code getting in the way of unit testing
  • Write the unit tests
  • Make the change
  • While you’re there, make other improvements that will help the next developer who needs to change that code (the “boy scout rule” – leave the camp site tidier than you found it)

Change after change, made safely in this way, will – over time – build up a suite of fast-running unit tests that will make future changes easier. I’ve worked on legacy code bases that went from upside-down test pyramids of mostly GUI-based system tests, that took hours or even days to run, to right-side-up pyramids where most of the code could be tested in under a minute. The impact on the cost and the speed of delivery is always staggering. It can be done.

But be patient. A code base might take a year or two to turn around, and at first the going will be tough. I find I have to be super-disciplined in those early stages. I manually re-test as I refactor, and resist the temptation to make a whole bunch of changes at a time before I re-test. Slow and steady, adding value and clearing paths for future changes at the same time.

Code Craft’s Value Proposition: More Throws Of The Dice

Evolutionary design is a term that’s used often, not just in software development. Evolution is a way of solving complex problems, typically with necessarily complex solutions (solutions that have many interconnected/interacting parts).

But that complexity doesn’t arise in a single step. Evolved designs start very simple, and then become complex over many, many iterations. Importantly, each iteration of the design is tested for it’s “fitness” – does it work in the environment in which it operates? Iterations that don’t work are rejected, iterations that work best are selected, and become the input to the next iteration.

We can think of evolution as being a search algorithm. It searches the space of all possible solutions for the one that is the best fit to the problem(s) the design has to solve.

It’s explained best perhaps in Richard Dawkins’ book The Blind Watchmaker. Dawkins wrote a computer simulation of a natural process of evolution, where 9 “genes” generated what he called “biomorphs”. The program would generate a family of biomorphs – 9 at a time – with a parent biomorph at the centre surrounded by 8 children whose “DNA” differed from the parent by a single gene. Selecting one of the children made it the parent of a new generation of biomorphs, with 8 children of their own.

biomorph
Biomorphs generated by the evolutionary simulation at http://www.emergentmind.com/biomorphs

You can find a recreation and more detailed explanation of the simulation here.

The 9 genes of the biomorphs define a universe of 118 billion possible unique designs. The evolutionary process is a walk through that universe, moving just one space in any direction – because just one gene is changing with each generation – with each iteration. From simple beginnings, complex forms can quickly arise.

A brute force search might enumerate all possible solutions, test each one for fitness, and select the best out of that entire universe of designs. With Dawkins’ biomorphs, this would mean testing 118 billion designs to find the best. And the odds of selecting the best design at random are 1:118,000,000,000. There may, of course, be many viable designs in the universe of all possible solutions. But the chances of finding one of them with a single random selection – a guess – are still very small.

For a living organism, that has many orders of magnitude more elements in their genetic code and therefore an effectively infinite solution space to search, brute force simply isn’t viable. And the chances of landing on a viable genetic code in a single step are effectively zero. Evolution solves problems not by brute force or by astronomically improbable chance, but by small, perfectly probable steps.

If we think of the genes as a language, then it’s not a huge leap conceptually to think of a programming language in the same way. A programming language defines the universe of all possible programs that could be written in that language. Again, the chances of landing on a viable working solution to a complex problem in a single step are effectively zero. This is why Big Design Up-Front doesn’t work very well – arguably at all – as a solution search algorithm. There is almost always a need to iterate the design.

Natural evolution has three key components that make it work as a search algorithm:

  • Reproduction – the creation of a new generation that has a virtually identical genetic code
  • Mutation – tiny variances in the genetic code with each new generation that make it different in some way to the parent (e.g., taller, faster, better vision)
  • Selection – a mechanism for selecting the best solutions based on some “fitness” function against which each new generation can be tested

The mutations from one generation to the next are necessarily small. A fitness function describes a fitness landscape that can be projected onto our theoretical solution space of all possible programs written in a language. Programs that differ in small ways are more likely to have very similar fitness than programs that are very different. Make one change to a working solution and, chances are, you’ve still got a working solution. Make 100 changes, and the risk of breaking things is much higher.

Evolutionary design works best when each iteration is almost identical to that last, with only one or two small changes. Teams practicing Continuous Delivery with a One-Feature-Per-Release policy, therefore, tend to arrive at better solutions than teams who schedule many changes in each release.

And within each release, there’s much more scope to test even smaller changes – micro-changes of the kind enacted in, say, refactoring, or in the micro-iterations of Test-Driven Development.

Which brings me neatly to the third component of evolutionary design: selection. In nature, the Big Bad World selects which genetic codes thrive and which are marked out for extinction. In software, we have other mechanisms.

Firstly, there’s our own version of the Big Bad World. This is the operating environment of the solution. A Point Of Sale system is ultimately selected or rejected through real use in real shops. An image manipulation program is selected or rejected by photographers and graphic designers (and computer programmers writing blog posts).

Real-world feedback from real-world use should never be underestimated as a form of testing. It’s the most valuable, most revealing, and most real form of testing.

Evolutionary design works better when we test our software in the real world more frequently. One production release a year is way too little feedback, way too late. One production release a week is far better.

Once we’ve established that the software is fit for purpose through customer testing – ideally in the real world – there are other kinds of testing we can do to help ensure the software stays working as we change it. A test suite can be thought of as a codified set of fitness functions for our solution.

One implication of the evolutionary design process is that, on average, more iterations will produce better solutions. And this means that faster iterations tend to arrive at a working solution sooner. Species with long life cycles – e.g., humans or elephants – evolve much slower than species with short life cycles like fruit flies and bacteria. (Indeed, they evolve so fast that it’s been observed happening in the lab.) This is why health organisations have to guard against new viruses every year, but nobody’s worried about new kinds of shark suddenly emerging.

For this reason, anything in our development process that slows down the iterations impedes our search for a working solution. One key factor in this is how long it takes to build and re-test the software as we make changes to it. Teams whose build + test process takes seconds tend to arrive at better solutions sooner than teams whose builds take hours.

More generally, the faster and more frictionless the delivery pipeline of a development team, the faster they can iterate and the sooner a viable solution evolves. Some teams invest heavily in Continuous Delivery, and get changes from a programmer’s mind into production in minutes. Many teams under-invest, and changes can take weeks or months to reach the real world where the most useful feedback is to be had.

Other factors that create delivery friction include the maintainability of the code itself. Although a system may be complex, it can still be built from simple, single-purpose, modular parts that can be changed much faster and more cheaply than complex spaghetti code.

And while many BDUF teams focus on “getting it right first time”, the reality we observe is that the odds of getting it right first time are vanishingly small, no matter how hard we try. I’ll take more iterations over a more detailed requirements specification any day.

When people exclaim of code craft “What’s the point of building it right if we’re building the wrong thing?”, they fail to grasp the real purpose of the technical practices that underpin Continuous Delivery like unit testing, TDD, refactoring and Continuous Integration. We do these things precisely because we want to increase the chances of building the right thing. The real requirements analysis happens when we observe how users get on with our solutions in the real world, and feed back those lessons into a new iteration. The sooner we get our code out there, the sooner can get that feedback. The faster we can iterate solutions, the sooner a viable solution can evolve. The longer we can sustain the iterations, the more throws of the dice we can give the customer.

That, ultimately, is the promise of good code craft: more throws of the dice.

 

Refactoring To Closures in Kotlin & IntelliJ

I spent yesterday morning practicing a refactoring in Kotlin I wanted to potentially demonstrate for a workshop, and after half a dozen unsuccessful attempts, I found a way that seems relatively safe. I thought it might be useful to document it here, both for myself for the future and anyone else who might be interested.

My goal here is to encapsulate the data used in this function for calculating quotes for fitted carpets. The solution I’m thinking of is closures.

How do I get from this to closures being injected into quote() safely? Here’s how I did it in IntelliJ.

  1. Use the Function to Scope… refactoring to extract the body of, say, the roomArea() function into an internal function.

2. As a single step, change the return type of roomArea() to a function signature that matches area(), return a reference to ::area instead of the return value from area(), and change quote() to invoke the returned area() function. (Phew!)

3. Rename roomArea() to room() so it makes more sense.

4. In quote(), highlight the expression room(width, length) and use the Extract Parameter refactoring to have that passed into quote() from the tests.

5. Now we’re going to do something similar for carpetPrice(), with one small difference. Next, as with roomArea(), use the Function to Scope refactoring to extract the body of carpetPrice() into an internal function.

6. Then swap the return value with a reference to the ::price function.

7. Now, this time we want the area to be passed in as a parameter to the price() function. Extract Parameter area from price(), change the signature of the returned function and update quote() to pass it in using the area() function. Again, this must be a single step.

8. Change the Signature of carpetPrice() to remove the redundant area parameter.

9. Rename carpetPrice() to carpet() so it makes more sense.

10. Extract Parameter for the expression carpet(pricePerSqrMtr, roundUp) in quote() called price()

 

If you want to have a crack at this yourself, the source code is at https://github.com/jasongorman/kotlin_simple_design/tree/master/fp_modular_design , along with two more examples (an OO/Java version of this, plus another example that breaks all the rules of both Simple Design and modular design in Kotlin.