The Software Design Process

One thing that sadly rarely gets discussed these days is how we design software. That is, how we get from a concept to working code.

As a student (and teacher) of software design and architecture of many years, experiencing first-hand many different methodologies from rigorous to ad hoc, heavyweight to agile, I can see similarities between all effective approaches.

Whether you’re UML-ing or BDD-ing or Event Storming-ing your designs, when it works, the thought process is the same.

It starts with a goal.

This – more often than not – is a problem that our customer needs solving.

This, of course, is where most teams get the design thinking wrong. They don’t start with a goal – or if they do, most of the team aren’t involved at that point, and subsequently are not made aware of what the original goal or problem was. They’re just handed a list of features and told “build that”, with no real idea what it’s for.

But they should start with a goal.

In design workshops, I encourage teams to articulate the goal as a single, simple problem statement. e.g.,

It’s really hard to find good vegan takeaway in my area.

Jason Gorman, just now

Our goal is to make it easier to order vegan takeaway food. This, naturally, begs the question: how hard is it to order vegan takeaway today?

If our target customer area is Greater London, then at this point we need to hit the proverbial streets and collect data to help us answer that question. Perhaps we could pick some random locations – N, E, S and W London – and try to order vegan takeaway using existing solutions, like Google Maps, Deliveroo and even the Yellow Pages.

Our data set gives us some numbers. On average, it took 47 minutes to find a takeaway restaurant with decent vegan options. They were, on average, 5.2 miles from the random delivery address. The orders took a further 52 minutes to be delivered. In 19% of selected delivery addresses, we were unable to order vegan takeaway at all.

What I’ve just done there is apply a simple thought process known as Goal-Question-Metric.

We ask ourselves, which of these do we think we could improve on with a software solution? I’m not at all convinced software would make the restaurants cook the food faster. Nor will it make the traffic in London less of an obstacle, so delivery times are unlikely to speed up much.

But if our data suggested that to find a vegan menu from a restaurant that will deliver to our address we had to search a bunch of different sources – including telephone directories – then I think that’s something we could improve on. It hints strongly that lack of vegan options isn’t the problem, just the ease of finding them.

A single searchable list of all takeaway restaurants with decent vegan options in Greater London might speed up our search. Note that word: MIGHT.

I’ve long advocated that software specifications be called “theories”, not “solutions”. We believe that if we had a searchable list of all those restaurants we had to look in multiple directories for, that would make the search much quicker, and potentially reduce the incidences when no option was found.

Importantly, we can compare the before and the after – using the examples we pulled from the real world – to see if our solution actually does improve search times and hit rates.

Yes. Tests. We like tests.

Think about it; we describe our modern development processes as iterative. But what does that really mean? To me – a physics graduate – it implies a goal-seeking process that applies a process over and over to an input, the output of which is fed into the next cycle, which converges on a stable working solution.

Importantly, if there’s no goal, and/or no way of knowing if the goal’s been achieved, then the process doesn’t work. The wheels are turning, the engine’s revving, but we ain’t going anywhere in particular.

Now, be honest, when have you ever been involved in a design process that started like that? But this is where good design starts: with a goal.

So, we have a goal – articulated in a testable way, importantly. What next?

Next, we imaginate (or is it visionize? I can never keep up with the management-speak) a feature – a proverbial button the user clicks – that solves their problem. What does it do?

Don’t think about how it works. Just focus on visualifying (I’m getting the hang of this now) what happens when the user clicks that magical button.

In our case, we imagine that when the user clicks the Big Magic Button of Destiny, they’re shown a list of takeaway restaurants with a decent vegan menu who can deliver to their address within a specified time (e.g., 45 minutes).

That’s our headline feature. A headline feature is the feature that solves the customer’s problem, and – therefore – is the reason for the system to exist. No, “Login” is never a headline feature. Nobody uses software because they want to log in.

Now we have a testable goal and a headline feature that solves the customer’s problem. It’s time to think about how that headline feature could work.

We would need a complete list of takeaway restaurants with decent vegan menus within any potential delivery address in our target area of Greater London.

We would need to know how long it might take to deliver from each restaurant to the customer’s address.

This would include knowing if the restaurant is still taking orders at that time.

Our headline feature will require other features to make it work. I call these supporting features. They exist only because of the headline feature – the one that solves the problem. The customer doesn’t want a database. They want vegan takeaway, damn it!

Our simple system will need a way to add restaurants to the list. It will need a way to estimate delivery times (including food preparation) between restaurant and customer addresses – and this may change (e.g., during busy times). It will need a way for restaurants to indicate if they’re accepting orders in real time.

At this point, you may be envisaging some fancypants Uber Eats style of solution with whizzy maps showing delivery drivers aimlessly circling your street for 10 minutes because nobody reads the damn instructions these days. Grrr.

But it ain’t necessarily so. This early on in the design process is no time for whizzy. Whizzy comes later. If ever. Remember, we’re setting out here to solve a problem, not build a whizzy solution.

I’ve seen some very high-profile applications go live with data entry interfaces knocked together in MS Access for that first simple release, for example. Remember, this isn’t a system for adding restaurant listings. This is a system for finding vegan takeaway. The headline feature’s always front-and-centre – our highest priority.

Also remember, we don’t know if this solution is actually going to solve the problem. The sooner we can test that, the sooner we can start iterating towards something better. And the simpler the solution, the sooner we can put it in the hands of end users. Let’s face it, there’s a bit of smoke and mirrors to even the most mature software solutions. We should know; we’ve looked behind the curtain and we know there’s no actual Wizard.

Once we’re talking about features like “Search for takeaway”, we should be in familiar territory. But even here, far too many teams don’t really grok how to get from a feature to working code.

But this thought process should be ingrained in every developer. Sing along if you know the words:

  • Who is the user and what do they want to do?
  • What jobs does the software need to do to give them that?
  • What data is required to do those jobs?
  • How can the work and the data be packaged together (e.g., in classes)
  • How will those modules talk to each other to coordinate the work end-to-end?

This is the essence of high-level modular software design. The syntax may vary (classes, modules, components, services, microservices, lambdas), but the thinking is the same. The user has needs (find vegan takeaway nearby). The software does work to satisfy those needs (e.g., estimate travel time). That work involves data (e.g., the addresses of restaurant and customer). Work and data can be packaged into discrete modules (e.g., DeliveryTimeEstimator). Those modules will need to call other modules to do related work (e.g., address.asLatLong()), and will therefore need “line of sight” – otherwise known as a dependency – to send that message.

You can capture this in a multitude of different ways – Class-Responsibility-Collaboration (CRC) cards, UML sequence diagrams… heck, embroider it on a tapestry for all I care. The thought process is the same.

This birds-eye view of the modules, their responsibilities and their dependencies needs to be translated into whichever technology you’ve selected to build this with. Maybe the modules are Java classes. Maybe their AWS lambdas. Maybe they’re COBOL programs.

Here we should be in writing code mode. I’ve found that if your on-paper (or on tapestry, if you chose that route) design thinking goes into detail, then it’s adding no value. Code is for details.

Start writing automated tests. Now that really should be familiar territory for every dev team.

/ sigh /

The design thinking never stops, though. For one, remember that everything so far is a theory. As we get our hands dirty in the details, our high-level design is likely to change. The best laid plans of mice and architects…

And, as the code emerges one test at a time, there’s more we need to think about. Our primary goal is to build something that solves the customer’s problem. But there are secondary goals – for example, how easy it will be to change this code when we inevitably learn that it didn’t solve the problem (or when the problem changes).

Most kitchen designs you can cater a dinner party in. But not every kitchen is easy to change.

It’s vital to remember that this is an iterative process. It only works if we can go around again. And again. And again. So organising our code in a way that makes it easy to change is super-important.

Enter stage left: refactoring.

Half the design decisions we make will be made after we’ve written the code that does the job. We may realise that a function or method is too big or too complicated and break it down. We may realise that names we’ve chosen make the code hard to understand, and rename. We may see duplication that could be generalised into a single, reusable abstraction.

Rule of thumb: if your high-level design includes abstractions (e.g., interfaces, design patterns, etc), you’ve detailed too early.

Jason Gorman, probably on a Thursday

The need for abstractions emerges organically as the code grows, through the process of reviewing and refactoring that code. We don’t plan to use factories or the strategy pattern, or to have a Vendor interface, in our solution. We discover the need for them to solve problems of software maintainability.

By applying organising principles like Simple Design, D.R.Y. Tell, Don’t Ask, Single Responsibility and the rest to the code is it grows, good, maintainable modular designs will emerge – often in unexpected ways. Let go of your planned architecture, and let the code guide you. Face it, it was going to be wrong anyway. Trust me: I know.

Here’s another place that far too many teams go wrong. As your code grows and an architecture emerges, it’s very, very helpful to maintain a birds-eye view of what that emerging architecture is becoming. Ongoing visualisation of the software – its modules, patterns, dependencies and so on – is something surprisingly few teams do these days. Working on agile teams, I’ve invested some of my time to creating and maintaining these maps of the actual terrain and displaying them prominently in the team’s area – domain models, UX storyboards, key patterns we’ve applied (e.g., how have we done MVC?) You’d be amazed what gets missed when everyone’s buried in code, neck-deep in details, and nobody’s keeping an eye on the bigger picture. This, regrettably, is becoming a lost skill – the baby Agile threw out with the bathwater.

So we build our theoretical solution, and deliver it to end users to try. And this is where the design process really starts.

Until working code meets the real world, it’s all guesswork at best. We may learn that some of the restaurants are actually using dairy products in the preparation of their “vegan” dishes. Those naughty people! We may discover that different customers have very different ideas about what a “decent vegan menu” looks like. We may learn that our estimated delivery times are wildly inaccurate because restaurants tell fibs to get more orders. We may get hundreds of spoof orders from teenagers messing with the app from the other side of the world.

Here’s my point: once the system hits the real world, whatever we thought was going to happen almost certainly won’t. There are always lessons that can only be learned by trying it for real.

So we go again. And that is the true essence of software design.

When are we done? When we’ve solved the problem.

And then we move on to the next problem. (e.g., “Yeah, vegan food’s great, but what about vegan booze?”)

Modularity & Readability

One thing I hear very often from less experienced developers is how difficult it can be for them to understand modular code.

The principles of modular design – that modules should:

  • Do one job
  • Hide their inner workings
  • Have swappable dependencies

– tend to lead to code that’s composed of small pieces that bind to abstractions for the other modules they use to do their jobs.

The key to making this work in practice rests on two factors: firstly, that developers get good at naming the boxes clearly enough so that people don’t have to look inside them to understand what they do, and secondly that developers accustom themselves to reading highly-composed code. More bluntly, developers have to learn how to read and write modular code.

Schools, universities and code clubs generally don’t get as far as modularity when they teach programming. Well, they may teach the mechanics of declaring and using modules, but they don’t present students with much opportunity to write larger, composed systems. The self-contained nature of programming problems in education typically presents students with algorithms whose implementations are all laid out on the screen in front of them.

Software at scale, though, doesn’t fit on a screen. They are jigsaw puzzles, and much more attention to how the pieces fit together is needed. Software design grows from being about algorithms and program flow to being about relationships between parts, at multiple levels of code organisation.

In this sense, young programmers leave school like freshly-minted playwrights who’ve only ever written short monologues. They know nothing of character, or motivation, or dialogue, of plotting, or pacing, or the three-act structure, nor have they ever concerned themselves with staging and practical considerations that just don’t come up when a single actor reads a single page standing centre-stage under a single spotlight.

Then they get their first job as a script assistant on a production of Noises Off and are all like “What are all these ‘stage directions’? Why are there so many scenes? I can’t follow this plot. Can we have just one character say all the lines?”

Here’s the thing; reading and writing modular code is an acquired skill. It doesn’t just happen overnight. As more and more young developers flood into the industry, I see more and more teams full of people who are easily bamboozled by modular, composed code.

Readability is about the audience. Programmers have a “reading age” defined by their ability to understand code, and code needs to be pitched to the audience’s reading age. This means that we may have to sacrifice some modularity for teams of less experienced developers. They’re not ready for it yet.

Having said all of that, of course, we get better at reading by being challenged. If we only ever read books that contained words we already know, we’d learn no new words.

I learned to read OO code by reading OO code written by more experienced programmers than me. They simultaneously pitched the code to be accessible to my level of understanding, and also a very little out of my current reach so that I had to stretch to follow the logic.

I know I’m a broken record on this topic, but that’s where mentoring comes in. Yes, there are many, many developers who lack the ability to read and write modular code. But every one of those teams could have someone who has lots of experience writing modular code who can challenge and guide them and bring them along over time – until one day it’s their turn to pay it forward.

The woeful lack of structured mentoring in our profession means that many developers go their entire careers never learning this skill. A lack of understanding combined with a lot of experience can be a dangerous mixture. “It is not that I don’t understand this play. This play is badly written. Good plays have a single character who stands in the centre of the stage under a single spotlight and reads out a 100-page monologue. Always.”

For those developers, a late-career Damascene conversion is unlikely to happen. I wish them luck.

For new developers, though, there’s a balance to be struck between working at a level they’re comfortable with today, and moving them forward to reading and writing more modular code in the future. Every program we write is both a solution to a problem today, and a learning experience to help us write a better solution tomorrow.

Why I Abandoned Business Modeling

So, as you may have gathered, I have a background in model-driven processes. I drank the UML Kool-Aid pretty early on, and by 2000 was a fully paid-up member of the Cult of Boxes And Arrows Solve Every Problem.

The big bucks for us architect types back then – and, probably still – came with a job title called Enterprise Architect. Enterprise Architecture is built on the idea that organisations like businesses are essentially machines, with moving connected parts.

Think of it like a motor car; there was a steering wheel, which executives turn to point the car in the direction they wanted to go. This was connected through various layers of mechanisms – business processes, IT systems, individual applications, actual source code – and segregated into connected vertical slices for different functions within the business, different business locations and so on.

The conceit of EA was that we could connect all those dots and create strategic processes of change where the boss changes a business goal and that decision works its way seamlessly through this multi-layered mechanism, changing processes, reconfiguring departments and teams, rewriting systems and editing code so that the car goes in the desired new direction.

It’s great fun to draw complex picture of how we think a business operates. But it’s also a fantasy. Businesses are not mechanistic or deterministic in this way at all. First of all, modeling a business of any appreciable size requires us to abstract away all the insignificant details. In complex systems, though, there are no such things as “insignificant details”. The tiniest change can push a complex system into a profoundly different order.

And that order emerges spontaneously and unpredictably. I’ve watched some big businesses thrown into chaos by the change of a single line of code in a single IT system, or by moving the canteen to a different floor in HQ.

2001-2003 was a period of significant evolution of my own thinking on this. I realised that no amount of boxes and arrows could truly communicate what a business is really like.

In philosophy, they have this concept of qualia – individual instances of subjective, conscious experience. Consider this thought experiment: you’re locked in a tower on a remote island. Everything in it is black and white. The tower has an extensive library of thousands of books that describe everything you could possibly need to know about the colour orange. You have studied the entire contents of that library, and are now the world’s leading authority on orange.

Then, one day, you are released from your tower and allowed to see the world. The first thing you do, naturally, is go and find an orange. When you see the colour orange for the first time – given that you’ve read everything there is to know about it – are you surprised?

Two seminal professional experiences I had in 2002-2004 convinced me that you cannot truly understand a business without seeing and experiencing it for yourself. In both cases, we’d had teams of business analysts writing documents, creating glossaries, and drawing boxes and arrows galore to explain the organisational context in which our software was intended to be used.

I speak box-and-arrow fluently, but I just wasn’t getting it. So many hidden details, so many unanswered questions. So, after months of going round in circles delivering software that didn’t fit, I said “Enough’s enough” and we piled into a minibus and went to the “shop floor” to see these processes for ourselves. The mist cleared almost immediately.

Reality is very, very complicated. All we know about conscious experience suggests that our brains are only truly capable of understanding complex things from first-hand experience of them. We have to see them and experience them for ourselves. Accept no substitutes.

Since then, my approach to strategic systems development has been one of gaining first-hand experience of a problem, and trying simple things we believe might solve those problems, seeing and measuring what effect they have, and feeding back into the next attempt.

Basically, I replaced Enterprise Architecture with agility. Up to that point, I’d viewed Agile as a way of delivering software. I was already XP’d up to the eyeballs, but hadn’t really looked beyond Extreme Programming to appreciate its potential strategic role in the evolution of a business. There have to be processes outside of XP that connect business feedback cycles to software delivery cycles. And that’s how I do it (and teach it) now.

Don’t start with features. Start with a problem. Design the simplest solution you can think of that might solve that problem, and make it available for real-world testing as soon as you can. Observe (and experience) the solution being used in the real world. Feed back lessons learned and go round again with an evolution of your solution. Rinse and repeat until the problem’s solved (my definition of “done”). Then move on to the next problem.

The chief differences between Enterprise Architecture and this approach are that:

a. We don’t make big changes. In complex adaptive systems, big changes != big results. You can completely pull a complex system out of shape, and over time the underlying – often unspoken – rule of the system (the “insignificant details” your boxes and arrows left out, usually) will bring it back to its original order. I’ve watched countless big change programmes produce no lasting, meaningful change.

b. We begin and end in the real world

In particular, I’ve learned from experience that the smallest changes can have the largest impact. We instinctively believe that to effect change at scale, we must scale our approach. Nothing could be further from the truth. A change to a single line of code can cause chaos at airport check-ins and bring traffic in an entire city to a standstill. Enterprise Architecture gave us the illusion of control over the effects of changes, because it gave us the illusion of understanding.

But that’s all it ever was: an illusion.

Is UML Esperanto for Programmers?

Back in a previous life, when I wore the shiny cape and the big pointy hat of a software architect, I thought the Unified Modeling Language was a pretty big deal. So much, in fact, that for quite a few years, I taught it.

In 2000, there was demand for that sort of thing. But by 2006 demand for UML training – and for UML on teams – had faded away to pretty much nothing. I rarely see it these days, on whiteboards or on developer machines. I occasionally see the odd class diagram or sequence diagram, often in a book. I occasionally draw the odd class diagram or sequence diagram myself – maybe a handful of times a year, when the need arises to make a point that such diagrams are well-suited to explaining.

UML is just one among many visualisation tools in my paint box. I use Venn diagrams when I want to visualise complex rules, for example. I use tables a lot – to visualise how functions should respond to inputs, to visualise state transitions, and to visualise conditional logic (e.g., truth tables). But we fixated on just that one set of diagrams, until UML became synonymous with software visualisation.

I’m a fan of pictures, you see. I’m a very visual thinker. But I’m aware that visual thinkers seem to be in a minority in computing. I often find myself being the only one in the room who gets it when they see a picture. Many programmers want to see code. So, on training courses now, I show them code, and then they get it.

Although UML has withered away, its vestigial limb remains in the world of academia. A lot of universities teach it, and in significant depth. In Computer Science departments around the world, Executable UML is still very much a thing and students may spend a whole semester learning how to specify systems in UML.

Then they graduate and rarely see UML again – certainly not Executable UML. The ones who continue to use it – and therefore not lose that skill – tend to be the ones who go on to teach it. Teaching keeps UML alive in the classroom long after it all but died in the office.

My website parlezuml.com still gets a few thousand visitors every month, and the stats clearly show that the vast majority are coming from university domains. In industry, UML is as dead a language as Latin. It’s taught to people who may go on to teach it, and elements of it can be found in many spoken languages today. (There were a lot of good ideas in UML). But there’s no country I can go to where the population speak Latin.

Possibly a more accurate comparison to UML,  might be Esperanto. Like UML, Esperanto was created – I think perhaps, aside from Klingon, only one example of a completely artificial spoken language – in an attempt to unify people and get everyone “speaking the same language”. As noble a goal as that may be, the reality of Esperanto is that the people who can speak it today mostly speak it to teach it to people who may themselves go on to teach it. Esperanto lives in the classroom – my Granddad Ray taught it for many years – and at conferences for enthusiasts. There’s no country I can go to where the population speak it.

And these days, I visit vanishingly few workplaces where I see UML being used in anger. It’s the Esperanto of software development.

I guess my point is this: if I was studying to be an interpreter, I would perhaps consider it not to be a good use of my time to learn Esperanto in great depth. For sure, there may be useful transferable concepts, but would I need to be fluent in Esperanto to benefit from them?

Likewise, is it really worth devoting a whole semester to teaching UML to students who may never see it again after they graduate? Do they need to be fluent in UML to learn its transferable lessons? Or would a few hours on class diagrams and sequence diagrams serve that purpose? Do we need to know the UML meta-meta-model to appreciate the difference between composition and aggregation, or inheritance and implementation?

Do I need to understand UML stereotypes to explain the class structure of my Python program, or the component structure of my service-oriented architecture? Or would boxes and arrows suffice?

If the goal of UML is to be understood (and to understand ourselves), then there are many ways beyond UML. How much of the 794-page UML 2.5.1 specification do I need to know to achieve that goal?

And why have they still not added Venn diagrams, dammit?! (So useful!)

So here’s my point: after 38 years programming – 28 of them for money – I know what skills I’ve found most essential to my work. Visualisation – drawing pictures – is definitely in that mix. But UML itself is a footnote. It beggars belief how many students graduate having devoted a lot of time to learning an almost-dead language but somehow didn’t find time to learn to write good unit tests or to use version control or to apply basic software design principles. (No, lecturers, comments are not a design principle.)

Some may argue that such skills are practical, technology-specific and therefore vocational. I disagree. There’s JUnit. And then there’s unit testing. I apply the same ideas about test structure, about test code design, about test organisation and optimisation in RSpec, in Mocha, in xUnit.net etc.

And UML is a technology. It’s an industry standard, maintained by an industry body. There are tools that apply – some more loosely than others – the standard, just like browsers apply W3C standards. Visual modeling with UML is every bit as vocational as unit testing with NUnit, or version control with Git. There’s an idea, and then that idea is applied with a technology.

You may now start throwing the furniture around. Message ends.

The Test Pyramid – The Key To True Agility

On the Codemanship TDD course, before we discuss Continuous Delivery and how essential it is to achieving real agility, we talk about the Test Pyramid.

It has various interpretations, in terms of the exactly how many layers and exactly what kinds of testing each layer is made of (unit, integration, service, controller, component, UI etc), but the overall sentiment is straightforward:

The longer tests take to run, the fewer of those kinds of tests you should aim to have

test_pyramid

The idea is that the tests we run most often need to be as fast as possible (otherwise we run them less often). These are typically described as “unit tests”, but that means different things to different people, so I’ll qualify: tests that do not involve any external dependencies. They don’t read from or write to databases, they don’t read or write files, they don’t connect with web services, and so on. Everything that happens in these tests happens inside the same memory address space. Call them In-Process Tests, if you like.

Tests that necessarily check our code works with external dependencies have to cross process boundaries when they’re executed. As our In-Process tests have already checked the logic of our code, these Cross-Process Tests check that our code – the client – and the external code – the suppliers – obey the contracts of their interactions. I call these “integration tests”, but some folk have a different definition of integration test. So, again, I qualify it as: tests that involve external dependencies.

These typically take considerably longer to execute than “unit tests”, and we should aim to have proportionally fewer of them and to run them proportionally less often. We might have thousands of unit tests, and maybe hundreds of integration tests.

If the unit tests cover the majority of our code – say, 90% of it – and maybe 10% of our code has direct external dependencies that have to be tested, on average we’ll make about 9 changes that need unit testing compared to 1 change that needs integration testing. In other words, we’d need to run our unit tests 9x as often as our integration tests, which is a good thing if each integration test is about 9 times slower than a unit test.

At the top of our test pyramid are the slowest tests of all. Typically these are tests that exercise the entire system stack, through the user interface (or API) all the way down to the external dependencies. These tests check that it all works when we plug everything together and deploy it into a specific environment. If we’ve already tested the logic of our code with unit tests, and tested the interactions with external suppliers, what’s left to test?

Some developers mistakenly believe that these system-levels tests are for checking the logic of the user experience – user “journeys”, if you like. This is a mistake. There are usually a lot of user journeys, so we’d end up with a lot of these very slow-running tests and an upside-down pyramid. The trick here is to make the logic of the user experience unit-testable. View models are a simple architectural pattern for logically representing what users see and what users do at that level. At the highest level they may be looking at an HTML table and clicking a button to submit a form, but at the logical level, maybe they’re looking at a movie and renting it.

A view model can help us encapsulate the logic of user experience in a way that can be tested quickly, pushing most of our UI/UX tests down to the base of the pyramid where they belong. What’s left – the code that must directly reference physical UI elements like HTML tables and buttons – can be wafer thin. At that level, all we’re testing is that views are rendered correctly and that user actions trigger the correct internal logic (which can easily be done using mock objects). These are integration tests, and belong in the middle layer of our pyramid, not the top.

Another classic error is to check core logic through the GUI. For example, checking that insurance premiums are calculated correctly by looking at what number is rendered on that web page. Some module somewhere does that calculation. That should be unit-testable.

So, if they’re not testing user journeys, and they’re not testing core logic, what do our system tests test? What’s left?

Well, have you ever found yourself saying “It worked on my machine”? The saying goes “There’s many a slip ‘twixt cup and lip.” Just because all the pieces work, and just because they all play nicely together, it’s not guaranteed that when we deploy the whole system into, say, our EC2 instances, that nothing could be different to the environments we tested it in. I’ve seen roll-outs go wrong because the servers handled dates different, or had the wrong locale, or a different file system, or security restrictions that weren’t in place on dev machines.

The last piece of the jigsaw is the system configuration, where our code meets the real production environment – or a simulation of it – and we find out if really works where it’s intended to work as a whole.

We may need dozens of those kinds of tests, and perhaps only need to run them on, say, every CI build by deploying the outputs to a staging environment that mirrors the production environment (and only if all our unit and integration tests pass first, of course.) These are our “good to go?” tests.

The shape of our test pyramid is critical to achieving feedback loops that are fast enough to allow us to sustain the pace of development. Ideally, after we make any change, we should want to get feedback straight away about the impact of that change. If 90% of our code can be re-tested in under 30 seconds, we can re-test 90% of our changes many times an hour and be alerted within 30 seconds if we broke something. If it takes an hour to re-test our code, then we have a problem.

Continuous Delivery means that our code is always shippable. That means it must always be working, or as near as possible always. If re-testing takes an hour, that means that we’re an hour away from finding out if changes we made broke the code. It means we’re an hour away from knowing if our code is shippable. And, after an hour’s-worth of changes without re-testing, chances are high that it is broken and we just don’t know it yet.

An upside-down test pyramid puts Continuous Delivery out of your reach. Your confidence that the code’s shippable at any point in time will be low. And the odds that it’s not shippable will be high.

The impact of slow-running test suites on development is profound. I’ve found many times that when a team invested in speeding up their tests, many other problems magically disappeared. Slow tests – which means slow builds, which means slow release cycles – is like a development team’s metabolism. Many health problems can be caused by a slow metabolism. It really is that fundamental.

Slow tests are pennies to the pound of the wider feedback loops of release cycles. You’d be surprised how much of your release cycles are, at the lowest level, made up of re-testing cycles. The outer feedback loops of delivery are made of the inner feedback loops of testing. Fast-running automated tests – as an enabler of fast release cycles and sustained innovation – are therefore highly desirable

A right-way-up test pyramid doesn’t happen by accident, and doesn’t come at no cost, though. Many organisations, sadly, aren’t prepared to make that investment, and limp on with upside-down pyramids and slow test feedback until the going gets too tough to continue.

As well as writing automated tests, there’s also an investment needed in your software’s architecture. In particular, the way teams apply basic design principles tends to determine the shape of their test pyramid.

I see a lot of duplicated code that contains duplicated external dependencies, for example. It’s not uncommon to find systems with multiple modules that connect to the same database, or that connect to the same web service. If those connections happened in one place only, that part of the code could be integration tested just once. D.R.Y. helps us achieve a right-way-up pyramid.

I see a lot of code where a module or function that does a business calculation also connects to an external dependency, or where a GUI module also contains business logic, so that the only way to test that core logic is with an integration test. Single Responsibility helps us achieve a right-way-up pyramid.

I see a lot of code where a module in one web service interacts with multiple features of another web service – Feature Envy, but on a larger scale – so there are multiple points of integration that require testing. Encapsulation helps us achieve a right-way-up pyramid.

I see a lot of code where a module containing core logic references an external dependency, like a database connection, directly by its implementation, instead of through an abstraction that could be easily swapped by dependency injection. Dependency Inversion helps us achieve a right-way-up pyramid.

Achieving a design with less duplication, where modules do one job, where components and services know as little as possible about each other, and where external dependencies can be easily stubbed or mocked by dependency injection, is essential if you want your test pyramid to be the right way up. But code doesn’t get that way by accident. There’s significant ongoing effort required to keep the code clean by refactoring. And that gets easier the faster your tests run. Chicken, meet egg.

If we’re lucky enough to be starting from scratch, the best way we know of to ensure a right-way-up test pyramid is to write the tests first. This compels us to design our code in such a way that it’s inherently unit-testable. I’ve yet to come across a team genuinely doing Continuous Delivery who wasn’t doing some kind of TDD.

If you’re working on legacy code, where maybe you’re relying on browser-based tests, or might have no automated tests at all, there’s usually a mountain to climb to get a test pyramid that’s the right way up. You need to write fast-running tests, but you will probably need to refactor the code to make that possible. Egg, meet chicken.

Like all mountains, though, it can be climbed. One small, careful step at a time. Michael Feather’s book Working Effectively With Legacy Code describes a process for making changes safely to code that lacks fast-running automated tests. It goes something like this:

  • Identify what code you need to change
  • Identify where around that code you’d want unit tests to make the change safely
  • Break any dependencies in that code getting in the way of unit testing
  • Write the unit tests
  • Make the change
  • While you’re there, make other improvements that will help the next developer who needs to change that code (the “boy scout rule” – leave the camp site tidier than you found it)

Change after change, made safely in this way, will – over time – build up a suite of fast-running unit tests that will make future changes easier. I’ve worked on legacy code bases that went from upside-down test pyramids of mostly GUI-based system tests, that took hours or even days to run, to right-side-up pyramids where most of the code could be tested in under a minute. The impact on the cost and the speed of delivery is always staggering. It can be done.

But be patient. A code base might take a year or two to turn around, and at first the going will be tough. I find I have to be super-disciplined in those early stages. I manually re-test as I refactor, and resist the temptation to make a whole bunch of changes at a time before I re-test. Slow and steady, adding value and clearing paths for future changes at the same time.

Action(Object), Object.Action() and Encapsulation

Just a quick post to bookmark an interesting discussion happening in Twitter right now in response to a little tweet I sent out.

Lot’s of different takes on this, but they tend to fall into three rough camps:

  • Lots of developers prefer action(object) because it reads the way we understand it – buy(cd), kick(ball) etc. Although, of course, this would imply functional programming (or static methods of unnamed classes)
  • Some like a subject, too – customer.buy(cd), player.kick(ball)
  • Some prefer the classic OOP – ball.kick(), cd.buy()

More than a few invented new requirements, I noticed. A discussion about YAGNI is for another time, though, I think.

Now, the problem with attaching the behaviour to a subject (or a function or static method of a different module or class) is you can end up with Feature Envy.

Let’s just say, for the sake of argument, that kicking a ball changes it’s position along an X-Y vector:

class Player(object):
    @staticmethod
    def kick(ball, vector):
        ball.x = ball.x + vector.x
        ball.y = ball.y + vector.y


class Ball(object):
    def __init__(self):
        self.x = 0
        self.y = 0


class Vector(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


if __name__ == "__main__":
    ball = Ball()
    Player.kick(ball, Vector(5,5))
    print("Ball -> x =", ball.x, ", y =", ball.y)

Player.kick() has Feature Envy for the fields of Ball. Separating agency from data, I’ve observed tends to lead to data classes – classes that are just made of fields (or getters and setters for fields, which is just as bad from a coupling point of view) – and lots of low-level coupling at the other end of the relationship.

If I eliminate the Feature Envy, I end up with:

class Player(object):
    @staticmethod
    def kick(ball, vector):
        ball.kick(vector)


class Ball(object):
    def __init__(self):
        self.x = 0
        self.y = 0

    def kick(self, vector):
        self.x = self.x + vector.x
        self.y = self.y + vector.y

And in this example – if we don’t invent any extra requirements – we don’t necessarily need Player at all. YAGNI.

class Ball(object):
    def __init__(self):
        self.x = 0
        self.y = 0

    def kick(self, vector):
        self.x = self.x + vector.x
        self.y = self.y + vector.y


class Vector(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


if __name__ == "__main__":
    ball = Ball()
    ball.kick(Vector(5,5))
    print("Ball -> x =", ball.x, ", y =", ball.y)

So we reduce coupling and simplify the design – no need for a subject, just an object. The price we pay – the trade-off, if you like – is that some developers find ball.kick() counter-intuitive.

It’s a can of worms!

“Stateless” – You Keep Using That Word…

One of the requirements of pure functions is that they are stateless. To many developers, this means simply that the data upon which the function acts is immutable. When dealing with objects, we mean that the object of an action has immutable fields, set at instantiation and then never changing throughout the instance’s life cycle.

In actual fact, this is not what ‘stateless’ means. Stateless means that the result of an action – e.e. a method call or a function call – is always the same given the same inputs, no matter how many times it’s invoked.

The classic stateless function is one that calculates square roots. sqrt(4) is always 2. sqrt(6.25) is always 2.5, and so on.

The classic stateful function is a light switch. The result of flicking the switch depends on whether the light is on or off at the time. If it’s off, it’s switched on. If it’s on, it’s switched off.

function Light() {
    this.on = false;

    this.flickSwitch = function (){
        this.on = !this.on;
    }
}

let light = new Light();

light.flickSwitch();
console.log(light);

light.flickSwitch();
console.log(light);

light.flickSwitch();
console.log(light);

light.flickSwitch();
console.log(light);

This code produces the output:

{ on: true }
{ on: false }
{ on: true }
{ on: false }

Most domain concepts in the real world are stateful, like our light switch. That is to say, they have a life cycle during which their behaviour changes depending on what has happened to them previously.

This is why finite state machines form a theoretical foundation for all program behaviour. Or, more simply, all program behaviour can be modeled as a finite state machine – a logical map of an object’s life cycle.

lightswitch

Now, a lot of developers would argue that flickSwitch() is stateful because it acts on an object with a mutable field. They would then reason that making on immutable, and producing a copy of the light with it’s state changed, would make it stateless.

const light = {
    on: false
}

function flickSwitch(light){
    return {...light, on: !light.on};
}

const copy1 = flickSwitch(light)
console.log(copy1);

const copy2 = flickSwitch(copy1);
console.log(copy2);

const copy3 = flickSwitch(copy2);
console.log(copy3);

const copy4 = flickSwitch(copy3);
console.log(copy4);

Technically, this is a pure functional implementation of our light switch. No state changes, and the result of each call to flickSwitch() is entirely determined by its input.

But, is it stateless? I mean, is it really? Technically, yes it is. But conceptually, no it certainly isn’t.

If this code was controlling a real light in the real world, then there’s only one light, it’s state changes, and the result of each invocation of flickSwitch() depends on the light’s history.

This is functional programming’s dirty little secret. In memory, it’s stateless and pure functional. Hooray for FP! But at the system level, it’s stateful.

While making it stateless can certainly help us to reason about the logic when considered in isolation – at the unit, or component or service level – when the identity of the object being acted upon is persistent, we lose those benefits at the system level.

Imagine we have two switches controlling a single light (e.g., one at the top of a flight of stairs and one at the bottom.)

lightswitches

In this situation, where a shared object is accessed in two different places, it’s harder to reason about the state of the light without knowing its history.

If I have to replace the bulb, I’d like to know if the light is on or off. With a single switch, I just need to look to see if it’s in the up (off) or down (on) position. With two switches, I need to understand the history. Was it last switched on, or switched off?

Copying immutable objects, when they have persistent identity – it’s the same light – does not make functions that act on those objects stateless. It makes them pure functional, sure. But we still need to consider their history. And in situations of multiple access (concurrency), it’s no less complicated than reasoning about mutable state, and just as prone to errors.

When I was knocking up my little code example, my first implementation of the FP version was:

const light = {
    on: false
}

function flickSwitch(light){
    return {...light, on: !light.on};
}

const copy1 = flickSwitch(light)
console.log(copy1);

const copy2 = flickSwitch(copy1);
console.log(copy2);

const copy3 = flickSwitch(copy2);
console.log(copy3);

const copy4 = flickSwitch(copy3);
console.log(copy3);

Do you see the error? When I ran it, it produced this output.

{ on: true }
{ on: false }
{ on: true }
{ on: true }

This is a class of bug I’ve seen many times in functional code. The last console.log uses the wrong copy.

The order – in this case, the order of copies – matters. And when the order matters, our logic isn’t stateless. It has history.

The most common manifestation of this class of bug I come across is in FP programs that have databases where object state is stored and shared across multiple client threads or processes.

Another workaround is to push the versioning model of our logical design into the database itself, in the form of event sourcing. This again, though, is far from history-agnostic and therefore far from stateless. Each object’s state – rather than being a single record in a single table that changes over time – is now the aggregate of the history of events that mutated it.

Going back to our finite state machine, each object is represented as the sequence of actions that brought it to its current state (e.g., flickSwitch() -> flickSwitch() -> flickSwitch() would produce a light that’s turned on.)

In reasoning about our logic, despite all the spiffy technological workarounds of FP, event sourcing and so on, if objects conceptually have history then they conceptually have state. And at the system level, we have to get that logic conceptually right.

Yet again, technology – including programming paradigm – is no substitute for thinking.

Overcoming Solution Bias

Just a short post this morning about a phenomenon I’ve seen many times in software development – which, for want of a better name, I’m calling solution bias.

It’s the tendency of developers, once they’ve settled on a solution to a problem, to refuse to let go of it – regardless of what facts may come to light that suggest it’s the wrong solution.

I’ve even watched teams argue with their customer to try to get them to change their requirements to fit a solution design the team have come up with. It seems once we have a solution in our heads (or in a Git repository) we can become so invested in it that – to borrow a metaphor – everything looks like a nail.

The damage this can do is obvious. Remember your backlog? That’s a solution design. And once a backlog’s been established, it has a kind of inertia that makes it unlikely to change much. We may fiddle at the edges, but once the blueprints have been drawn up, they don’t change significantly. It’s vanishingly rare to see teams throw their designs away and start afresh, even when it’s screamingly obvious that what they’re building isn’t going to work.

I think this is just human nature: when the facts don’t fit the theory, our inclination is to change the facts and not the theory. That’s why we have the scientific method: because humans are deeply flawed in this kind of way.

In software development, it’s important – if we want to avoid solution bias – to first accept that it exists, and that our approach must actively take steps to counteract it.

Here’s what I’ve seen work:

  • Testable Goals – sounds obvious, but it still amazes me how many teams have no goals they’re working towards other than “deliver on the plan”. A much more objective picture of whether the plan actually works can help enormously, especially when it’s put front-and-centre in all the team’s activities. Try something. Test it against the goal. See if it really works. Adapt if it doesn’t.
  • Multiple Designs – teams get especially invested in a solution design when it’s the only one they’ve got. Early development of candidate solutions should explore multiple design avenues, tested against the customer’s goals, and selected for extinction if they don’t measure up. Evolutionary design requires sufficiently diverse populations of possible solutions.
  • Small, Frequent Releases – a team that’s invested a year in a solution is going to resist that solution being rejected with far more energy than a team who invested a week in it. If we accept that an evolutionary design process is going to have failed experiments, we should seek to keep those experiments short and cheap.
  • Discourage Over-Specialisation – solution architectures can define professional territory. If the best solution is a browser-based application, that can be good news for JavaScript folks, but bad news for C++ developers. I often see teams try to steer the solution in a direction that favours their skill sets over others. This is understandable, of course. But when the solution to sorting a list of surnames is to write them into a database and use SQL because that’s what the developers know how to do, it can lead to some pretty inappropriate architectures. Much better, I’ve found, to invest in bringing teams up to speed on whatever technology will work best. If it needs to be done in JavaScript, give the Java folks a couple of weeks to learn enough JavaScript to make them productive. Don’t put developers in a position where the choice of solution architecture threatens their job.
  • Provide Safety – I can’t help feeling that a good deal of solution bias is the result of fear. Fear of failure.  Fear of blame. Fear of being sidelined. Fear of losing your job. If we accept that the design process is going to involve failed experiments, and engineer the process so that teams fail fast and fail cheaply – with no personal or professional ramifications when they do – then we can get on with the business of trying shit and seeing if it works. I’ve long felt that confidence isn’t being sure you’ll succeed, it’s not being afraid to fail. Reassure teams that failure is part of the process. We expect it. We know that – especially early on in the process of exploring the solution space – candidate solutions will get rejected. Importantly: the solutions get rejected, not the people who designed them.

As we learn from each experiment, we’ll hopefully converge on the likeliest candidate solution, and the whole team will be drawn in to building on that, picking up whatever technical skills are required as they do. At the end, we may not also deliver a good working solution, but a stronger team of people who have grown through this process.

 

Action->Object vs. Object->Action

One of the factors that I see programmers new to objects struggling with is our natural tendency to separate agency from data. Things do things to other things. The VCR plays the video. The toaster heats the toast. The driver drives the taxi. Etc.

I think it’s possibly linguistic, too, that we – in most natural languages – but the object after the verb: play(video), toast(bread), drive(taxi).

Thing is, though – this isn’t how object oriented programming works. Objects encapsulate agency with the data it works on, producing video.play(), bread.toast() and taxi.drive().

In OOP, the cat kicks its own arse.

You’re absolutely correct if you’re thinking “That isn’t how we’d say or write it in real life”. It isn’t. I suspect this is one of the reasons some programmers find OOP counter-intuitive – it goes against the way we see the world.

Ironically, Object thinking – while not intuitive in that sense – makes discovery of actions much easier. What can I do with a video? What can I do with bread? And so forth. That’s why Object->Action still dominates UI design. Well, good UI design, anyway. Likewise, developers tend to find it easier to discover functions that can be applied to types when they start with the type.

When I wrote code to tell the story of what happens when a member donates a video to a community library, each line started with a function – well, in Java, a static method, which is effectively the same thing. This is not great OOP. Indeed, it’s not OOP. It’s FP.

And that’s fine. Functional Programming works more the way we say things in the real world. Clean the dishes. Set the timer. Kick the cat. I suspect this is one reason why more and more programmers are draw to the functional paradigm – it works more the way we think, and reads more the way we talk. Or, at least, it can if we’re naming things well.

(There’s a separate discussion about encapsulation in FP. The tendency is for functional programmers not to bother with, which leads to inevitable coupling problems. That’s not because you can’t encapsulate data in FP. It’s just that, as a concept, it’s not been paid much attention.)

If you’re doing OOP – and I still do much of the time, because it’s perfectly workable, thank you very much – then it goes Object->Action. Methods like play(video) and kick(cat) hint at responsibilities being in the wrong place, leading to the lack of encapsulation I witness in so much OO code.

It’s like they say; give a C programmer C++, and they’ll write you a C program with it.

 

 

 

Do Your Unit Tests Know *Too* Much?

A common pitfall of extensive unit testing reported by many teams is that, as the number of tests builds up, changing the implementation under test forces them to rewrite many, many tests. In this scenario, the test code becomes a barrier to change instead of its enabler.

Having witnessed this quite a few times first-hand, I’ve got a few observations I want to share.

First of all, it’s actually usually a problem caused by unmanaged dependencies in our implementation code that causes changes to ripple out to large numbers of tests.

Imagine we wrote a unit test for every public method or function in our implementation, and we decide to change the way one of them works. If that breaks a hundred of our unit tests, my first thought might be “Why is this method/function referenced in a hundred tests?” Did we write a hundred distinct tests for that one thing, or is it used in the set-up of a hundred tests? That would imply that there’s no real separation of concerns in that part of the implementation. A lack of modularity creates all kinds of problems that show up in test code – usually in the set-ups.

The second thing I wanted to mention is duplication in test code. There’s a dangerous idea that’s been gaining in popularity in recent years that we shouldn’t refactor any duplication out of our tests. The thinking is that it can make our tests harder to understand, and there’s some merit to this when it’s done with little thought.

But there are ways to compose tests, reuse set-ups, assertions and whole tests that clearly communicate what’s going on (many well described in the xUnit Test Patterns book). Inexperienced programmers often struggle with code that’s composed out of small, simple parts, and it almost always comes down to poor naming.

Composition – like separation of concerns – needs to be a black box affair. If you have to look inside the box to understand what’s going on, then you have a problem. I’m as guilty of sloppy naming as anyone, and that’s something I’m working on to improve at.

There’s one mechanism in particular for removing duplication from test code that I’ve been championing for 20 years – parameterised tests. When we have multiple examples of the same rule or behaviour covered by multiple tests, it’s a quick win to consolidate those examples into a single data-driven test that exercises all of those cases. This can help us in several ways:

  • Removes duplication
  • Offers an opportunity to document the rule instead of the examples (e.g., fourthFibonacciNumberIsTwo(), sixthFibonacciNumberIsFive() can become fibonacciNumberIsSumOfPreviousTwo() )
  • It opens a door to much more exhaustive testing with surprisingly little extra code

Maybe those 100 tests could be a handful of parameterised tests?

The fourth thing I wanted to talk about is over-reliance on mock objects. Mocks can be a great tool for achieving cohesive, loosely-coupled modules in a Tell, Don’t Ask style of design – I’m under the impression that’s why they were originally invented. But as they give with one hand, they can take away with the other. The irony with mocks is that, while they can lead us to better encapsulation, they do so by exposing the internal interactions of our modules. A little mocking can be powerful design fuel, but pour too much of it on your tests and you’ll burn the house down.

And the final thing I wanted to highlight is the granularity of our tests. Do we write a test for every method of every class? Do we have a corresponding test fixture for every module in the implementation? My experience has been that it’s neither necessary nor desirable to have test code that sticks to your internal design like cheese melted into a radio.

At the other extreme, many teams have written tests that do everything from the outside – e.g., from a public API, or at the controller or service level. I call these “keyhole tests”, because when I’ve worked with them it can feel a little like keyhole surgery. My experience with this style of testing is that, for sure, it can decouple your test code from internal complexity, but at a sometimes heavy price when tests fail and we end up in the debugger trying to figure out where in the complex internal – and non-visible – call stack things went wrong. It’s like when the engineer from the gas company detects a leak somewhere in your house by checking for a drop in pressure at the meter outside. Pinpointing the source of the leak may involve ripping up the floors…

The truth, for me, lies somewhere in between melted cheese and keyhole surgery. What I strive for within a body of code is – how can I put this? – internal APIs. These are interfaces within the implementation that encapsulate the inner complexity of a particular behaviour (e.g, the cluster of classes used to calculate mortgage repayments). They decouple that little cluster from the rest of the implementation, just as their dependencies are decoupled from them in the same S.O.L.I.D. way. And their interfaces tend to be stable, because they’re not about the details. Tests written around those interfaces are less likely to need to change, but also more targeted to a part of the design instead of the whole call stack. So when they fail, it’s easier to pinpoint where things went wrong. (Going back to the gas leak example, imagine having multiple test points throughout the house, so we can at least determine what room the leak is in.)

Even for a monolith, I aim to think of good internal architecture as a network of microservices, each with its own tests. By far the biggest cause of brittle tests that I’ve seen is that your code quite probably isn’t like that. It’s a Big Ball of Mud. That can be exacerbated by leaving all the duplication in the test code, and/or by over-reliance on mock objects, plus a tendency to try to write tests for every method or function of every class.

You want tests that run fast and pinpoint failures, but that also leave enough wiggle room to easily refactor what’s happening behind those internal APIs.