Code Craft’s Value Proposition: More Throws Of The Dice

Evolutionary design is a term that’s used often, not just in software development. Evolution is a way of solving complex problems, typically with necessarily complex solutions (solutions that have many interconnected/interacting parts).

But that complexity doesn’t arise in a single step. Evolved designs start very simple, and then become complex over many, many iterations. Importantly, each iteration of the design is tested for it’s “fitness” – does it work in the environment in which it operates? Iterations that don’t work are rejected, iterations that work best are selected, and become the input to the next iteration.

We can think of evolution as being a search algorithm. It searches the space of all possible solutions for the one that is the best fit to the problem(s) the design has to solve.

It’s explained best perhaps in Richard Dawkins’ book The Blind Watchmaker. Dawkins wrote a computer simulation of a natural process of evolution, where 9 “genes” generated what he called “biomorphs”. The program would generate a family of biomorphs – 9 at a time – with a parent biomorph at the centre surrounded by 8 children whose “DNA” differed from the parent by a single gene. Selecting one of the children made it the parent of a new generation of biomorphs, with 8 children of their own.

biomorph
Biomorphs generated by the evolutionary simulation at http://www.emergentmind.com/biomorphs

You can find a recreation and more detailed explanation of the simulation here.

The 9 genes of the biomorphs define a universe of 118 billion possible unique designs. The evolutionary process is a walk through that universe, moving just one space in any direction – because just one gene is changing with each generation – with each iteration. From simple beginnings, complex forms can quickly arise.

A brute force search might enumerate all possible solutions, test each one for fitness, and select the best out of that entire universe of designs. With Dawkins’ biomorphs, this would mean testing 118 billion designs to find the best. And the odds of selecting the best design at random are 1:118,000,000,000. There may, of course, be many viable designs in the universe of all possible solutions. But the chances of finding one of them with a single random selection – a guess – are still very small.

For a living organism, that has many orders of magnitude more elements in their genetic code and therefore an effectively infinite solution space to search, brute force simply isn’t viable. And the chances of landing on a viable genetic code in a single step are effectively zero. Evolution solves problems not by brute force or by astronomically improbable chance, but by small, perfectly probable steps.

If we think of the genes as a language, then it’s not a huge leap conceptually to think of a programming language in the same way. A programming language defines the universe of all possible programs that could be written in that language. Again, the chances of landing on a viable working solution to a complex problem in a single step are effectively zero. This is why Big Design Up-Front doesn’t work very well – arguably at all – as a solution search algorithm. There is almost always a need to iterate the design.

Natural evolution has three key components that make it work as a search algorithm:

  • Reproduction – the creation of a new generation that has a virtually identical genetic code
  • Mutation – tiny variances in the genetic code with each new generation that make it different in some way to the parent (e.g., taller, faster, better vision)
  • Selection – a mechanism for selecting the best solutions based on some “fitness” function against which each new generation can be tested

The mutations from one generation to the next are necessarily small. A fitness function describes a fitness landscape that can be projected onto our theoretical solution space of all possible programs written in a language. Programs that differ in small ways are more likely to have very similar fitness than programs that are very different. Make one change to a working solution and, chances are, you’ve still got a working solution. Make 100 changes, and the risk of breaking things is much higher.

Evolutionary design works best when each iteration is almost identical to that last, with only one or two small changes. Teams practicing Continuous Delivery with a One-Feature-Per-Release policy, therefore, tend to arrive at better solutions than teams who schedule many changes in each release.

And within each release, there’s much more scope to test even smaller changes – micro-changes of the kind enacted in, say, refactoring, or in the micro-iterations of Test-Driven Development.

Which brings me neatly to the third component of evolutionary design: selection. In nature, the Big Bad World selects which genetic codes thrive and which are marked out for extinction. In software, we have other mechanisms.

Firstly, there’s our own version of the Big Bad World. This is the operating environment of the solution. A Point Of Sale system is ultimately selected or rejected through real use in real shops. An image manipulation program is selected or rejected by photographers and graphic designers (and computer programmers writing blog posts).

Real-world feedback from real-world use should never be underestimated as a form of testing. It’s the most valuable, most revealing, and most real form of testing.

Evolutionary design works better when we test our software in the real world more frequently. One production release a year is way too little feedback, way too late. One production release a week is far better.

Once we’ve established that the software is fit for purpose through customer testing – ideally in the real world – there are other kinds of testing we can do to help ensure the software stays working as we change it. A test suite can be thought of as a codified set of fitness functions for our solution.

One implication of the evolutionary design process is that, on average, more iterations will produce better solutions. And this means that faster iterations tend to arrive at a working solution sooner. Species with long life cycles – e.g., humans or elephants – evolve much slower than species with short life cycles like fruit flies and bacteria. (Indeed, they evolve so fast that it’s been observed happening in the lab.) This is why health organisations have to guard against new viruses every year, but nobody’s worried about new kinds of shark suddenly emerging.

For this reason, anything in our development process that slows down the iterations impedes our search for a working solution. One key factor in this is how long it takes to build and re-test the software as we make changes to it. Teams whose build + test process takes seconds tend to arrive at better solutions sooner than teams whose builds take hours.

More generally, the faster and more frictionless the delivery pipeline of a development team, the faster they can iterate and the sooner a viable solution evolves. Some teams invest heavily in Continuous Delivery, and get changes from a programmer’s mind into production in minutes. Many teams under-invest, and changes can take weeks or months to reach the real world where the most useful feedback is to be had.

Other factors that create delivery friction include the maintainability of the code itself. Although a system may be complex, it can still be built from simple, single-purpose, modular parts that can be changed much faster and more cheaply than complex spaghetti code.

And while many BDUF teams focus on “getting it right first time”, the reality we observe is that the odds of getting it right first time are vanishingly small, no matter how hard we try. I’ll take more iterations over a more detailed requirements specification any day.

When people exclaim of code craft “What’s the point of building it right if we’re building the wrong thing?”, they fail to grasp the real purpose of the technical practices that underpin Continuous Delivery like unit testing, TDD, refactoring and Continuous Integration. We do these things precisely because we want to increase the chances of building the right thing. The real requirements analysis happens when we observe how users get on with our solutions in the real world, and feed back those lessons into a new iteration. The sooner we get our code out there, the sooner can get that feedback. The faster we can iterate solutions, the sooner a viable solution can evolve. The longer we can sustain the iterations, the more throws of the dice we can give the customer.

That, ultimately, is the promise of good code craft: more throws of the dice.

 

Code Craft is Seat Belts for Programmers

Every so often we all get a good laugh when some unfortunate new hire or intern at a major tech company accidentally “deletes Google” on their first day. It’s easy to snigger (because, of course, none of us has ever messed up like that).

The fact is, though, that pointing and laughing when tech professionals make mistakes doesn’t stop mistakes getting made. It can also breed a toxic work culture, where people learn to avoid mistakes by not taking risks. Not taking risks is anathema to innovation, where – by definition – we’re trying stuff we’ve never done before. Want to stifle innovation where you work? Pointing and laughing is a great way to get there.

One of the things I like most about code craft is how it can promote a culture of safety to try new things and take risks.

A suite of good, fast-running unit tests, for example, makes it easier to spot our boos-boos sooner, so we can un-boo-boo them quickly and without attracting attention.

Continuous Integration offers a level of un-doability that makes it easier and safer to experiment, safe in the knowledge that if we mess it up, we can get back to the last version that worked with a simple hard reset.

The micro-cycles of refactoring mean we never stray far from the path of working code. Combine that with fast-running tests and frequent commits, and ambitious and improbable re-architecting of – say – legacy code becomes a sequence of mundane, undo-able and safe micro-rewrites.

And I can’t help feeling – when I see some poor sod getting Twitter Heat for screwing up a system in production – that it was the deficiency in their delivery pipeline that allowed it to happen that was really at fault. The organisation messed up.

Software development’s a learning process. Think about when young children – or people of any age – first learn to use a computer. The fear of “breaking it” often discourages them from trying new things, and this hampers their learning process. never underestimate just how much great innovation happens when someone says “I wonder what happens if I do this…” Remove that fear by fostering a culture of “what if…?” shielded by systems that forgive.

Code craft is seat belts for programmers.

Code Craft is More Throws Of The Dice

On the occasions founders ask me about the business case for code craft practices like unit testing, Continuous Integration and refactoring, we get to a crunch moment: will this guarantee success for my business?

Honestly? No. Nobody can guarantee that.

Unit testing can’t guarantee that. Test-Driven Development can’t guarantee that. Refactoring can’t guarantee it. Automated builds can’t guarantee it. Microservices can’t. The Cloud can’t. Event sourcing can’t. NoSQL can’t. Lean can’t. Scrum can’t. Kanban can’t. Agile can’t. Nothing can.

And that is the whole point of code craft. In the crap game of technology, every release of our product or system is almost certainly not a winning throw of the dice. You’re going to need to throw again. And again. And again. And again.

What code craft offers is more throws of the dice. It’s a very simple value proposition. Releasing working software sooner, more often and for longer improves your chances of hitting the jackpot. More so than any other discipline in software development.

Codemanship Twitter Code Craft Quiz – Answers

Yesterday evening – for fun and larks – I posted 20 quiz questions about code craft as Twitter polls. It’s been fun watching the percentages for each answer emerge, but now it’s time to reveal my answers so you can see how yours compare.

The correct answer is Always Shippable. The goal of CD is to empower our customer to release our software whenever they choose, without having to go through a long testing and release process. Many of the principles and practices of code craft – e.g., unit testing and TDD – contribute to that goal.

Evidently, a lot of folk get Continuous Delivery confused with Continuous Deployment, and that’s understandable because the name kind of implies something similar. Perhaps we should have called it “Continuously Shippable”?

The correct answer is Comment Block. There’s no such refactoring. If you want to remove code, do a Safe Delete (delete code, but only if no other code references it). If you want to keep old code, use version control.

The correct answer is Refactoring. They were separate disciplines in the original description of Extreme Programming practices, but folk quickly realised that refactoring needed to be an explicit step in the TFD process.

The correct answer is Tell, Don’t Ask. The goal of Tell, Don’t Ask is to better encapsulate – hide the data of – modules so that they know less about each other.

The correct answer is Feature Envy. Feature Envy is when a method of one class references the features of another class – typically the data – more than its own. It’s “Ask, Don’t Tell”.

The best answer is Examples. Yes, it is true that BDD uses executable specifications, but what makes those specifications executable? The thing that makes them executable is the thing that makes them precise and unambiguous – Examples! BDD, TDD and ATDD are all examples of Specification By Example.

The correct answer is the Facade pattern.

The correct answer is Property-Based Testing. This is sometimes more descriptively called “Generative Testing”, because we write code to generate potentially very large sets of test inputs automatically (e.g., random numbers, combinations of inputs, etc). It has a similar aim to Exploratory Testing, but isn’t manual like ET, and therefore can scale to mind-boggling numbers of test cases with minimal extra code, and run far, far faster.

The correct answer is Automated Testing. If it takes you 5 hours to manually re-test your software, you can only check in safely every 5 hours at the earliest. Which doesn’t sound very “continuous” to me. Good to see that message getting through.

The best answer is Stubs and Mocks. The challenge in testing multithreaded logic is that thread scheduling – e.g., by the OS or a VM – is usually beyond our control, so we can’t guarantee how operations in separate threads will be interleaved. This can lead to unpredictable test results that are difficult to reproduce – “Heisenbugs” and “flickering builds”. One simple way to reduce this effect is to test as much “multithreaded” logic as possible in a single thread. Test Doubles can be used to pretend to be the other end of a multithreaded conversation. For example, we can use mock objects to test that callbacks were invoked as expected, or we can use stubs that provide synchronous implementations of asynchronous methods. The goal is to get as much of the logic as possible into places where it can be tested synchronously. This is compatible with a goal of good multithreaded code design – which is to have a little of it as possible.

The correct answer is Tell, Don’t Ask. I was very surprised by how few people got this. Tell, Don’t Ask is about designing more cohesive classes in order to reduce class coupling. The underlying goal of Common Closure – things that change together belong together – and Common Reuse – things that are reused together belong together – is more cohesive packages, in order to reduce package coupling. They share the goal of improving encapsulation. IMO, package design principles have been historically explained poorly, and this may go some way to explaining why a lot of developers struggle to grok them. In practice, they’re the exact same principles at the class/module and package level. The way I try to explain them attempts to be consistent at every level of code organisation.

The correct answer is 3. This is about the Rule of Three. We wait to see three examples of code duplication before we refactor it into a single generalisation or abstraction. The rule of thumb describes a simple way to balance the risks of refactoring too early, before we’ve seen enough examples to form a good abstraction (the number one cause of “leaky abstractions”), and refactoring too late, when we have more duplication to deal with.

The best answer is Identify Change Points. In his book, Working Effectively With Legacy Code, Michael Feathers describes a process for safely making changes – i.e., with the benefit of fast-running automated tests (“unit tests”) – to legacy software. There are two reasons why I wouldn’t start by writing a system test:

  1. How do I know what system tests I’ll need without identifing which features lie on the critical path of the code I’m going to change? Do I write system tests for all of it?
  2. How long do I want to live with those system tests? Is it worth writing them just to have them for as long a it takes to introduce unit tests? My goal is to get fast-running tests for the logic in place ASAP.

If I’m refactoring code that has few or no automated tests, a Golden Master – a test that uses an example output (e.g., a web page) to compare against any potentially broken output – can be a relatively quick way of establishing basic safety. But, again, how do I know what output(s) to use without identifying which features would need to be retested for the change I’m planning to make. And a Golden Master test would effectively be another slow-running system test, which I probably wouldn’t want to live with for long enough to justify writing one in the first place.

After we’ve identified what parts of the code need to change, our goal should be to get fast-running tests around those parts of the code. While we break any dependencies that are getting in our way, I will usually re-test the software manually. Gasp! The point being, I’m not manually testing it for very long before I can add unit tests. It might take me a morning. Is it worth automating system tests that you’re not going to want to rely on going forward, just for a morning?

Having said all that, if I was the only developer on my team writing unit tests on a legacy system, I’d introduce a Golden Master into the build pipeline to protect against obvious regressions. But not on a per change basis. I’d do that before even thinking about changes.

The best answer is Check In. I would have hoped that wouldn’t need explaining! A big part of the discipline of Continuous Integration is to try to ensure that the code you have in VCS – the code that is, in theory, always shippable – is never broken. When it is broken – for whatever reason – any changes you push on to it risk being lost if the code has to be reverted. Plus, there’s no way of knowing if your build succeeded. Don’t push on to broken code.

The correct answer is C++. If I change a C++ interface, even clients that aren’t using the methods I changed have to be recompiled, re-tested and re-deployed. C++ clients bind to the whole interface at compile time. In dynamic languages, this generally isn’t the case. Ruby, Python and JavaScript clients bind at runtime, and only to the methods they use. Indeed, the object doesn’t even have to have a specific type, just as long as it implements compatible methods. Much of S.O.L.I.D. is language-dependent in this way.

The correct answer is See The Test Fail. More specifically, see the test assertion fail. So you know, going forward, it’s a good test that you can rely to fail when the result is wrong. Test your tests.

The best answer is When The Tests Pass. Refactoring was added as an explicit step in the TDD micro-cycle. But refactor what, exactly? I encourage developers to do a little review on code they’ve added or changed whenever they get to a green light:

  • Is it easy to understand?
  • Is there duplication I should remove?
  • Is it as simple as I can make it?
  • Does each part do one thing?
  • Is there Feature Envy between modules?
  • Are modules exposed to things they’re not using?
  • Are module dependencies easily swappable?

I find from experience and from client studies that code reviews on a less frequent basis tend to be too little, too late. TDD and refactoring and CI/CD are practices specifically aimed at breaking work down into the smallest chunks, so we can get very frequent feedback, and bring more focus to each design decision.

And when we’re programming in pairs, the thinking is that code review is continuous. It’s one of the main reasons we do it.

When we chunk code reviews into pull requests – or even larger batches of design decisions – we tend to miss a whole bunch of things. This is borne out by the resulting quality of the code.

I also see how, for many teams, pull requests become a significant bottleneck, which is usually the consequence of batching feedback. The whole point of Extreme Programming is to turn all the good dials up to 11. PR code reviews set the dial at about 5-6.

If you still feel your merge process needs that last line of defence, consider investing in automating code quality testing in your build pipeline instead.

It’s a hot take for PR fans, I know! You may now start throwing the furniture around.

The best answer is Refactoring. This has been a painful lesson for many, many developers. When we open up discussions about refactoring with people who manage our time, the risk is that we’re inviting them to say “no” to it. And, nine times out of ten, they will. Which is why 9 out of 10 coe bases end up too rigid and brittle to accomodate change, and the pace of innovation slows to a very expensive crawl.

Refactoring is an absolutely essential part of code craft. We should be doing it continuously. It’s part of how we write code. End of discussion.

The correct answer is Liskov Substitution. The LSP states that we should be able to substitute an instance of any class with an instance of any of its subclasses. (In modern parlance, we might use the word “type” instead of “class”.) This is all about contracts. If I define an interface for, say, a device driver to be used with my operating system, there are certain rules all device drivers need to obey to function correctly in my OS. I could write a suite of contract tests – tests that are written against that interface, with the actual implementation under test deferred/injected – so that anyone implementing a device driver can assure themselves it will obey the device driver contract. Indeed, this is exactly what Microsoft did for their device driver interfaces.

The best answer is True. Now, this is going to take some explaining…

Firstly, if we include Specification By Example in code craft – which I do – then a good chunk of it is about pinning down what the customer wants. It may not necessarily turn out to be what the customer needs, though. Which is what the rest of code craft is about.

The traditional view of requirements engineering is that we try to specify up-front what the customer needs and then deliver that. We learned that this doesn’t really work almost as soon as people started programming computers.

Our first pass at a solution will almost always be – to some extent – wrong. So we take another pass and get it less wrong. And another. And another. Until the solution is good enough for our customer’s real needs.

In building the right thing, feedback cycles matter more than up-front guesses. The faster we can iterate our design, the sooner we can arrive at a workable solution. Fast customer feedback cycles are enabled by code craft. The whole point of code craft is to help us learn our way to the right solution.

Acting on customer feedback means we’ll be changing the code. If the code is difficult to change, then we can’t sustain the pace of learning. The wrong design gets baked in to code that’s too rigid and brittle to evolve into the right design.

And software can have an operational lifespan that long surpasses the original needs of the customer. Legacy code is a very real and very damaging limiting factor on tens of thousands of businesses. Marketing would love to be able to offer their customers the spiffy new widget the competition just rolled out, but if it’s going to cost millions and take years, it’s not an option.

So, in a very real and direct sense, code craft is all about building the right thing by building it right.