“Productivity”. You Keep Using That Word.

Bill writes a book with about 80,000 words. It takes him 500 hours.

Priti writes a book with about 60,000 words. It takes her 2,000 hours.

Which author is most productive?

It’s a nonsensical question, of course.

Maybe Priti’s book sold 10x what Bill’s did. Maybe it won the Booker Prize. Maybe Priti was commissioned straight away to write another book, with a big advance. Maybe Steven Spielberg bought the rights to Priti’s book and he’s going to make it into a blockbuster movie franchise.

Maybe Bill’s kids, who he wrote the book for, absolutely love it, and he doesn’t care what anybody else thinks. Maybe Bill’s book sold 100 copies and changed 100 lives. Maybe, in writing the book, Bill came to terms with a past trauma and is moving on with his life.

Here’s the thing about “productivity”: until we know what the goals are, it’s meaningless.

And if they have distinctly different goals, it’s also meaningless to measure Priti’s against Bill’s performance. Bill is apples, and Priti is oranges.

The industry’s recent obsession with developer productivity is equally meaningless. More code faster is greater productivity? More features? More Pull Requests?

If Priti’s book has more chapters than Bill’s, is Priti a more productive author?

If we create a software solution with the aim of reducing the cost of deliveries for our business, and the cost goes up, do we look in the repo to see if it was because we didn’t write enough code or do enough commits?

Or do we sit down and have a good long think about what changes we could make that would bring us closer to the goal? Do we observe the system in action to see where costs might be accruing?

And would that good long think and that observing end up as a statistic in some study about “lazy developers”, because we’re not producing more stuff?

Any formula that aims to describe software development productivity should include a term for value created. And teams should be empowered to move that dial – to choose their battles so their expensive and limited time is better-spent.

And “value” is very much in the eye of the beholder. What matters to me might not matter to you. It could be measured in dollars or pounds or yen. It could be measured in cats rehomed. It could be measured in meals on wheels delivered. It could be measured in lives saved.

And it’s usually multi-dimensional. Many businesses have been dashed on the rocks of a blinkered outlook, chasing one target at the expense of all others.

“If you give a manager a numerical target, he’ll make it even if he has to destroy the company in the process.”

– W. Edwards Deming.

Tensions within a business – like the HR team trying to improve employee morale while finance are cutting childcare – can be created by failing to consider the dependencies between outcomes, and to balance the needs of multiple groups of stakeholders.

With any goal, it’s important to explore how it might impact in different perspectives. Sure, we can cut costs on ingredients, but will the customers still enjoy the taste? All very well increasing margins, but for naught if we’re losing custom.

In the average software organisation, so little thought is given to why we’re building what we’re building. And when it is (usually by business stakeholders), those goals are not often communicated to development teams. It’s one of the most common complaints I hear from developers – nobody’s told them what that feature or change is actually supposed to achieve. What problem does it solve?

And this can easily translate into dysfunctions in the organisation of teams. Teams organised around technology stacks and technical disciplines -“front-end”, “back-end”, “database”, “QA”, “ops”, “architecture”, “UX design” – have about as much chance of achieving a business goal as a sack full of ferrets has of changing a lightbulb.

Smart companies organise around business outcomes, creating cross-functional teams of technical and business experts tasked with solving a business problem. We are on the same team because we share the same goal.

And when we iterate working software into the business, we have a clear idea of what it is we’re iterating towards, and ways to know when we’re getting closer to achieving it. Software development is an iterative, goal-seeking process.

Each release is an experiment. We don’t just push code out the door and move on. We observe as the experiment plays out, and learn for the next iteration.

Don’t optimise for delivery. Optimise for learning.

And yes, that’s going to create gaps in your commit history. But just because you shipped, that doesn’t mean you’re done yet.

The AI-Ready Software Developer #17 – Back To The Future (Again)

In the earliest days of stored-memory programmable electronic computers, programs were captured by punching holes into cards, each representing an individual binary digit (bit) that could be read into memory.

The people who did this were highly skilled computer programmers.

In 1952, everything changed when Admiral Grace Hopper invented the A-0 compiler that automatically translated high-level human-readable instructions into binary machine code. “Automatic Programming”… well… completely automated programming.

This was first time an advance in computer programming tools completely eliminated the need for specialised programmers. As everyone knows, there have been no computer programmers since 1952.

The second time an advance in programming technology completely eliminated the need for programmers was in the late 1950s, when third-generation compiled languages like COBOL and Fortran were invented, enabling users to describe what software they want in high-level language and have the machine automatically generate the low-level machine code.

As I’m sure we all recall from our history class, there have been no computer programmers since 1959.

Then, through the 1960s, high-level “problem-oriented languages” like LISP and ALGOL completely eliminated the need for computer programmers all over again. Now users could simply express the goals of the system in high-level language, and low-level code would be automatically generated by the machine.

That’s why there haven’t been any computer programmers since the 1960s.

Programming was completely eliminated once again in the 1970s and 80s by fourth-generation languages like Informix-4GL and Focus, that enabled users to describe what software they want in high-level language. That’s why you’ll never meet a programmer under the age of 55.

The 1990s saw the rise of visual modeling and Computer-Aided Software Engineering – which, remember, wasn’t a thing because it died out in the 1950s – and now complex computer systems could be designed by, y’know, cats or horses or whatever just describing at a high-level what software they want, with no further need for computer programmers.

This is why the only place you’ll get to see a code editor these days is in a museum. We’ve had no need for them since 1999.

“AI” coding assistants are the latest advance in programming technology that is eliminating the need for computer programmers (that haven’t existed since 1952, remember?)

Users can just describe what software they want in high-level natural language, and the language model, with the aid of some non-AI gubbins on top, will generate a complete working solution for them.

Sound familiar?

I’m being facetious, of course. Programming, as a profession, didn’t die out in 1952, or 1959. By then, there were just more computer programs, and more computer programmers.

With every previous advance in programming technology, that’s been the result: more software, and more software developers. Making it easier and more accessible has just increased demand. This is an example of Jevons’ Paradox.

The other reason why specialised programmers have always been in demand is because of the inherent ambiguity of natural human languages. Although 3GLs like COBOL look like English at a glance, they’re actually quite different. A statement written in COBOL can mean one – and only one -thing. A statement written in English might have multiple possible interpretations.

So, creators of program compilers had to necessarily invent formal languages to instruct them with.

It turns out that the really hard part of computer programming is expressing ourselves formally and precisely in a way that can be automatically translated into machine instructions, regardless of the level of abstraction.

The people creating the COBOL and Fortran programs had to become programmers. The people creating the Focus programs had to become programmers. The people creating spreadsheet applications had to become programmers. The people dragging and dropping visual components in WYSIWYG editors had to become programmers. The people creating the executable UML models had to become programmers. The people snapping together reusable No-Code/Low-Code widgets had to become programmers. They all had to learn to think like a computer.

And the people creating “AI”-generated software will necessarily have to become programmers.

“This time it’s different, Jason.”

Well, I’ve heard that before. And the folly of believing we can accurately specify software using natural language, and the evidence we’ve seen so far, suggests that it’s going to be no different this time. Human intent is too nuanced for computers.

Also, if we were to view language models as being the same as compilers, we’d be making a category mistake. LLMs are not deterministic like a compiler. Every time you hit the “Build” button, you get a different computer program (if it’s able to complete the program at all without a human intervening.)

The uncertainty in our natural languages, coupled with the now-famous unreliability and stochastic nature of LLMs, mean that a human programmer will still be required.

The other matter is the central thesis of this series of posts: that most dev teams using “AI” coding assistants are not getting any value out of them. The tools are actually making the bottlenecks in software delivery worse, leading to even longer delays, more problems in production – hello, everyone at Amazon Web Services! – and rapidly increasing maintainability problems.

These teams would actually go faster if they stopped using “AI” to generate or modify code.

And that might give them some breathing room to address the real bottlenecks in their process.

“Ah, but Jason, AI coding assistants are getting better every day.”

Are they, though? We’ve been seeing a very obvious plateauing of the capabilities – in particular, the accuracy – of LLMs for over a year now. Scaling, once touted as the route to Artificial General Intelligence (AGI) has now clearly hit a wall. In many cases, the bigger they’ve try to make the models recently, the less reliable their performance seems to have become in key areas, like code generation. And LLMs are the engines of these coding assistants (for the time being).

No doubt, the developers of the IDEs, CLIs and “agents” that sit atop the LLM are learning how to work around the technology’s limitations – in many cases, building on principles that are discussed in this series.

But that, too – constrained by an “intelligence” that’s not likely to get much smarter for the foreseeable future – will hit its limits. There’s only so much you can do with a Markdown file and a “while” loop.

So we need to cut our cloth. This, folks, is about as good as its gonna get – perhaps in my lifetime.

But let’s not be a Gloomy Gary! With the technology as it is today, some teams are getting – admittedly modest – benefit from it.

The irony is that those teams were already high-performing, according to the DORA data. They’d already addressed the bottlenecks, the blockers and the leaks in their development process.

The key to being effective with “AI” coding assistants is being effective without them.

When they attach the code-generating firehose, they still don’t get the power shower that the AI industry promised, but they can feel a difference; a noticeably stronger jet.

And, with investment in skills, in process, and in automation of the old-fashioned kind, theoretically any team could reap these benefits – just not necessarily today.

So what of the future, then? The real, likely future?

Multiple lines of research seem now to be converging on the benefits – in reliability, cost, energy etc -of much smaller models than the hyperscale frontier LLMs that we’ve been focusing on up to now.

Models with a few billion parameters, created by distilling much larger models perhaps, are already becoming more popular. They can run locally on high-end consumer hardware; no need for a data centre with a 100 MW power supply.

And research is delving deeper into even smaller models, with just millions of parameters, targeted at niche applications (like code generation).

Perhaps the future of “AI” coding models will be small, local and application-specific? Artificial neural networks are amenable to a divide-and-conquer approach to training and inference. Half the model might require a quarter of the compute.

Maybe the hyperscale general-purpose models we see today – with all the economic, environmental and societal downsides they’ve proven to bring with them – are the biggest we’ll ever see? Maybe the trend for hyper-scaling and Cloud-based AI will go into reverse?

For sure, at this scale, they’re a massive loss-leader for companies like Anthropic and OpenAI, and certainly not worth it for the modest productivity gains that a small portion of teams are reporting.

It won’t come as a huge surprise if the industry decides that hyper-scaling – for a bunch of reasons – just isn’t worth it.

To sum up, “AI” coding assistants are probably here to stay, but they are pretty much as good as they’re going to get for the foreseeable future. That means all the gains going forward aren’t going to come from the technology itself, but how we use it.

T’was ever thus, going all the way back to A-0. Development team productivity has always been systemic, and not about individual output. A bad development system will beat a code-generating firehose every time.

And that’s what “The AI-ready Software Developer” is all about.

The AI-Ready Software Developer #16 – A Token of Our eXtreme

Some of you reading the posts in this series might be thinking, “This all sounds a bit familiar”. And you’d be right.

“AI” coding assistants built around Large Language Models may be a relatively new technology, but we’re discovering that the best ways to use them are decades old.

Some teams have been working in small batches – solving one problem at a time – and testing, reviewing, refactoring and integrating their code continuously for decades.

Some teams are cohesive, cross-functional and largely autonomous – adapting and self-organising to address problems in the moment, instead of waiting for permission from above.

Some teams have been using examples to pin down the meaning of system requirements, and driving the design of the software directly from executable interpretations of those examples (you may know them as “tests”), since the 1950s.

One software development methodology in particular encapsulates all of the principles and practices we’ve explored in this series: eXtreme Programming (XP).

XP was born in the mid-1990s, and is most closely associated with Kent Beck. It’s a shining example of what can happen when you get the right people in the room.

It was undoubtedly the main inspiration for the Agile Software Development movement that started at a ski resort – where else? – in 2001, but was subsequently somewhat overshadowed by it after. But for many of us, when you say “Agile”, we still hear “XP”.

If you look carefully at the group photo on the Agile Manifesto’s home page, and check the original signatories, you’ll see that most of the people attending that summit – like Ron Jeffries and Ward Cunningham – were closely involved with the early evolution of XP.

In this new age of “AI”-assisted programming, XP is experiencing something of a renaissance – although many folks currently rediscovering the approach might not realise it already has a name. So let me fill in the blanks.

eXtreme Programming brought together key lessons about what works and what doesn’t in developing software learned by programmers over the preceding 40 years.

Core to the technical practices of eXtreme Programming is a micro-iterative process we now call Test-Driven Development (TDD).

In TDD, we work in small steps – solving one problem at a time. We specify using examples (tests). We test continuously. We review the code continuously. We refactor continuously. And when we’re refactoring, we make one small change at a time, testing and reviewing the results at every step.

Many of us build version control into the TDD micro-cycle, committing changes whenever the tests are all “green”. Some even revert if any tests fail and try again, perhaps taking a smaller, safer step. And many of us push our changes directly to the trunk branch multiple times an hour, rather than waiting until a feature’s been completed.

XP teams will often work in pairs so that, as well as having an extra brain for problem-solving and providing direction, there’s also an extra pair of eyes reviewing the code as it’s being written.

XP teams tackle architecture in a highly collaborative and ongoing fashion. Always striving for simplicity, XP teams will have short design sessions throughout the day, using simple modeling techniques to visualise, understand, plan and communicate software design. (A very common misconception about XP is that teams “don’t do any design planning”. They’re doing it all the time.)

XP teams tend to be small and cohesive, encapsulating the skills needed to deliver customer requirements end-to-end whenever possible.

They are also highly autonomous and self-organising, making decisions together when they need to be made, instead of sending them up the chain of command. In XP, the team – working closely with the customer – is in command.

eXtreme Programming works by minimising uncertainty, and it does this by minimising the amount of work in progress – solving one problem at a time, maximising focus and minimising cognitive load – and maximising objective feedback. Basically, they turn the cards over one at a time, as they’re being dealt.

In terms of team productivity, the small batch sizes, fast feedback loops and continuous – rather than phased (blocking)- design, testing, review and integration tends to minimise bottlenecks and maximise the flow of value. Skilled XP teams tend to have very short delivery lead times, produce very stable releases, and are able to sustain the pace of delivery for years on the same product.

When we’re using “AI” coding assistants, uncertainty is even more in play. We can minimise uncertainty in pretty much exactly the same kinds of ways. Smaller steps with less ambiguity, and faster feedback.

Many “AI”-assisted developers are learning that a process like TDD can significantly reduce the “downstream chaos” that DORA data shows plagues most teams using these tools.

Working one test case (one example) at a time reduces context – the LLM equivalent of cognitive load – and specifying with tests minimises semantic ambiguity, dramatically reducing the risk of models misinterpreting our requirements.

It also gives us many opportunities to test (and re-test) the code in smaller feedback cycles, as well as many more opportunities to review generated code and refactor – again, one small step at a time – if there are problems (which there will be, and often!) We can use a process like TDD to keep the model on a very tight leash.

Combine TDD with merciless version control – commit on green, revert on red – to keep the code base on the path of working (shippable) software in every small increment – and do frequent merges to the release branch, and you have an approach that is, in most key respects, eXtreme Programming.

You can call it “eXtreme Vibing”, if you think that will look better on your CV.