The AI-Ready Software Developer #18 – “Productivity”. You Keep Using That Word.

It’s 20 years since I created a website with the banner “I Care About Software” as part of a loose “post-agile” movement that sought to step back from the tribes and factions that had grown to dominate software development at the time.

Regardless of whether we believed X, Y or Z was the “best way”, could we at least agree that the outcomes matter?

It matters if the software does what the user expects it to do. It matters if it does it reliably. It matters that it does it when they need it. It matters that when they need it to do something else, they don’t have to wait a year or three for us to bring them that change.

Unlike many other professions, and with few exceptions, we’re under no compulsion to produce useful, usable, reliable software or to be responsive to the customer’s needs. It’s largely voluntary.

We don’t usually get fined when we ship bugs. We won’t be sanctioned if the platform goes down for 24 hours. We won’t get struck off some professional register if the lead time on changes is months or years (or never).

(Of course, eventually, if we’re consistently bad, we can go out of business. But historically, another job – where we can screw up another business – hasn’t been difficult to find, even with a long trail of bodies behind us.)

And we don’t usually get a bonus for releases that go without incident, or a promotion for consistently maintaining short lead times.

In this sense, we have less incentive to do a good job than a takeaway delivery driver.

A friend once kindly introduced me to the project managers in her company to give them the old “better, sooner, for longer” pitch. I talked about teams I’d worked with who had built the capability to deliver and deliver and deliver, week after week, year after year, with no drama and no fires to put out.

They actually said the quiet part out loud: “But we get paid to put out the fires!”

For software developers, the carrot and the stick usually have very little to do with actual outcomes that customers and end users might care about. This is evidenced by the fact that so few teams keep even one eye on those outcomes.

The average development team doesn’t actually know how much of their time is spent fixing bugs instead of responding to user needs. They don’t know what their lead times are, or how they might be changing over the lifetime of the product or system. They’re often the last to know when the website’s down.

Most damning of all, the average development team has no idea what the users’ needs or the business goals of the product actually are. And that’s where the value that we all talk about really is, you’d have thought.

And so it’s entirely possible – inevitable, even – for the priorities of dev teams and of the people paying for and using the software to become very misaligned.

I’m always struck by the chasm that can grow between them, with developers genuinely believing they’re doing a great job while users just roll their eyes. You’d be surprised how often teams are blissfully unaware of how dissatisfied their customers are.

So, before you start that 2-year REPLACE ALL THE THINGS WITH RUST project, stop to ask yourselves “What impact would this have on overall outcomes?”

If your goal is to make your software more memory-safe, are there other ways that might be less radical or disruptive? (You might be surprised what you can do with static analysis, for example.)

Is it possible to do it a bit at a time, under the radar, to minimise the impact on customer-perceived value?

Will it really solve any problem the business actually has at all? I’m a fan of asking what the intended business outcomes are. You’d be amazed how often technical initiatives explode on contact with that question.

Which brings me to the topic de jour. The Gorman Paradox asks why, if “AI” coding assistants are having the profound impact on development team productivity many report – 2x, 5x, 10x, 100x (!) – we see no sign of that in the app stores, on business bottom lines, or in the wider economy? Where’s all this extra productivity going?

I also have to ask why the reports of productivity gains using “AI” vary so widely, with anecdotal reports of increases in excess of 1000%, and measured variances in the range of -20% to +20%.

The words doing all the work here are “anecdotal” and “measured”, I suspect. But also, in precisely what is being measured.

Optimistic findings are usually based on measurements of things the customer doesn’t care about – lines of code, commits, Pull Requests etc.

The pessimistic – or certainly less sensational – findings are usually based on measurements of things the customer does care about, like lead times, reliability and overall costs.

It’s well-understood why producing more code faster – faster than we can understand it and test it – tends to overwhelm the real bottlenecks in the software development process. So there’s no great mystery about how “AI” code generation can actually reduce overall system performance.

What has been mysterious is why some teams see it, and most teams don’t.

They attach a code-generating firehose to their process and can’t understand why the business is complaining that they’re not getting the power shower they were promised.

There is a candidate for a causal mechanism. Most teams don’t see the impact on systemic outcomes because they’re not looking.

So when a developer tells you that, say, Claude Code has made them 10x more productive, they’re not lying. (Well, okay, maybe some of them are.) They just have a very different understanding of what “productivity” means.

If we’re to survive as professionals in this “age of AI”, I recommend pinning your flag to the mast of user needs and business outcomes.

Most importantly, we should be measuring our success by the business goals of the software, or the feature, or the change. If the goal is to, say, increase our share of the vegan takeaway market, the ultimate test is whether in reality we actually do.

This is the ultimate definition of “Done”.

We claim to develop software iteratively, but that implies we’re iterating towards some goal. If iterations don’t converge, we get (literal) chaos – just a random walk through solution space. Which would be a sadly accurate summary of the majority of efforts, with most teams unable to articulate what the goals actually are. If, indeed, there are any.

Aligning Teams Around Shared Goals (Is A Very Good Idea)

Software development peeps: if I asked you what is the ultimate business goal of the software you’re working on, would you know? Are you sure it even has one?

I’m gonna tell you a story from my early days as a contractor working in London. I took over the lead in a team of 8 developers, and very quickly could see that things were going badly.

Putting aside all the technical obstacles they were wrestling with, like a heavy reliance on manual regression testing, and everybody trying to merge the day before a scheduled release etc, the thing that really struck me was the huge amount of time being spent on arguing.

Arguing about the tech stack. Arguing about the architecture. Arguing about the approach. The team was split into factions, all pulling in different directions, providing no net forward momentum.

In my experience, this is what happens when teams don’t have a clear direction to align on. What this team needs, I thought to myself, is a goal – a magnetic north to get them pointing roughly in the same direction.

So I went back to the business – because our business analysts couldn’t answer the question (and hadn’t asked, evidently) – and asked “What are you hoping to get from this new system?”

I’m going to change their goal to protect the innocent (it’s a bit of a giveaway). Let’s imagine they replied, “We’re looking to grow our vegan customer base”.

Now the team’s arguments had their magnetic north. How will this help grow our vegan customer base? You’d be surprised, in that light, how many of the contentions simply evaporated. (Or maybe you wouldn’t.)

You might be less surprised by the profound shift in the team’s focus. Use cases got dropped. New use cases were explored. The technical architecture got simplified. Communication improved, both within the team and with other dev teams, ops, and the business.

Because now we all had something in common to talk about.

Software products and systems don’t exist in a vacuum. They’re almost always part of something bigger. And if religion teaches us anything, it’s that people like to feel part of something bigger than themselves.

The shift in focus from delivering software to solving a problem can completely rewrite priorities and realign teams. And it completely pulls the rug from under what most developers think of as “productivity”.

Why does this matter more today?

We’re currently seeing our industry go through quite possibly it’s worst navel-gazing episode, certainly since I’ve been in it. I’ve never seen so many developers obsessing over the “how”, and not giving a moment’s thought to the “what”, let alone the “why”.

Who cares how fast we can climb the wrong mountains?

And finally, let’s pause to reflect on that word, “iterative”. Iteration without convergence is chaos – literally.

That rather begs the question, converging on what, exactly?

Are You Training Your Junior Developers, Or Hazing Them?

One of the ways I feel lucky in my software development career is in how I got started.

I learned programming by building – well, trying to build – programs. Complete working programs – mostly simple games – on computers I had total control over.

I designed the games (remember graph paper?) I composed the music. I wrote the code. I tested the programs – perhaps not as thoroughly or as often as I should have. I copied the C30 cassettes. And I swapped them for other home-produced games on the playground. I even sold a couple.

I was the CEO, CTO, head of sales and marketing, product manager and lead developer, head of distribution, and QA manager of my own micro-tech company… that just happened not to make any real money, but that’s a minor detail.

I did it all, hands-on.

Then, after I stumbled out of university and needed money, I freelanced for a while, working directly with customers to understand their requirements, designing user interfaces, writing code, designing databases, testing the software – perhaps not as thoroughly or as often as I should have – packaging and deploying finished products, and answering the phone when users encountered a problem. Which they did. Often.

So I started my career as a full lifecycle software developer: requirements, design, programming, databases, testing, releases and operations.

Did I screw it up at times? Oh boy, yeah! But I learned fast. I had to to get paid. And, importantly, I got to see the whole process, work with a range of technologies, and wear a bunch of different hats.

And these were only mini projects. The world didn’t burn down when my SQL corrupted some data. The work was relatively low on risk, but high on learning. It built my competence and my confidence quickly.

When I got my first salaried job as a “software engineer”, I was then given what I would describe as a 2-year apprenticeship where I learned a lot of foundational stuff that would have been damned useful to know when I was freelancing.

And I was encouraged to try my hand at a wide range of things. My many screw-ups just never made it into any releases. The guardrails were very effective.

Importantly, while I was given a fairly free reign, I was closely supervised and mentored by developers with many years more experience. And I was given a lot of training.

Sadly, this is very different to how most developers start their careers these days. Instead of creating a wide range of learning opportunities on low-risk work, entry-level devs are confined to narrow, menial tasks – typically the ones “senior” developers would prefer not to do. “Training”, for too many, looks more like hazing.

It’s not at all uncommon for a junior dev to spend 6-12 months doing little else but fixing bugs on production systems, or working through SonarQube issues, or manning the support hotline. “It’s all they’re good for.”

New features? Product strategy? Talking to customers? Architecture? UX? Process improvement? The interesting stuff? That’s senior work.

Most often, it’s the risk that they’ll make mistakes that deters managers from giving junior developers too much freedom. But that’s a fundamental misjudgment. Mistakes and failure are integral to the learning process. The real risk is that you’ll grow developers who are afraid to try.

And while they’re painting the proverbial fences, it’s rare that they get much structured training or mentoring, either. Most organisations view senior developers as too valuable to “waste” on such things.

I see it differently. I think there comes a point in a developer’s career where the real waste is not letting them share their experience.

In this sense, there are “three ages” of a software developer, as their focus shifts from mostly learning, to mostly doing, to mostly teaching.

The job of a junior developer is to grow into a productive and well-rounded practitioner. And the productivity of a junior developer should be measured not by how much they deliver, but by how fast they grow. Month-on-month, year-on-year, what difference do we see in their capability and their confidence?

Businesses are so obsessed with cooking with the green tomatoes, they forget that with more time and more watering, they’ll grow into far more versatile red tomatoes.

Keeping them in a narrow lane, blinkered to the wider development process, stunts their growth. I’ve met many devs with decades of experience who were to all intents and purposes still junior developers.

When we frame it in those terms, the emphasis shifts from “What value can we extract from this junior dev today?” to “What potential can we add?”

In that light, it makes sense to structure their work around providing the most valuable learning opportunities. If they create tangible business value along the way, all the better. But that’s not the primary aim. The primary aim is to produce better software developers, and their work is a vehicle for that.

And if they somehow manage to burn the house down, that’s a you problem. How did their mistake make it into production?

Refactoring Is Like Chess

When I’m introducing developers to refactoring, I draw a parallel between this hugely valuable – but much-misunderstood – design discipline and chess.

Primitive refactorings are like the moves of chess that apply to the different pieces on a chess board.

A bishop can move diagonally, a rook can move horizontally or vertically, and so on.

Likewise, there are “pieces” in our code we can rename, extract or introduce things from, inline, move, etc etc.

These are the smallest “moves” we can make when we’re refactoring that bring us back to code that works.

At a higher level, there are tactics. These are sequences of basic moves that achieve a specific purpose, with designations like “Clearance Sacrifice” and “Desperado”. Serious players might study hundreds or even thousands of them.

Refactoring, too, has its tactics – sequences of primitive refactorings that achieve a higher level goal. Many of those have their designations, like “Replace Conditional With Polymorphism”, “Introduce Method Object”.

Importantly, they’re executed as a sequence of primitive, behaviour-preserving refactorings like Extract Method and Introduce Parameter. So, no matter how long the sequence, we’re never far from working (shippable) code.

Of course, we could spend a lifetime studying tactics, and still not cover even a tiny fraction of the possibilities. It’s an infinite problem space.

At the highest level, chess has strategies. These are the organising principles – the end goals – of tactics:

  • Material Count
  • Piece Activity
  • Pawn Structure
  • Space
  • King Safety

Strategies in chess are about gaining positional advantage in a game going forward.

And, at the highest level, refactoring has its strategies, too – organising principles that make changing code easier going forward. This is the software design equivalent of positional advantage.

You may know them as “software design principles“:

  • Readability
  • Complexity
  • Duplication
  • Coupling & Cohesion
  • (the one we tend to forget) Testability

Each refactoring tactic is designed to gain us “positional advantage” in one or more of these dimensions to:

  • make code easier to understand
  • make it simpler
  • remove duplication (by introducing modularity/generality)
  • reduce coupling (by improving cohesion – 2 sides of the same coin)
  • make it easier to test quickly, which is often a very valuable side-effect – and sometimes the main goal – of the first 4

The most effective refactorers operate seamlessly across all 3 levels.

They’re thinking strategically about their design goals and measuring impact along those dimensions.

They’re thinking tactically, looking several refactorings ahead, to get them safely from A to B.

And they’re working one primitive refactoring at a time, keeping the code working all the way.

And, like chess, this can take a lifetime to master. Expert help is highly recommended if you want to grasp it faster, of course 🙂

The Gorman Paradox: An Explanation?

Yesterday I ruminated on why – if the claims about “AI” productivity gains in programming are to be believed – we see no evidence of significant numbers of “AI”-generated or “AI”-assisted software making it out into the real world (e.g., on app stores).

I get a sense that there may be some kind of “Great Filter” that prevents projects from evolving to that advanced stage. And I have a feeling I might know what it is.

I’m imaging software development as an iterative, goal-seeking algorithm.

What would its time complexity look like? I reckon the factors would be:

  • Batch size – how much changes in each iteration
  • Feedback “tightness” – how much uncertainty is reduced in each iteration
  • Cost of change – how able are we to act on that feedback?

I suspect “coding”, as a factor, would shrink to nothing at scale.

Basically, batch size, feedback loops and cost of change are doing the heavy lifting

I could go even further. Maybe the cost of change, at limits, becomes simply a function of how long it takes to understand the code and how long it takes to test it (and I’d include things like code review in testing).

Far from helping, attaching a code-generating firehose to development has already proven to work against us in these respects if we loosen our grip on batch sizes to gain the initial benefits.

And if we don’t loosen our grip – if we keep the “AI” on a tight leash – coding, as a factor, still shrinks to nothing in the Big O. Even the most high-performing teams see modest improvements at best in lead times. Most teams slow down.

All this might explain why the productivity gains of “AI” coding assistants vanish at scale, and why we see no evidence of significant numbers of “AI”-assisted projects making it out of the proverbial shed.

When user experience, reliability, security and maintainability matter, we’re forced to drink from the firehose one small mouthful at a time, taking deep breaths between so as not to let it overwhelm us. When you’re drinking from a firehose, the limit isn’t the firehose.

For sure, teams are using this technology on code bases where those things matter, but we’re already seeing from tech companies who’ve boasted publicly about how much of their code is “AI”-generated what the downstream consequences can be.

So, for real productivity gains, that constrains “AI” coding assistants to projects where those things don’t matter anywhere near as much. Personal projects, prototypes, internal tools, one-offs etc. I don’t think anybody disputes that this technology is great for those kinds of things. But they don’t often make it out of the shed.

At least, I very much hope they don’t.

I’ve done a lot of research and experimentation to try to establish how to get better results using LLMs, but I can’t hand-on-heart promise that they’ll do much more than mitigate harms. They’re very much focused on batch sizes, feedback loops and cost of change – the stuff we already know works, “AI” or not.

I have reasons to suspect that teams who are showing modest gains using “AI” have actually tightened up their feedback loops to adapt to the firehose, which could be thought of as a kind of stress test for development processes. It’s entirely possible that this is what’s giving them those small gains, and not the “AI” at all.

The Gorman Paradox: Where Are All The AI-Generated Apps?

In 1950, while discussing the recent wave of flying saucer reports over lunch with colleagues at Los Alamos National Laboratory in New Mexico, physicist Enrico Fermi asked a simple question.

There are hundreds of billions of stars in our Milky Way galaxy, and – presumed at the time – a significant percentage have Earth-like habitable planets orbiting them. The galaxy is billions of years old, and the odds are high that there should be other technological civilisations out there. But we see no convincing sign of them.

So, where is everybody?

This question is now know as the Fermi Paradox.

In the last couple of years, I’ve been seeing another paradox. Many people claim that working software can now be produced for pennies on the pound, in a fraction of the time that it takes humans. Some go so far as to claim that we’re in the age of commoditised software, throwaway software, and hail the end of the software industry as we know it.

Why buy a CRM solution or a ERM system when “AI” can generate one for you in hours or even minutes? Why sign up for a SaaS platform when Cursor can spit one out just as good in the blink of an eye?

But when we look beyond the noise – beyond these sensational flying saucer reports – we see nothing of the sort. No AI-generated Spotify or Salesforce or SAP. No LLM-generated games bothering the charts. No noticeable uptick in new products being added to the app stores.

So, where is everybody?

“AI”-Assisted Refactoring Golf

A very common interaction I see online is people talking about how hard it was to get their “AI” coding assistant to do what they wanted, and someone – inevitably – replying “It works for me. You must be doing it wrong.”

The difficulty with these conversations is knowing whether we’re comparing apples with apples. On the occasions when someone’s offered to try to solve a specific problem that was irking me, in the end, they’ve almost always moved the goalposts and done something else.

In a recent post I talked about whether refactoring is more efficient and effective using “AI” coding assistants or using automated refactorings in an IDE like IntelliJ.

My experience is that, nine times out of ten, automated refactorings win. I make exceptions when the IDE doesn’t have the refactoring I need (e.g., Move Instance Method in PyCharm). But the rest of the time, I’ll take predictable over powerful any day.

But this is, of course, subjective and qualitative. It’s my lived experience, and we all know how reliable that kind of evidence can be.

So, I thought to myself, what might be a more subjective test?

It just so happens that there is a game we can play called Refactoring Golf. First run as a workshop at the original Software Craftsmanship 20xx conference by Dave Cleal, Ivan Moore and Mike Hill, this is a game that helps us get more familiar with the automated refactorings and other useful code-manipulating shortcuts in our IDE.

The original rules have been lost to time, but these are the rules I’ve been playing it by:

  • Contestants are given two versions of the same code, a “before” version and a refactored “after” that’s behaviorally identical. It does exactly the same thing.
  • The goal is to refactor the “before” into the “after” such that they have an identical abstract syntax tree (formatting notwithstanding), scoring as few points as possible – hence “golf”.
  • Every code edit made using an automated refactoring or other IDE shortcut (e.g., Find+Replace) costs 1 point.
  • Any code edit made manually costs 2 points. Any time you change a line of code, there’s a penalty.
  • Any edit made while the code isn’t working (tests failing or build failing) is double the points (so a manual edit with tests failing is 4 points, for example)
  • Any edit that doesn’t change the abstract syntax tree – e.g., reformatting or deleting blank lines – costs 0 points.
  • Needless to say, the tests must be re-run after every change.

Emily Bache has helpfully curated a small selection of rounds of Refactoring Golf in Java, Kotlin and C#.

Anyhoo, I just happened to be revisiting the game this week for a refactoring deep-dive that I was running for a new client, and it got me to thinking: could this be the “apples with apples” test of my gut feeling about IDE-assisted vs. “AI”-assisted refactoring?

I’m proposing an “AI”-assisted version of the game that has the following rules:

  • Same “before” and “after”. Same goal.
  • Any code edit made in a single interaction with the LLM costs 1 point. Not an interaction with the coding assistant (e.g., Cursor, Codex), but with the actual model itself. So an agent sending 5 requests to the LLM that result in changes that are applied to the code – even if they’re applied in one final step – would cost 5 points. We’re scoring based on code changes generated by the model in a single interaction.
  • Any code edit made by anything other than the LLM (including using automated refactorings and IDE shortcuts) costs 2 points. Basically, every time you change a line of code, there’s a penalty.
  • Any code edit made while the code is broken costs 2x the points. So if you have to use the Extract Method refactoring in your IDE while tests are failing, that’s 4 points.
  • Any code edit made by you or the “AI” that doesn’t change the AST costs 0 points. Formatting costs nothing, basically. Good luck getting an LLM to reformat code without changing it!
  • Again, shouldn’t need saying, but the tests must be re-run after every change. If you’re going the “agentic” route, you will need to enforce this somehow.
  • YOU ARE ONLY ALLOWED TO INCLUDE THE CODE AS IT CURRENTLY IS, OR CODE EXAMPLES FROM A DIFFERENT PROBLEM, IN YOUR INPUTS (PROMPTS OR CONTEXT FILES etc). YOU MUST NOT TELL THE “AI” WHAT THE CODE SHOULD BE (because then you’re the one writing it).

I’ve tried this on the C# version of ROUND_3 in Emily’s repo, and it was a lot of fun trying to get Claude to do exactly what I want. It felt a bit like playing real golf with a bazooka.

I did manage to get one almost-clean round where I didn’t need to edit the code myself much at all, but – by jingo – we went around the houses!

I want to experiment more with this, because I also see it as useful test of the principles I’ve been converging on for “AI”-assisted development more generally. For sure, smaller steps helped. Prompting with examples of how specific refactorings work helped, and I’ve been doing that for quite some time.

Training & Mentoring is a Common Good

“Why should I train my developers? They’ll just leave.”

“Why is it so hard to find developers with the skills I need?”

Over a thirty three-year career, I’ve heard variations on these questions many, many times. And, typically, from the exact same people.

Many businesses are reluctant to invest in developers because tenures tend to be shorter than the time it takes for that investment to pay off. By the time a junior turns into a genuinely productive professional developer – someone who can work largely unsupervised and create more value than they cost – chances are they’ll be doing that somewhere else.

The mental leap employers don’t seem able to take is understanding that “somewhere else” is, from a previous employer’s perspective, them.

Where did their productive developers come from? Did they emerge fully-formed from a college or a school or a boot camp? Or are they products of years of learning – both on the job, from books, from courses, from each other, and so on – that somebody invested in?

Somebody took that hit. Whether they did it explicitly by paying for training and education or providing mentoring, or implicitly by shouldering the learning curve in everyday work, the only reason there’s any pasta sauce available at all is because somebody grew tomatoes.

I encourage employers to think of professional development in terms of paying it forward. The mindset that it should only be provided when there’s a direct – and often immediate – benefit has led to an industry of perpetual beginners.

Developers whose growth was stunted by a lack of investment in knowledge and skills set the example for inexperienced developers who are about to see their growth stunted in turn.

Because nobody sees it as their responsibility. It’s that patch of grass that doesn’t belong to anybody, so nobody cuts it, even though everybody complains about it.

Developers who are lucky enough to have lots of free time – usually young single men – may proactively hone their skills out of hours. But that’s not a strategy that scales to a $1.5 trillion-a-year profession. That would require more structure and more investment. A lot more. Imagine if your doctor was mostly self-taught in their spare time…

And it excludes a large part of the population who might well make great developers, but have young children, or care for elderly relatives, or volunteer in their communities, and can’t find the time to build skills that – let’s be honest now – are a bigger benefit for employers than for anybody else.

It shouldn’t be expected that the grass mows itself. By all means, if you can, then good for you. But the fact remains that a skilled software developer can be worth a lot of money to a business, and I don’t think it’s at all unreasonable to expect them to chip in.

I see developers as shared resources. Over their career, they’ll likely bring value to a bunch of different enterprises. It’s rare that we stay in the same place for our entire useful dev life.

In that sense, I see training and mentoring developers as a common good. And I believe that, in the long term, while developers should definitely own their own learning journey, employers should be expected to contribute to it while they’re together.

It should be a collaboration that hopefully brings benefit to both parties directly, but more importantly takes a long-term view and brings wider benefits to a whole range of organisations over their careers.

One benefit in particular is how developers who received proper long-term training and mentoring – it’s no secret that I’m an advocate of structured 3-5-year apprenticeships – can go on to pass their knowledge and skills on to people coming into the profession.

Frankly, if that was the norm, we might be looking at a very different industry. And, for employers, the ripples you start will eventually find their way back to your shores when you’re hiring.

“Why is it so hard to find developers who can do TDD, refactoring and Continuous Integration?” It’s because so few get expert training and mentoring in them. Invest in your developers, and build the capability to rapidly, reliably and sustainably evolve working software to meet rapidly-changing business needs.

Is “AI-First” a Strategy, an Ideology, or a Performance?

I was recently observing a team doing their day-to-day work. Their C-suite had introduced an “AI-first” policy over the summer, mandating that development teams use “AI” as much as possible on their code.

Starting in November, this mandate turned into a KPI for individual developers, and for teams: % of AI-generated code in Pull Requests. (And, no, I have no idea how they measure that. But I understand that tool use is being tracked. More tokens, nurse!)

The underlying threat didn’t need to be said out loud. “Use this technology more, or start looking for a new job.”

Developers are now incentivised to find reasons to use “AI” coding assistants, and they’re doing it at any cost. All other priorities rescinded. Crew expendable.

By now, we probably all know Goodhart’s Law:

When a measure becomes a target, it ceases to be a good measure.

I have a shorter version: be careful what you wish for.

The history of software development is littered with the bones of teams who were given incentives to adopt dysfunctional behaviour.

The classic “Lines of Code”, “Function Points”, “Velocity” and other easily gameable measures of “productivity” have forced thousands upon thousands of teams to take their eyes off the prize – i.e. business outcomes – and focus their efforts on producing more stuff – output.

Introducing mandates about how that stuff must be produced is a step up the dysfunction ladder.

So I had the privilege of watching a Java developer write the following prompt, which I jotted down for posterity.

Please extract the selected block of code into a new method called 'averageDailySales'

Using their IDE, that would have been just Ctrl+Alt+M and a method name. And, importantly, it would have worked first time. They ended up taking a second pass to fix the missing parameter the new method needed.

The whole 2-hour session was a masterclass in trying to cook a complete roast dinner in a sandwich toaster. The goal was very clearly not to solve the problem, but to use the tool.

I’m not saying that a tool like Claude Code or Cursor would add no value in the process. I’m saying that developers should be incentivised to use the right tool for the job.

But the “AI-first” mandate has encouraged some of the developers to drop all the other tools. They’ve gone 100% “AI”. No IDE in sight.

An Integrated Development Environment is a Swiss Army Knife of tools for viewing, navigating, manipulating (including refactoring), executing, debugging, profiling, inspecting, testing, version controlling and merging code. Well, the ones I use are, anyway.

Could IDEs be better? For sure. But when it comes to, for example, extracting a method, they are still my go-to. It’s usually much faster, and it’s much, much safer. I’ll take predictable over powerful any day.

Using refactoring as an example, if my IDE doesn’t have the automated refactoring I need – e.g., there’s no Move Instance Method in PyCharm – then I’ll let Claude have a crack at it, with my finger poised over the reset button.

Because my focus is on achieving better outcomes, I’ve necessarily landed on a hybrid approach that uses Claude when that makes sense – and, if you read my blog regularly, you’ll know I’m still exploring that – and uses my IDE or some boring old-fashioned deterministic command line tool when that makes sense. And, right now, that’s most of the time.

I feel no compulsion to drink exclusively from the firehose “just because”.

But then, I’m the only shareholder. And that’s probably what “AI-first” policies are really about: optics. There’s something about this that genuinely feels performative. It’s not about using “AI”, it’s about being seen to use “AI”. Look at us! We’re cutting edge!

There’s no credible evidence that “AI” ten-times’s dev team productivity. But there’s plenty of evidence that it can 10x a valuation.

The fact that, according to the more credible data, the technology slows most teams down – less reliable software gets delivered later and costs more – doesn’t seem to matter.

It’s quite revealing, if you think about it. Perhaps it never mattered?

I contracted in a London firm that would proudly announce in each year’s annual report how much they’d invested in technology. It didn’t seem to matter what return they got on that investment, just as long as they spent that £30 million on the latest “cool thing”.

When my team tried to engage with the business on real problems, the push-back came from the IT Director himself. That, apparently, was “not what we do here”. We’re here to chew bubblegum and spend money. And we’re all out of bubblegum.

So, in that sense, t’was ever thus. But, as with all things “AI” these days, it’s a question of scale. Watching team after team after team drop everything to try and tame the code-generating firehose, while real business and real user needs go unaddressed, is quite the spectacle. It’s a hyper-scaled dysfunction.

Of course, eventually, reality’s going to catch up with us. I was interviewed for a Financial Times newsletter, The AI Shift, a few weeks ago, and it was clear that the resetting of expectations has spread far beyond the the dev floor. People who aren’t software developers are starting to notice.

If, like me, you’re interested in what’s real and what works in developing software – with or without “AI” – you might want to visit my training and coaching site for details of courses and consulting in principles and practices that are proven to shorten lead times, improve reliability of releases and lower the cost of change.

I mean, if that’s your sort of thing.

And if you’re curious about what really seems to work when we’re using “AI” coding assistants, I’ve brain-dumped my learnings from nearly 3 years experimenting with and exploring the code-generating firehose. You might be surprised to hear that it has very little to do with code generation, and almost everything to do with the real bottlenecks in development.

Then again, you might not.