Once upon a time, in a magical land far, far way – just north of London Bridge – I wore the shiny cape and pointy hat of a senior software architect.
A big part of my job was to document architectures, and the key decisions that led to them.
Many diagrams. Many words. Many tables. Many, many documents.
Structure, dynamics, patterns, goals, principles, problem domains, business processes. You name it, I documented it. Much long time writing them, much even longer time keeping them up-to-date. Descriptions of things that aren’t the things themselves, it turns out, go stale fast.
It only takes a couple of gigs to become suspicious that maybe, perhaps, nobody actually reads these documents. So I started keeping the receipts. I’d monitor file access to see when a document was last pulled from our DMS or shared server, or when Wiki pages were last viewed.
My suspicions were confirmed. Teams weren’t looking at the documents much, or – usually – at all. Those fiends!
In their defence, as the saying goes:
Documentation’s useful until you need it
(I looked up the source of this anonymous quote, which I’ve used often. Turns out it’s me. LOL. Why didn’t I remember? I probably wrote it in a document.)
A lot of architecture documentation is out-of-date to the point where it becomes misleading. Code typically evolves faster than one person’s ability to keep an accurate record of the changes.
A fair amount was never in-date in the first place because it describes what the architect wanted the developers to do, and not what they actually did.
The upshot is that architecture documentation is rarely an accurate guide to reality. It’s either stale history, or creation myths.
Recent research found that familiarity with a code base is a far better predictor of a developer’s comprehension of the architecture than access to documentation, and the style of the documentation didn’t seem to make any significant difference.
When I think back to my times as developer rather than architect, that rings true. I’d often skim the documentation – if I looked at it at all – then go look at the code. Because the code is the code. It’s the truth of the architecture.
For architecture documentation to be more useful in aiding comprehension, it has to be tightly-coupled to the reality it describes. Changes to the architecture need to be reflected very soon after they’re made – ideally, pretty much immediately.
I experimented with round-trip modeling of architecture quite deeply, reverse-engineering code on every successful build, producing an updated description for every working version.
But that, too, produced web documentation that I know for a fact almost never got looked at. Least of all by me, as one of the developers.
Reverse-engineering code tends to produce static models – models of structure. I do not find purely structural descriptions very helpful in understanding what code does. It only describes what code is.
If a library has documentation generated from JavaDoc, I’ll tend to ignore that and look for examples of how the library can be used.
When I want to understand a modular design, I’ll look for examples of how key components in the architecture interact in specific usage scenarios – how does the architecture achieve user outcomes?
Quick war story: I joined a team in an investment bank who’d been working in isolation on different COM components. The Friday before our launch date, we still hadn’t seen how these very well-specified components – we all got handed detailed Word documents for our parts – would work together to satisfy key use cases.
Long-story-short, turns out they didn’t. Not a single end-to-end use case scenario.
Every part of the system was described in detail. But the system as a whole didn’t work, and we couldn’t see that until it was too late.
This is why I start with usage scenarios. You can keep your documentation – I’m gonna set breakpoints, run the damn thing, and click some buttons so I can step down through the call stack and build a picture of the real architecture, as it is now, in functional slices.
And I would have damned well designed it that way in the first place, starting with user outcomes and working backwards to an architecture that achieves them.
So, a dynamic model driven by usage examples is preferable in aiding my comprehension of architecture to structural models of static elements and their compile-time relationships.
One neat way of capturing usage examples is by writing them as tests. This can serve a dual purpose, both documenting scenarios as well as enabling us to see explicitly if the software satisfies them.
Tests as specifications – specification by example – is a cornerstone of test-driven approaches to software design. And it can also be a cornerstone of test-driven approaches to software comprehension.
When tests describe user outcomes, we can pull on that thread to visualise how the architecture fulfils them. That so few tools exist to do that automatically – generate diagrams and descriptions of key functions, key components, key interactions as the code is running (a sort of visual equivalent of your IDE’s debugger) – is puzzling. Other fields of design and engineering have had them for decades.
As for understanding the architecture’s history, again I’ll defer to what actually happened over what someone says happened. The version history of a code base can reveal all sorts of interesting things about its evolution.
Of course, the problem with software paleontology is that we’re often working with an incomplete fossil record. Some teams will make many design decisions in each commit, and then document them with a helpful message like “Changed some stuff, bro”.
A more fine-grained version history, where each diff solves one problem and every commit explains what that problem was, provides a more complete picture of not just what was done to the design, but why it was done.
e.g., “Refactor: moved Checkout.confirm() to Order to eliminate Feature Envy”
Couple this with a per-commit history of outcomes – test run results, linter issues, etc – and we can get a detailed picture of the architecture’s evolution, as well as the development process itself. You can tell a lot about an animal’s lifestyle from what it excretes. Or, as Jeff Goldblum put it in Jurassic Park, “That is one big pile of shit”.
Naturally, there will be times when the reason for a design decision isn’t clear from the fossil record. Just as with code comments, in those cases we probably need to add an explanation to that record.
This is my approach to architecture decision records – we don’t record every decision, just the ones that need explaining. And these, too, need to be tightly-coupled to the code affected. So if I’m looking at a piece of code and going “Buh?”, there’s an easy way to find out what happened.
Right now, some of you are thinking “But I can just get Claude to do that”. The problem with LLM-generated summaries is that they’re unreliable narrators – just like us.
I’ve been encouraging developers to keep the receipts for “AI”-generated documentation. It’s pretty clear that humans aren’t reading it most of the time, and when we do, it’s at “LGTM-speed”. We’re not seeing the brown M&Ms in the bowl.
Load unreliable summaries into the context alongside actual code, and models can’t tell the difference. Their own summaries are lying to them. Humans, at least, are capable of approaching documentation skeptically.
Whenever possible, contexts – just like architecture descriptions intended for human eyes – should be grounded in contemporary reality and validated history.
And talking of “AI” – because, it seems, we always are these days – one area of machine learning that I haven’t seen applied to software development yet might lie in mining the rich stream of IDE, source file and code-level events we give off as we work to recognise workflows and intents.
A friend of mine, for his PhD in Machine Vision, trained a model to recognise what people were doing – or attempting to do – in kitchens from videos of them doing it.
Such pattern recognition of developer activity might also be useful to classify probable intent, and even predict certain likely outcomes as we work. At the very least, more accurate and meaningful commit messages could be automatically generated. No more “Changed some stuff, bro”.
At the end of the (very long) day, comprehension is the goal here. Not documentation. If a document is not comprehended – and if nobody’s reading it, then that’s a given – then it’s of no help.
And if it’s going to be read, it needs to be helpful in aiding comprehension. Like, duh!
Personally, what I would find useful is not better documentation, but better visualisation – visualisation of static structure and dynamic behaviour, of components and their interactions, and of the software’s evolution.
And visualisation at multiple scales from individual functions and modules all the way up to systems of systems, and the business processes and problem domains they interact with.
And when we want the team to actually read the documentation, we need to take it to them. Diagrams should be prominently displayed (I’ve spent a lot of time updating domain models to be printed on A0 paper and hung on the wall), explanations should be communicated. Ideally, architecture should be a shared team activity – e.g., with teaming sessions, with code reviews, with pair programming – and an active, ongoing, highly-visible collaboration.
Active participation in architecture is essential to better comprehension. Doing > seeing > reading or hearing.
That also goes for legacy architecture – active engagement (clicking buttons, stepping through call stacks in the debugger, writing tests for existing code) tends to build deeper understanding faster than reading documentation.
The fastest way to understand code is to write it. The second fastest is to debug it.
This is especially critical in highly-iterative, continuous approaches to software and system architecture, where decisions are being made in real-time many times a day. Without a comprehensible, bang-up-to-date picture of the architecture, we’re basing our decisions on shaky foundations.
Like an LLM generating code by matching patterns in its own summaries, we risk coming untethered from reality.
Which would be an apt description of most architecture documents, including mine.