The Unfinished Symphony

On the limits of AI in software engineering

Jan 31, 2025

In 1965, Moore coined his famous Law, and for decades the hardware architects in their sterile workshops dutifully enacted it. By the turn of the millennium, computer hardware performance had improved by a factor of one million. Yet, somehow, people felt that computers were getting slower. Programming language luminary Niklaus Wirth cut to the heart of the problem:

“Software is getting slower more rapidly than hardware becomes faster”

But why? Where was Moore’s Law for software? Another software legend, Edsger Dijkstra, thought the hardware had become too powerful and complex for software to control, a latter day permutation of Babel’s Tower. He wrote:

“As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem.”

But it was also in 1965 that I.J. Good speculated that “an ultraintelligent machine could design even better machines”, an idea that has since become known as The Singularity. Now that we have, to some approximation, witnessed the rise of AI agents that can write software, has the software crisis been resolved?

No, emphatically. As evidence, I will cite an article, an anecdote and a symphony. Cue music.

Music?

1×

0:00

-14:28

Hit play, then rewind to one hundred years before Moore and Good. In 1865, the same year that Lincoln died, a new Schubert symphony premiered in Vienna. Never mind that the great composer had been dead for nearly 40 years; when the music began it was as if he were alive once more. One critic remarked:

“When ... clarinet and oboe in unison began their gentle cantilena above the calm murmur of the violins ... 'Schubert' was whispered in the audience”

More remarkable still: the work was unfinished. Did the audience mind? Many who heard it would later argue that Schubert had intentionally ended it as it was: perfect. Nevertheless, there have been many attempts to complete The Unfinished Symphony.

In 2019, three years before ChatGPT, Chinese technology company Huawei produced an AI model which “listened to the first two movements of Schubert’s Symphony… analysed the key musical elements that make it so incredible, then generated the melody for the missing third and fourth movement from its analysis”.

Was it any good? To the untrained ear, yes. It was definitely music of the right style, genre and niche. Disrobed of the Schubert mantle and situated within a film or a video game score, it may even have been called excellent.

Yet, another critic wrote:

“The final two movements communicate profound ignorance of autonomous art or artistic development. Grafted to provoke acclaim and applause, they are impression management at its worst”

The AI went brassy and bold when it should have gone personal and intimate. It quoted earlier figures inappropriately, reducing them to clichés. The structure shifted tone passionately when it should have been reserved and lyrical. To the expert, it was a disaster.

Soul of a new machine

“At the beginning of any musical composition is the intuition of voice or spirit.”

This was the explanation the critic gave to encompass why Huawei’s AI had not succeeded. Without this intuition, no AI can be expected to produce a completion of Schubert that will convince experts. For a software system, is there an equivalent to this prerequisite?

Peter Naur, in his seminal 1985 paper “Programming as Theory Building,” argues that software engineering is the act of forming a theory about how certain aspects of the real world can be handled by a software system. It’s the idea of the system in the engineers’ heads that matters: the actual production of the working system is an incidental process that occurs while building this theory.

But music soars untethered in realms of fancy, and as such can be set to page, recorded, or otherwise considered finished. Software deals with the real world, famously capricious and inscrutable, and so is necessarily subject to unending change. How to change a software system?

First, define your requirements. Second, hand them to the engineers. Third, let marinate overnight. Fourth, walk by the R&D conference room. Inside, two seasoned engineers draw complicated figures on a whiteboard and argue intensely. The argument is strange: they are both technically correct in their proposed implementations, because both options fit the new requirements to the letter.

Where they differ is in their mental theory of the system. The argument will be resolved when they have hammered out this discrepancy to their satisfaction. The actual code change that follows is, essentially, trivial. They may even have an intern do it.

If both engineers are correct, why are they arguing? They both sense great danger. The problem is that a technically correct change can still make things worse. The engineers are each worried that the other’s proposed change will degrade the consistency of the program with respect to their mental theory of the system. That is to say, muddy it up, make it needlessly complicated, obscure important properties, add “tech debt”.

To illustrate with a contrived example: suppose one renames all the variables in the code to random words from the dictionary, but otherwise leaves the system to behave in the same way as before. On purely technical and logical grounds, no wrong has transpired. This is only a “bad” change with respect to the damage it does to the theory about the software that lives in someone’s head.

AI software engineers

Consider now the current crop of AI software agents that aim to get people to build software without dirtying their eyeballs by viewing actual source code. It has been widely discussed that they suffer from what some call the 70% problem. This is the observation that, given some handy prompting, current AI can produce an impressive demo that realizes roughly 70% of the original idea. In the case of very simple functionality or tool-like utilities, this may be all that is needed and everyone leaves happy. So where’s the problem?

The other 30%: when the user asks the AI to extend the solution to cover a more useful set of inputs and behaviors (to productionize it, in software engineering parlance), they eventually find that every change produces an unexpected and perplexing degradation in some other part of the system. It is one step forward, two steps back. At some point the user faces a choice: give up, or try to read and understand the code themselves (the very thing they hoped the AI would free them from having to do). How to explain this phenomenon?

AI is unquestionably able to rapidly provide code-correct solutions to the explicit requirements and directions, but it (for now) lacks an internal theory of how the program should correspond to reality. Thus the quality of an AI’s changes with respect to any coherent theory of the system is essentially unfocused and random. It’s as if, in addition to the required change, it decided to flip a few arbitrary bits of RAM somewhere.

The consequence is that every new AI change produces a modest degradation in the quality of the code. Bit by bit, the consistency, clarity and harmony of the code base erodes away, like an AM-radio signal fading to static. Eventually, new AI changes don’t work at all no matter how perfect your prompting. Further changes by humans become impossibly mired in this noise, every tweak hard-won through relentless trial and error. At some point, making changes is too costly to contemplate. The system is now dead.

A dead system can still run and do useful work. Potentially even for decades, in the case of the ancient tracts of FORTRAN and COBOL still reverently operated by big banks and government. Nevertheless, a dead system cannot be modified and so can no longer adapt to changes in reality. Its ongoing utility is tightly circumscribed to the bits of life that don’t change very much (death and taxes, etc).

The owner of a dead system has three options: let it die, keep it running and never change it, or throw it away and start from scratch. Ultimately, if the system matters at all, someone is going to have to read some code.

AI-powered theories

Another thing no one doubts that AI can do is read reams of source code and documentation very quickly. Why then can it not build up this theory of the system at the instant that it is needed? Naur provides the explanation:

“For a new programmer to come to possess an existing theory of a program it is insufficient that he or she has the opportunity to become familiar with the program text and other documentation. What is required is that the new programmer has the opportunity to work in close contact with the programmers who already possess the theory, so as to be able to become familiar with the place of the program in the wider context of the relevant real world situations and so as to acquire the knowledge of how the program works and how unusual program reactions and program modifications are handled within the program theory. This problem of education of new programmers in an existing theory of a program is quite similar to that of the educational problem of other activities where the knowledge of how to do certain things dominates over the knowledge that certain things are the case, such as writing and playing a music instrument.”

If it is hard to believe this (since surely software engineers would want others to believe they are indispensable), look no further than the facilities AI software agent tools provide their human users in order to supply this theory of the system. Devin.ai calls it “Knowledge”, Cursor.ai calls it “Notepads” etc.

By the way, this same logic applies to other intellectual professions under threat from AI. For example, the value of the lawyer is not their ability to regurgitate statutes but rather their understanding of the correspondence between the law and reality, or what is often called interpreting the law.

So, what will it take for AI to become capable of good software engineering? What is needed is a way for humans and AI to collaboratively build a theory of a system. That is, the challenge is to build hybrid human-AI systems that can proactively pull signals about the system as it manifests in reality from its human operators, combine them with detailed analyses and syntheses of the source code and other documentation, and collect the result in a form that both human and AI can read, understand, verify, update and use.

The body of theory collected in this way is a kind of system encyclopedia, where statements are grounded in sources and references, all relevant information on a topic is present, and all relevant topics are covered. The latest information is represented, and updates are ideally automatic. The content is searchable by human operators and indexed for AI operators.

With this theory in hand, a kind of Chesterton’s fence for AI can be applied: don’t let an AI modify this code until it can explain how it works first. One could even ask an AI to come up with experimental changes simply to expand and improve the theory of the system.

It should be noted that even with this shared human-AI theory of the system in hand, the effective rate of changes to the system remains bounded by the ability of the humans to assimilate updates to it.

A closing thought

In 1986, one year after Naur published “Theory Building”, Fred Brooks published “No Silver Bullet”, in which he famously predicted that:

“There is no single development, in either technology or management technique, which by itself promises even one order of magnitude [tenfold] improvement within a decade in productivity, in reliability, in simplicity”

There are, he argued, two kinds of complexity: the essential and the accidental. Accidental complexity arises from how a problem is solved, but is not inherent in the problem itself. For example, the fact that you get software by going to an app store and downloading it is accidental complexity: it has no bearing on whether or not the software itself solves your problem. Essential complexity is whatever is left over after you remove the accidental complexity.

As a class, software engineers have a fair grasp of accidental complexity, and developments like high-level languages, operating systems, and the Internet have made tremendous inroads against it. Brooks’ prediction in 1986 stemmed from his observation that no-one had made any progress whatsoever on tackling essential complexity systematically.

So far, the latest wave of AI has helped whittle down sources of accidental complexity by writing and re-writing significant chunks of tedious, mundane code. Can it be applied to tackle essential complexity too? Perhaps, but then again, it has been observed before that “software systems grow faster in size and complexity than methods to handle complexity are invented”.

If it is the peculiar human affinity for music that protects the composer from the encroachments of AI, perhaps another human trait protects the software engineer: our insatiable urge to expand the scope of our software until it eats the world.

References

Programming as Theory Building, Peter Naur, 1985, https://pages.cs.wisc.edu/~remzi/Naur.pdf
Composers Are Under No Threat From AI, The Conversation, 2019, https://theconversation.com/composers-are-under-no-threat-from-ai-if-huaweis-finished-schubert-symphony-is-a-guide-111630
Symphony No. 8 'Unfinished' - 1- Allegro moderato in B minor, c. Franz Schubert, 1822, p. Fulda Symphonic Orchestra, 2000, https://en.wikipedia.org/wiki/File:Schubert_Symphony_No._8_%27Unfinished%27_-_1-_Allegro_moderato_in_B_minor.ogg
The 70% Problem, Addi Osman, 2024, https://addyo.substack.com/p/the-70-problem-hard-truths-about
No Silver Bullet, Fred Brooks, 1986, https://www.cs.unc.edu/techreports/86-020.pdf

Tom’s Substack

Discussion about this post