24 July 2024

On software as an "in-discipline"

The nth-order effects of the recent CrowdStrike fiasco ^[1][2][3] will unfold over time. As it stands, it is apparently the single biggest global “tech outage” ever, which has already disrupted everything from airlines to railways to hospitals to financial systems amongst numerous others—globally.

This particular incident seems to have been an unfortunate combination of bugs, process failures, and a botched remote update among other unknown factors, that triggered catastrophic behavior in a particular piece of software. Within seconds of the update being pushed from a central location, servers and devices, many running critical systems, dropped like flies the world over. There will be enough written on this particular incident, but here, I would like to discuss some long-lingering thoughts on the nature of software as a discipline, or rather, in-discipline.

What we are witnessing are the obvious perils of having a highly centralized, proprietary piece of software running on millions of devices across industries and sectors around the globe. That too, something that runs in the operating system’s kernel space with root access, is always connected to the internet, can be remotely controlled by a single entity, and also auto-updates silently. What could possibly go wrong? That this is the preferred and widely accepted model for “enterprise endpoint security” globally shows how nascent, naive, and immature software technology as a discipline is. As a software developer myself, I guess I too am, in one way or another, complicit in the overall state of things.

Despite best efforts, every system with increasing complexity will misbehave in unexpected ways. This painful reality is something every reasonably experienced software engineer and developer knows all too well. This is true for every piece of software ever written, and will continue to be so until the day there are magical systems that can formally verify code, examine all possible execution paths, and simulate and test every possible parameter and condition around it. This, unfortunately, is the very nature of complex systems. But should one mistake or a mishap at one organisation somewhere in the world adversely affect, that too instantly, entire sectors and industries across the globe? Is there no other way?

This stuff is going to happen again and again in ways we have not imagined yet for decades to come, thanks to the fundamental nature of software and our mental models of it.

Software as a discipline

“Internet-scale” digitisation and technology underpinned by a massive globally distributed maze of intertwined systems, has been around for barely two decades. Actually, less than that even. The Web 2.0 era came and went in the mid-2000s. In the last decade and a half, interconnected last-mile dependency on software has exploded exponentially in all directions. With governments pushing for essential citizen-services to be not only digital, but online, even more so. The global internet penetration graph (displayed below) is a good proxy for understanding this.

Take some other disciplines for example, say, civil engineering and medicine. Humans have been building megastructures like towers and bridges for many millennia. Medicine has been practiced in various (ghastly) forms for even longer periods of time. It’s only in the last century or so, with the application of the scientific method, that civil engineering and medicine have been formalized and structured into their current “modern” forms. This is after innumerable trial-and-error iterations over centuries, and these disciplines continue to evolve and improve on a regular basis. And yet, these disciplines also witness catastrophic errors of varying scales from time to time. There are numerous such disciplines that civilisation depends on dearly for its very existence.

How about software as a discipline, where civilisation has come to depend on increasingly complex globally interconnected systems? How old is this discipline exactly again? And how well refined and time tested is it?

Extrapolating, on a historic scale, software as a technology and a discipline must probably be at the same point as the printing press was in the 16th century. It will be decades, if not centuries, before it is truly mastered and the true depth of its societal and civilisational implications are understood. And like with every other human endeavor, it will be through numerous mistakes and catastrophes. With software, humanity is going with the flow with practically zero foresight, our hallmark trait. I mean, it has only taken a few decades of unbridled industrial “progress” to push the planet beyond the brink of human-made climate change and biosphere collapse. We are surely going to do large scale digitisation, which seemingly does not have any physical constraints, really thoughtfully.

As I said earlier, as a software developer (who enjoys writing software), as a technologist, by virtue of being involved in the running of a (financial) technology enterprise, I must also be contributing to the overall state of affairs somehow.

Proliferation

Even things with hard physical constraints, everything from megastructures to all kinds of hardware, personal appliances to automobiles, have proliferated exponentially in the last century. Moreso, in the last few decades. So, it is no surprise that digitisation with software—which seemingly has no physical constraints for development or deployment—has taken over the world in no time. With consumer software and network effects at play, it has reshaped billions of lives practically overnight. Software can be remixed, repurposed, and combined in infinite ways. Specific software is written to create and proliferate more software. Software can be transmitted, copied, and distributed perfectly with zero loss, instantly, and at negligible costs, around the world. What an absolute marvel! Its long term implications for the world though? I do not think we have even begun to understand. Bytes do not abate; they just proliferate and proliferate.

Infinite ways

How many ways are there to construct a bridge? Or to perform heart surgery? Or to print a book? Or to make a knife? One? A few? Many? Numerous? How many complex systems out there have infinite valid and practical ways of constructing, apart from software? Given a reasonably complex software system, the likelihood of just two developers diverging in their approach to its development is already extremely high. From the choice of programming languages, frameworks, libraries and plugins, dependencies, architectural patterns, ways of modeling and organising data, coding conventions and styles, patterns for organising code and modules, naming conventions, algorithms and logic to solve specific problems, the possible permutations and combinations are infinite. And to top it off, the said languages, dependencies, frameworks—most software in general—are in a state of constant flux. There are bug fixes, new features and improvements, changes of all kinds. What about the dependencies of those dependencies? They also change. And we are only talking about the lifecycle of a single software system.

Speaking of developers, every time one works on a new project, logical decisions, choice of dependencies, the style and way of organising the system, naming modules, all change ever so slightly (and sometimes, drastically). And this is with projects and systems demanding objective, predictable, and often, very strict outcomes, unlike subjective endeavors such as art and music. If there are infinite inconsistent ways of constructing something complex where it and its ingredients change unpredictably over time, then infinite possibilities of making subtle mistakes with unpredictable consequences exist. Remember, software can be copied, replicated, and distributed instantly and globally, and with it, any mistakes.

And this is considering the best-case scenario, assuming high competency, experience, knowledge, skills, and a good environment for everyone involved.

Infinite opinions

Going back to the same examples, how many opinions would a surgeon get from the general populace on how to perform heart surgery? A civil engineer on the construction of a bridge? Now, how many opinions and suggestions does a piece of software attract? From users, friends, family, technical and non-technical people, people from all walks of life, the management in an organisation, from random strangers—when it comes to software, there are infinite opinions (and demands).

For the developers out there: Why can’t you just move this button up there? Can you add this feature that I really want? Why not just remove this thing and add that new other thing? Too much whitespace. Why can’t you make it grey? When I click a button here, can you make it do X other things in Y other places instantly? Why not just add some AI to make it do X? Why not just fix all the bugs so that there are no new bugs? … ad infinitum. Everybody has an opinion. I do too, but at least reasonably qualified, I tell myself.

It must be the non-physical nature of software that creates this almost universal mental model and perception of software. The assumption that because it is non-physical, anything must be possible, only limited by a developer’s imagination, like the legend at zombo.com once said. I have deep sympathy for tech teams out there who have to work on trying to implement the whims, fancies, and unqualified demands of non-sensible, non-technical management above them. People who, despite knowing the futility and irrationality all too well, are forced to go on wild goose chases, introducing changes to complex software systems. Changes, which in turn increase the odds of introducing mistakes with unknown consequences. Infinite opinions add significant fuel to the change. It will be eons before our mental model of software and its implications evolve, before software matures as a discipline.

Change in constituents, change in conditions.

If changes to the constituents and ingredients of a complex software system do not get to it, then the changes in its conditions most definitely will. A web service that was designed to handle a few hundred hits a minute suddenly gets tens of thousands of hits. A text editor that was built for writing text suddenly has to open a 1 GB file. A photo resizing utility has to now resize a 1 gigapixel photo. A web page that was designed for the desktop is opened on screens of all shapes and sizes.

Code does not have to change. The change in one or more conditions, generally unforeseen, will cause it to misbehave. Is there a framework that allows one to see all possible scenarios that the conditions surrounding will undergo? Software as a discipline has no system or methodology to evaluate or simulate most such conditions, often, even obvious ones. It is almost always entirely dependent on the level of experience and perspectives of the people working on the system. There are no formal methods for verifying the correctness of complex systems end to end. Any methods or tools that do attempt this, do a very limited job, and are only able to generally evaluate the constituents, but not the conditions. One hopes for magical future systems capable of doing that.

So, what does it all mean? What ought we to do? Where is it all headed? I, for one, do not know. I do not think anybody knows or can know either. We have generally been unable to accurately predict the evolution of physical systems. For instance, how human flight went from the tests at Kitty Hawk, to commercial Transatlantic aviation in large-bodied planes, to human spaceflight in no time. Or the rise of industrialization to the massive loss of biodiversity and planetary biosphere collapse. How well can we then predict the evolution of physically unconstrained information and software systems that are proliferating exponentially?

What I do know is that being skeptical and sticking to first principles can significantly reduce the probability of future gotchas and catastrophes. Accept that we do not know the future and that there is a high probability of us running off of a cliff collectively. And, there are always going to be gotchas and catastrophes. It is inherent in any complex system, including nature, and the universe itself. “Catastrophes”, although, is a purely anthropocentric idea.

It is often difficult to figure out which practices are truly good, until they go horribly wrong. Software as a discipline is already littered with “planet scale” lessons in a short span of time. Even if one is unable to surmise what is truly good, there are common sense approaches that reduce the probability of cascading failures—open source, decentralisational, interoperability, federation, open specs and standards etc. On the contrary, there is enough evidence pointing to what is not good:

Rampant, massive centralisation of systems and dependencies.
Locking humans and systems into proprietary, centralised “walled gardens”.
Forcing, coaxing, and making people’s lives dependent on networked digitial systems.
Top-down push for “digitisation” of every random thing. Who is excited about downloading yet another app, giving away private information, and scanning yet another QR code? (╯°□°）╯︵ ┻━┻
Centralised amassing of data on humans and devaluing privacy and agency for convenience, “efficiency”, and vague, non-sensical notions of “personalised experiences”.
The myopic view that technology is the silver bullet solution to all societal problems.
Regulatory and policy stances that force organisations to adopt solutions as a compliance checkbox exercise without technical nuance.
Setting the stage that enables one mistake in one place to ripple and cause catastrophes all across.
Enabling ignorance, incompetence, and intellectual dishonesty to take technical decisions that impacts lives.

… and on and on.

I love tinkering with software. But, I also know for certain that “Anything that can go wrong, will go wrong” (Murphy’s law). It is the very nature of complex systems. The absolute minimum one can do is to be cautious and accept the fact that it will be eons for software as a discipline to evolve and mature. That, as humanity, we have not even begun to understand software to build such deep running dependencies with obvious disastrous consequences. Beyond software, engineering, or code, one has to zoom out and take a civilisational view of it for better perspective.