Lehman on Software, Models and Change

The modeled and evolving quality of software comes to the surface when thinking about software maintenance. A classic paper on this is Lehman’s 1980 paper Programs, life cycles, and laws of software evolution, which lays out the territory with great clarity, then confidently strides off in the completely wrong direction.

Model

Lehman introduces a distinction between 

  • S-programs, that implement a formal mathematical (s)pecification, such as the travelling salesman problem
  • P-programs, that solve some messy (p)roblem arising in the world, such as actually scheduling real salespeople and their ambiguities and partly-known preferences, and 
  • E-programs, those (e)mbedded in and changing the world they directly model, such as in air-traffic control.

For P-programs and E-programs, “the acceptability of a solution is determined by the environment in which it is embedded.” The distinction is in the programs relationship with its origin story: between top-down and bottom-up; transcendence and immanence.

Lehman goes on to note P-programs are also in a feedback loop arising from their use in the world. Their execution is observed, even lived, by their users, and this results in demand for change. 

This is a cybernetic view, though Lehman doesn’t use the terminology. The paper sketches some more complicated loops, particularly where a specification intermediates between the P-program and the world. It is that intermediation, rather than feedback, that is foregrounded in the difficult and famous statement on code:world relations:

Any program is a model of a model within a theory of a model of an abstraction of some portion of the world or of some universe of discourse.

Lehman drops this on page two, before defining S-, P- or E-programs, and never gets around to defining theory or model, or otherwise directly elaborating, disconnected sagelike pronouncements being an expected feature of software engineering papers of the time. Cook (and a team including Lehman) later link this to the social process of Kuhn’s paradigm shifts – renaming P-programs to (p)aradigm-programs and E-programs to (e)volving-programs.

Weisberg’s work on the crucial role of models in science could also help. For Weisberg, a theory maps a model to the world through (mostly explicit) construals. This plays a similar role to “abstraction” in Lehman’s definition. (Bit more on Weisberg here.)

It’s also worth throwing Naur’s “Programming as Theory Building” into the mix, though his paper does not make much distinction between model-as-code and theory-as-code.

Lehman also introduces “laws” of software evolution, which did have some empirical basis, but appear hard to reproduce. They might be compared to more recent work on meaningful descriptive code metrics, or properties of software as a material.

 

The Rivers and The Lakes That You’re Used To

After accelerating through insight after insight into the fluid and evolving nature of software, Lehman closes off the theory section by casually inventing microservices (in 1980), then taking a screaming left turn at the Process Street T-Junction, crashing through a railing and tumbling over a cliff. For over that cliff flows a process waterfall, and in the structured programming era, there’s nothing more attractive.

Like the rest of the structured programming crowd, he has factory envy: “An assembly line manufacturing process is possible when a system can be partitioned into subsystems that are simply coupled and without invisible links … Unfortunately, present day programming is not like this.” Lehman goes on to emphasize the care and structure needed when writing separate elaborate requirement and technical specifications. You get the idea. The remaining process recommendations I’m just going to skip.

It is easy to be wise after the fact in 2019. Agile practices and literal miniature software assembly lines (continuous build infra) now exist, and have made us much more sensitive to the damage done by scope size and delivery delay in large software systems. Trying to solve complex problems with more upfront planning was a high modernist worldview going out of fashion, but still very much in the intellectual water in 1980: Lehman gave a lecture in 1974 referencing both city planning and the Club of Rome report Limits to Growth. Perhaps it would be fairer to point out that thinkers who advocated short simple changes as a response to complex systems – like Jane Jacobs, or John Boyd and his OODA loop – were always tacking into the intellectual wind.

References

Cook, S., Harrison, R., Lehman, M.M. and Wernick, P.: ‘Evolution in software systems: foundations of the SPE classification scheme’, Software Maintenance and Evolution Research and Practice, 2006, 18, (1), pp. 1-35  
Lehman, M.M. , “Programs, cities, students – Limits to growth?” Inaugural Lecture, May 14,  1974, ICST Inaugral Lecture Series, Ed., VOI. 2, pp. 147-163, 1979. vol. 9, pp. 211-229,  1970-1974; and in Programming Methodology D. Gries, Ed. New York: Springer-Verlag, 1979,  pp. 42-69.
Lehman, M. M. (1980). Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, 68(9), 1060-1076.
Naur, P. (1985). Programming as theory building. Microprocessing and microprogramming, 15(5), 253-261.
Weisberg – Simulation and Similarity.

Advertisements

Ancillary SYN-ACK

Ancillary Justice is a cyborg soldiers and AI spaceships novel (with complications) in a space opera setting, built around a soldier called Breq. Think Iain M Banks but with the Roman Empire instead of plush toy communism. The complications are both cool and fundamental to the characters, and I won’t spoil the slow reveal of the first book here, even though it’s all over the web.

The trilogy is completed with Ancillary Sword and Ancillary Mercy. After the galaxy-spanning wandering of the first book, racing towards the capital, the second and third books focus back on a particular system. Breq muscles in as a fleet captain of a capital ship. Aliens, ships and stations join the cast of characters. Interpersonal and gunboat diplomacy ensue.

The heavy Space Roman Empire vibe of the first volume evolves into something a bit more Space Girls Boarding School Naval Academy in the later books. Though both have their virtues, I always tend to favour first books, and the thick vertigo of new ideas is denser in Ancillary Justice than the other two volumes. I still devoured all three at speed and with pleasure.

The Ancillary trilogy is, at some level, network space opera, about synchronization, replication, latency and packet corruption. The empire exists because it successfully replicates itself over distance and time. And then it stops: packet loss and fragmentation.

An Anatomical Sketch of Software As A Complex System

As intellectually awkward artifacts that open up new capabilities, are surprising, frustrating and costly in other ways, and which regularly confound our physical intuitions about their behaviour, software systems meet an everyday language definition of complexity. A more systematic comparison, presented here, shows a significant family resemblance. Complexity science studies common techniques across a number of fields, and using that framework to analyze software engineering could allow a more precise technical understanding of these software problems.

This isn’t a unique thought. Various approaches, such as David Snowden’s Cynefin framework, have used complexity science as a source for insight on software development. Herbert Simon, in works like “Sciences of the Artificial”, helped build complexity science, with software programs and Good Old Fashioned AI as reference points. Famous papers such as Parnas et al’s “The Modular Structure of Complex Systems” also point the same way. As I was introduced to this material myself, I missed a more recent reference that lined up these features of complex systems with modern software in a brief and systematic way. These notes attempt that in the form of an anatomical sketch.

This note considers software systems of many internal models and at least thousands of lines, rather than shorter programs analysed in formal detail. This places it more under software engineering than formal computer science, without intending any strict break from the latter. Likewise, by default, it addresses consciously engineered software rather than machine learning. This complexity differs from algorithmic processing time complexity as captured by O(x) notation, though there may be interesting formal connections to be explored there too.

 

Anatomical Sketch

Ladyman et al give seven features of complex systems, and I’ve added one more from Crutchfield.

1. Non-linearity

Software exhibits non-linearity in the small and the large. Every ‘if’ condition, implicit or explicit, represents distinct possible outputs. This is most obvious in response to unexpected input or state; error and exit, segmentation fault, stack trace, NullPointerException.

2. Feedback

From a use perspective, many software systems are part of a feedback loop, with users and the world, and this feedback can often involve internal software state.

From an engineering perspective, all software systems beyond a trivial size are built in cycles where the current state of a codebase is a rich input into the next cycle of engineering. This is true whether iterative software development methodologies are used or not. For instance, consider bug fixes resulting from a test phase in waterfall.

3. Spontaneous Order

Spontaneous order is not a feature of large software systems. If anything, the usual condition of engineering large software systems is constantly and deliberately working to maintain order against a tendency for these systems to suffer from entropy, or into complicated disorder. The ideas of ‘software crisis’ and ‘technical debt’ are both reactions to lack of perceived order in engineered software.

4. Robustness and lack of central control

In the small, or even at the level of the individual system, software tends to brittleness, as noted above. Robustness, being “stable under perturbations of the system” (Ladyman), must be specifically engineered in by considering a wide variety of inputs and states and testing the system under those conditions. However, certain software ecosystems such as the TCP/IP substrate of the Internet display great robustness. Individual websites go down, but the whole Internet or World Wide Web tends not to. This is related to the choice of a highly distributed architecture based on relatively simple, standard, protocols and design guidelines like Postel’s principle (be tolerant in what you accept and strict in what you send). Like a flock of birds, the lack of central control makes the system tolerant of local failure. High availability systems make use of similar principles of redundancy.

5. Emergence

Software systems tend not to exhibit emergent behaviours as highly visible features of the system, in the way say a flock of birds assumes a particular overall shape once each bird follows certain simple rules about their position relative to their neighbour. Certain important non-visible features are emergent. Leveson, in Engineering A Safer World, argues that system safety (including software) is an emergent feature: “Determining whether a plant is acceptably safe is not possible, for example, by examining a single valve in the plant. In fact, statements about the ’safety of the valve’ without information about the context in which that valve is used are meaningless.” Difficult bugs in established software systems are often multi-causal and emerge from systemic interactions between components rather than isolated failure.

Conway’s law, the observation that a software system’s internal component structure mirrors the team structure of the organisation that created it, describes system shape emerging from social structure without explicit causal rules.

6. Hierarchical organisation

Formal models of computation did not originally differentiate between parts of a program; for instance Turing machines or the Church lambda calculus do not even distinguish between programs and data. Many of the advances in software development have by contrast been tools for structuring programs in hierarchies and differing levels of abstraction. A reasonable history of programming could be told simply through differentiated structure, eg:

  • Turing machines / Church lambda calculus
  • Von Neumann machine separation of program, data, input, output
  • MIT Summer Session Computer: named instructions
  • Hopper: ALGOL compiler and functions
  • Backus: FORTRAN distinguished control structures IF and DO-WHILE
  • Parnas: module decomposition through information hiding
  • Smalltalk object orientation
  • Codd: relational databases
  • GoF design patterns
  • Beck: xUnit automated unit testing
  • Fowler refactoring for improved structure
  • Maven systematic library dependency management

Navigating program hierarchy from user interface through domain libraries to system libraries and services is a significant, even dominant, proportion of modern programming work (from personal observation, though a quantified study should be possible).

7. Numerosity (Many more is different)

The techniques for navigating, designing and changing a codebase of hundreds of classes are different than with a short script, at least partly due to the limitations of human memory and attention span. An early recognition of this is Benington’s Production of large computer programs; a more recent one is Feathers’ Working Effectively With Legacy Code. Feathers states: “As the amount of code in a project grows, it gradually surpasses understanding”.

8. Historical information storage

“Structural complexity is the amount of historical information that a system stores” according to Crutchfield. This is relevant for both use- and engineering-time views of software systems.

In use, the amount of state stored by a software system is historical information in this sense. An example might be a hospital patient record database. A subtlety here is suggested measures of complexity based on amounts of information (such as Kolmogorov) tend to specify maximum compression. Simply allocating several blank terabytes of disk isn’t enough. This also covers implicit forms of complexity such as dependencies in code on particular structures in data. Contrast a hospital database alone (just records and basic SQL) and the same database together with software which provides a better user interface and imposes rules on how records may be updated to suit the procedures of the hospital.

Source control changes provide a build-time problem of historic information. In practice, when extending or maintaining a system, classes are rarely replaced wholesale or deleted. New classes are added or existing classes modified to add functionality. The existing code is always an input to the new state of the code for the programmer doing the change, even if existing code was left untouched. Welsh even declared, in a paper of that name, that “Software is history!”

The result, regardless, is increasing historical information in a codebase over time, and therefore complexity.

 

References
Conway – How Do Committees Invent? Datamation 1968
Crutchfield – Five Questions on Complexity, Responses
Feathers – Working Effectively With Legacy Code, 2006
Ladyman, Lambert, Wisener – What Is A Complex System?
Leveson – Engineering a Safer World, Chapter 3 p64, 2011
Parnas – On The Criteria To Be Used in Decomposing Systems into Modules, Communications of the ACM, 1972
Parnas, Clements, Weiss – The Modular Structure of Complex Systems, IEEE Transactions on Software Engineering, 1985
Postel –  RFC 761 Transmission Control Protocol https://tools.ietf.org/html/rfc761 https://en.wikipedia.org/wiki/Robustness_principle
Simon – Sciences of the Artificial
Snowden and Boone – A Leader’s Framework For Decision Making (Cynefin)
Welsh – Software Is History!

Accelerationism: A Brief Taxonomy

It is a moment of pause for the theory of accelerationism. The burst of self-identifying activity over the last few years has cycled into something of a bear market, even as the conceptual toolbox is more powerful than ever in navigating our present. Theorists of acceleration are connoisseurs of vertigo, and will insist any snapshot of their thought is dead or out of date. This taxonomy is both. But it’s short.

 

Accelerationism: ACC: Capitalism is a feedback cycle of increasing spiraling power, which it is not possible to comprehend or control from within, and therefore at all. The complexity of this alien system includes hyperstitions and reverse causalities. Capitalism melts and reassembles everything. Fictions become realities through their articulation. Future structures assemble themselves through their conditioning of the past.

Texts:

  • Deleuze and Guattari, Anti-Oedipus; A Thousand Plateaus
  • Land, Meltdown
  • CCRU – Writings 1997-2003
  • Collapse Journal I-VIII

 

Right Accelerationism: R#ACC: Capitalism is modernity, science, intelligence. What is powerful in all these things is one identical force. What is best in the world is represented by this force, the product of sharpening by relentless competition, brutal empiricism and blind peer review, the butcher’s yard of evolution. Historically, it was possible to put a defensive brake on capitalism and intelligence. That possibility is fast receding or likely already gone, and was always undesirable. Ethically and therefore politically, we should align ourselves with the emancipation of the means of production. Artificial intelligence, genetic engineering, corporate microstates, and breeding a cyborg elite are all means for achieving this end.

Right accelerationism is entwined with the techno-capitalist thread of neoreaction.

Texts:

 

Left Accelerationism: L#ACC: The tremendous productive power of capitalism is a world system that is impossible to fully control, but it may be harnessed and steered for progressive ends. Only with the wealth and productivity of capitalism has fully automated luxury gay space communism become possible, and now it is within reach it can be seized. Only through computational power can the relationship between Homo sapiens and its ecosystem be understood and balanced. The great corporations and financial structures of the early twenty-first century are themselves prototypes of platform planned economies leveraging enormous computational power to fulfill billions of needs and desires across society. By accelerating progressive technological invention, reinvigorating the domesticated industrial state as a platform state, nationalizing data utilities, sharing dividends and redefining work, the system may be made sustainable and wealth shared with all according their need.

After a surge of activity, many left accelerationists rapidly swerved away from the name a few years ago. This was coincident with Srnicek and Williams’ book Inventing the Future, which is all about left accelerationism, without mentioning it once.

Texts:

 

Unconditional Accelerationism: U#ACC: To erect any political program that pretends to steer, brake, or accelerate this system is folly and human-centric hubris. The system can be studied as a matter of fascination, and of survival. The only politics that makes sense is to embrace fragmentation and create a safe distance from centralized political power. A patchwork of small communities built across and within networks, societies and geographies are a means for some to survive and thrive. Many small ships can ride through a storm with a few losses, where one giant raft will be destroyed, dooming all.

Texts:

 

Blaccelerationism: The separation of human and capital is a power structure shell game. Living capital, speculative value, and accumulated time is stored in the bodies of black already-inhuman (non)subjects.

Texts:

 

Gender Accelerationism: G#ACC: Everyone is becoming transsexual lesbian programmer cyborgs. Enjoy it.

Texts:

 

Zero Accelerationism: Z#ACC: The world-system is accelerating off a cliff.

Texts:

 

Accelerating The Contradictions: Capitalism is riven with conflict and contradictions. Revolutionaries should accelerate this destructive process as it hastens the creation of a system beyond capitalism.

No modern accelerationist group has held this position (D/G: “Nothing ever died of contradictions!”), but it’s a common misunderstanding, or caricature, of Left Accelerationism.

Texts:

 

Other introductions: Meta-nomad has a more theory-soaked introduction to accelerationism, which teases out the rhizomatic cross-connections between these threads, and is a good springboard for those diving further down the rabbit hole.

Just Like Reifying A Dinner

Closing the Sorites Door After The Cow Has Ambled

The Last Instance has an interesting, pro-slime response to my recent musings on the sorites paradox. TLI offers a more nuanced herd example in Kotlin, explicitly modelling the particularity of empty herds, herds of one cow, as well as herds of two or more cows, and some good thoughts on what code-wrangling metaphors we should keep to hand.

It’s a better code example, in a number of ways, as it suggests a more deliberate language alignment between a domain jargon and the model captured in code. It includes a compound type with distinct Empty and Singleton subtypes.

But notice that we have re-introduced the sorites paradox by the back-door: the distinction between a proper herd and the degenerate cases represented by the empty and singleton herds is based on a seemingly-arbitrary numeric threshold.

Probably in my rhetorical enthusiasm for the reductive case (herd=[]), the nuance of domain alignment was lost. I don’t agree that this new example brings the sorites paradox in by the back door, though. There is a new ProperHerd type that always has two or more members. By fixing a precise threshold, the ambiguity is removed, and the sorites paradox still disappears. Within this code, you can always work out whether something is a Herd, and which subtype (Empty, Singleton, or ProperHerd) it belongs to. It even hangs a lampshade on the philosophical bullet-biting existence of the empty herd.

Though you can imagine attempts to capture more of this ambiguity in code – overlapping categories of classification, and so on – there would ultimately be some series of perhaps very complicated disambiguating rules for formal symbolic processing to work. Insofar as something like deep learning doesn’t fit that, because it holds a long vector of fractional weights against unlabelled categories, it’s not symbolic processing, even though it may be implemented on top of a programming language.

Team Slime

I don’t think a programmer should take too negative a view of ontological slime. Part of this is practical: it’s basically where we live. Learning to appreciate the morning dew atop a causal thicket, or the waves of rippling ambiguity across a pond of semantic sludge, is surely a useful mental health practice, if nothing else.

Part of the power of Wimsatt’s slime term, to me, is the sense of ubiquity it gives. Especially in software, and its everyday entanglement with human societies and institutions, general rules are an exception. Once you find them, they are one of the easy bits. Software is made of both planes of regularity and vast quantities of ontological slime. I would even say ontological slime is one of Harrison Ainsworth’s computational materials, though laying that out requires a separate post.

Wimsatt’s slime just refers to a region of dense, highly local, causally entangled rules. Code can be like that, even while remaining a symbolic processor. Spaghetti code is slimy, and a causal thicket. Software also can be ontological slime because parts of the world are like slime. Beyond a certain point, a particular software system might just need to suck that up and model a myriad of local rules. As TLI says:

The way forward may be to see slime itself as already code-bearing, rather as one imagines fragments of RNA floating and combining in a primordial soup. Suppose we think of programming as refining slime, making code out of its codes, sifting and synthesizing. Like making bread from sticky dough, or throwing a pot out of wet clay.

And indeed, traditionally female-gendered perspectives might be a better way to understand that. Code can often use mending, stitching, baking, rinsing, plucking, or tidying up. (And perhaps you have to underline your masculinity when explaining the usefulness of this: Uncle Bob Martin and the Boy Scout Rule. Like the performative super-blokiness of TV chefs.) We could assemble a team: as well as Liskov, we could add the cyberfeminist merchants of slime from VNS Matrix, and the great oceanic war machinist herself

“It’s just like planning a dinner,” explains Dr. Grace Hopper, now a staff scientist in system programming for Univac. (She helped develop the first electronic digital computer, the Eniac, in 1946.) “You have to plan ahead and schedule everything so it’s ready when you need it. Programming requires patience and the ability to handle detail. Women are ‘naturals’ at computer programming.”

Hopper invented the first compiler: an ontology-kneading machine. By providing machine checkable names that correspond to words in natural language, it constructs attachment points for theory construals, stabilizing them, and making it easier for theories to be rebuilt and shared by others working on the same system. Machine code – dense, and full of hidden structure – is a rather slimy artifact itself. Engineering an ontological layer above it – the programming language – is, like the anti-sorites, a slime refinement manoeuvre.

To end on that note seems too neat, though, too much of an Abstraction Whig History. To really find the full programmer toolbox, we need to learn not just reification, decoupling, and anti-sorites, but when and how to blend, complicate and slimify as well.