Lehman on Software, Models and Change

The modeled and evolving quality of software comes to the surface when thinking about software maintenance. A classic paper on this is Lehman’s 1980 paper Programs, life cycles, and laws of software evolution, which lays out the territory with great clarity, then confidently strides off in the completely wrong direction.

Model

Lehman introduces a distinction between 

  • S-programs, that implement a formal mathematical (s)pecification, such as the travelling salesman problem
  • P-programs, that solve some messy (p)roblem arising in the world, such as actually scheduling real salespeople and their ambiguities and partly-known preferences, and 
  • E-programs, those (e)mbedded in and changing the world they directly model, such as in air-traffic control.

For P-programs and E-programs, “the acceptability of a solution is determined by the environment in which it is embedded.” The distinction is in the programs relationship with its origin story: between top-down and bottom-up; transcendence and immanence.

Lehman goes on to note P-programs are also in a feedback loop arising from their use in the world. Their execution is observed, even lived, by their users, and this results in demand for change. 

This is a cybernetic view, though Lehman doesn’t use the terminology. The paper sketches some more complicated loops, particularly where a specification intermediates between the P-program and the world. It is that intermediation, rather than feedback, that is foregrounded in the difficult and famous statement on code:world relations:

Any program is a model of a model within a theory of a model of an abstraction of some portion of the world or of some universe of discourse.

Lehman drops this on page two, before defining S-, P- or E-programs, and never gets around to defining theory or model, or otherwise directly elaborating, disconnected sagelike pronouncements being an expected feature of software engineering papers of the time. Cook (and a team including Lehman) later link this to the social process of Kuhn’s paradigm shifts – renaming P-programs to (p)aradigm-programs and E-programs to (e)volving-programs.

Weisberg’s work on the crucial role of models in science could also help. For Weisberg, a theory maps a model to the world through (mostly explicit) construals. This plays a similar role to “abstraction” in Lehman’s definition. (Bit more on Weisberg here.)

It’s also worth throwing Naur’s “Programming as Theory Building” into the mix, though his paper does not make much distinction between model-as-code and theory-as-code.

Lehman also introduces “laws” of software evolution, which did have some empirical basis, but appear hard to reproduce. They might be compared to more recent work on meaningful descriptive code metrics, or properties of software as a material.

 

The Rivers and The Lakes That You’re Used To

After accelerating through insight after insight into the fluid and evolving nature of software, Lehman closes off the theory section by casually inventing microservices (in 1980), then taking a screaming left turn at the Process Street T-Junction, crashing through a railing and tumbling over a cliff. For over that cliff flows a process waterfall, and in the structured programming era, there’s nothing more attractive.

Like the rest of the structured programming crowd, he has factory envy: “An assembly line manufacturing process is possible when a system can be partitioned into subsystems that are simply coupled and without invisible links … Unfortunately, present day programming is not like this.” Lehman goes on to emphasize the care and structure needed when writing separate elaborate requirement and technical specifications. You get the idea. The remaining process recommendations I’m just going to skip.

It is easy to be wise after the fact in 2019. Agile practices and literal miniature software assembly lines (continuous build infra) now exist, and have made us much more sensitive to the damage done by scope size and delivery delay in large software systems. Trying to solve complex problems with more upfront planning was a high modernist worldview going out of fashion, but still very much in the intellectual water in 1980: Lehman gave a lecture in 1974 referencing both city planning and the Club of Rome report Limits to Growth. Perhaps it would be fairer to point out that thinkers who advocated short simple changes as a response to complex systems – like Jane Jacobs, or John Boyd and his OODA loop – were always tacking into the intellectual wind.

References

Cook, S., Harrison, R., Lehman, M.M. and Wernick, P.: ‘Evolution in software systems: foundations of the SPE classification scheme’, Software Maintenance and Evolution Research and Practice, 2006, 18, (1), pp. 1-35  
Lehman, M.M. , “Programs, cities, students – Limits to growth?” Inaugural Lecture, May 14,  1974, ICST Inaugral Lecture Series, Ed., VOI. 2, pp. 147-163, 1979. vol. 9, pp. 211-229,  1970-1974; and in Programming Methodology D. Gries, Ed. New York: Springer-Verlag, 1979,  pp. 42-69.
Lehman, M. M. (1980). Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, 68(9), 1060-1076.
Naur, P. (1985). Programming as theory building. Microprocessing and microprogramming, 15(5), 253-261.
Weisberg – Simulation and Similarity.

Advertisements

Just Like Reifying A Dinner

Closing the Sorites Door After The Cow Has Ambled

The Last Instance has an interesting, pro-slime response to my recent musings on the sorites paradox. TLI offers a more nuanced herd example in Kotlin, explicitly modelling the particularity of empty herds, herds of one cow, as well as herds of two or more cows, and some good thoughts on what code-wrangling metaphors we should keep to hand.

It’s a better code example, in a number of ways, as it suggests a more deliberate language alignment between a domain jargon and the model captured in code. It includes a compound type with distinct Empty and Singleton subtypes.

But notice that we have re-introduced the sorites paradox by the back-door: the distinction between a proper herd and the degenerate cases represented by the empty and singleton herds is based on a seemingly-arbitrary numeric threshold.

Probably in my rhetorical enthusiasm for the reductive case (herd=[]), the nuance of domain alignment was lost. I don’t agree that this new example brings the sorites paradox in by the back door, though. There is a new ProperHerd type that always has two or more members. By fixing a precise threshold, the ambiguity is removed, and the sorites paradox still disappears. Within this code, you can always work out whether something is a Herd, and which subtype (Empty, Singleton, or ProperHerd) it belongs to. It even hangs a lampshade on the philosophical bullet-biting existence of the empty herd.

Though you can imagine attempts to capture more of this ambiguity in code – overlapping categories of classification, and so on – there would ultimately be some series of perhaps very complicated disambiguating rules for formal symbolic processing to work. Insofar as something like deep learning doesn’t fit that, because it holds a long vector of fractional weights against unlabelled categories, it’s not symbolic processing, even though it may be implemented on top of a programming language.

Team Slime

I don’t think a programmer should take too negative a view of ontological slime. Part of this is practical: it’s basically where we live. Learning to appreciate the morning dew atop a causal thicket, or the waves of rippling ambiguity across a pond of semantic sludge, is surely a useful mental health practice, if nothing else.

Part of the power of Wimsatt’s slime term, to me, is the sense of ubiquity it gives. Especially in software, and its everyday entanglement with human societies and institutions, general rules are an exception. Once you find them, they are one of the easy bits. Software is made of both planes of regularity and vast quantities of ontological slime. I would even say ontological slime is one of Harrison Ainsworth’s computational materials, though laying that out requires a separate post.

Wimsatt’s slime just refers to a region of dense, highly local, causally entangled rules. Code can be like that, even while remaining a symbolic processor. Spaghetti code is slimy, and a causal thicket. Software also can be ontological slime because parts of the world are like slime. Beyond a certain point, a particular software system might just need to suck that up and model a myriad of local rules. As TLI says:

The way forward may be to see slime itself as already code-bearing, rather as one imagines fragments of RNA floating and combining in a primordial soup. Suppose we think of programming as refining slime, making code out of its codes, sifting and synthesizing. Like making bread from sticky dough, or throwing a pot out of wet clay.

And indeed, traditionally female-gendered perspectives might be a better way to understand that. Code can often use mending, stitching, baking, rinsing, plucking, or tidying up. (And perhaps you have to underline your masculinity when explaining the usefulness of this: Uncle Bob Martin and the Boy Scout Rule. Like the performative super-blokiness of TV chefs.) We could assemble a team: as well as Liskov, we could add the cyberfeminist merchants of slime from VNS Matrix, and the great oceanic war machinist herself

“It’s just like planning a dinner,” explains Dr. Grace Hopper, now a staff scientist in system programming for Univac. (She helped develop the first electronic digital computer, the Eniac, in 1946.) “You have to plan ahead and schedule everything so it’s ready when you need it. Programming requires patience and the ability to handle detail. Women are ‘naturals’ at computer programming.”

Hopper invented the first compiler: an ontology-kneading machine. By providing machine checkable names that correspond to words in natural language, it constructs attachment points for theory construals, stabilizing them, and making it easier for theories to be rebuilt and shared by others working on the same system. Machine code – dense, and full of hidden structure – is a rather slimy artifact itself. Engineering an ontological layer above it – the programming language – is, like the anti-sorites, a slime refinement manoeuvre.

To end on that note seems too neat, though, too much of an Abstraction Whig History. To really find the full programmer toolbox, we need to learn not just reification, decoupling, and anti-sorites, but when and how to blend, complicate and slimify as well.

Heaps of Slime

The sorites paradox is a fancy name for a stupid-sounding problem. It’s a problem of meaning, of the kind software developers have to deal with all the time, and also of the kind software generates all the time. It’s a pervasive, emergent property of formal and informal languages.

You have a heap of sand. One grain of sand is not a heap. You take away one grain of sand. One grain of sand makes little difference – so you still have a heap of sand.

You have a grain of sand. You add another grain. Two grains of sand are surely not a heap. You add another. Three grains of sand are not a heap.

If you add only a grain or take away only a grain of sand, since one grain of sand can hardly make a difference, how do you tell when you have a heap?

That’s the paradox. The Stanford Encyclopedia of Philosophy has a more comprehensive historical overview.

 

Slime Baking

To make software, you build a machine out of executable formal logic. Let’s call that code a model, including its libraries and compiler, but excluding the software machinic layers below that.

The model has different elements which we represent in programming language structures, usually with names corresponding to our understanding of the domain of the problem. These correspond to phenomena in two ways: parsing, and delegation to an analogue instrument. Parsing is the process of structuring information using formal rules. An analogue instrument from this perspective is a thermostat, a camera, a human user, a rabbit user, or possibly some statistical or computational processes with emergent effects, like Monte Carlo simulations or machine learning autoencoders.

You can imagine any particular software system as a free-floating machine, just taking in inputs and providing outputs over time. Think of a program where all names of classes, functions, variables, button labels, etc, are replaced with arbitrary identifiers like a1, a2, etc, (which does have some correspondence to the processing happening inside a compiler, or during zip compression). We tether this symbolic system to the world by replacing these arbitrary names with ones that have representational meaning in human language, so that users and programmers can navigate the use and internals of the system, and make new versions of it.

To make it easier to understand, navigate and change this system, we label its interface and internals with names that have meaning in whatever domain we are using it for. Dairy farm systems will have things named after cows and online bookstores will have data structures representing books.

We have then delegated the problem of representation to the user of the system – a human choosing from a dropdown box, on a web form, for example, does the work of identification for the user+software system. But we run slap bang into the problem of vagueness.

Most of the users of our dairy software will not be on quaint farms in the English countryside owning one cow named Britney, so it will be necessary to represent a herd. How many cows do you need to qualify as a herd? Well, in practice, a programmer will pick a useful bucket data structure, like a set or a list, and name that variable “herd”. Nowadays it would probably be a collection in a standard library, like java.util.HashSet. The concept of an empty collection is a familiar one to programmers, furthermore there is a specific object to point to called “herd” (the new variable), so a herd is defined to be a data structure with zero or more (whole) cows. Sorites paradox solved <dusts hands>. And unwittingly too.

herd = []
# I refute it thus!

The loose, informal, family resemblance definition of a concept (herd) gets forced into a symbolic structure, like an everyday Python variable, to treat it as an object in a software system. This identification of a concept with a specific software structure is called reification. In the case of a herd (or a heap of sand) the formalism is a fairly uncontroversial net win; after getting over the slightly weird idea of the empty herd, the language will may converge around this new more formal definition, at least in the context of the system. (Or it may not. It is interesting to note the continuing popularity of the shopping cart usability metaphor, a concrete physical container that can be empty, rather than say, a pile of books that is allowed to have zero books in it.) 

The sorites might be thought of as a limiting case of vagueness, due to the deliberate simplicity of the concept involved (one type of thing, one collection of it). There are much messier cases. Keith Braithwaite points out that software is built on a foundation of universal distinguished types, and it is a constant emphasis of training in science and engineering. People without that training tend to instead organize their thinking around representative examples, and categorize by what Wittgenstein called family resemblance, ie, sharing a number of roughly similar properties. Accordingly Braithwaite suggests foregrounding examples as a shared artifact for discussion between programmers and users, and using legible, executable examples, as in Behaviour Driven Development (BDD).

Example-driven reasoning is also a survival technique in an environment lacking clearly distinguishable universal rules. Training in physical sciences emphasizes the wonderful discovery of universal physical laws such as those for gravity or electrical charge. Biologists are more familiar with domains where simple universal laws do not have sufficient explanatory power, and additional, much more local rules, are the only navigational aids possible. Which is to say, non-scientific exemplary reasoning was likely rational in the context it evolved in, and additionally, there are many times in science and engineering when we can not solve problems using universal rules. William Wimsatt names these conditions of highly localized rules “ontological slime”, and the complex feedback mechanisms that accompany them “causal thickets”. He points out that even if you think an elegant theory of everything is somehow possible, we have to deal with the world today, where there definitely isn’t one to hand, but ontological slime everywhere.

Readers who have built software for organizations may see where this is going. It’s not that (fairly) universal rules are unknown to organizations, but that rules run the gamut from wide generality right down to ontological slime, with people in organizations usually navigating vagueness by rule-of-thumb and exemplar-based categories which don’t form distinguished types. Additionally, well-organized domains of knowledge often intersect in organizations in idiosyncratic ways. For example, a hospital has chemical, electrical and water systems, many different medical domains, radioactive and laser equipment, legal and regulatory codes, and financial constraints to work within. And so the work of software development proceeds, one day accidentally solving custom sorites paradoxes, the next breaking everything by squeezing a twenty-nine sided Escher tumbleweed peg into a square hole.

 

Lunch

For software applications written for a domain, especially, software acts as a model to the world. This relation even holds for a great deal of “utility” software – software that other software is built on. An operating system needs to both use and provide functions dealing with time, for example, which has a lot more domain quirks than you might at first think.

Model is a specific jargon term in philosophy of science, and the use here is deliberate. For most software, the software : world relation is a close relative of the model : world relation in science. The image of code running without labels, untethered to the world, above, is an adaptation of an image from philosopher Chuang Liu: a map, showing only a selected structure, without labels or legend. We use natural language in all its power and ambiguity to attach labels to structures. This relation is organized according to a theory. Michael Weisberg calls the description, in the light of the theory, of how the world maps and doesn’t map to the model, a construal. Unlike scientific theories, the organizing theory for a software application is rarely carefully stated or specifically taught. So individual users and programmers build their own specific theory of the system as they work, and their own construals to go with them.

Software is not just a model: it’s also an instrument through which users act. The world it models is changed by its use, much more directly than for scientific models. Most observably, the world changes to be more like the model in software. Software also changes frequently. New versions chase changes in the world, including those conditioned by earlier versions of the software, in a feedback spiral. (Donald Mackenzie calls this “Barnesian performativity” when discussing economic models, the CCRU called it “hyperstition” when discussing fiction, and Brian Goetz and friends call it “an adventure in iterative specification discovery” when discussing programming.)

It is this feedback spiral which can eliminate ambiguity in terms by identifying them with exactly their use in software, therefore solving the sorites paradox in a stronger sense. It becomes meaningless to talk about an artifact outside its software context. We don’t argue about whether we have a pile of email, as it is obvious it is a container with one limit at Inbox Zero. This is one sense in which software can be said to be “eating the world”: by realigning the way a community sees and describes it.

There are other forms of software / language / world feedback, including ones that destroy meaning, dissolve formal definitions and create ambiguity. It’s often desirable, but perhaps not always, to collapse definitions into precise model-instrumented formality. Reifying an ambiguous concept by collapsing a sorites paradox into a concrete machine component is simply one process to be aware of when building software; an island of sediment in a river of slime.

References

Braithwaite – Things: how we think of them, what that means for building systems https://www.darkpeakconsulting.co.uk/blog/things-how-we-think-of-them-what-that-means-for-building-systems
Goetz et al – Java Concurrency In Practice
Hyde and Raffman – Sorites Paradox https://plato.stanford.edu/entries/sorites-paradox/
Liu – Fictionalism, Realism, and Empiricism on Scientific Models http://philsci-archive.pitt.edu/11162/
Mackenzie – An Engine, Not A Camera: How Financial Models Shape Markets
Visee – Falsehoods Programmers Believe About Time https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b923ca
Weisberg – Simulation and Similarity
Wimsatt – Re-Engineering Philosophy For Limited Beings

Kafka As Deterritorializing Stream Function

Sometimes he accompanied her on her errands in the city, where everything had to be carried out in the utmost hurry. Then she would almost run to the next subway station, Karl with her bag in his hand, the journey went by in a flash, as if the train were being carried away without any resistance, already they were getting off, clattering up the stairs instead of waiting for the elevator that was too slow for them, the large squares from which the streets flowed out in a starburst emerged and brought a tumult of streamed lines of traffic from all sides, but Karl and Therese hurried, tightly together, to the different offices, cleaners, warehouses and stores which weren’t easy to contact by telephone in order to make orders or complaints, generally trivial things.

– Kafka, Amerika, Hofman trans.

“No one is better than Kafka at differentiating the two axes of the assemblage and making them function together,” say Deleuze and Guattari, and though they are referring to the Czech writer, it applies quite well to the open source distributed message queue too, even though the quote was written thirty years before its invention. Kafka in either form effects both decoupling of components and a disintegration of content. Models, formats and codes are first broken down, then made available to be reassembled in other ways.

I admit that when I first heard of it, naming an information processing system after a writer famous for depictions of absurd, violent and impenetrable bureaucracy did strike me as bold. It evokes The Departure (Der Aufbruch) as router documentation, or An Imperial Message (Eine kaiserliche Botschaft) as a Service Level Agreement. Perhaps we can think of the system developers as finally addressing Franz’s eloquent, frustrated, bug reports. Jay Kreps, the namer of Apache Kafka and one of the co-creators (with Narkhede and Rao), simply explains that he wanted his high performance message queue to be good at writing, so he named it after a favourite prolific writer. Fortunately complete publication and reading doesn’t also require the process dying of tuberculosis, followed by a decades-long legal case. Even if Kreps did have a deeper correspondence in mind, if I were forced to explain the name all the time, I might just smile and point at my Franz fridge magnet too.

The D/G quote is from the Postulates of Linguistics chapter, concerned with the way meaning is imposed on communicating agents and their intertwining systems.

On a first, horizontal, axis, an assemblage comprises two segments, one of content, one of expression. On the one hand it is a machinic assemblage of bodies, of actions and passions, an intermingling of bodies reacting to one another; on the other hand it is a collective assemblage of enunciation, of acts and statements, of incorporeal transformations attributed to bodies. Then on a vertical axis, the assemblage has both territorial sides, or reterritorialized sides, which stabilize it, and cutting edges of deterritorialization, which carry it away. No one is better than Kafka at differentiating the two axes of the assemblage and making them function together.

– Deleuze and Guattari, November 30, 1923: Postulates of Linguistics, A Thousand Plateaus

Deleuze and Guattari are secret pomo management consultants at heart, and as their dutiful intern I have accordingly expressed the great men’s vision as an Ansoff Matrix slide for distribution to valued stakeholders.

Compare this Jay Kreps slide from Strange Loop 2015, itself entirely representative of a million whiteboard sketches accompanying middleware everywhere:

For middleware, what D/G call the cutting edges of deterritorialization, we might call a payload codec. The data structure used within the producing process is disassembled, scrambled into a bucket of bytes, then carried away along a line of flight – in this case a Kafka topic. A Kafka topic is a transactional log, a persistent multi-reader queue where the removal policy is decoupled from reader delivery, and retention is instead controlled by time or storage space windows. (Blockchains are public transaction logs optimised for distributed consensus and no retention limit. Hence their inherent parliamentary slowness.) The consumer then uses its own codec to reterritorialize the data – making it intelligible according to its own data model, and within its own process boundary. Though nowadays the class signature of the messages may match (say both sides use the JVM and import the definition from the same library), at a bare minimum the relationship of those messages to other objects and functions within the process differs.

In Anti-Oedipus, D/G call reading a text “productive use of the literary machine”, and it’s along those lines that the quote continues:

No one is better than Kafka at differentiating the two axes of the assemblage and making them function together. On the one hand, the ship-machine, the hotel-machine, the circus-machine, the castle-machine, the court-machine, each with its own intermingled pieces, gears, processes, and bodies contained in one another or bursting out of containment (see the head bursting through the roof). On the other hand, the regime of signs or of enunciation: each regime with its incorporeal transformation, acts, death sentences and judgements, proceedings, “law”. […] On the second axis, what is compared or combined of the two aspects, what always inserts one into the other, are the sequenced or conjugated degrees of deterritorialization, and the operations of reterritorialization that stabilize the aggregate at a given moment. K., the K.-function, designates the line of flight or deterritorialization that carries away all of the assemblages but also undergoes all kinds of reterritorializations and redundancies – redundancies of childhood, village life, bureaucracy, etc.

 – Deleuze and Guattari, ATP ibid.

Cataloguing these correspondences between D/G’s description of Kafka and the software that bears his name is not intended to ignore that their contact is a kind of iconographic car accident. Kafka is a famous, compelling writer, and frequent cultural reference point, after all. The collision of names reveals structural similarities that are usually hidden.

In the Kreps talk above, titled in Deleuzian fashion “Apache Kafka and the Next 700 Stream Processing Systems”, he also introduces a stream processing API to unify the treatment of streams and tables. The team saw this as crucial to Kafka’s identity as a streaming platform rather than just a queue, and delayed calling Kafka 1.0 for years, until this component was ready. Nomadic messages escape through the smooth stream space, before capture and transformation in striated tablespace as rows in data warehouses.

First version of Tetris.

This unification of batches and streams echoes a similar call in computational theory by Eberbach. Turing computation is built around batches. Data is available in complete form at input on the Turing machine tape, then the program runs, and if it terminates, a complete output is available on the same tape. The theory of computability and complexity are built around this same encapsulated box of space and time. Much of what computers do in 2018 is actually continual computation – the reacting to events or processing streams of data that have no semantically tied termination point. That is, though the process may terminate, that isn’t particularly relevant to any analysis of computational complexity or performance we want to do. When editing a document on a computer, you care about the responsiveness keystroke to keystroke, not the entire time editing the document as if it were one giant text batch.  

By contrast, modern computing systems process infinite streams of dynamically generated input requests. They are expected to continue computing indefinitely without halting. Finally, their behavior is history-dependent, with the output determined both by the current input and the system’s computation history.

 – Eberbach, Goldin and Wegner – Turing’s Ideas and Models of Computation

Streams are a computational model of continuation, and therefore infinity. In their wide-ranging 2004 paper, Eberbach and friends go on to argue for models of Super-Turing computation. This includes alternative theoretical models such as the π-calculus and the $-calculus, new programming languages, and hardware architectures.

We conjecture that a single computer based on a parallel architecture is the wrong approach altogether. The emerging field of network computing, where many autonomous small processes (or sensors) form a self-configured network that acts as a single distributed computing entity, has the promise of delivering what earlier supercomputers could not.

 – Eberbach et al, ibid.

These systems are now coming into existence. Through co-ordination with distributed registries (Zookeeper here), and with the improved deployment and configuration baseline devops techs have brought in, spinning up new nodes or failing over existing nodes is autonomously self-configured. Complete autonomy isn’t there, but it seems a high and not particularly desirable bar for many systems. Distributed systems and streams predate Apache Kafka, nor shall it be the last one. Yet is marks a moment where separate solutions for managing streams, tables and failover are concretized in a single technical object, a toolbox for the streaming infrastructure of infinity.

References

CCRU – Ccru Writings 1997-2003
Cremin – Exploring Videogames with Deleuze and Guattari: Towards an affective theory of form
Deleuze and Guattari – Anti-Oedipus
Deleuze and Guattari – A Thousand Plateaus
Eberbach, Goldin and Wegner – Turing’s Ideas and Models of Computation
Kafka, Hofman trans. – Amerika (The Man Who Disappeared)
Kreps, Narkhede and Rao – Kafka – A Distributed Messaging System For Log Processing
Kreps – Apache Kafka and the Next 700 Stream Processing Systems (talk)
https://youtu.be/9RMOc0SwRro
Narkhede – Apache Kafka Goes 1.0 https://www.confluent.io/blog/apache-kafka-goes-1-0/
Pajitnov – Tetris (game)
Stopford – The Data Dichotomy: Rethinking the Way We Treat Data and Services https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/
Thereska – Unifying Stream Processing and Interactive Queries in Apache Kafka https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/

Reducing Abstraction in Code

Abstraction is something we are taught to value as programmers, and the process of finding common patterns across parts of a system is one programmers are usually good at. There is another equally important process of improving systems by collapsing redundancy and abstraction. Gilbert Simondon names this “concretization”.

Primitives

Concretization removes parts and makes machines more specific. A simple example is the abbreviation of clutter by replacing with clearer syntax. Say in Python

if available == True:
   reserve()

to

if available:
   reserve()

Or in JUnit:

assertTrue(false);

to

fail();

These are behaviour-preserving design improvements, or in other words, refactorings. They often turn up in novice programmer code or code written by people new to a language or toolset. Other primitive concretizing refactorings might be dead code removals, such as Remove Unused Parameter.

Another primitive concretization step is recognizing that a variable with a highly general type can be typed more precisely. A String, byte[] or a void* are highly general types, in that they can hold pretty much anything. Replacing with a more specific type usually relies on a precondition, either implicitly or explicitly.

int age = Integer.parseInt(ageStr);

In this case the potential throwing of NumberFormatException entails an implicit precondition. The concretizing step is the refactoring that introduces the typed variable.

Wait, but isn’t the problem with using Strings and primitive objects everywhere that they lack abstraction? Yes. They indicate that the code lacks an explicit model, or in other words, abstractions. They also indicate the code lacks concretizations – specifics from the problem domain that make it a well-focused machine. (Lacking both abstraction and concretization indicates ontological slime, a wonderful term from William Wimsatt, and perhaps the topic of another post.)

For a multi-line example of primitive concretization, consider this refactoring available when going from Java 1 to 5:

Iterator expenseIter = expenses.iterator();
while (expenseIter.hasNext()){
  Expense expense = (Expense)expenseIter.next();
  sum += expense.getExpenseValue();
}

to

for (Expense expense: expenses){
  sum += expense.getExpenseValue();
}

This mirrors the evolution of Java itself as a technical object and iteration as a technical concept. I’ve written about Simondon and the history of looping at more length elsewhere. Specialization and reduction are near-synonyms more frequently used in programming, but because of the clearer relationship to abstraction, and the connection to Simondon, I stick with concretization here, at the cost of a few more syllables. (Reification is a different concept, in my opinion.)

Interleaving Abstraction and Concretization

The adjunction of a supplementary structure is not a real progress for the technical object unless that structure is concretely incorporated into the ensemble of the dynamic system of its operation. – Simondon, Mode of Existence of Technical Objects, Mellamphy trans.

Design improvements often include both abstracting and concretizing steps. The feeling is of abstraction clearing space and concretization then making better use of it.

Michael Feathers’ use of characterization tests is an example of starting a design process with concretization.

    @Test
    public void removesTextBetweenAngleBracketPairs() {
        assertEquals("", Pattern.formatText(""));
    }

Characterization tests stabilize the function of the machine by pinning down very specific behaviors in the form of facts. This then allows a round of refactorings and rewrites. The immediate next step would often be abstracting refactorings such as Extract Method and Extract Class (naming a clump of things introduces an abstraction and an indirection).

Arlo Belshee’s Naming Is A Process also interleaves abstracting and concretizing steps.

Missing to Nonsense – Abstraction
Nonsense to Honest – Concretization
Honest to Honest and Complete – Concretization
Honest and Complete to Does the Right Thing – Abstraction
Does the Right Thing to Intent – Concretization
Intent to Domain Abstraction – Abstraction

A number of these steps, especially in the later half, themselves consist of interleaved abstracting and concretizing sub-steps. Eg in Honest and Complete:

1/ Use Introduce Parameter Object. Select just the one parameter you want to encapsulate. Name the class Foo and the parameter self. (Abstraction)
2/ Use Convert To Instance Method on the static. Select the parameter you just introduced. (Abstraction)
3/ Improve the class name (Foo) to at least the Honest level. (Concretization)
4/ Go to the caller of the method. Select the creation of the new type. Introduce parameter to push it up to the caller’s caller. (Abstraction)
5/ Convert any other uses of the parameter you are encapsulating to use the field off the new class. (Concretization)

Belshee’s process, using names as the signposts for improving code, is a wonderful combination of practical walkthrough and a theory of programming. It even seems to put living flesh on my skeletal wish for Name Oriented Software Development, though, eg, stronger tool and language support for consistent dictionaries are needed to realize the full vision.

Executable Theory

This kind of divergence of functional aims is a residue of abstract design in the technical object, and the progress of a technical object is definable in terms of the progressive reduction of this margin between functions in plurivalent structures. – Simondon, ibid

Every abstraction, even one as small as an extracted method, is also a theory. These little theories then need to be applied and refined to ensure a coherent system. What Simondon saw in the evolution of mechanical engines and other industrial era machines, we can observe at smaller scale and higher frequency when engineering in our more plastic computational material.

Simondon describes machines as becoming more concrete over time, finally reaching a highly focused state where each part cleanly supports the functions of others in an overall system. He also states that the introduction of a new theory is the invention of a new machine. So perhaps he would disagree that the process is cyclical.

We can, perhaps, reconcile this if we think of each software function or class as a small widget in a larger system. In this sense of the widget = machine = function, every new method is a new Simondonian machine. This also suggests that software rarely progresses to the refined machines he describes, but is more usually an assembly of semi-refined widgets. Which sounds about right.

Once you realise abstraction and concretization are complementary, anti-parallel processes, you start noticing it everywhere. I suspect casual design phrases like “nice abstraction” are actually misleading. Ohm’s Law is a nice abstraction; modern chips that rely on parasitic capacitance in a material context of silicon are well-built machines. In working software, a nice abstraction is also a nice concretization: a well-formed widget within a coherent machine.

All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections. – David Wheeler