Reducing Abstraction in Code

Abstraction is something we are taught to value as programmers, and the process of finding common patterns across parts of a system is one programmers are usually good at. There is another equally important process of improving systems by collapsing redundancy and abstraction. Gilbert Simondon names this “concretization”.

Primitives

Concretization removes parts and makes machines more specific. A simple example is the abbreviation of clutter by replacing with clearer syntax. Say in Python

if available == True:
   reserve()

to

if available:
   reserve()

Or in JUnit:

assertTrue(false);

to

fail();

These are behaviour-preserving design improvements, or in other words, refactorings. They often turn up in novice programmer code or code written by people new to a language or toolset. Other primitive concretizing refactorings might be dead code removals, such as Remove Unused Parameter.

Another primitive concretization step is recognizing that a variable with a highly general type can be typed more precisely. A String, byte[] or a void* are highly general types, in that they can hold pretty much anything. Replacing with a more specific type usually relies on a precondition, either implicitly or explicitly.

int age = Integer.parseInt(ageStr);

In this case the potential throwing of NumberFormatException entails an implicit precondition. The concretizing step is the refactoring that introduces the typed variable.

Wait, but isn’t the problem with using Strings and primitive objects everywhere that they lack abstraction? Yes. They indicate that the code lacks an explicit model, or in other words, abstractions. They also indicate the code lacks concretizations – specifics from the problem domain that make it a well-focused machine. (Lacking both abstraction and concretization indicates ontological slime, a wonderful term from William Wimsatt, and perhaps the topic of another post.)

For a multi-line example of primitive concretization, consider this refactoring available when going from Java 1 to 5:

Iterator expenseIter = expenses.iterator();
while (expenseIter.hasNext()){
  Expense expense = (Expense)expenseIter.next();
  sum += expense.getExpenseValue();
}

to

for (Expense expense: expenses){
  sum += expense.getExpenseValue();
}

This mirrors the evolution of Java itself as a technical object and iteration as a technical concept. I’ve written about Simondon and the history of looping at more length elsewhere. Specialization and reduction are near-synonyms more frequently used in programming, but because of the clearer relationship to abstraction, and the connection to Simondon, I stick with concretization here, at the cost of a few more syllables. (Reification is a different concept, in my opinion.)

Interleaving Abstraction and Concretization

The adjunction of a supplementary structure is not a real progress for the technical object unless that structure is concretely incorporated into the ensemble of the dynamic system of its operation. – Simondon, Mode of Existence of Technical Objects, Mellamphy trans.

Design improvements often include both abstracting and concretizing steps. The feeling is of abstraction clearing space and concretization then making better use of it.

Michael Feathers’ use of characterization tests is an example of starting a design process with concretization.

    @Test
    public void removesTextBetweenAngleBracketPairs() {
        assertEquals("", Pattern.formatText(""));
    }

Characterization tests stabilize the function of the machine by pinning down very specific behaviors in the form of facts. This then allows a round of refactorings and rewrites. The immediate next step would often be abstracting refactorings such as Extract Method and Extract Class (naming a clump of things introduces an abstraction and an indirection).

Arlo Belshee’s Naming Is A Process also interleaves abstracting and concretizing steps.

Missing to Nonsense – Abstraction
Nonsense to Honest – Concretization
Honest to Honest and Complete – Concretization
Honest and Complete to Does the Right Thing – Abstraction
Does the Right Thing to Intent – Concretization
Intent to Domain Abstraction – Abstraction

A number of these steps, especially in the later half, themselves consist of interleaved abstracting and concretizing sub-steps. Eg in Honest and Complete:

1/ Use Introduce Parameter Object. Select just the one parameter you want to encapsulate. Name the class Foo and the parameter self. (Abstraction)
2/ Use Convert To Instance Method on the static. Select the parameter you just introduced. (Abstraction)
3/ Improve the class name (Foo) to at least the Honest level. (Concretization)
4/ Go to the caller of the method. Select the creation of the new type. Introduce parameter to push it up to the caller’s caller. (Abstraction)
5/ Convert any other uses of the parameter you are encapsulating to use the field off the new class. (Concretization)

Belshee’s process, using names as the signposts for improving code, is a wonderful combination of practical walkthrough and a theory of programming. It even seems to put living flesh on my skeletal wish for Name Oriented Software Development, though, eg, stronger tool and language support for consistent dictionaries are needed to realize the full vision.

Executable Theory

This kind of divergence of functional aims is a residue of abstract design in the technical object, and the progress of a technical object is definable in terms of the progressive reduction of this margin between functions in plurivalent structures. – Simondon, ibid

Every abstraction, even one as small as an extracted method, is also a theory. These little theories then need to be applied and refined to ensure a coherent system. What Simondon saw in the evolution of mechanical engines and other industrial era machines, we can observe at smaller scale and higher frequency when engineering in our more plastic computational material.

Simondon describes machines as becoming more concrete over time, finally reaching a highly focused state where each part cleanly supports the functions of others in an overall system. He also states that the introduction of a new theory is the invention of a new machine. So perhaps he would disagree that the process is cyclical.

We can, perhaps, reconcile this if we think of each software function or class as a small widget in a larger system. In this sense of the widget = machine = function, every new method is a new Simondonian machine. This also suggests that software rarely progresses to the refined machines he describes, but is more usually an assembly of semi-refined widgets. Which sounds about right.

Once you realise abstraction and concretization are complementary, anti-parallel processes, you start noticing it everywhere. I suspect casual design phrases like “nice abstraction” are actually misleading. Ohm’s Law is a nice abstraction; modern chips that rely on parasitic capacitance in a material context of silicon are well-built machines. In working software, a nice abstraction is also a nice concretization: a well-formed widget within a coherent machine.

All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections. – David Wheeler

Advertisements

Symbiotic Design

Do we build code, or grow it? I was fortunate enough to attend a Michael Feathers workshop called Symbiotic Design earlier in the year, organized by the good people at Agile Singapore, where he is ploughing biological and psychological sources for ideas on how to manage codebases. There’s also some links to Naur and Simondon’s ideas on technical objects and programming that weren’t in the workshop but are meanderings of mine.

Ernst Haeckel - Trachomedusae, 1904 (wiki commons)

Ernst Haeckel – Trachomedusae, 1904 (wiki commons)

Feathers literally wrote the book on legacy code, and he’s worked extensively on techniques for improving the design of code at the line of code level. Other days in the week focused on those How techniques; this session was about why codebases change the way they do (ie often decaying), and techniques for managing the structures of a large codebase. He was pretty clear these ideas were still a work in progress for him, but they are already pretty rich sources of inspiration.

I found the workshop flowed from two organizing concepts: that a codebase is an organic-like system that needs conscious gardening, and Melvin Conway’s observation that the communication structure of an organization determines the shape of the systems its people design and maintain (Conway’s Law). The codebase and the organization are the symbionts of the workshop title. Some slides from an earlier session give the general flavour.

Feathers has used biological metaphors before, like in the intro to Working Effectively With Legacy Code:

You can start to grow areas of very good high-quality code in legacy code bases, but don’t be surprised if some of the steps you take to make changes involve making some code slightly uglier. This work is like surgery. We have to make incisions, and we have to move through the guts and suspend some aesthetic judgment. Could this patient’s major organs and viscera be better than they are? Yes. So do we just forget about his immediate problem, sew him up again, and tell him to eat right and train for a marathon? We could, but what we really need to do is take the patient as he is, fix what’s wrong, and move him to a healthier state.

The symbiotic design perspective sheds light on arguments like feature teams versus component teams. Feature teams are a new best practice, and for good reasons – they eliminate queuing between teams, work against narrow specialization, and promote a user or whole-product view over a component view. They do this by establishing the codebase as a commons shared by many feature teams. So one great question raised is “how large can the commons of a healthy codebase be?” Eg there is a well known economic effect of the tragedy of the commons, and a complex history of the enclosure movement behind it. I found it easy to think of examples from my work where I had seen changes made in an essentially extractive or short-term way that degraded a common codebase. How might this relate to human social dynamics, effects like Dunbar’s number? Presumably it takes a village to raise a codebase.

Feathers didn’t pretend to have precise answers when as a profession we are just starting to ask these questions, but he did say he thought it could vary wildly based on the context of a particular project. In fact that particularity and skepticism of top down solutions kept coming up as part of his approach in general, and it definitely appeals to my own anti-high-modernist tendency. I think of it in terms of the developers’ theory of the codebase, because as Naur said, programming is theory building. How large a codebase can you have a deep understanding of? Beyond that point is where risks of hacks are high, and people need help to navigate and design in a healthy way.

((You could, perhaps, view Conway’s Law through the lens of Michel Foucault, also writing around 1968: the communication lines of the organization become a historical a priori for the system it produces, so developers promulgate that structure without discussing it explicitly. That discussion deserves space of its own.))

Coming back to feature teams, not because the whole workshop was about that, but because it’s a great example, if you accept an organizational limit on codebase size, this makes feature/component teams a spectrum, not good vs evil. You might even, Feathers suggests, strategically create a component team, to help create an architectural boundary. After all, you are inevitably going to impact systems with your organizational design. You may as well do it consciously.

A discussion point was a recent reaction to all of these dynamics, the microservices approach, of radically shrinking the size of system codebases, multiplying their number and decentralizing their governance. If one component needs changes, the cost of understanding it is not large, and you can, according to proponents, just rewrite it. The organizational complement of this is Fred George’s programmer anarchy (video). At first listen, it sounds like a senior manager read the old Politics Oriented Software Development article and then went nuts with an organizational machete. I suspect that where that can work, it probably works pretty well, and where it can’t, you get mediocre programmers rewriting stuff for kicks while the business paying them drowns in a pool of its own blood.

Another architectural approach discussed was following an explicitly evolutionary approach of progressively splitting a codebase as it grew. This is a technique Feathers has used in anger, with the obvious biological metaphors being cell meiosis and mitosis, or jellyfish reproduction.

The focus on codebases and the teams who work on them brings me back to Gilbert Simondon’s idea of the “theatre of reciprocal causality”. Simondon notes that technical objects’ design improvements have to be imagined as if from the future. They don’t evolve in a pure Darwinian sense of random mutations being winnowed down by environmental survival. Instead, when they improve, especially when they improve by simplification and improved interaction of their components, they do so by steps towards a potential future design, which after the steps are executed, they become. This is described in the somewhat mindbending terms of the potential shape of the future design exerting a reverse causal influence on the present: hence the components interact in a “theatre of reciprocal causality”.

This is exactly what programmers do when they refactor legacy code. Maybe you see some copy-pasted boilerplate in four or five places in a class. So you extract it as a method, add some unit tests, and clean up the original callers. You delete some commented out code. Then you notice that the new method is dealing with a concept you’ve seen somewhere else in the codebase. There’s a shared concept there, maybe an existing class, or maybe a new class that needs to exist, that will tie the different parts of the system together better.

That’s the theatre of reciprocal causality. The future class calling itself into existence.

So, given the symbiosis between organization and codebase, the question is, who and what is in the theatre? Which components and which people? Those are the components that have the chance to evolve together into improved forms. If it gets too large, it’s a stadium where no one knows what’s going on, one team is filming a reality TV show about teddy bears and another one is trying to stage a production of The Monkey King Wreaks Havoc In Heaven. One of the things I’ve seen with making the theatre very small, like some sort of Edinburgh Fringe Festival production with an audience of two in the back of an old Ford Cortina, is you keep that component understandable, but cut off its chance at technical evolution and improvement and consolidation. I’m not sure how that works with microservices. Perhaps the evolution happens through other channels: feature teams working on both sides of a service API, or on opportunistically shared libraries. Or perhaps teams in developer anarchy rewrite so fast they can discard technical evolution. Breeding is such a drag when drivethru immaculate conception is available at bargain basement prices.

Ink and Rubber

To a developer, an unfamiliar system can be like a picture drawn with an Etch-A-Sketch. You can see what is there, and some relationships between the pieces, by location, style or theme. When you have to alter the picture, though, difficulties arise. The Etch-A-Sketch eraser is quite coarse. It is easy to destroy existing parts of the picture. This is if you can find the eraser at all, and it hasn’t fallen behind the couch, or been chewed by the dog. If you really have to remove something, you may as well just pick up the whole thing and shake it back to a blank slate.

If you are very attached to the picture, it is still possible to add detail on existing empty ground. When large areas of white space are still free, you can choose a space to draw something new fairly easily. Once the picture is detailed, or you need to make a change related in a specific way to the existing picture, this is basically reduced to coloring in between the lines.

We code like we are painting with ink.

Our instinct is not to reuse and refactor, but to tweak and rewrite. Why? Fear and pride, perhaps. We aren’t born coding, though, so these emotional reactions are learnt because of some aspect of working with software. I’d suggest both behaviours stem from incomplete or untrusted knowledge of the existing code.

Peter Naur described programming as theory building.

Programming in this sense primarily must be the programmers’ building up knowledge of a certain kind, knowledge taken to be basically the programmers’ immediate possession, any documentation being an auxiliary, secondary product.

Naur is talking of a mental model, not the formal and external symbolic expression of an equation or a Java class. The model may need to be sophisticated, but it is also defined by its working immediacy. Your model of the code today may easily be different to a year ago, even if the code has not changed. Naur doesn’t relate the two, but to me it is also reminiscent of what sociologist Patricia Hill Collins calls everyday theory. Hill Collins is referring to theories of society, and she contrasts everyday theory with High Theory constructed in and for an academic setting, like, say, Marx’s dialectical materialism. Everyday theory is collective and social, may lack scholarly depth at times, but it also has the workaday vigour of something used every day to navigate a system.

If we follow Naur in considering the main activity in programming to be building mental models, even if the deliverables are software artefacts, one implication is the cost of building a theory – the cost of learning – dominates everyday coding. Writing a new class is then easier because the mental model is built by the writer while they are writing. At a larger scale, this suggests why the greenfield myth is so beguiling. Architecture suffers the same costs of understanding as programs. There’s a lot of interest in microservices at the moment, which is basically the rewrite elevated to an architectural ideal. I guess it works very well in certain domains and organisations, but I haven’t used auto-balkanisation of this sort myself, and at any rate there are many running systems not written that way.

Colouring-in behaviour is a bit harder to describe. It’s when a developer goes to painstaking effort to minimise changes to lines of code, especially structural elements like methods and classes. It’s similar to what Michael Feathers recently described as shoving, except he is using a different spatial metaphor.

Colouring-in is mentally cheaper because of scope. The scope of your working theory doesn’t have to include the interaction of class structures and broader parts of the codebase. From the perspective of the developer and that specific fix, it can even seem neater than more structurally impactful solutions, such as factoring out new methods or consolidating emerging redundancy. Poor unit tests exacerbate this, increasing the risk of errors after changes are made, discouraging refactoring.

Unit tests are also interesting for mental models because they are effectively worked examples of the theory the developer was using at the time. Good tests illustrate an idea as well as describing edge cases, and this decreases the cost of learning, and provides quicker feedback on the suitability of the programmer’s working theory.

The computational material of software is often brittle for a user in the way it runs, but the material itself is strikingly plastic at development time. It’s less like ink painted on paper and more like lego fused with rubber bands. Refactoring recognises this plasticity, and works with it. Compared to reinforced concrete, software is easy to change. Sometimes our minds are not.