Refactoring the Argo

Le vaisseau Argo ~ The ship Argo

A frequent image: that of the ship Argo (luminous and white), each piece of which the Argonauts gradually replaced, so that they ended with an entirely new ship, without having to alter either its name or its form. This ship Argo is highly useful: it affords the allegory of an eminently structural object, created not by genius, inspiration, determination, evolution, but by two modest actions (which cannot be caught up in any mystique of creation): substitution (one part replaces another, as in a paradigm) and nomination (the name is in no way linked to the stability of the parts): by dint of combinations made within one and the same name, nothing is left of the origin: Argo is an object with no other cause than its name, with no other identity than its form.

Another Argo: I have two work spaces, one in Paris, the other in the country. Between them there is no common object, for nothing is ever carried back and forth. Yet these sites are identical. Why ? Because the arrangement of tools (paper, pens, desks, clocks, calendars) is the same: it is the structure of the space which constitutes its identity. This private phenomenon would suffice to shed some light on structuralism: the system prevails over the very being of objects.

— Roland Barthes, Roland Barthes

The Argo, Constantine Volanakis

The Argo (luminous and white) is a software system and the Argonauts its human components. They trim its sails and move its oars; its rudder steers by the use of a helmsman.

When each piece comes in turn to be replaced, it is substituted for a better part; newer and hence luminous, but in keeping with the shape of the old. The system is rebuilt anew without an act of raw greenfield creation.

The Argonauts are skilled and far from friendly ports: they repair the ship as they sail it. This too is why one traveller will tell of the grand trireme Argo, and others of a swift catamaran, an Argo with a glorious triangular sail, growing ever swifter, ever smoother as time passes. But most will tell of strange hybrid ships, nautical chimera where names and shapes are stretched before losing now redundant pieces entirely, and reforming around a new coherence.

Some stories tell of another Argo, also sailing and perpetually repaired. When each part eventually decays and breaks, it is replaced with the most perfect imitation, in shape, strength, flex, texture and colour. Some say this Argo has a beautiful unity lacking in the first, where the bastard styles and material of past and future ships are always found together. This Argo is easier to find than the others, for though its Argonauts are strong and brave in battle, they never sail more than a few weeks from Thessaly. It is there, in the forest on the outskirts of Iolcus, that a copse of tall trees grow fast and strong. They are excellent for building triremes, and the Argonauts only source of timber. Jason has long ago died, and been replaced with other princes, with their own usurped claims. None of them have held the Golden Fleece.

(I learnt of this Barthesian image from Jenny Turner’s review of Maggie Nelson’s The Argonauts.)

Ink and Rubber

To a developer, an unfamiliar system can be like a picture drawn with an Etch-A-Sketch. You can see what is there, and some relationships between the pieces, by location, style or theme. When you have to alter the picture, though, difficulties arise. The Etch-A-Sketch eraser is quite coarse. It is easy to destroy existing parts of the picture. This is if you can find the eraser at all, and it hasn’t fallen behind the couch, or been chewed by the dog. If you really have to remove something, you may as well just pick up the whole thing and shake it back to a blank slate.

If you are very attached to the picture, it is still possible to add detail on existing empty ground. When large areas of white space are still free, you can choose a space to draw something new fairly easily. Once the picture is detailed, or you need to make a change related in a specific way to the existing picture, this is basically reduced to coloring in between the lines.

We code like we are painting with ink.

Our instinct is not to reuse and refactor, but to tweak and rewrite. Why? Fear and pride, perhaps. We aren’t born coding, though, so these emotional reactions are learnt because of some aspect of working with software. I’d suggest both behaviours stem from incomplete or untrusted knowledge of the existing code.

Peter Naur described programming as theory building.

Programming in this sense primarily must be the programmers’ building up knowledge of a certain kind, knowledge taken to be basically the programmers’ immediate possession, any documentation being an auxiliary, secondary product.

Naur is talking of a mental model, not the formal and external symbolic expression of an equation or a Java class. The model may need to be sophisticated, but it is also defined by its working immediacy. Your model of the code today may easily be different to a year ago, even if the code has not changed. Naur doesn’t relate the two, but to me it is also reminiscent of what sociologist Patricia Hill Collins calls everyday theory. Hill Collins is referring to theories of society, and she contrasts everyday theory with High Theory constructed in and for an academic setting, like, say, Marx’s dialectical materialism. Everyday theory is collective and social, may lack scholarly depth at times, but it also has the workaday vigour of something used every day to navigate a system.

If we follow Naur in considering the main activity in programming to be building mental models, even if the deliverables are software artefacts, one implication is the cost of building a theory – the cost of learning – dominates everyday coding. Writing a new class is then easier because the mental model is built by the writer while they are writing. At a larger scale, this suggests why the greenfield myth is so beguiling. Architecture suffers the same costs of understanding as programs. There’s a lot of interest in microservices at the moment, which is basically the rewrite elevated to an architectural ideal. I guess it works very well in certain domains and organisations, but I haven’t used auto-balkanisation of this sort myself, and at any rate there are many running systems not written that way.

Colouring-in behaviour is a bit harder to describe. It’s when a developer goes to painstaking effort to minimise changes to lines of code, especially structural elements like methods and classes. It’s similar to what Michael Feathers recently described as shoving, except he is using a different spatial metaphor.

Colouring-in is mentally cheaper because of scope. The scope of your working theory doesn’t have to include the interaction of class structures and broader parts of the codebase. From the perspective of the developer and that specific fix, it can even seem neater than more structurally impactful solutions, such as factoring out new methods or consolidating emerging redundancy. Poor unit tests exacerbate this, increasing the risk of errors after changes are made, discouraging refactoring.

Unit tests are also interesting for mental models because they are effectively worked examples of the theory the developer was using at the time. Good tests illustrate an idea as well as describing edge cases, and this decreases the cost of learning, and provides quicker feedback on the suitability of the programmer’s working theory.

The computational material of software is often brittle for a user in the way it runs, but the material itself is strikingly plastic at development time. It’s less like ink painted on paper and more like lego fused with rubber bands. Refactoring recognises this plasticity, and works with it. Compared to reinforced concrete, software is easy to change. Sometimes our minds are not.

VII.1 Reuse

子日,述而不作,信而好古,窃比于我老彭. – 论语 七:一

The Master said, ‘I transmit but do not innovate; I am truthful in what I say and devoted to antiquity. I venture to compare myself to your Old P’eng.’ – Analects VII.1 (Lau)

Before contemplating the process implications of this radically static statement, let’s note that from the perspective of designed code itself, it is always true. Code transforms and transmits information. This is the garbage-in garbage-out principle. Designed code (not genetic or evolved code) does not innovate.

Backups of the user directory for the Analects’ source control repository are, alas, lost to antiquity, and though many sophisticated data recovery techniques have been tried, with some success, none have yielded the identity of Old P’eng. Our ignorance of him highlights our relationship with Confucius and with any classical tradition. To us, Confucius founded a philosophical school, but in his own words he merely continued a tradition that we can see indirectly, if at all.

Scholarly consensus is that Confucius is deliberately overstating his lack of innovation for reasons of rhetoric or modesty (see eg DC Lau, AC Graham, or just wikipedia on this verse). Nevertheless the verse is considered pivotal in understanding Confucius’ traditionalism and conservatism in a time of extraordinary violence and social change.

Existing solutions are useful in at least two ways. 

Firstly they may capture unintuitive theoretical results in accessible ways. Many algorithm design and data structure results are now in this category, such as sorting algorithms and efficient concurrent maps (eg the Java 6 lock free implementation of java.util.ConcurrentHashMap). The formal scientific characterization of such solutions in terms of, say, algorithmic complexity and performance benchmarks  make computer theoretic literacy crucial. Programmers will be unlikely to understand the derivation by reading the code, so they must be able to read the documentation. 

Secondly they may capture highly specific details of the environment and robust solutions to managing it. This will include successful workarounds for under-specified elements of protocols, or flat-out incorrect but popular implementations. Any user of say Ruby on Rails or Tomcat takes advantage of this kind of reuse. Consider too the domain specific details and tolerances of a fly-by-wire control system for a particular make and model of plane.

These two kinds of reuse may be contrived to lie on a spectrum, but I’ve chosen to distinguish them here for their correspondence to two different categories of knowledge – logos and metis. In classical Greek epistemology logos is theoretic universal knowledge and metis is hard won cunning, “feel”, or craft knowledge (as an aspect of techne, craft knowledge and theory). James C. Scott describes the Greek hero Odysseus, surgeons and maritime pilots as all relying on metis (Seeing Like A State). Scott also makes the connection between traditional knowledge – which is particular and tied to a society and geography – and common law conservatism in the tradition of Edmund Burke and Michael Oakeshott.

Confucius is claimed as a kind of Burkean conservative, for instance, by James Kalb. Both Confucius and Burke grew up in societies with small literate elites and large impoverished peasantries. They both share senses of the worth of settled convention, the importance of teaching and the literary canon, a paternalistic affection for heredity power, and a sympathy for the welfare of everyday people.  Neither are they reactionaries, but welcome improvement at a humane pace (IX.3).

Seeing Burke and Confucius as similar is not mainstream and deserves a dedicated analysis of its own. (My searches revealed more extant work linking both of them individually to Wittgenstein than to each other, but pointers are always welcome.) In a comprehensive entry for Burke in the Stanford Encyclopedia of Philosophy, Ian Harris argues that despite being more often claimed by the right wing, he does not have a clear modern partisan successor. Nevertheless, distinguished scholars like DC Lau or AC Graham stay well clear of Western political comparisons, while happily comparing classical Chinese figures with Western philosophers. 

Unusually, a software library, and all the hard won craft knowledge that comes with it, can be imported into another with extraordinary ease when compared to other forms of craft knowledge. A pilot is of little advantage outside his home port, and Ruby on Rails is of little use for 3D rendering, but in software we can copy the pilot and use him on innumerable ships entering that port. We can also ultimately read the source code to Ruby on Rails and determine how it tolerates the idiosyncrasies of particular browsers and servers. This is because all code is built on a formal information substrate – the computational medium. (This is Harrison Ainsworth’s term and his note on reuse provided a number of the connections in this post.)

Not all craft knowledge of a codebase is encapsulated in the codebase. There are particularities of the install, workaround scripts, configuration, scheduled jobs and so on, but these are ultimately digital artifacts easily included within a slightly broader view of what a codebase is (this latter is a premise of DevOps and for anyone serious about a controlled environment). More problematically, there are conventions of use, design choices, oral traditions of “check here when you change there”, and so on. At the limit, all codebases are incomplete. They depend on co-texts, results and knowledge of the domain that need not be encoded. An air traffic control system does not need a textbook description of Bernoulli’s Principle.

Burke and Scott argue that in an established society important, non-obvious, traditional knowledge is captured in social conventions and established practice, and the practice cannot be simplified without a loss of valuable situational knowledge. Scott additionally points out that such an environment is very difficult for an outsider to navigate and there are strong motivations for central political power to apply simplifications to it.

Yet highly particular, ‘local’ code that requires hands on experience and knowledge of accompanying conventions most frequently has another name in software development: bad. Or: spaghetti. Or: legacy. The sentiment is well captured in Qi’s koan on fear, even if it does riff off an opposing classical Chinese tradition. (In Confucian terms we might note the building is not harmonious.)

In No Silver Bullet, Brooks distinguishes accidental and inherent complexity, with the latter being an attribute of the underlying problem rather than any specific software or hardware implementation. Complexity due to poor or improvable design is always accidental; that due to the problem domain is by definition inherent.

An aesthetic sense of good or poor design becomes crucial when pursuing aggressive reuse (VII.14). Without it you will simply perpetuate junk.

Having argued the link between conservatism and software reuse, it is worth being a little more precise about flavours of conservatism. William F. Buckley famously described it as that which “stands athwart history, yelling Stop, at a time when no one is inclined to do so, or to have much patience with those who so urge it.” Despite its partisan origins, this is a good start, as it illustrates certain threads of environmentalism and the idea of heritage listing fall easily under the same banner. ((It is also useful to think of contemporary US Democrats defending Franklin D. Roosevelt’s New Deal, or opposition to changes to Britain’s NHS in this frame.))

In its purest form, this can be “return to a golden age” conservatism. There’s certainly an argument that Confucius would have been happy with a reversion to the society of the Eastern Zhou. We should again temper our interpretation by wondering how much is rhetoric covering adaptation of tradition to new times. In software, certainly, simply reactionary approaches are of little use. Brooks and the founders of eXtreme Programming have both noted that a more effective strategy is to embrace change. Oakeshott argues in On Being Conservative that settings of widespread and enthusiastic change are in particular need of an awareness of the value of what exists now. A traditionalist most often defends the present versus the future, not the past versus the present. This conservative disposition’s usefulness to software is more apparent if taken as an analytic tool rather than an inherent aspect of personality. After all, the greenfield doesn’t exist (see X.18), and any project that pretends to be a greenfield is an interesting lie.

Conservative thought in this vein usually emphasizes working within a tradition and a community – in software we would say platform. This also suggests interesting contours for the breadth of possible reuse; and there are other verses, such as XVI.11, where that might be explored. What is immediately apparent is the narrowness and fragility of an entirely in-house platform due to the smallness of its developer community; and the need for a shared jargon (XIII.3) and perhaps a canon (XVI.13).

Given the corpus of extant code in the form of libraries, to adore antiquity is to know your platform, including its innards, not just thoughtless rote quoting via copy and paste. At this moment in software, to reuse and extend is a greater service than extraneous self-involvement masquerading as innovation.

If you can easily find some code and copy it, you get the result at zero cost. That is an efficiency that cannot be beaten: no amount of programming tool and technique improvements can ever do that. So we want to maximise reuse. – Fred Brooks, No Silver Bullet

X.18 When his lord gave him a gift of a live animal, he invariably reared it

君赐食,必正席先尝之。君赐腥,必熟而荐之。君赐生,必畜之侍食于君。君祭,先饭。– 论语,十:十三

When his lord gave a gift of cooked food, the first thing he invariably did was to taste it after having adjusted his mat. When his lord gave him a gift of uncooked food, he invariably cooked it and offered it to the ancestors. When his lord gave him a gift of a live animal, he invariably reared it. At the table of his lord, when his lord had made an offering before the meal he invariably started with the rice first. — Analects X.18 (Lau)

We must respect working systems. Working systems are living systems and have a life that should be respected. In these early years of software, being handed a living system is often being handed a legacy system. And legacy systems are off-putting to some of the technical senses. They offend our inner fashion designer with their unfashionableness. They offend our inner engineer with their inelegance, or our inner scientist with their reliance on a dead paradigm.

Or they might simply offend our inner creator, because an adopted system is not our system: we have fallen for the greenfield myth. There is no true greenfield in contemporary software, and that is a mark of our success. In Java: you started a new eclipse project for this; it depends on fifty other libraries. You wrote your own library set; they depend on Java standard libraries and the virtual machine. You wrote your own compiler or interpreter: they depend on the operating system. You wrote your own operating system: it depends on the hardware. (But you didn’t really write your own operating system, did you? Even Linus Torvalds needed the GNU toolset.)

And even if you doublethink the greenfield in your mind into existence, you make shiny new eclipse project a tabula rasa, you will leave your desk at the end of that first day. And on the second day, when you unlock the screen and return to your code, you will be doing software maintenance.

When your client, or your corporate lord gives you the gift of a live system, what is your response? The ceremony exists because the logic of the organisation brought it into being, and this is to be respected (恕). It will probably need to change, because to live is to change. Perhaps it needs to merge with some other system, or needs to do its function immediately instead of overnight. This is to the good. When your lord gives you a gift of a live animal, you invariably rear it.