The Napkin Scrawls of Dining Philosophers

Technologists are, by their vocational commitment to new things, manufacturers and early adopters of language. Our commitment to language is however generally one of casual incompetence. The artifact being built, or fixed, is the focus of our attention, and the language referring to it is an after thought for the tormenting of poets, grammarians and the marketing department. Perhaps that’s as it should be, but it’s still a pleasure to read a book like Java Concurrency In Practice, which has a mastery of both language and its topic.

JCIP has already been well reviewed on its technical merits. [NB: These notes are also from 2006 and written about the first edition.] In summary, it’s a great reference. As the jacket copy points out, concurrency in software is both difficult to deal with and of renewed importance. Problems like these put stress not only on software artifacts being developed, but the context in which that artifact is built and used, like collaboration in its construction, or human and computer interfaces, or eventual maintenance. Java Concurrency In Practice also has some insights into this, but it’s not foregrounded, and seems worth exploring.

Form

JCIP is arranged according to well established conventions for concise software references, chapters, glossary and so on, with two variations. Firstly, when code snippets are introduced, they are accompanied by one of three smiley-style faces, smiling, frowning or neutral. These are large and prominently placed beside the code snippet without making it take up more space. The intent is to discuss common pitfalls without those snippets making their way into production via the copy and paste myopia that can be an occupational hazard for programmers. It also serves as direct negative or positive feedback for the reader so aversion-on-sight instincts can be reinforced towards, say, correct use of synchronization for visibility on getter methods. At first I found its appearance in an engineering textbook rather absurd, imagining Victorian bridge engineers scrawling notes on how l33t they were in their blueprint margins. I quickly got used to it though, and given its immediacy it is probably a worthwhile technique.

Secondly, Goetz and colleagues make vivid and precise use of metaphors with non-software reality. When they choose a metaphor, they commit to it, but do not overextend it (except on one occassion when making a joke). The metaphors relate to concurrency conditions in real life – for instance an explanation of race conditions in terms of meeting at Starbucks. They also do not overuse metaphors in a mistaken attempt at accessibility – I count ten in a 300 page book.

It’s also worth noting that when Goetz and colleagues include a graph of performance data, it’s wiggly, rather than impossibly smooth. They are comfortable with reality and how their abstractions relate to it.

Documentation

Metaphors, while useful, are mainly an impressive show of technical writing technique. Most of the insights on language in software engineering are nestled in section 4.5, Documentation.

This three page essay, which warrants publishing as an article in its own right, begins with the observation that documentation is useful and often awful. Reasons and techniques for good class level documentation of concurrency policies is dealt with cleanly and concisely on the first page. The remainder describes techniques for coping with poor or incomplete documentation.

While prudence suggests that we not assume behaviors that aren’t part of the specification, we have work to get done, and we are often faced with a choice of bad assumptions. Should we assume an object is thread-safe because it seems that it ought to be? Should we assume that access to an object can be made thread-safe by acquiring its lock first? […] Neither choice is very satisfying.

One way to improve the quality of your guess is to interpret the specification from the perspective of someone who will implement it (such as a container or database vendor), as opposed to someone who will merely use it.

Let’s take a step back here. Six highly regarded experts in Java development – Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes and Doug Lea – luminaries almost, are, in a serious technical reference work, suggesting that programmers need to use techniques from literary criticism to do their job, in this case to interpolate unspecified thread safety elements of class contracts. Considering a text from the viewpoint of different audiences is a critical technique. (If newspapers, films and even concerts can be “texts” then Unicode text files most certainly can.) Like critics and historians developers are often working with incomplete or secondary sources, as the source code for all libraries may not be available. The passage goes on to suggest ways a writer-centric analysis can yield insight into crucial libraries.

Servlets are always called from a container-managed thread, and it is safe to assume that if there is more than one such thread, the container knows this. The servlet container makes available certain objects that provide services to multiple servlets, such as HttpSession or servletContext. So the servlet should expect to have those objects access concurrently […]

Literary criticism went through a variety of contortions in the 20th century. Previously, the life and intentions of the author, in the sense of the person with their name on the cover, had been seen as the fundamental starting point for analysis of a text. From the 1920s or so an approach unimaginatively called The New Criticism advocated a close reading of the text alone, without a reliance on extra-textual sources such as an author’s biography. Over decades, this eventually spawned a backlash, and attempts at synthesis (maybe we can read the text closely and still think about the authors circumstances and intentions). So far as I can tell the argument seems to have settled into a post-modern non-consensus where we pick whatever perspective or tradition seems most relevant to us at the time.

It seems to me that’s about the level of sophistication we need for software development anyway. Like attempts in postmodern architecture to form a coherent building on a particular site while drawing on many traditional schools of thought, contemporary software development draws on disparate libraries (java.util, Apache commons) to solve provide a particular business solution.

Goetz and friends propose not just the traditional solution of close reading (staring at the code until your eyes bleed) but also considering the author’s intent. Literary critics have already gone through the agony of mapping this turf out for us, and so here, casually sampled, are a number of perspectives to bring to a truly intractable piece of code.

  • Aesthetic. Was the code produced in a context of certain aesthetic or design norms? Does it consciously reference known design approaches, like particular design patterns?
  • Cultural. What was the organisational context in which this library was written? Was it written yesterday? Is it a pre-alpha download off sourceforge, a third party proprietary driver, or a widely used standard library? Are the misspellings in variable names a clue to the location of the author?
  • Economic / Marxist. What were the market conditions under which this library was produced? Was it rushed in order to capitalize on perceived opportunities? Was it outsourced to the lowest cost vendor?
  • Gender / Sexuality. Has the gender of the writer brought certain perspectives with it? This has been discussed elsewhere, under the label “Girl Code”.
  • Post-colonial. Have historically asymmetric power relations between the software stakeholders impacted the design of the code? This might apply to teams spread across globabl locations, one of which would usually be the corporate headquarters.
  • User. Who was the code originally developed for? What were their needs?
  • Writer. Who wrote the code? What problems would they have to face when designing it? Do you know them? What was their background? Were they a C programmer who converted to Java? Were they a mathematics major? Did they have experience using concurrency?
  • Machine. This is the inescapably strict genre standard that a compiler imposes on a program, which you know plenty about already.

These may seem overly abstract questions, and successful organisations will have processes in place to manage them, but they still come up, and when they do they can be particularly awkward. I work on a globally located team at the moment, and one piece of advice I’ve needed is “If you’re having trouble with timing, try setting the timezone to the one the developer lives in”. In this particular case the problem is rarely one as naive as a developer rolling their own date class, rather it is usually an interaction of rogue constants, configuration and location-specific assumptions about the nature of the business day, in code that was originally designed for one deployment in the same location the developer was working in.

There are environments where costly and intensive documentation production and review is supported; let’s put them aside for a moment. Though certainly relevant, low-documentation environments are common and perhaps illustrate the point more clearly. A developer’s relationship to documentation is often like the relationship a Sunday morning drunkard has with his local priest. Similarly those in software who care about documentation can feel like parish priests in a town of drunkards. Documentation can be cast in moral terms: if you document you are in some sense a good, dutiful programmer, and if not you are a free-riding degenerate. It’s easy to fulminate at the craven ill-discipline of programmers, seeing lack of documentation as a moral failing. Perhaps it is: like so many other moral failings it is also routine. Depending on the project you work on, documentation at the class or method level is largely altruistic, because you are mostly benefiting either someone else, or the shadowy presence of your future self. Documentation in this mode, like charities, don’t do very well without a lot of social pressure, and if that pressure is not maintained the contribution winds down to a handful of stoics or temperamental altruists. In these situations, it is not economically rational for developers to document software, so they don’t.

Putting documentation in moral terms is understandable but it’s also an admission of failure. Unless it’s integrated with the process as a whole, or valued as a deliverable by itself, documentation will be awful. For many systems, the accompanying comments and specifications are simply less valuable in dollar terms than the system built around them. This only changes if the API and system is very widely used, or if communication costs become otherwise increased. Otherwise the incompleteness of documentation is infuriatingly sensible and your literary critic goggles will remain valuable.

It’s also worth noting that, for instance, the Java API is of a very high documentation standard when compared with most code found in the wild. The documentation, including the database specification, which is without concurrency documentation, was originally written in a time when awareness of concurrency issues was much less prevalent or relevant than today. Such details were, at least originally, left out in ignorant good faith. Their absence from subsequent revisions may be considered calculated malice.

It is entirely possible that in the future issues could arise of similar retrospective importance. For instance, the Collections framework was added in version 1.2 without describing the algorithmic performance of its components. Similar frameworks such as the C++ Standard Library do provide such guarantees. If single CPU performance has platueued then knowing the precise performance characteristics of these classes may become even more compelling and the absence of documentation more infuriating and apparently immoral. High performance uses of these libraries already require reading the supplied source code. (As it happens, one of the key developers of the framework was Joshua Bloch, one of the authors of JCIP and an occassional documentation sermonizer himself.)

Literary programming

The connection I draw here with literary criticism is perhaps whimsical but it is also sincere. The vast majority of software development is embedded in organisational and user needs that are inevitably expressed in natural language. We live in a time which is not particularly fast moving technologically, compared to say the Industrial Revolution, but where we do deal with an unprecedented quantity of information, and where all sorts of professionals have to use techniques to evaluate the meaning and relevance of that information. So it makes sense to take techniques and terms from people that analyze texts all the time. The humanities have certainly been taking techniques and terms from scientists and technologists for years, to an extent that may surprise. For instance, pick up a recent publication in academic history and you might hit a passage like this:

However, this cannot be a systematic comparative work with a single set of variables applied to each case. The May Fourth women differed so much from Euro-American feminist activists at the turn of the twentieth century that no single set of variables can be identified. — Wang Zheng, Women in The Chinese Enlightenment

Rather than simple appropriation, this is really representative of an intersection between the concerns of a professional historian (“What happened to women during the May the Fourth movement in China?”) and the professional software developer (“What crack was this guy smoking when he wrote this?”).

[I]t is worthwhile to consider every program as a work of literature”. — Literate Programming, Knuth 1983 Full PDF

The problems of a software developer, when grappling with this domain, can actually be much easier than a historian or literary critic. We usually work with heavy tool support, automated searching, annotations, automated documentation comments (popularised by Java, following Knuth, but only a little way), version control, a single natural language of discourse. Version control alone would make biblical scholars weep with jealousy: they’ve had to painstakingly reconstruct such things by hand. The authors of Java Concurrency In Practice are well aware of these advantages. JCIP, like 1984, ends with an appendix on language. It concerns concurrency contracts stated as annotations, like @ThreadSafe, which are used to good effect throughout the book.

These annotations, through standardisation, make documentation both easier and more expressive. They therefore lower the transaction costs (to the developer) of documentation while increasing its utility, and so should become more prevalent than natural language comments. More sophisticated tools could be built around this, such as automated checks around changes to variables involved in guard conditions. Ultimately though, as a system will run without them, documentation is always vulnerable to being omitted or obsoleted. Java Concurrency In Practice refers to circumstances like this as “an adventure in iterative specification discovery”. There’s plenty of adventures yet to be had.

Advertisements

4 thoughts on “The Napkin Scrawls of Dining Philosophers

  1. I’m late to this thread — to this whole blog, actually. I can’t believe I haven’t been following this blog all along, but have happily remedied that oversight and spent a lovely day catching up on the archive. Anyway, back on topic, as they say …

    I remember your recommendation of JCIP from long ago for its linguistic and stylistic merits; I added it to my overflowing Amazon wishlist back then, but this post has made me seriously consider acquiring it again.

    I wish I had more profound things to say in the vein of a lit crit approach to software (except to note that we don’t call them programming “languages” for nothing and that code is absolutely a means of expression and communication). Instead, some scattered notes:

    Your first paragraph triggers a whole cluster of thoughts about technology and the words that go with it — though it seems tangential to your main thesis, many of the challenges I face are not so much in describing concepts and artifacts as in finding the correct labels for them (not to mention identifying the right conceptual chunks that need labelling). There’s an interesting philosophical/aesthetic divide, for example, between abstract or arbitrary product names and more descriptive ones — spreadsheets named after aspirational verbs as opposed to anything called a Noun Verber — but the problem also applies to things at a deeper level that Marketing usually doesn’t care about. It’s sort of an extension of the practice of choosing good method names. I should probably put something in the queue to write about this at further length sometime.

    It is, as you note, usually a chore for the developer to document his own code (at least at the API level), and there is usually very little incentive to do a good job of it. However, I realize now that as someone who’s never (or rarely) been in the position of having to interpret API doc or its underlying code in order to achieve anything practical, I’m not necessarily well equipped to critique such documentation on anything but a purely mechanical level (even presuming that the schedule permits anything more than a cursory review). Also, having seen firsthand the deleterious effects of the lack of documentation about the performance characteristics of the Java Collections framework, but not knowing whether I’d have the insight to identify and correct such a problem personally, I wonder if there’s a better way to use the combined skills of the various people typically employed by a software company to produce more useful information about what they do. To echo the closing of your post, just about everything to do with software is an iterative adventure; one only hopes that with each iteration we actually improve upon things.

    • We’ve been identifying code as expressions of language – as you say, programming languages – for sixty or so years, and yet it seems few except Knuth have taken the identification to it’s logical extent. And it is an identification, not just a metaphor, and one it’s rare to take issue with.

      On naming, our names at work are just acronym soup, I suspect we get away with it as we’re not retail. Names are absolutely key in software though – I think Ken Beck has called renaming the most powerful refactoring we have.

      There’s a Bertrand Meyer quote I have been using a bit too much lately, “It is crucial to find, as a criterion for decomposition, properties less volatile than the system’s main function.” (It’s from his opus Object Oriented Software Construction which my bookmark has been lodged embrassingly early in for years.) I think this one of the reasons naming software is so hard though – not just the newness but the mutability.

      There is also an interesting example on the c2 wiki of discovering domain concepts and new names through emergent design.

      As much as I agree with the ideas of tacit knowledge and meaning through use, there surely must be a better way to bring the intersection of skillsets needed to build software together in a way that produces useful information about it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s