Technologists are, by their vocational commitment to new things, manufacturers and early adopters of language. Our commitment to language is however generally one of casual incompetence. The artifact being built, or fixed, is the focus of our attention, and the language referring to it is an after thought for the tormenting of poets, grammarians and the marketing department. Perhaps that’s as it should be, but it’s still a pleasure to read a book like Java Concurrency In Practice, which has a mastery of both language and its topic.
JCIP has already been well reviewed on its technical merits. [NB: These notes are also from 2006 and written about the first edition.] In summary, it’s a great reference. As the jacket copy points out, concurrency in software is both difficult to deal with and of renewed importance. Problems like these put stress not only on software artifacts being developed, but the context in which that artifact is built and used, like collaboration in its construction, or human and computer interfaces, or eventual maintenance. Java Concurrency In Practice also has some insights into this, but it’s not foregrounded, and seems worth exploring.
JCIP is arranged according to well established conventions for concise software references, chapters, glossary and so on, with two variations. Firstly, when code snippets are introduced, they are accompanied by one of three smiley-style faces, smiling, frowning or neutral. These are large and prominently placed beside the code snippet without making it take up more space. The intent is to discuss common pitfalls without those snippets making their way into production via the copy and paste myopia that can be an occupational hazard for programmers. It also serves as direct negative or positive feedback for the reader so aversion-on-sight instincts can be reinforced towards, say, correct use of synchronization for visibility on getter methods. At first I found its appearance in an engineering textbook rather absurd, imagining Victorian bridge engineers scrawling notes on how l33t they were in their blueprint margins. I quickly got used to it though, and given its immediacy it is probably a worthwhile technique.
Secondly, Goetz and colleagues make vivid and precise use of metaphors with non-software reality. When they choose a metaphor, they commit to it, but do not overextend it (except on one occassion when making a joke). The metaphors relate to concurrency conditions in real life – for instance an explanation of race conditions in terms of meeting at Starbucks. They also do not overuse metaphors in a mistaken attempt at accessibility – I count ten in a 300 page book.
It’s also worth noting that when Goetz and colleagues include a graph of performance data, it’s wiggly, rather than impossibly smooth. They are comfortable with reality and how their abstractions relate to it.
Metaphors, while useful, are mainly an impressive show of technical writing technique. Most of the insights on language in software engineering are nestled in section 4.5, Documentation.
This three page essay, which warrants publishing as an article in its own right, begins with the observation that documentation is useful and often awful. Reasons and techniques for good class level documentation of concurrency policies is dealt with cleanly and concisely on the first page. The remainder describes techniques for coping with poor or incomplete documentation.
While prudence suggests that we not assume behaviors that aren’t part of the specification, we have work to get done, and we are often faced with a choice of bad assumptions. Should we assume an object is thread-safe because it seems that it ought to be? Should we assume that access to an object can be made thread-safe by acquiring its lock first? […] Neither choice is very satisfying.
One way to improve the quality of your guess is to interpret the specification from the perspective of someone who will implement it (such as a container or database vendor), as opposed to someone who will merely use it.
Let’s take a step back here. Six highly regarded experts in Java development – Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes and Doug Lea – luminaries almost, are, in a serious technical reference work, suggesting that programmers need to use techniques from literary criticism to do their job, in this case to interpolate unspecified thread safety elements of class contracts. Considering a text from the viewpoint of different audiences is a critical technique. (If newspapers, films and even concerts can be “texts” then Unicode text files most certainly can.) Like critics and historians developers are often working with incomplete or secondary sources, as the source code for all libraries may not be available. The passage goes on to suggest ways a writer-centric analysis can yield insight into crucial libraries.
Servlets are always called from a container-managed thread, and it is safe to assume that if there is more than one such thread, the container knows this. The servlet container makes available certain objects that provide services to multiple servlets, such as HttpSession or servletContext. So the servlet should expect to have those objects access concurrently […]
Literary criticism went through a variety of contortions in the 20th century. Previously, the life and intentions of the author, in the sense of the person with their name on the cover, had been seen as the fundamental starting point for analysis of a text. From the 1920s or so an approach unimaginatively called The New Criticism advocated a close reading of the text alone, without a reliance on extra-textual sources such as an author’s biography. Over decades, this eventually spawned a backlash, and attempts at synthesis (maybe we can read the text closely and still think about the authors circumstances and intentions). So far as I can tell the argument seems to have settled into a post-modern non-consensus where we pick whatever perspective or tradition seems most relevant to us at the time.
It seems to me that’s about the level of sophistication we need for software development anyway. Like attempts in postmodern architecture to form a coherent building on a particular site while drawing on many traditional schools of thought, contemporary software development draws on disparate libraries (java.util, Apache commons) to solve provide a particular business solution.
Goetz and friends propose not just the traditional solution of close reading (staring at the code until your eyes bleed) but also considering the author’s intent. Literary critics have already gone through the agony of mapping this turf out for us, and so here, casually sampled, are a number of perspectives to bring to a truly intractable piece of code.
- Aesthetic. Was the code produced in a context of certain aesthetic or design norms? Does it consciously reference known design approaches, like particular design patterns?
- Cultural. What was the organisational context in which this library was written? Was it written yesterday? Is it a pre-alpha download off sourceforge, a third party proprietary driver, or a widely used standard library? Are the misspellings in variable names a clue to the location of the author?
- Economic / Marxist. What were the market conditions under which this library was produced? Was it rushed in order to capitalize on perceived opportunities? Was it outsourced to the lowest cost vendor?
- Gender / Sexuality. Has the gender of the writer brought certain perspectives with it? This has been discussed elsewhere, under the label “Girl Code”.
- Post-colonial. Have historically asymmetric power relations between the software stakeholders impacted the design of the code? This might apply to teams spread across globabl locations, one of which would usually be the corporate headquarters.
- User. Who was the code originally developed for? What were their needs?
- Writer. Who wrote the code? What problems would they have to face when designing it? Do you know them? What was their background? Were they a C programmer who converted to Java? Were they a mathematics major? Did they have experience using concurrency?
- Machine. This is the inescapably strict genre standard that a compiler imposes on a program, which you know plenty about already.
These may seem overly abstract questions, and successful organisations will have processes in place to manage them, but they still come up, and when they do they can be particularly awkward. I work on a globally located team at the moment, and one piece of advice I’ve needed is “If you’re having trouble with timing, try setting the timezone to the one the developer lives in”. In this particular case the problem is rarely one as naive as a developer rolling their own date class, rather it is usually an interaction of rogue constants, configuration and location-specific assumptions about the nature of the business day, in code that was originally designed for one deployment in the same location the developer was working in.
There are environments where costly and intensive documentation production and review is supported; let’s put them aside for a moment. Though certainly relevant, low-documentation environments are common and perhaps illustrate the point more clearly. A developer’s relationship to documentation is often like the relationship a Sunday morning drunkard has with his local priest. Similarly those in software who care about documentation can feel like parish priests in a town of drunkards. Documentation can be cast in moral terms: if you document you are in some sense a good, dutiful programmer, and if not you are a free-riding degenerate. It’s easy to fulminate at the craven ill-discipline of programmers, seeing lack of documentation as a moral failing. Perhaps it is: like so many other moral failings it is also routine. Depending on the project you work on, documentation at the class or method level is largely altruistic, because you are mostly benefiting either someone else, or the shadowy presence of your future self. Documentation in this mode, like charities, don’t do very well without a lot of social pressure, and if that pressure is not maintained the contribution winds down to a handful of stoics or temperamental altruists. In these situations, it is not economically rational for developers to document software, so they don’t.
Putting documentation in moral terms is understandable but it’s also an admission of failure. Unless it’s integrated with the process as a whole, or valued as a deliverable by itself, documentation will be awful. For many systems, the accompanying comments and specifications are simply less valuable in dollar terms than the system built around them. This only changes if the API and system is very widely used, or if communication costs become otherwise increased. Otherwise the incompleteness of documentation is infuriatingly sensible and your literary critic goggles will remain valuable.
It’s also worth noting that, for instance, the Java API is of a very high documentation standard when compared with most code found in the wild. The documentation, including the database specification, which is without concurrency documentation, was originally written in a time when awareness of concurrency issues was much less prevalent or relevant than today. Such details were, at least originally, left out in ignorant good faith. Their absence from subsequent revisions may be considered calculated malice.
It is entirely possible that in the future issues could arise of similar retrospective importance. For instance, the Collections framework was added in version 1.2 without describing the algorithmic performance of its components. Similar frameworks such as the C++ Standard Library do provide such guarantees. If single CPU performance has plateaued then knowing the precise performance characteristics of these classes may become even more compelling and the absence of documentation more infuriating and apparently immoral. High performance uses of these libraries already require reading the supplied source code. (As it happens, one of the key developers of the framework was Joshua Bloch, one of the authors of JCIP and an occassional documentation sermonizer himself.)
The connection I draw here with literary criticism is perhaps whimsical but it is also sincere. The vast majority of software development is embedded in organisational and user needs that are inevitably expressed in natural language. We live in a time which is not particularly fast moving technologically, compared to say the Industrial Revolution, but where we do deal with an unprecedented quantity of information, and where all sorts of professionals have to use techniques to evaluate the meaning and relevance of that information. So it makes sense to take techniques and terms from people that analyze texts all the time. The humanities have certainly been taking techniques and terms from scientists and technologists for years, to an extent that may surprise. For instance, pick up a recent publication in academic history and you might hit a passage like this:
However, this cannot be a systematic comparative work with a single set of variables applied to each case. The May Fourth women differed so much from Euro-American feminist activists at the turn of the twentieth century that no single set of variables can be identified. — Wang Zheng, Women in The Chinese Enlightenment
Rather than simple appropriation, this is really representative of an intersection between the concerns of a professional historian (“What happened to women during the May the Fourth movement in China?”) and the professional software developer (“What crack was this guy smoking when he wrote this?”).
The problems of a software developer, when grappling with this domain, can actually be much easier than a historian or literary critic. We usually work with heavy tool support, automated searching, annotations, automated documentation comments (popularised by Java, following Knuth, but only a little way), version control, a single natural language of discourse. Version control alone would make biblical scholars weep with jealousy: they’ve had to painstakingly reconstruct such things by hand. The authors of Java Concurrency In Practice are well aware of these advantages. JCIP, like 1984, ends with an appendix on language. It concerns concurrency contracts stated as annotations, like @ThreadSafe, which are used to good effect throughout the book.
These annotations, through standardisation, make documentation both easier and more expressive. They therefore lower the transaction costs (to the developer) of documentation while increasing its utility, and so should become more prevalent than natural language comments. More sophisticated tools could be built around this, such as automated checks around changes to variables involved in guard conditions. Ultimately though, as a system will run without them, documentation is always vulnerable to being omitted or obsoleted. Java Concurrency In Practice refers to circumstances like this as “an adventure in iterative specification discovery”. There’s plenty of adventures yet to be had.