Kafka As Deterritorializing Stream Function

Sometimes he accompanied her on her errands in the city, where everything had to be carried out in the utmost hurry. Then she would almost run to the next subway station, Karl with her bag in his hand, the journey went by in a flash, as if the train were being carried away without any resistance, already they were getting off, clattering up the stairs instead of waiting for the elevator that was too slow for them, the large squares from which the streets flowed out in a starburst emerged and brought a tumult of streamed lines of traffic from all sides, but Karl and Therese hurried, tightly together, to the different offices, cleaners, warehouses and stores which weren’t easy to contact by telephone in order to make orders or complaints, generally trivial things.

– Kafka, Amerika, Hofman trans.

“No one is better than Kafka at differentiating the two axes of the assemblage and making them function together,” say Deleuze and Guattari, and though they are referring to the Czech writer, it applies quite well to the open source distributed message queue too, even though the quote was written thirty years before its invention. Kafka in either form effects both decoupling of components and a disintegration of content. Models, formats and codes are first broken down, then made available to be reassembled in other ways.

I admit that when I first heard of it, naming an information processing system after a writer famous for depictions of absurd, violent and impenetrable bureaucracy did strike me as bold. It evokes The Departure (Der Aufbruch) as router documentation, or An Imperial Message (Eine kaiserliche Botschaft) as a Service Level Agreement. Perhaps we can think of the system developers as finally addressing Franz’s eloquent, frustrated, bug reports. Jay Kreps, the namer of Apache Kafka and one of the co-creators (with Narkhede and Rao), simply explains that he wanted his high performance message queue to be good at writing, so he named it after a favourite prolific writer. Fortunately complete publication and reading doesn’t also require the process dying of tuberculosis, followed by a decades-long legal case. Even if Kreps did have a deeper correspondence in mind, if I were forced to explain the name all the time, I might just smile and point at my Franz fridge magnet too.

The D/G quote is from the Postulates of Linguistics chapter, concerned with the way meaning is imposed on communicating agents and their intertwining systems.

On a first, horizontal, axis, an assemblage comprises two segments, one of content, one of expression. On the one hand it is a machinic assemblage of bodies, of actions and passions, an intermingling of bodies reacting to one another; on the other hand it is a collective assemblage of enunciation, of acts and statements, of incorporeal transformations attributed to bodies. Then on a vertical axis, the assemblage has both territorial sides, or reterritorialized sides, which stabilize it, and cutting edges of deterritorialization, which carry it away. No one is better than Kafka at differentiating the two axes of the assemblage and making them function together.

– Deleuze and Guattari, November 30, 1923: Postulates of Linguistics, A Thousand Plateaus

Deleuze and Guattari are secret pomo management consultants at heart, and as their dutiful intern I have accordingly expressed the great men’s vision as an Ansoff Matrix slide for distribution to valued stakeholders.

Compare this Jay Kreps slide from Strange Loop 2015, itself entirely representative of a million whiteboard sketches accompanying middleware everywhere:

For middleware, what D/G call the cutting edges of deterritorialization, we might call a payload codec. The data structure used within the producing process is disassembled, scrambled into a bucket of bytes, then carried away along a line of flight – in this case a Kafka topic. A Kafka topic is a transactional log, a persistent multi-reader queue where the removal policy is decoupled from reader delivery, and retention is instead controlled by time or storage space windows. (Blockchains are public transaction logs optimised for distributed consensus and no retention limit. Hence their inherent parliamentary slowness.) The consumer then uses its own codec to reterritorialize the data – making it intelligible according to its own data model, and within its own process boundary. Though nowadays the class signature of the messages may match (say both sides use the JVM and import the definition from the same library), at a bare minimum the relationship of those messages to other objects and functions within the process differs.

In Anti-Oedipus, D/G call reading a text “productive use of the literary machine”, and it’s along those lines that the quote continues:

No one is better than Kafka at differentiating the two axes of the assemblage and making them function together. On the one hand, the ship-machine, the hotel-machine, the circus-machine, the castle-machine, the court-machine, each with its own intermingled pieces, gears, processes, and bodies contained in one another or bursting out of containment (see the head bursting through the roof). On the other hand, the regime of signs or of enunciation: each regime with its incorporeal transformation, acts, death sentences and judgements, proceedings, “law”. […] On the second axis, what is compared or combined of the two aspects, what always inserts one into the other, are the sequenced or conjugated degrees of deterritorialization, and the operations of reterritorialization that stabilize the aggregate at a given moment. K., the K.-function, designates the line of flight or deterritorialization that carries away all of the assemblages but also undergoes all kinds of reterritorializations and redundancies – redundancies of childhood, village life, bureaucracy, etc.

 – Deleuze and Guattari, ATP ibid.

Cataloguing these correspondences between D/G’s description of Kafka and the software that bears his name is not intended to ignore that their contact is a kind of iconographic car accident. Kafka is a famous, compelling writer, and frequent cultural reference point, after all. The collision of names reveals structural similarities that are usually hidden.

In the Kreps talk above, titled in Deleuzian fashion “Apache Kafka and the Next 700 Stream Processing Systems”, he also introduces a stream processing API to unify the treatment of streams and tables. The team saw this as crucial to Kafka’s identity as a streaming platform rather than just a queue, and delayed calling Kafka 1.0 for years, until this component was ready. Nomadic messages escape through the smooth stream space, before capture and transformation in striated tablespace as rows in data warehouses.

First version of Tetris.

This unification of batches and streams echoes a similar call in computational theory by Eberbach. Turing computation is built around batches. Data is available in complete form at input on the Turing machine tape, then the program runs, and if it terminates, a complete output is available on the same tape. The theory of computability and complexity are built around this same encapsulated box of space and time. Much of what computers do in 2018 is actually continual computation – the reacting to events or processing streams of data that have no semantically tied termination point. That is, though the process may terminate, that isn’t particularly relevant to any analysis of computational complexity or performance we want to do. When editing a document on a computer, you care about the responsiveness keystroke to keystroke, not the entire time editing the document as if it were one giant text batch.  

By contrast, modern computing systems process infinite streams of dynamically generated input requests. They are expected to continue computing indefinitely without halting. Finally, their behavior is history-dependent, with the output determined both by the current input and the system’s computation history.

 – Eberbach, Goldin and Wegner – Turing’s Ideas and Models of Computation

Streams are a computational model of continuation, and therefore infinity. In their wide-ranging 2004 paper, Eberbach and friends go on to argue for models of Super-Turing computation. This includes alternative theoretical models such as the π-calculus and the $-calculus, new programming languages, and hardware architectures.

We conjecture that a single computer based on a parallel architecture is the wrong approach altogether. The emerging field of network computing, where many autonomous small processes (or sensors) form a self-configured network that acts as a single distributed computing entity, has the promise of delivering what earlier supercomputers could not.

 – Eberbach et al, ibid.

These systems are now coming into existence. Through co-ordination with distributed registries (Zookeeper here), and with the improved deployment and configuration baseline devops techs have brought in, spinning up new nodes or failing over existing nodes is autonomously self-configured. Complete autonomy isn’t there, but it seems a high and not particularly desirable bar for many systems. Distributed systems and streams predate Apache Kafka, nor shall it be the last one. Yet is marks a moment where separate solutions for managing streams, tables and failover are concretized in a single technical object, a toolbox for the streaming infrastructure of infinity.

References

CCRU – Ccru Writings 1997-2003
Cremin – Exploring Videogames with Deleuze and Guattari: Towards an affective theory of form
Deleuze and Guattari – Anti-Oedipus
Deleuze and Guattari – A Thousand Plateaus
Eberbach, Goldin and Wegner – Turing’s Ideas and Models of Computation
Kafka, Hofman trans. – Amerika (The Man Who Disappeared)
Kreps, Narkhede and Rao – Kafka – A Distributed Messaging System For Log Processing
Kreps – Apache Kafka and the Next 700 Stream Processing Systems (talk)
https://youtu.be/9RMOc0SwRro
Narkhede – Apache Kafka Goes 1.0 https://www.confluent.io/blog/apache-kafka-goes-1-0/
Pajitnov – Tetris (game)
Stopford – The Data Dichotomy: Rethinking the Way We Treat Data and Services https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/
Thereska – Unifying Stream Processing and Interactive Queries in Apache Kafka https://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/

Advertisements

The Bureaucracy of Automatons

An introduction to the notes on Confucian Software.

Software and the Sage

Among the many dissimilarities between software and gentlemen of the classical Chinese Spring and Autumn Period, two in particular stand out. One existed in a pre-scientific feudal society on an agricultural technological and economic base, and the other presupposes the scientific method and a modern (or post-modern) industrial base. Secondly, the concept of virtue or potency (德) is central to The Analects, but software artifacts are, in our day and age, non-sentient. Morality requires some degree of self-awareness – of consciousness – and so software does not itself practice virtue any more than a spoon or a lawnmower.

The immediate relevance, for a developer, of the Analects, are the two other grand concerns of Confucius, which are existential fundaments of software. These are names (名), and the rites (礼).

Continue reading

XIII.3 Name Oriented Software Development

子路日:卫君待子而为政,子将奚先.子日:必也正名乎. 子路日:有是哉!子之迂也.奚其正.子日:野哉由也.君子于其所不知.蓋阙如也.名不正,则言不顺.言不顺,则事不成.事不成,则礼乐不兴.礼乐不兴,则刑罚不中.刑罚不中,则民无所措手足.故君子名知必可言也.言之必可行也.君子于其言,无所苟而已矣. — 论语 十三:三

Tzu-lu said, ‘If the Lord of Wei left the administration (cheng) of his state to you, what would you put first?’ The Master said, ‘If something has to be put first, it is, perhaps, the rectification (cheng) of names.’ Tzu-lu said, ‘Is that so? What a roundabout way you take! Why bring rectification in at all?’ The Master said, ‘Yu, how boorish you are. Where a gentleman is ignorant, one would expect him not to offer any opinion. When names are not correct, what is said will not sound reasonable; when what is said does not sound reasonable, affairs will not culminate in success; when affairs do not culminate in success, rites and music will not flourish; when rites and music do not flourish, punishments will not fit the crimes; when punishments do not fit the crimes, the common people will not know where to put hand and foot. Thus when the gentleman names something, the name is sure to be usable in speech, and when he says something this is sure to be practicable. The thing about the gentleman is that he is anything but casual where speech is concerned.’ — Analects XIII.3 (Lau)

If something has to be put first in programming, it is, perhaps, the rectification of names. Names are what place the system and the not-system in the same reality. They are the bridge between the externalized machine without use and the internalized emotion without expression.

What in Confucian philsophy we call the rectification of names we can in Confucian software call Name Oriented Software Development. This does not yet exist, but it can be defined simply. Name Oriented Software Development uses a toolset that promotes the continuous rectification of names across all the interstices between natural languages and machine languages in the system.

The nominalist toolset should span package, class, variable and method names. It should span library and project names. It should cover layers of code around the core system, including message protocol definitions and protocol dictionaries, configuration entities, unit and performance tests, and run scripts. The aim in managing machine facing names is to enforce consistency and coherence of names while making precision in naming, including type restriction, easy. The same name appearing in different machine-facing contexts, eg a message protocol field in a script and a Java class, should be linked in an automated and machine verified way. This might at first seem to make the internal technical dialect (namespace, one could say, if it were not taken) too rigid. This is not the intent, and if we look at the rename variable refactoring, not its effect in smaller scopes. It is because we can execute refactorings in a reliable, automatic way that it becomes viable as a low-risk change. This is the same across the entire internal technical dialect – enforced formal consistency allows mutability to be low risk.

By contrast, the aim in managing people facing names is to allow a consensus jargon to emerge backed by a literature of interaction and aspiration which is still moored to the technical reality of the system as it exists. This covers specifications, use cases, test documents (even and especially automated acceptance tests), application messages or labels for users or external parties, internationalization, log messages (ie, a UI for support) support and developer faqs, and user manuals and documentation. Changes to this shared understanding should be reflected as immediately as possible and not tied to a long software release cycle. A label, for example, is essentially a piece of simple key-mapped static data; changes to it can therefore be routine and implicit. When treating this sort of text as static data, it is essential to keep it automatically linked to the running widgets themselves – otherwise it is merely the domain of style guides and moral exhortation. A true dialect is owned by a community – a folksonomy – so the entire community must be able to use and contribute to it. The editors for these documents should be as broad as possible within organisational constraints. Where editing constraints exist (organisational not technical), annotation on these documents should be easy. Cross referencing within the docset and outside it to external sources of expertise (for instance on domain jargon) should also be easy to add and maintain.

Or, via Wittgenstein, Tractatus 7: Whereof one cannot speak clearly, one must damn well link to a wikipedia entry.

This approach owes more than a little to Knuth’s Literate Programming. A distinction is that instead of putting documentation with source code in a single artifact to be owned by a single philosopher-developer auteur, it puts editable, often executable content in the hands of a community of expertise.

Historically, the Confucians became increasingly sophisticated in their theory of names. I know of no record of Confucius addressing names changing over time, apart from a general openness to political reform (Analects IX.3). Indeed, as old Kongzi was often conservative, the quote above might reasonably be taken to advocate reverting to old names now in disuse. Xun Zi (荀子), who lived in the century after Confucius and was one of his major intellectual heirs, saw fit to reuse the Mohist (墨家) theory of names.

单足以喻则单,单不足以喻则兼; […] 名无固宜,约之以命,约定俗成谓之宜,异于约则谓之不宜。名无固实,约之以命实,约定俗成,谓之实名。名有固善,径易而不拂,谓之善名 — 荀子 – 正名 6-8

If a single name is enough to communicate, make it single; if not, combine. […] Names have no inherent appropriateness, we name by convention; when the convention is fixed and the custom established, we call them appropriate, and what differs from the convention we call inappropriate. No object belongs inherently to a name, we name by convention, and when the convention is fixed and the custom established, we call it the object’s name. Names do have inherent goodness; when straightforward, easy and not inconsistent, we call them good names. — Xun Zi – Rectification of Names 6-8 , AC Graham translation

AC Graham’s use of the word “combine” (兼) above is in the sense of “compound”. This is the term the Stanford Encyclopedia of Philosophy uses in preference. That entry points out Xunzi’s concern with names is also in rebuttal to the paradoxes of the Chinese sophists. Furthermore, Xunzi is rather more open to mutability and the pragmatic construction of language from a vernacular folksonomy. (The term 俗成, translated as convention, includes 俗 which now spans meanings including custom and vulgar.)

Computer science launched itself out of the western analytical tradition, though. A sometime description and criticism of Object Oriented design is that it is reheated Platonism. See, e.g., blogs by Vlad Tarko and Richard Farrar. ((Should the frequent light analogical treatment of Platonism and software sound a note of alarm for this very project? Perhaps. But there is heavier academic firepower behind us as well. Are we not grappling with code, which Berry and Pawlik call the defining discourse of our postmodernity? In an article that has more code metaphors than a house full of crack-addled Java-monkeys at teatime.))

The platonic analogy has legs. We do construct a parallel world of sorts in code. It is also revealing to think of classes as ideal representations of some external physical element. Isn’t that why we often fail as programmers? We critique the world for being less perfect than our ideal programs, for failing to match up to their strict conditions, for making them ugly, for crashing them. Code, and OO, can have a kind of brittleness that happens when we stop thinking of software as a model of the world and start thinking of it as the true world. We fit our shape to the name.

My suggestion is not to stop doing taxonomy – we could not navigate the world and construct alternative worlds out of software without it. We couldn’t even speak without it. Instead, we should use Xunzi’s advice and fit our names to the shape. We name by convention. We continually rectify names. If a single name is enough to communicate, make it single. If not, combine.

‘RenameClass’ is the most powerful refactoring. — Michael Feathers