As intellectually awkward artifacts that open up new capabilities, are surprising, frustrating and costly in other ways, and which regularly confound our physical intuitions about their behaviour, software systems meet an everyday language definition of complexity. A more systematic comparison, presented here, shows a significant family resemblance. Complexity science studies common techniques across a number of fields, and using that framework to analyze software engineering could allow a more precise technical understanding of these software problems.
This isn’t a unique thought. Various approaches, such as David Snowden’s Cynefin framework, have used complexity science as a source for insight on software development. Herbert Simon, in works like “Sciences of the Artificial”, helped build complexity science, with software programs and Good Old Fashioned AI as reference points. Famous papers such as Parnas et al’s “The Modular Structure of Complex Systems” also point the same way. As I was introduced to this material myself, I missed a more recent reference that lined up these features of complex systems with modern software in a brief and systematic way. These notes attempt that in the form of an anatomical sketch.
This note considers software systems of many internal models and at least thousands of lines, rather than shorter programs analysed in formal detail. This places it more under software engineering than formal computer science, without intending any strict break from the latter. Likewise, by default, it addresses consciously engineered software rather than machine learning. This complexity differs from algorithmic processing time complexity as captured by O(x) notation, though there may be interesting formal connections to be explored there too.
Ladyman et al give seven features of complex systems, and I’ve added one more from Crutchfield.
Software exhibits non-linearity in the small and the large. Every ‘if’ condition, implicit or explicit, represents distinct possible outputs. This is most obvious in response to unexpected input or state; error and exit, segmentation fault, stack trace, NullPointerException.
From a use perspective, many software systems are part of a feedback loop, with users and the world, and this feedback can often involve internal software state.
From an engineering perspective, all software systems beyond a trivial size are built in cycles where the current state of a codebase is a rich input into the next cycle of engineering. This is true whether iterative software development methodologies are used or not. For instance, consider bug fixes resulting from a test phase in waterfall.
3. Spontaneous Order
Spontaneous order is not a feature of large software systems. If anything, the usual condition of engineering large software systems is constantly and deliberately working to maintain order against a tendency for these systems to suffer from entropy, or into complicated disorder. The ideas of ‘software crisis’ and ‘technical debt’ are both reactions to lack of perceived order in engineered software.
4. Robustness and lack of central control
In the small, or even at the level of the individual system, software tends to brittleness, as noted above. Robustness, being “stable under perturbations of the system” (Ladyman), must be specifically engineered in by considering a wide variety of inputs and states and testing the system under those conditions. However, certain software ecosystems such as the TCP/IP substrate of the Internet display great robustness. Individual websites go down, but the whole Internet or World Wide Web tends not to. This is related to the choice of a highly distributed architecture based on relatively simple, standard, protocols and design guidelines like Postel’s principle (be tolerant in what you accept and strict in what you send). Like a flock of birds, the lack of central control makes the system tolerant of local failure. High availability systems make use of similar principles of redundancy.
Software systems tend not to exhibit emergent behaviours as highly visible features of the system, in the way say a flock of birds assumes a particular overall shape once each bird follows certain simple rules about their position relative to their neighbour. Certain important non-visible features are emergent. Leveson, in Engineering A Safer World, argues that system safety (including software) is an emergent feature: “Determining whether a plant is acceptably safe is not possible, for example, by examining a single valve in the plant. In fact, statements about the ’safety of the valve’ without information about the context in which that valve is used are meaningless.” Difficult bugs in established software systems are often multi-causal and emerge from systemic interactions between components rather than isolated failure.
Conway’s law, the observation that a software system’s internal component structure mirrors the team structure of the organisation that created it, describes system shape emerging from social structure without explicit causal rules.
6. Hierarchical organisation
Formal models of computation did not originally differentiate between parts of a program; for instance Turing machines or the Church lambda calculus do not even distinguish between programs and data. Many of the advances in software development have by contrast been tools for structuring programs in hierarchies and differing levels of abstraction. A reasonable history of programming could be told simply through differentiated structure, eg:
- Turing machines / Church lambda calculus
- Von Neumann machine separation of program, data, input, output
- MIT Summer Session Computer: named instructions
- Hopper: ALGOL compiler and functions
- Backus: FORTRAN distinguished control structures IF and DO-WHILE
- Parnas: module decomposition through information hiding
- Smalltalk object orientation
- Codd: relational databases
- GoF design patterns
- Beck: xUnit automated unit testing
- Fowler refactoring for improved structure
- Maven systematic library dependency management
Navigating program hierarchy from user interface through domain libraries to system libraries and services is a significant, even dominant, proportion of modern programming work (from personal observation, though a quantified study should be possible).
7. Numerosity (Many more is different)
The techniques for navigating, designing and changing a codebase of hundreds of classes are different than with a short script, at least partly due to the limitations of human memory and attention span. An early recognition of this is Benington’s Production of large computer programs; a more recent one is Feathers’ Working Effectively With Legacy Code. Feathers states: “As the amount of code in a project grows, it gradually surpasses understanding”.
8. Historical information storage
“Structural complexity is the amount of historical information that a system stores” according to Crutchfield. This is relevant for both use- and engineering-time views of software systems.
In use, the amount of state stored by a software system is historical information in this sense. An example might be a hospital patient record database. A subtlety here is suggested measures of complexity based on amounts of information (such as Kolmogorov) tend to specify maximum compression. Simply allocating several blank terabytes of disk isn’t enough. This also covers implicit forms of complexity such as dependencies in code on particular structures in data. Contrast a hospital database alone (just records and basic SQL) and the same database together with software which provides a better user interface and imposes rules on how records may be updated to suit the procedures of the hospital.
Source control changes provide a build-time problem of historic information. In practice, when extending or maintaining a system, classes are rarely replaced wholesale or deleted. New classes are added or existing classes modified to add functionality. The existing code is always an input to the new state of the code for the programmer doing the change, even if existing code was left untouched. Welsh even declared, in a paper of that name, that “Software is history!”
The result, regardless, is increasing historical information in a codebase over time, and therefore complexity.
Conway – How Do Committees Invent? Datamation 1968
Crutchfield – Five Questions on Complexity, Responses
Feathers – Working Effectively With Legacy Code, 2006
Ladyman, Lambert, Wisener – What Is A Complex System?
Leveson – Engineering a Safer World, Chapter 3 p64, 2011
Parnas – On The Criteria To Be Used in Decomposing Systems into Modules, Communications of the ACM, 1972
Parnas, Clements, Weiss – The Modular Structure of Complex Systems, IEEE Transactions on Software Engineering, 1985
Postel – RFC 761 Transmission Control Protocol https://tools.ietf.org/html/rfc761 https://en.wikipedia.org/wiki/Robustness_principle
Simon – Sciences of the Artificial
Snowden and Boone – A Leader’s Framework For Decision Making (Cynefin)
Welsh – Software Is History!