Emergence and complexity in social programme evaluation

There’s lots of talk of emergence and complexity in the world of social programme evaluation with little clarity about what the terms mean. I thought I’d explore ideas of complexity outside evaluation where the definitions have been formalised.

One is Kolmogorov complexity:

“the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produces the object as output. It is a measure of the computational resources needed to specify the object.”

For example (mildly edited from the Wikipedia article) compare the following two strings:

abababababababababababababababab
4c1j5b2p0cv4w1x8rx2y39umgw5q85s7

The first string has a short description: “ab 16 times” (11 characters). The second has no description shorter than the text itself (32 characters). So the first string is less complex than the second. (The description of the text or other object would usually be written in a programming language.)

One of the fun things we can do with Kolmogorov complexity is use it to help make sense of emergence – how complex phenomena can emerge at a macro-level from some micro level phenomena in a way that seems difficult to predict from the micro-level.

A prototypical example is how complex patterns emergence from simple rules in Conway’s Game of Life. Game of Life consists of an infinite 2D array of cells. Each cell is either alive or dead. The rules are:

    1. Any ‘on’ cell (at time t-1) with fewer than two ‘on’ neighbours (at t -1) transitions to an ‘off’ state at time t.
    2. Any ‘on’ cell (t -1) with two or three ‘on’ neighbours (t -1) remains ‘on’ at time t.
    3. Any ‘on’ cell (t -1) with more than three ‘on’ neighbours (t -1) transitions to an ‘off’ state at time t
    4. And ‘off’ cell (t -1) with exactly three ‘on’ neighbours (t -1) transitions to an ‘on’ state at time t.

Here’s an example of the complexity that can emerge (from the Wikipedia article on Game of Life):

Looking at the animation above, there’s still an array of cells switching on and off, but simultaneously it looks like there’s some sort of factory of (what are known in the genre as) gliders. The challenge is, how do we define the way this macro-level pattern emerges from the micro-level cells?

Start with Mark Bedau’s (1997, p. 378) definition of a particular kind of emergence known as weak emergence:

Macrostate P of S with microdynamic D is weakly emergent iff P can be derived from D and S‘s external conditions but only by simulation.

This captures the idea that it’s difficult to tell just by inspecting the rules (the microdynamic) that the complex pattern will emerge – you have to setup the rules and run them (whether by computer or using pen and paper) to see. However, Nora Berenstain (2022) points out that this kind of emergence is satisfied by random patternlessness at the macro-level which is generated from but can’t be predicted from the micro-level without simulation. Patternlessness doesn’t seem to be the kind of thing we think of as emerging, argues Berenstain.

Berenstain (2022) adds a condition of algorithmic compressibility – in other words, the Kolmogorov complexity of the macro-level pattern must be smaller than the pattern itself for it to count as emergence. Here’s Berenstain’s combined definition:

“Where system S is composed of micro-level entities having associated micro-states, and where microdynamic D governs the time evolution of S’s microstates, macrostate P of S with microdynamic D is weakly emergent iff P is algorithmically compressible and can be derived from D and S’s external conditions only by simulation.”

Now I wonder what happens if a macrostate is very simple – so simple it cannot be compressed. This is different to incompressibility due to randomness. Also how should we define simulation outside the world of models in reality: does evaluating emergence mean observing a complex social system to see what happens, whilst accepting it’s impossible to predict? This would lead to interesting consequences for evaluating complex social programmes, e.g., how can data dredging be prevented? What should be in a study plan? If you can predict what will emerge then does that mean it’s not emergence…?

References

Bedau, M. (1997). Weak emergence. Philosophical Perspectives, 11, 375–399.

Berenstain, N. (2022). Strengthening weak emergence. Erkenntnis, 87, 2457–2474.

A lovely video about Game of Life, featuring John Conway

Once you’ve watched that, have a play over here.

Death and furniture

Found this paper by Edwards, Ashmore, and Potter (1995) amusing as recently I tapped a table to make a point about different levels of analysis. From the intro:

“When relativists talk about the social construction of reality, truth, cognition, scientific knowledge, technical capacity, social structure, and so on, their realist opponents sooner or later start hitting the furniture, invoking the Holocaust, talking about rocks, guns, killings, human misery, tables and chairs. The force of these objections is to introduce a bottom line, a bedrock of reality that places limits on what may be treated as epistemologically constructed or deconstructible. There are two related kinds of moves: Furniture (tables, rocks, stones, etc. — the reality that cannot be denied), and Death (misery, genocide, poverty, power — the reality that should not be denied). Our aim is to show how these “but surely not this” gestures and arguments work, how they trade off each other, and how unconvincing they are, on examination, as refutations of relativism.”

And the point about levels is made:

“It is surprisingly easy and even reasonable to question the table’s given reality. It does not take long, in looking closer, at wood grain and molecule, before you are no longer looking at a “table”. Indeed, physicists might wish to point out that, at a certain level of analysis, there is nothing at all “solid” there, down at the (most basic?) levels of particles, strings and the contested organization of sub-atomic space. Its solidity is then, ineluctably, a perceptual category, a matter of what tables seem to be like to us, in the scale of human perception and bodily action. Reality takes on an intrinsically human dimension, and the most that can be claimed for it is an ‘experiential realism'”

Reference

Edwards, D., Ashmore, M., & Potter, J. (1995). Death and furniture: The rhetoric, politics and theology of bottom line arguments against relativism, History of the Human Sciences, 8, 25-49.

Personal and sub-personal

Reading Da Silva Neves et al.’s (2002) An empirical test of patterns for nonmonotonic inference [Annals of Mathematics and Art. Intel., 34: 107-130]. Interesting paragraph (p. 110):

… even if we expect human inference to corroborate these properties, we know of no sufficient reason to think that lay reasoners would recognize any rationality postulate as valid, neither that they would conscientiously use them to guide their reasoning.

Then later (p. 111):

… we assume that human inference is constrained by knowledge organisation in memory and that its formal properties emerge from a spreading activation process operating directly on knowledge structures. We make the hypothesis that this spreading activation process is by and large consistent with TP [a set of properties they provide].

This is wonderful stuff, and an example of where the personal/sub-personal distinction recently exposited by Keith Frankish [link updated 2020] would come in handy.

A Connectionist Computational Model for Epistemic and Temporal Reasoning

Many researchers argue that logics and connectionist systems complement each other nicely. Logics are an expressive formalism for describing knowledge, they expose the common form across a class of content, they often come with pleasant meta-properties (e.g. soundness and completeness), and logic-based learning makes excellent use of knowledge. Connectionist systems are good for data driven learning and they’re fault tolerant, also some would argue that they’re a good candidate for tip-toe-towards-the-brain cognitive models. I thought I’d give d’Avila Garcez and Lamb (2006) a go [A Connectionist Computational Model for Epistemic and Temporal Reasoning, Neural Computation 18:7, 1711-1738].

I’m assuming you know a bit of propositional logic and set theory.

The modal logic bit

There are many modal logics which have properties in common, for instance provability logics, logics of tense, deontic logics. I’ll follow the exposition in the paper. The gist is: take all the usual propositional logic connectives and add the operators □ and ◊. As a first approximation, □P (“box P”) means “it’s necessary that P” and ◊P (“diamond P”) means “it’s possible that P”. Kripke models are used to characterise when a model logic sentence is true. A model, M, is a triple (Ω, R, v), where:

  • Ω is a set of possible worlds.
  • R is a binary relation on Ω, which can be thought of as describing connectivity between possible worlds, so if R(ω,ω’) then world ω’ is reachable from ω. Viewed temporally, the interpretation could be that ω’ comes after ω.
  • v is a lookup table, so v(p), for an atom p, returns the set of worlds where p is true.

Let’s start with an easy rule:

(M, ω) ⊨ p iff ω ∈ v(p), for a propositional atom p

This says that to check whether p is true in ω, you just look it up. Now a recursive rule:

(M, ω) ⊨ A & B iff (M, ω) ⊨ A and (M, ω) ⊨ B

This lifts “&” up to our natural language (classical logic interpretation thereof) notion of “and”, and recurses on A and B. There are similar rules for disjunction and implication. The more interesting rules:

(M, ω) ⊨ □A iff for all ω’ ∈ Ω such that R(ω,ω’), (M, ω’) ⊨ A

(M, ω) ⊨ ◊A iff there is an ω’ ∈ Ω such that R(ω,ω’) and (M, ω’) ⊨ A

The first says that A is necessarily true in world ω if it’s true for all connected worlds. The second says that A is possibly true if there is at least one connected world for which it is true.

A sketch of logic programs and a connectionist implementation

Logic programs are sets of Horn clauses, A1 & A2 & … & An → B, where Ai is a propositional atom or the negation of an atom. Below is a picture of the network that represents the program {B & C & ~D → A, E & F → A, B}.

A network representing a program

The thresholds are configured so that the units in the hidden layer, Ni, are only active when the antecedents are all true, e.g. N1 is only active when B, C, and ~D have the truth value true. The thresholds of the output layer’s units are only active when at least one of the hidden layer connections to them is active. Additionally, the output feeds back to the inputs. The networks do valuation calculations through the magic of backpropagation, but can’t infer new sentences as such, as far as I can tell. To do so would involve growing new nets and some mechanism outside the net interpreting what the new bits mean.

Aside on biological plausibility

Biological plausibility raises its head here. Do the units in this network model – in any way at all – individual neurons in the brain? My gut instinct says, “Absolutely no way”, but perhaps it would be better not even to think this as (a) the units in the model aren’t intended to characterise biological neurons and (b) we can’t test this particular hypothesis. Mike Page has written in favour of localists nets, of which this is an instance [Behavioral and Brain Sciences (2000), 23: 443-467]. Maybe more on that in another post.

Moving to modal logic programs and nets

Modal logic programs are like the vanilla kind, but the literals may have one of the modal operators. There is also a set of connections between the possible worlds, i.e. a specification of the relation, R. The central idea of the translation is to use one network to represent each possible world and then apply an algorithm to wire up the different networks correctly, giving one unified network. Take the following program: {ω1 : r → □q, ω1 : ◊s → r, ω2 : s, ω3 : q → ◊p, R(ω1,ω2), R(ω1,ω3)}. This wires up to:

A network representing a modal logic program

Each input and output neuron can now represent □A, ◊A, A, □~A, ◊~A, or ~A. The individual networks are connected to maintain the properties of the modality operators, for instance □q in ω1 connects to q in ω2 and ω3 since R(ω1, ω2), R(ω1, ω3), so q must be true in these worlds.

The Connectionist Temporal Logic of Knowledge

Much the same as before, except we now have a set of agents, A = {1, …, n}, and a timeline, T, which is the set of naturals, each of which is a possible world but with a temporal interpretation. Take a model M = (T, R1, …, Rn, π). Ri specifies what bits of the timeline agent i has access to, and π(t) gives a set of propositions that are true at time t.

Recall the following definition from before

(M, ω) ⊨ p iff ω ∈ v(p), for a propositional letter p

Its analogue in the temporal logic is

(M, t) ⊨ p iff t ∈ π(p), for a propositional letter p

There are two extra model operators: O, which intuitively means “at the next time step” and K which is the same as □, except for agents. More formally:

(M, t) ⊨ OA iff (M, t+1) ⊨ A

(M, t) ⊨ KA iff for all u ∈ T such that Ri(t,u), (M, u) ⊨ A

Now in the translation we have network for each agent, and a collection of agent networks for each time step, all wired up appropriately.

Pages 1724-1727 give the algorithms for net construction. The proof of soundness of translation relies on d’Aliva Garcez, Broda, and Gabbay (2002), Neural-symbolic learning systems: Foundations and applications.

Some questions I haven’t got around to working out the answers to

  • How can these nets be embedded in a static population coded network. Is there any advantage to doing so?
  • Where is the learning? In a sense it’s the bit that does the computation, but it doesn’t correspond to the usual notion of “learning”.
  • How can the construction of a network be related to what’s going on in the brain? Really I want a more concrete answer to how this could model the psychology. The authors don’t appear to care, in this paper anyway.
  • How can networks shrink again?
  • How can we infer new sentences from the networks?

Comments

I received the following helpful comments from one of the authors, Artur d’Avila Garcez (9 Aug 2006):

I am interested in the localist v distributed discussion and in the issue of biological plausibility; it’s not that we don’t care, but I guess you’re right to say that we don’t “in this paper anyway”. In this paper – and in our previous work – what we do is to say: take standard ANNs (typically the ones you can apply Backpropagation to). What logics can you represent in such ANNs? In this way, learning is a bonus as representation should precede learning.

The above answers you question re. learning. Learning is not the computation, that’s the reasoning part! Learning is the process of changing the connections (initially set by the logic) progressively, according to some set of examples (cases). For this you can apply Backprop to each network in the ensemble. The result is a different set of weights and therefore a different set of rules – after learning if you go back to the computation you should get different results.