Individuals versus aggregrates

“Winwood Reade is good upon the subject,” said Holmes. “He remarks that, while the individual man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician.”

The Sign of Four by Sir Arthur Conan Doyle (hat-tip MP)

Dysrationalia — if you disagree with me, off with your head

Some quotations from a critique of (an early version of) Stanovich’s theory by Sternberg (1994). Firstly, theory:

“Why do we need a theory? Because we’ve had too many fly-by-night constructs in the abilities business, and we don’t need more of them. We do need serious new constructs — and the way to present them is via a theory […] — and construct validation of that theory.”

What is an irrational belief?

“In the real world, few problems truly lend themselves to the kind of deductive (rational) reasoning we learn in logic classes. The vast majority of problems are inductive, so that arguments can be stronger or weaker, but not logically valid or invalid. I am afraid that Stanovich has fallen into a trap that of labeling people as “dysrational” who have beliefs that he does not accept. And therein lies frightening potential for misuse. And if you disagree with me, off with your head. Here, it’s a joke. Historically, it’s not.”

Reference

Robert J. Sternberg (1994). What If the Construct of Dysrationalia Were an Example of Itself? Educational Researcher, 23, pp. 22-23+27

The rational agent

“… the rational agent is not simply the one who follows the normative canons of logic and probability theory, and neither is she the one who follows adapted heuristics for action choice or ‘somatic markers’. Rather the rational agent is the critically self-aware agent; the one who is aware why she acts, and who modifies her own behaviour according to her self-knowledge. As Karl Popper (1990) wrote, ‘A rationalist is simply someone for whom it is more important to learn than to be proved right’…”

Lambie, J. A. (2008). On the irrationality of emotion and the rationality of awareness. Consciousness and Cognition, 17, 946-971

Prover9 and Mace4

Just found two fantastic programs and a GUI for exploring first-order classical models and also automated proof, Prover9 and Mace4.  There are many other theorem provers and model checkers out there.  This one is special as it comes as a self-contained and easy to use package for Windows and Macs.

There are many impressive examples built in which you can play with.  To start easy, I gave it a syllogism:

all B are A
no B are C

with existential presupposition, which is expressed:

exists x a(x).
exists x b(x).
exists x c(x).
all x (b(x) -> a(x)).
all x (b(x) -> -c(x)).

and asked it to find a model. Out popped a model with two individuals, named 0 and 1:

a(0).
- a(1).

b(0).
- b(1).

- c(0).
c(1).

So individual 0 is an A, a B, but not a C. Individual 1 is not an A, nor a B, but is a C.

Then I requested a counterexample to the conclusion no C are A:

a(0).
a(1).

b(0).
- b(1).

- c(0).
c(1).

The premises are true in this model, but the conclusion is false.

Finally, does the conclusion some A are not C follow from the premises?

2 (exists x b(x)) [assumption].
4 (all x (b(x) -> a(x))) [assumption].
5 (all x (b(x) -> -c(x))) [assumption].
6 (exists x (a(x) & -c(x))) [goal].
7 -a(x) | c(x). [deny(6)].
9 -b(x) | a(x). [clausify(4)].
10 -b(x) | -c(x). [clausify(5)].
11 b(c2). [clausify(2)].
12 c(x) | -b(x). [resolve(7,a,9,b)].
13 -c(c2). [resolve(10,a,11,a)].
16 c(c2). [resolve(12,b,11,a)].
17 $F. [resolve(16,a,13,a)].

Indeed it does. Unfortunately the proofs aren’t very pretty as everything is rewritten in normal forms.  One thing I want to play with is how non-classical logics may be embedded in this system.

Competence vs. performance

It’s all Chomsky’s fault (Chomsky 1965, p. 4):

“We thus make a fundamental distinction between competence (the speaker-hearer’s knowledge of his language) and performance (the actual use of language in concrete situations). […] A record of natural speech will show numerous false starts, deviations from rules, changes of plan in mid-course, and so on. The problem for the linguist, as well as for the child learning the language, is to determine from the data of performance the underlying system of rules that have been mastered by the speaker-hearer and that he puts to use in actual performance.”

So the idea is that people are trying to do C but only manage to do P, because of various constraints. We (children, adults, theorists) see (imperfect) P, and want to infer C. We go to school and go through various rigmaroles to better approximate C. The same distinction is applied in reasoning. Various options: people are irrational (with respect to C); maybe C = P, if we look hard enough to see it. Or bright people have P = C. Or bright people want P = C.

What fascinates me in reasoning is the role played by small groups of experts who produce particular systems of reasoning—logical calculi, probabilistic machinery—along with proofs that they have properties which they argue are reasonable properties to have. Then others come along to use the systems. Hey, this looks like a good logic to know; maybe it’ll help make my arguments better if I use it. Maybe this probability calculus will make it easier to diagnose illness in my patients. And so forth. Then somebody else comes along and decides whether or not we’re consistent with a competence theory’s judgements, or whether we’re interpreting things a different way; whether another competence theory (application thereof) might be more appropriate for a given situation or a different psychological model of the situation.

Easy to get tied up in knots.

References

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT Press.

“Conscious” reasoning

Read this:

For us, however, a key difference is that only conscious reasoning can make use of working memory to hold intermediate conclusions, and accordingly reason in a recursive way (Johnson-Laird, 2006, p. 69): primitive recursion, by definition, calls for a memory of the results of intermediate computations (Hopcroft & Ulmann, 1979). [… example task omitted …] The non-recursive processes of intuition cannot make this inference, but when we deliberate about it consciously, we grasp its validity (Cherubini & Johnson-Laird, 2004). Conscious reasoning therefore has a greater computational power than unconscious reasoning, and so it can on occasion overrule our intuitions.

There’s no evidence that whatever bits of memory intuition uses cannot do recursion.  Hunting through semantic memory structures can be viewed as a recursive process and the process is not (at least always) accessible to consciousness.  Aside from this, you can impose recursion on just about any process you care to analyse, and you can often remove recursion from a process description depending on what primitives are available.  Questioning whether a process “is” or “isn’t” recursive isn’t a healthy activity.  Also the jump from “recursive” to “primitive recursive”, as if they were one and the same, is deeply confusing.  See the Stanford Encyclopaedia of Philosophy for details of other flavours of recursion.

Bucciarelli, M.; Khemlani, S. & Johnson-Laird, P. N. (2008). The psychology of moral reasoning. Judgment and Decision Making, 3, 121-139

A good way to think about truth

“My intention was not to deal with the problem of truth, but with the problem of truth-teller or truth-telling as an activity. By this I mean that, for me, it was not a question of analyzing the internal or external criteria that would enable the Greeks and Romans, or anyone else, to recognize whether a statement or proposition is true or not. At issue for me was rather the attempt to consider truth-telling as a specific activity, or as a role.”

Discourse & Truth, Concluding remarks by Foucault. (Spotted here.)

Success

“This series of lectures on proof-theory is a priori dedicated to mathematicians and computer-scientists, physicists, philosophers and linguists ; and, since we are no longer in the XVI—not to speak of the XVIII—century, it is doomed to failure. […] This being said, plain success is not the only possible goal ; mine might simply be the exposition of a disorder in this apparently well-organised universe, in which logic eventually took its place between two beer mugs and the Reader’s Digest, and does not disturb, no longer disturbs—a sort of fat cat purring on the carpet.”

–Jean-Yves Girard, The Blind Spot

Formal and Applied Practical Reasoning

robot_girl.gif

This design, by Lydia Rivlin, was used for an academic conference poster and book cover. Apparently it caused some controversy. About its design, Lydia says (personal communication, 16 Aug 2006):

“When I had to think of something to illustrate the idea of formal (i.e. mechanical) and applied practical reasoning, this image of a robot chatting up a prostitute sprang straight into my mind. […] I have an idea he is asking her how much she would charge for an oil change – but I could be wrong.”

A Connectionist Computational Model for Epistemic and Temporal Reasoning

Many researchers argue that logics and connectionist systems complement each other nicely. Logics are an expressive formalism for describing knowledge, they expose the common form across a class of content, they often come with pleasant meta-properties (e.g. soundness and completeness), and logic-based learning makes excellent use of knowledge. Connectionist systems are good for data driven learning and they’re fault tolerant, also some would argue that they’re a good candidate for tip-toe-towards-the-brain cognitive models. I thought I’d give d’Avila Garcez and Lamb (2006) a go [A Connectionist Computational Model for Epistemic and Temporal Reasoning, Neural Computation 18:7, 1711-1738].

I’m assuming you know a bit of propositional logic and set theory.

The modal logic bit

There are many modal logics which have properties in common, for instance provability logics, logics of tense, deontic logics. I’ll follow the exposition in the paper. The gist is: take all the usual propositional logic connectives and add the operators □ and ◊. As a first approximation, □P (“box P”) means “it’s necessary that P” and ◊P (“diamond P”) means “it’s possible that P”. Kripke models are used to characterise when a model logic sentence is true. A model, M, is a triple (Ω, R, v), where:

  • Ω is a set of possible worlds.
  • R is a binary relation on Ω, which can be thought of as describing connectivity between possible worlds, so if R(ω,ω’) then world ω’ is reachable from ω. Viewed temporally, the interpretation could be that ω’ comes after ω.
  • v is a lookup table, so v(p), for an atom p, returns the set of worlds where p is true.

Let’s start with an easy rule:

(M, ω) ⊨ p iff ω ∈ v(p), for a propositional atom p

This says that to check whether p is true in ω, you just look it up. Now a recursive rule:

(M, ω) ⊨ A & B iff (M, ω) ⊨ A and (M, ω) ⊨ B

This lifts “&” up to our natural language (classical logic interpretation thereof) notion of “and”, and recurses on A and B. There are similar rules for disjunction and implication. The more interesting rules:

(M, ω) ⊨ □A iff for all ω’ ∈ Ω such that R(ω,ω’), (M, ω’) ⊨ A

(M, ω) ⊨ ◊A iff there is an ω’ ∈ Ω such that R(ω,ω’) and (M, ω’) ⊨ A

The first says that A is necessarily true in world ω if it’s true for all connected worlds. The second says that A is possibly true if there is at least one connected world for which it is true.

A sketch of logic programs and a connectionist implementation

Logic programs are sets of Horn clauses, A1 & A2 & … & An → B, where Ai is a propositional atom or the negation of an atom. Below is a picture of the network that represents the program {B & C & ~D → A, E & F → A, B}.

A network representing a program

The thresholds are configured so that the units in the hidden layer, Ni, are only active when the antecedents are all true, e.g. N1 is only active when B, C, and ~D have the truth value true. The thresholds of the output layer’s units are only active when at least one of the hidden layer connections to them is active. Additionally, the output feeds back to the inputs. The networks do valuation calculations through the magic of backpropagation, but can’t infer new sentences as such, as far as I can tell. To do so would involve growing new nets and some mechanism outside the net interpreting what the new bits mean.

Aside on biological plausibility

Biological plausibility raises its head here. Do the units in this network model – in any way at all – individual neurons in the brain? My gut instinct says, “Absolutely no way”, but perhaps it would be better not even to think this as (a) the units in the model aren’t intended to characterise biological neurons and (b) we can’t test this particular hypothesis. Mike Page has written in favour of localists nets, of which this is an instance [Behavioral and Brain Sciences (2000), 23: 443-467]. Maybe more on that in another post.

Moving to modal logic programs and nets

Modal logic programs are like the vanilla kind, but the literals may have one of the modal operators. There is also a set of connections between the possible worlds, i.e. a specification of the relation, R. The central idea of the translation is to use one network to represent each possible world and then apply an algorithm to wire up the different networks correctly, giving one unified network. Take the following program: {ω1 : r → □q, ω1 : ◊s → r, ω2 : s, ω3 : q → ◊p, R(ω1,ω2), R(ω1,ω3)}. This wires up to:

A network representing a modal logic program

Each input and output neuron can now represent □A, ◊A, A, □~A, ◊~A, or ~A. The individual networks are connected to maintain the properties of the modality operators, for instance □q in ω1 connects to q in ω2 and ω3 since R(ω1, ω2), R(ω1, ω3), so q must be true in these worlds.

The Connectionist Temporal Logic of Knowledge

Much the same as before, except we now have a set of agents, A = {1, …, n}, and a timeline, T, which is the set of naturals, each of which is a possible world but with a temporal interpretation. Take a model M = (T, R1, …, Rn, π). Ri specifies what bits of the timeline agent i has access to, and π(t) gives a set of propositions that are true at time t.

Recall the following definition from before

(M, ω) ⊨ p iff ω ∈ v(p), for a propositional letter p

Its analogue in the temporal logic is

(M, t) ⊨ p iff t ∈ π(p), for a propositional letter p

There are two extra model operators: O, which intuitively means “at the next time step” and K which is the same as □, except for agents. More formally:

(M, t) ⊨ OA iff (M, t+1) ⊨ A

(M, t) ⊨ KA iff for all u ∈ T such that Ri(t,u), (M, u) ⊨ A

Now in the translation we have network for each agent, and a collection of agent networks for each time step, all wired up appropriately.

Pages 1724-1727 give the algorithms for net construction. The proof of soundness of translation relies on d’Aliva Garcez, Broda, and Gabbay (2002), Neural-symbolic learning systems: Foundations and applications.

Some questions I haven’t got around to working out the answers to

  • How can these nets be embedded in a static population coded network. Is there any advantage to doing so?
  • Where is the learning? In a sense it’s the bit that does the computation, but it doesn’t correspond to the usual notion of “learning”.
  • How can the construction of a network be related to what’s going on in the brain? Really I want a more concrete answer to how this could model the psychology. The authors don’t appear to care, in this paper anyway.
  • How can networks shrink again?
  • How can we infer new sentences from the networks?

Comments

I received the following helpful comments from one of the authors, Artur d’Avila Garcez (9 Aug 2006):

I am interested in the localist v distributed discussion and in the issue of biological plausibility; it’s not that we don’t care, but I guess you’re right to say that we don’t “in this paper anyway”. In this paper – and in our previous work – what we do is to say: take standard ANNs (typically the ones you can apply Backpropagation to). What logics can you represent in such ANNs? In this way, learning is a bonus as representation should precede learning.

The above answers you question re. learning. Learning is not the computation, that’s the reasoning part! Learning is the process of changing the connections (initially set by the logic) progressively, according to some set of examples (cases). For this you can apply Backprop to each network in the ensemble. The result is a different set of weights and therefore a different set of rules – after learning if you go back to the computation you should get different results.