On a relevance criterion

Logicians study logics – plural. There are different logics for different reasoning tasks. Classical logic, the flavour taught to undergraduate students of all persuasions, falls apart when confronted with the kinds of reasoning that people do effortlessly every day. My favourite way to break classical logic involves an innocent “if” and “or”.

Ponder the following sentence (based on an example by Alf Ross, 1944):

If Alex posted the letter [P], then Alex posted the letter [P] or Alex set fire to the letter [F].

If you think this sentence is true, then your interpretation and reasoning are compatible with translating it into classical logic using the material conditional (\(\Rightarrow\)) for the “if” and inclusive disjunction (\(\lor\)) for the “or”. You could write it like this and it’s trivially true: \(P \Rightarrow (P \lor F)\).

Some people are perfectly content with this interpretation, but many think the sentence is fishy and false.  There are a number of ways to explain what has happened.

One is to assume that the issue is language pragmatics rather than logic. Pragmatics studies the ways in which context and social conventions for communication affect people’s interpretation of language. According to one theory of communication (see Liza Verhoeven’s 2007 explanation), asserting that you posted the letter or burned it under the assumption that you posted it violates principles of cooperativeness. These principles affect the meaning of a sentence and its truth, so in this case the sentence is false.

Another way to make sense of what has gone wrong is using a relevance criterion devised by Gerhard Schurz (1991). The first step we need to take is to transform the “if” into an argument with a single premise and conclusion.

Premise: Alex posted the letter [P].
Conclusion: Alex posted the letter [P] or Alex set fire to the letter [F].

This is an uncontroversial step in classical logic, e.g., application of a rule for introducing an “if” in natural deduction.

Schurz introduces a criterion for a conclusion relevance that roughly goes as follows. The starting point is an argument that is valid according to classical logic. That’s the case for the argument above. If there are any terms in the conclusion that can be substituted with arbitrary alternatives without affecting the argument’s validity, then the conclusion is irrelevant. Otherwise the conclusion is relevant.

For our letter example, we can replace “Alex set fire to the letter” with anything and it has no effect on the validity of the argument. Alex opened the letter. Alex scribbled on the letter. Alex swallowed the letter. The letter was a surrealist painting. The letter was the size of house. And so on. No substitution in the second half of the conclusion can affect the validity of the argument, so the conclusion is irrelevant.

How about an argument where the conclusion is relevant? The trick is to ensure that everything in the conclusion is… relevant. That’s what I like about the criterion: it formalises (and the details are fiddly) an intuitive property of arguments. Here’s an easy example:

Premise: It’s raining and I left my umbrella at home
Conclusion: I left my umbrella at home and it’s raining

This is an example of the conjunction, “and”, being commutative in classical logic: the order of the conjuncts in the sentence (the parts on either side of “and”) doesn’t affect its truth. There are many ways to edit the conclusion so that the argument is no longer valid. For instance replace one or both of the conjuncts with “I posted a letter”. Then the conclusion doesn’t follow from the premise since the premise doesn’t tell us anything about a letter.

Colleagues and I explored people’s interpretations of these kinds of sentence about a decade ago in the context of an alleged paradigm shift in the psychology of reasoning. Read all about it. I was reminded of this again as Google Scholar dutifully notified me that Michał Sikorski recently cited it (thank you kindy Michał!).

Drawing an is-ought

Hume’s (1739) Treatise famously argued that we cannot infer an “ought” from an “is”. This has presented an enduring problem for science: how should we produce a set of recommendations for what should be done following the results of a study? If a new cancer treatment dramatically improves remission rates, should study authors simply shrug, present the results, and leave the recommendations to politicians? What if a treatment causes significant harms – can we recommend that the treatment be banned? Or suppose we have ideas for future studies that should be carried out and want to summarise them in the conclusions…? Even doing this would be ruled out by Hume.

The solution, if it is one, is that any recommendations require a set of premises stating our values. These values necessarily assert something beyond the evidence, for instance that if a treatment is effective then it should be provided by the health service. In practice, such values are often left implicit and assumed to be shared with readers. But there are interesting examples where it is apparently possible to draw an is-ought inference without assuming values.

One example, due to Mavrodes (1964), begins with the premise

If we ought to do A, then it is possible to do A.

This seems reasonable enough. It would, for instance, be horribly dystopian to require that people behave a particular way if it were impossible for them to do so. Games like chess and tennis have rules that are possible – if they were impossible then it would make playing the games challenging. Let’s see what happens if we apply a little logic to this premise.

Sentences of the form

If A, then B

are equivalent to those of the contrapositive form

If not-B, then not-A

This can be seen in the truth table below, where 1 denotes true and 0 denotes false. The values of the last two columns are equivalent:

A B not-A not-B If A, then B If not-B, then not-A
1 1 0 0 1 1
1 0 0 1 0 0
0 1 1 0 1 1
0 0 1 1 1 1

Together, this means that if we accept the premise

If we ought to do A, then it is possible to do A,

and the rules of classical logic, we must also accept

If it is not possible to do A, then it is not the case that we ought to do A.

But here we have an antecedent that is an “is” and a consequent that is an “ought”: logic has licenced an is-ought!

Worry not: there has been debate in the literature… See Gillian Russell (2021) for a recent analysis.

References

Mavrodes, G. I. (1964). “Is” and “Ought.” Analysis, 25(2), 42–44.

Russell, G. (2021). How to Prove Hume’s Law. Journal of Philosophical Logic. In press.

A psychoanalyst walks into a bar(red subject)

A psychoanalyst walks into a bar with a book on logic and set theory. He orders a whisky. And another. Twelve hours and a lock-in later, all he has to show for the evening is a throbbing headache and some indecipherable rubbish scrawled on a napkin.

That’s the only conceivable explanation for these diagrams from The Subversion of the Subject and the Dialectic of Desire in the Freudian Unconscious, by Jacques Lacan (published in the Écrits collection):

But, surely this notation means something? After all, Lacan is famous and academics across the world dedicate their lives to understanding his genius.

Also the notion  f(x) is a function, f, applied to argument x – that’s recognisable from maths. So the I(A) and s(A) must mean something…?

To see how function notation is used, consider the Fibonacci sequence, which pops up in all kinds of interesting places in nature. It is defined as follows:

f(0) = 0,
f(1) = 1,
f(n) = f(n-1) + f(n-2), for n > 1.

In English, this says that the first two numbers in the sequence are 0 and 1. The numbers following are obtained by summing the previous two, so the sequence goes: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, … The function notation “does something”. It provides a way of defining and referring to (here, mathematical) concepts.

Less well-known, but appearing in university philosophy courses, is the lozenge symbol, ◊, which means “possible” in a particular kind of logic called modal logic. It seems plausible that there is something meaningful here in Lacan’s use of the symbol too.

Here is Lacan, “explaining” his notation for non-mathematicians:

Huh?

Lacan doesn’t try to explain what the notion means; he doesn’t seem to want readers to understand. Maybe he is just too clever and if only we persevered we would get what he means. However, elsewhere in the same text Lacan uses arithmetic to argue that “the erectile organ can be equated with √(-1)”. I’m told this is a joke because √(-1) is an imaginary number. Maybe trainee psychoanalysts learn about complex numbers so get the joke (I doubt it though)? Maybe all Lacanian discourse is dadaist performance.

Alan Sokal and Jean Bricmont have written a book-length critique of Lacan’s maths and others’ similar use of natural science concepts. Having read lots of mathematical texts and seen how authors make an effort to introduce their notation, I think it’s entirely possible Lacan is a fraud, ◊(Lacan is a fraud). That might sound harsh, but forget how famous he is and just look at the pretentious rubbish he writes.

Prover9 and Mace4

Just found two fantastic programs and a GUI for exploring first-order classical models and also automated proof, Prover9 and Mace4.  There are many other theorem provers and model checkers out there.  This one is special as it comes as a self-contained and easy to use package for Windows and Macs.

There are many impressive examples built in which you can play with.  To start easy, I gave it a syllogism:

all B are A
no B are C

with existential presupposition, which is expressed:

exists x a(x).
exists x b(x).
exists x c(x).
all x (b(x) -> a(x)).
all x (b(x) -> -c(x)).

and asked it to find a model. Out popped a model with two individuals, named 0 and 1:

a(0).
- a(1).

b(0).
- b(1).

- c(0).
c(1).

So individual 0 is an A, a B, but not a C. Individual 1 is not an A, nor a B, but is a C.

Then I requested a counterexample to the conclusion no C are A:

a(0).
a(1).

b(0).
- b(1).

- c(0).
c(1).

The premises are true in this model, but the conclusion is false.

Finally, does the conclusion some A are not C follow from the premises?

2 (exists x b(x)) [assumption].
4 (all x (b(x) -> a(x))) [assumption].
5 (all x (b(x) -> -c(x))) [assumption].
6 (exists x (a(x) & -c(x))) [goal].
7 -a(x) | c(x). [deny(6)].
9 -b(x) | a(x). [clausify(4)].
10 -b(x) | -c(x). [clausify(5)].
11 b(c2). [clausify(2)].
12 c(x) | -b(x). [resolve(7,a,9,b)].
13 -c(c2). [resolve(10,a,11,a)].
16 c(c2). [resolve(12,b,11,a)].
17 $F. [resolve(16,a,13,a)].

Indeed it does. Unfortunately the proofs aren’t very pretty as everything is rewritten in normal forms.  One thing I want to play with is how non-classical logics may be embedded in this system.

It’s funny how the same names keep popping up…

I first heard of Per Martin-Löf through his work in intuitionist logic, which turned out to be important in computer science (see Nordström, Petersson, and Smith, 1990).  His name has popped up again (Martin-Löf, 1973), this time in the context of his conditional likelihood ratio test, apparently used by Item Response Theory folk to assess whether two groups of items test the same ability (see Wainer et al, 1980).  Small world.

References

Martin-Löf, P. (1973). Statistiska modeller. Anteckningar fran seminarier lasaret 1969–1970 utarbetade av rolf sundberg. Obetydligt ändrat nytryck, october 1973 (photocopied manuscript). Institutet för Säkringsmatematik och Matematisk Statistik vid Stockholms Universitet.

Bengt Nordström, Kent Petersson, and Jan M. Smith. (1990). Programming in Martin-Löf’s Type Theory. Oxford University Press.

Howard Wainer, Anne Morgan and Jan-Eric Gustafsson (1980).  A Review of Estimation Procedures for the Rasch Model with an Eye toward Longish Tests.  Journal of Educational Statistics, 5, 35-64

“Semantics”

“… there can hardly be any question that what ‘semantics’ conveyed and conveys to the mind of the general reader is a theory of meaning, which Tarski’s theory most emphatically was not. By calling his theory ‘semantics,’ Tarski opened the door to endless misunderstandings on this point. There has been significant damage to logic arising from such misunderstandings, from confusion of model theory or ‘semantics’ improperly so-called with meaning theory or ‘semantics’ properly so-called.”
—From Tarski’s Tort by John P. Burgess

Science for the half-wits

A bit from Jean Yves-Girard‘s latest rant, The phantom of transparency:

Still under the heading « science for the half-wits », let us mention non monotonic « logics ». They belong in our discussion because of the fantasy of completeness, i.e., of the answer to all questions. Here, the slogan is what is not provable is false : one thus seeks a completion by adding unprovable statements. Every person with a minimum of logical culture knows that this completion (that would yield transparency) is fundamentally impossible, because of the undecidability of the halting problem, in other terms, of incompleteness, which has been rightly named : it denotes, not a want with respect to a preexisiting totality, but the fundamentally incomplete nature of the cognitive process.

Success

“This series of lectures on proof-theory is a priori dedicated to mathematicians and computer-scientists, physicists, philosophers and linguists ; and, since we are no longer in the XVI—not to speak of the XVIII—century, it is doomed to failure. […] This being said, plain success is not the only possible goal ; mine might simply be the exposition of a disorder in this apparently well-organised universe, in which logic eventually took its place between two beer mugs and the Reader’s Digest, and does not disturb, no longer disturbs—a sort of fat cat purring on the carpet.”

–Jean-Yves Girard, The Blind Spot

Logic and Reasoning: do the facts matter?

Had a read of Logic and reasoning: do the facts matter? by Johan van Benthem. Covers much ground in a short space and I found it thought provoking. Here’s a quick sketch of the bits I liked.

Van Benthem mentions the anti-psychologism stance, briefly the idea that human practice cannot tell us what correct reasoning is. He contrasts Frege’s view with that of Wundt; the latter, he argues, was too close to practice; Frege was too far. He argues that if logics were totally inconsistent with real practice then they’d be useless.

Much logic is about going beyond what classical logic has to offer and is driven by real language use. Van Bentham cites Prior’s work on temporal structure, Lewis and Stalnaker’s work on comparative orderings of worlds, work on generalised quantifiers which was driven by the mess of real language and for instance produced formalisations of quantifiers like most and few. Generally, van Bentham argues, “one needs to move closer to the goal of providing more direct and faithful mathematical renderings of what seem to be stable reasoning practices.” You want your logic to be more natural, closer to the phenomena. Conceptions of mathematical logic were driven by the terms that appeared in rigorous proofs, so the linguistic stuff is just widening the set of practices that are modelled.

Correctability in a logic is more important than correctness, he argues. This is consistent with the goals of the non-monotonic logic crowd I know and love. I find this most interesting when looking at individual differences in reasoning processes: perhaps a correctability dimension is out there somewhere, if only we could measure it and its correlates.

Divergences from competence criteria, he argues, suggest new practices. I still see many papers in which people are scored against classical logic. Failure should cause an attempt to work out what practice is being followed by a person rather than the more common concern of what went wrong and how we could bring people back.

Much more in this little paper…

A Connectionist Computational Model for Epistemic and Temporal Reasoning

Many researchers argue that logics and connectionist systems complement each other nicely. Logics are an expressive formalism for describing knowledge, they expose the common form across a class of content, they often come with pleasant meta-properties (e.g. soundness and completeness), and logic-based learning makes excellent use of knowledge. Connectionist systems are good for data driven learning and they’re fault tolerant, also some would argue that they’re a good candidate for tip-toe-towards-the-brain cognitive models. I thought I’d give d’Avila Garcez and Lamb (2006) a go [A Connectionist Computational Model for Epistemic and Temporal Reasoning, Neural Computation 18:7, 1711-1738].

I’m assuming you know a bit of propositional logic and set theory.

The modal logic bit

There are many modal logics which have properties in common, for instance provability logics, logics of tense, deontic logics. I’ll follow the exposition in the paper. The gist is: take all the usual propositional logic connectives and add the operators □ and ◊. As a first approximation, □P (“box P”) means “it’s necessary that P” and ◊P (“diamond P”) means “it’s possible that P”. Kripke models are used to characterise when a model logic sentence is true. A model, M, is a triple (Ω, R, v), where:

  • Ω is a set of possible worlds.
  • R is a binary relation on Ω, which can be thought of as describing connectivity between possible worlds, so if R(ω,ω’) then world ω’ is reachable from ω. Viewed temporally, the interpretation could be that ω’ comes after ω.
  • v is a lookup table, so v(p), for an atom p, returns the set of worlds where p is true.

Let’s start with an easy rule:

(M, ω) ⊨ p iff ω ∈ v(p), for a propositional atom p

This says that to check whether p is true in ω, you just look it up. Now a recursive rule:

(M, ω) ⊨ A & B iff (M, ω) ⊨ A and (M, ω) ⊨ B

This lifts “&” up to our natural language (classical logic interpretation thereof) notion of “and”, and recurses on A and B. There are similar rules for disjunction and implication. The more interesting rules:

(M, ω) ⊨ □A iff for all ω’ ∈ Ω such that R(ω,ω’), (M, ω’) ⊨ A

(M, ω) ⊨ ◊A iff there is an ω’ ∈ Ω such that R(ω,ω’) and (M, ω’) ⊨ A

The first says that A is necessarily true in world ω if it’s true for all connected worlds. The second says that A is possibly true if there is at least one connected world for which it is true.

A sketch of logic programs and a connectionist implementation

Logic programs are sets of Horn clauses, A1 & A2 & … & An → B, where Ai is a propositional atom or the negation of an atom. Below is a picture of the network that represents the program {B & C & ~D → A, E & F → A, B}.

A network representing a program

The thresholds are configured so that the units in the hidden layer, Ni, are only active when the antecedents are all true, e.g. N1 is only active when B, C, and ~D have the truth value true. The thresholds of the output layer’s units are only active when at least one of the hidden layer connections to them is active. Additionally, the output feeds back to the inputs. The networks do valuation calculations through the magic of backpropagation, but can’t infer new sentences as such, as far as I can tell. To do so would involve growing new nets and some mechanism outside the net interpreting what the new bits mean.

Aside on biological plausibility

Biological plausibility raises its head here. Do the units in this network model – in any way at all – individual neurons in the brain? My gut instinct says, “Absolutely no way”, but perhaps it would be better not even to think this as (a) the units in the model aren’t intended to characterise biological neurons and (b) we can’t test this particular hypothesis. Mike Page has written in favour of localists nets, of which this is an instance [Behavioral and Brain Sciences (2000), 23: 443-467]. Maybe more on that in another post.

Moving to modal logic programs and nets

Modal logic programs are like the vanilla kind, but the literals may have one of the modal operators. There is also a set of connections between the possible worlds, i.e. a specification of the relation, R. The central idea of the translation is to use one network to represent each possible world and then apply an algorithm to wire up the different networks correctly, giving one unified network. Take the following program: {ω1 : r → □q, ω1 : ◊s → r, ω2 : s, ω3 : q → ◊p, R(ω1,ω2), R(ω1,ω3)}. This wires up to:

A network representing a modal logic program

Each input and output neuron can now represent □A, ◊A, A, □~A, ◊~A, or ~A. The individual networks are connected to maintain the properties of the modality operators, for instance □q in ω1 connects to q in ω2 and ω3 since R(ω1, ω2), R(ω1, ω3), so q must be true in these worlds.

The Connectionist Temporal Logic of Knowledge

Much the same as before, except we now have a set of agents, A = {1, …, n}, and a timeline, T, which is the set of naturals, each of which is a possible world but with a temporal interpretation. Take a model M = (T, R1, …, Rn, π). Ri specifies what bits of the timeline agent i has access to, and π(t) gives a set of propositions that are true at time t.

Recall the following definition from before

(M, ω) ⊨ p iff ω ∈ v(p), for a propositional letter p

Its analogue in the temporal logic is

(M, t) ⊨ p iff t ∈ π(p), for a propositional letter p

There are two extra model operators: O, which intuitively means “at the next time step” and K which is the same as □, except for agents. More formally:

(M, t) ⊨ OA iff (M, t+1) ⊨ A

(M, t) ⊨ KA iff for all u ∈ T such that Ri(t,u), (M, u) ⊨ A

Now in the translation we have network for each agent, and a collection of agent networks for each time step, all wired up appropriately.

Pages 1724-1727 give the algorithms for net construction. The proof of soundness of translation relies on d’Aliva Garcez, Broda, and Gabbay (2002), Neural-symbolic learning systems: Foundations and applications.

Some questions I haven’t got around to working out the answers to

  • How can these nets be embedded in a static population coded network. Is there any advantage to doing so?
  • Where is the learning? In a sense it’s the bit that does the computation, but it doesn’t correspond to the usual notion of “learning”.
  • How can the construction of a network be related to what’s going on in the brain? Really I want a more concrete answer to how this could model the psychology. The authors don’t appear to care, in this paper anyway.
  • How can networks shrink again?
  • How can we infer new sentences from the networks?

Comments

I received the following helpful comments from one of the authors, Artur d’Avila Garcez (9 Aug 2006):

I am interested in the localist v distributed discussion and in the issue of biological plausibility; it’s not that we don’t care, but I guess you’re right to say that we don’t “in this paper anyway”. In this paper – and in our previous work – what we do is to say: take standard ANNs (typically the ones you can apply Backpropagation to). What logics can you represent in such ANNs? In this way, learning is a bonus as representation should precede learning.

The above answers you question re. learning. Learning is not the computation, that’s the reasoning part! Learning is the process of changing the connections (initially set by the logic) progressively, according to some set of examples (cases). For this you can apply Backprop to each network in the ensemble. The result is a different set of weights and therefore a different set of rules – after learning if you go back to the computation you should get different results.