Some scribbles on reasoning

Ideally to study reasoning we would examine the discussions couples have about their relationships, the arguments people have in work, the language used by people with mental health problems, and characterise the logical and rhetorical mechanisms used. We would observe participants in naturalistic situations, not affecting their discourse by our observation, and extract features of their utterances and interactions. We would note sensitivity to nonverbal behaviour, tone, dysfluency. Using a range of behavioural tests with data derived again from naturalistic observation, we would seek correlations, build complex models, to relate to the style of reasoning utilised.

Analysing observational data is a lot of work. The consensus is that sitting watching videos of people talking, trying to extract features, is too costly at the moment and the results produced don’t justify the cost. Recent research has strived to make observational studies more feasible. Gottman et al (2002) describe an observational approach where the participants, couples in various states of marital (un)bliss, both rate videos of themselves talking to each other, for instance continually recording the emotional states of themselves and their partner by twiddling a knob. Dynamical systems theory is used to model the results and examine interactions between the partners’ ratings. The problem with this kind of very unstructured data—two continuous variables—is that it fails to tease apart the detailed structure of interactions and so fails to expose the form which characterises how the participants are reasoning.

Now we retreat to the world of the state of the art in reasoning research. This research has focused on lab based tasks about which much is now known on how the presence of various logical forms and the presentation of the information to be reasoned about affects participants’ performance. The tasks may be viewed as microcosms of dialogue where the psychological processes used to interact with people and make decisions day-to-day are recruited to try to make sense of (or ignore) the experimenter’s odd demands. In this way we are still studying the processes used outside the lab. We still get conflict, some refusing to cooperate but continuing to engage with the tasks, some try to believe everything they are being told, some have strange interpretations of the tasks not shared by the majority. It is also possible to connect more explicitly with the real world of human experience by using self-report questionnaires. Participants are treated as if they they have been their own mobile lab during the period of their life before coming to see the psychologist, and are asked to report what they have experienced, using structured folk-psychological vernacular, the constituents of which when combined, the psychometricians tell us, will be reliable, and valid with respect to some other measurement of the phenomena of interest.

The vast majority of researchers in reasoning have presupposed simplistic mappings from the surface form of a task’s presentation to classical logic or probability. For instance “if P then Q” is mapped to P → Q. Historically the dangers of assuming a single competence model are known. Henlé (1962, p. 373), for instance, says the following on interpreting performance in reasoning:

(a) While the possibility of fallacy often cannot be excluded it is, in Mill’s words, `scarcely ever possible decidedly to affirm that any argument involves a bad syllogism.’ (b) Where error occurs, it need not involve faulty reasoning, but may be a function of the individual’s understanding of the task or the materials presented to him.

Importantly for decades—centuries?—logicians and linguists have argued that such naive translations are wrong!

Viewing reasoning as two main activities, reasoning to an interpretation and reasoning from an interpretation (as formulated by Stenning & van Lambalgen 2004, 2005)—henceforth interpretation and derivation, respectively—may go some way towards dealing with this problem. Interpretation is the understanding of the task. This need not be an explicit awareness of the task and may include implicit parameters such as sensitivity to the order the materials are presented, assumptions about the action that is to be performed with the material. Derivation is then the process of calculation from that interpretation. A fallacy in this framework occurs when, for instance, the derivation phase results in a sentence which is inconsistent with the interpretation. Failing to interpret a task as the experimenter intended could be either a problem with the experimenter’s inferences about the participants’ likely interpretations or a problem with the participant’s interpretation of the experimenter’s intention. Now an obvious question to ask is, can we separate someone’s interpretation from their derivation processes? Another question is, what predicts the interpretations people form and with what other traits do such interpretations correlate?

The immediate inference task is an example of a task that demands of participants, for a series of simple sentences, “Report how you have interpreted me!” The syllogisms task demands, when paired with the immediate inference task, “Reason about combinations of these sentences of which, you may recall, you have just told me your accepted meaning!” Depressingly, there is no obvious relationship between the two tasks. Participants try to set up and communicate their own notion of competence but then fail horribly to be predictable with respect to this theory. A logistic regression model of the relationships developed by Stenning & Cox (2006) weighs in at 27 independent variables and predicts the term-order of responses in the syllogisms. It reveals a relationship between term order and how classically logical one is on immediate inference, tempered by effects of, e.g., the presentation order of the information that is to be reasoned about.

An existential proof of a relationship between tests for one population is not in itself interesting unless it is robust and can be combined with a plausible theoretical account. Robustness has been secured to a certain degree by the training set and test set methodology applied by the authors, but the study needs to be replicated. The theoretical account comes from the distinction between credulous and sceptical reasoning: building a preferred model, using all the contextual cues we usually have access to, versus some strategy which attempts to make inferences that hold for all possible models. It is useful to take a nonjudgemental stance here and consider, just why is it that some people are particularly sensitive to the order in which information is provided to them, why are some people good at classical reasoning, why are others good at finding preferred models. Fallacy detection is still possible, but with different notions of fallacy. On the one hand there is the verbal report of what someone believes to be valid reasoning. On the other hand there is the actual, often implicit, specification of what rationality has evolved (e.g. at a first approximation one could look at relationship and employment success?)

The “interpret this and tell me!” approach to discovering how people reason to an interpretation is unlikely to work. Psycholinguistics provides the largest body of work for determining people’s interpretations: asking people to complete sentences, check the grammaticality of sentences, often recording saccade patterns to infer if something has gone wrong. Reasoning tasks may be considered an extension of this methodology: “(Implicitly) interpret this, then (at least partially explicitly) do something with it!” Now it should be more apparent why it is crucial to examine performance within participants across a range of tasks. Since we cannot directly tap into implicit interpretative machinery, the way to proceed is to examine commonality across a range of explicit reasoning tasks. The relationship between performance in different tasks then tells us something about the interpretative processes, for a given individual, which carry between tasks.