Being realistic about “realist” evaluation

Realist evaluation (formerly known as realistic evaluation; Pawson & Tilley, 2004, p. 3) is an approach to Theory-Based Evaluation that treats, e.g., burglars and prisons as real as opposed to narrative constructs; follows “a realist methodology” that aims for scientific “detachment” and “objectivity”; and also strives to be realistic about the scope of evaluation (Pawson & Tilley, 1997, pp. xii-xiv).

“Realist(ic)” evaluation proposes something apparently new and distinctive. How does it look in practice? What’s new about it? Let’s have a read of Pawson and Tilley’s (1997) classic to try to find out.

Déjà vu

Open any text on social science methodology, and it will say something like the following about the process of carrying out research:

  1. Review what is known about your topic area, including theories which attempt to explain and bring order to the various disparate findings.
  2. Use prior theory, supplemented with your own thinking, to formulate research questions or hypotheses.
  3. Choose methods that will enable you to answer those questions or test the hypotheses.
  4. Gather and analyse data.
  5. Interpret the analysis in relation to the theories introduced at the outset. What have you learned? Do the theories need to be tweaked? For qualitative research, this interpretation and analysis are often interwoven.
  6. Acknowledge limitations of your study. This will likely include reflection about whether your method or the theory are to blame for any mismatch between theory and findings.
  7. Add your findings to the pool of knowledge (after a gauntlet of peer review).
  8. Loop back to 1.

Realist evaluation has similar:

Figure 4.1 and 4.2 from Pawson and Tilley (1997), glued together for ease of comparison. The left loop is taken from a 1970s text on sociological method and the right loop is the authors’ revision for “realist” evaluation.

It is scientific method as usual with constraints on what the various stages should include for a study to be certified genuinely “realist”. For instance, the theories should be framed in terms of contexts, mechanisms, and outcomes (more on which in a moment); hypotheses emphasise the “for whom” and circumstances of an evaluation; and instead of “empirical generalisation” there is a “program specification”.

The method of data collection and analysis can be anything that satisfies this broad research loop (p. 85):

“… we cast ourselves as solid members of the modern, vociferous majority […], for we are whole-heartedly pluralists when it comes to the choice of method. Thus, as we shall attempt to illustrate in the examples to follow, it is quite possible to carry out realistic evaluation using: strategies, quantitative and qualitative; timescales, contemporaneous or historical; viewpoints, cross-sectional or longitudinal; samples, large or small; goals, action-oriented or audit-centred; and so on and so forth. [… T]he choice of method has to be carefully tailored to the exact form of hypotheses developed earlier in the cycle.”

This is reassuringly similar to the standard textbook story. However, like the standard story, in practice there are ethical and financial constraints on method. Indeed the UK government’s evaluation bible, the Magenta Book (HM Treasury, 2020), recommends using Theory-Based approaches like “realist” evaluation when experimental and quasi-experimental approaches are not feasible. (See also, What is Theory-Based Evaluation, really?)

More than a moment’s thought about theory

Pawson and Tilley (1997) emphasise the importance of thinking about why social interventions may lead to change and not only looking at outcomes, which they illustrate with the example of CCTV:

“CCTV certainly does not create a physical barrier making cars impenetrable. A moment’s thought has us realize, therefore, that the cameras must work by instigating a chain of reasoning and reaction. Realist evaluation is all about turning this moment’s thought into a comprehensive theory of the mechanisms through which CCTV may enter the potential criminal’s mind, and the contexts needed if these powers are to be realized.” (p. 78)

They then list a range of potential mechanisms. CCTV might make it more likely that thieves are caught in the act. Or maybe the presence of CCTV make car parks feel safer, which means they are used by more people whose presence and watchful eyes prevent theft. So other people provide the surveillance rather than the camera bolted to the wall.

Nothing new here – social science is awash with theory (Pawson and Tilley cite Durkheim’s 1950s work on suicide as an example). Psychological therapies are some of the most evaluated of social interventions and the field is particularly productive when it comes to theory; see, e.g., Whittle (1999, p. 240) on psychoanalysis, a predecessor of modern therapies:

“Psychoanalysis is full of theory. It has to be, because it is so distrustful of the surface. It could still choose to use the minimum necessary, but it does the opposite. It effervesces with theory…”

Power (2010) argues that most effects in modern therapies can be explained by transference (exploring and using how the relationship between therapist and client mirrors relationships outside therapy), graded exposure to situations which provoke anxiety, and challenging dysfunctional assumptions – for each of which there are detailed theories of change.

However, perhaps evaluations of social programme – therapies included – have concentrated too much on tracking outcomes and neglected getting to grips with potential mechanisms of change, so “realist” evaluation is potentially a helpful intervention. The specific example of CCTV is a joy to read and is a great way to bring the sometimes abstract notion of  social mechanism alive.

The structure of explanations in “realist” evaluation

Context-mechanism-regularity (or outcome) – the organisation of explanation in “realist” evaluations

The context-mechanism-outcome triad is a salient feature of the approach. Rather than define each of these (see the original text), here are four examples from Pawson and Tilley (1997) to illustrate what they are. The middle column (New mechanism) describes the putative mechanism that may be “triggered” by a social programme that has been introduced.

Context New mechanism Outcome
Poor-quality, hard-to-let housing; traditional housing department; lack of tenant involvement in estate management Improved housing and increased involvement in management create increased commitment to the estate, more stability, and opportunities and motivation for social control and collective responsibility Reduced burglary
prevalence
Three tower blocks, occupied mainly by the elderly; traditional housing department; lack of tenant involvement in estate management Concentration of elderly tenants into smaller blocks and natural wastage creates vacancies taken up by young, formerly homeless single people inexperienced in independent living. They become the dominant group. They have little capacity or inclination for informal social control, and are attracted to a hospitable estate subterranean subculture Increased burglary prevalence concentrated amongst the more
vulnerable; high levels of vandalism and incivility
Prisoners with little or no previous education with a growing string of convictions – representing a ‘disadvantaged’ background Modest levels of engagement and success with the program trigger ‘habilitation’ process in which the inmate experiences self-realization and social acceptability (for the first time) Lowest levels of reconviction as compared with statistical norm for such inmates
High numbers of prepayment meters, with a high proportion of burglaries involving cash from meters Removal of cash meters reduces incentive to burgle by decreasing actual or perceived rewards Reduction in percentage of burglaries involving meter breakage; reduced risk of burglary at dwellings where meters are removed; reduced burglary rate overall

This seems a helpful way to organise thinking about the context-mechanism-outcome triad, irrespective of whether the approach is labelled “realist”. Those who are into logframe matricies (logframes) might want to add a column for the “outputs” of a programme.

The authors emphasise that the underlying causal model is “generative” in the sense that causation is seen as

“acting internally as well as externally. Cause describes the transformative potential of phenomena. One happening may well trigger another but only if it is in the right condition in the right circumstances. Unless explanation penetrates to these real underlying levels, it is deemed to be incomplete.” (p. 34)

The “internal” here appears to refer to looking inside the “black box” of a social programme to see how it operates, rather than merely treating it as something that is present in some places and absent in others. Later, there is further elaboration of what “generative” might mean:

“To ‘generate’ is to ‘make up’, to ‘manufacture’, to ‘produce’, to ‘form’, to ‘constitute’. Thus when we explain a regularity generatively, we are not coming up with variables or correlates which associate one with the other; rather we are trying to explain how the association itself comes about. The generative mechanisms thus actually constitute the regularity; they are the regularity. The generative mechanisms thus actually constitute the regularity; they are the regularity.” (p. 67)

We also learn that an action is causal only if its outcome is triggered by a mechanism in a context (p. 58). Okay, but how do we find out if an action’s outcome is triggered in this manner? “Realist” evaluation does not, in my view, provide an adequate analysis of what a causal effect is. Understandable, perhaps, given its pluralist approach to method. So, understandings of causation must come from elsewhere.

Mechanisms can be seen as “entities and activities organized in such a way that they are responsible for the phenomenon” (Illari & Williamson, 2011, p. 120). In “realist” evaluation, entities and their activities in the context would be included in this organisation too – the context supplies the mechanism on which a programme intervenes. So, let’s take one of the example mechanisms from the table above:

“Improved housing and increased involvement in management create increased commitment to the estate, more stability, and opportunities and motivation for social control and collective responsibility.”

To make sense of this, we need a theory of what improved housing looks like, what involvement in management and commitment to the estate, etc., means. To “create commitment” seems like a psychological, motivational process. The entities are the housing, management structures, people living in the estate, etc. To evidence the mechanism, I think it does help to think of variables to operationalise what might be going on and to use comparison groups to avoid mistaking, e.g., regression to the mean or friendlier neighbours for change due to improved housing. And indeed, Pawson and Tilley use quantitative data in one of the “realist” evaluations they discuss (next section). Such operationalisation does not reduce a mechanism to a set of variables; it is merely a way to analyse a mechanism.

Kinds of evidence

Chapter 4 gives a range of examples of the evidence that has been used in early “realist” evaluations. In summary, and confirming the pluralist stance mentioned above, it seems that all methods are relevant to realist evaluation. Two examples:

  1. Interviews with practitioners to try to understand what it is about a programme that might effect change: “These inquiries released a flood of anecdotes, and the tales from the classroom are remarkable not only for their insight but in terms of the explanatory form which is employed. These ‘folk’ theories turn out to be ‘realist’ theories and invariably identify those contexts and mechanisms which are conducive to the outcome of rehabilitation.” (pp. 107-108)
  2. Identifying variables in an information management system to “operationalize these hunches and hypotheses in order to identify, with more precision, those combinations of types of offender and types of course involvement which mark the best chances of rehabilitation. Over 50 variables were created…” (p. 108)

Some researchers have made a case for and carried out what they term realist randomised controlled trials (Bonell et al., 2012; which seems eminently sensible to me). The literature subsequently exploded in response. Here’s an illustrative excerpt of the criticisms (Marchal et al., 2013, p. 125):

“Experimental designs, especially RCTs, consider human desires, motives and behaviour as things that need to be controlled for (Fulop et al., 2001, Pawson, 2006). Furthermore, its analytical techniques, like linear regression, typically attempt to isolate the effect of each variable on the outcome. To do this, linear regression holds all other variables constant “instead of showing how the variables combine to create outcomes” (Fiss, 2007, p. 1182). Such designs “purport to control an infinite number of rival hypotheses without specifying what any of them are” by rendering them implausible through statistics (Campbell, 2009), and do not provide a means to examine causal mechanisms (Mingers, 2000).”

Well. What to make of this. Yes, RCTs control for stuff that’s not measured and maybe even unmeasurable. But you can also measure stuff you know about and see if that moderates or mediates the outcome (see, e.g., Windgassen et al., 2016). You might also use the numbers to select people for qualitative interview to try to learn more about what is going on. The comment on linear regression reveals surprising ignorance of how non-linear transformations of and interactions between predictors can be added to models. It is also trivial to calculate marginal outcome predictions for combinations of predictors together, rather than merely identifying which predictors are likely non-zero when holding others fixed. See Bonell et al. (2016) for a very patient reply.

Conclusions

The plea for evaluators to spend more time developing theory is welcome – especially in policy areas where “key performance indicators” and little else are the norm (see also Carter, 1989, on KPIs as dials versus tin openers opening a can of worms). It is a laudable aim to help “develop the theories of practitioners, participants and policy makers” of why a programme might work (Pawson & Tilley, 1997, p. 214). The separation of context, mechanism, and outcome, also helps structure thinking about social programmes (though there is widespread confusion about what a mechanism is in the “realist” literature; Lemire et al., 2020). But “realist” evaluation is arguably better seen as an exposition of a particular reading of ye olde scientific method applied to evaluation, with a call for pluralist methods. I am unconvinced that it is a novel form of evaluation.

References

Bonell, C., Fletcher, A., Morton, M., Lorenc, T., & Moore, L. (2012). Realist randomised controlled trials: a new approach to evaluating complex public health interventions. Social Science & Medicine, 75(12), 2299–2306.

Bonell, C., Warren, E., Fletcher, A., & Viner, R. (2016). Realist trials and the testing of context-mechanism-outcome configurations: A response to Van Belle et al. Trials, 17(1), 478.

Carter, N. (1989). Performance indicators: “backseat driving” or “hands off” control? Policy & Politics, 17, 131–138.

HM Treasury (2020). Magenta Book.

Illari, P. M., & Williamson, J. (2011). What is a mechanism? Thinking about mechanisms across the sciencesEuropean Journal for Philosophy of Science2(1), 119–135.

Lemire, S., Kwako, A., Nielsen, S. B., Christie, C. A., Donaldson, S. I., & Leeuw, F. L. (2020). What Is This Thing Called a Mechanism? Findings From a Review of Realist Evaluations. New Directions for Evaluation, 167, 73–86.

Marchal, B., Westhorp, G., Wong, G., Van Belle, S., Greenhalgh, T., Kegels, G., & Pawson, R. (2013). Realist RCTs of complex interventions – an oxymoron. Social Science & Medicine, 94, 124–128.

Pawson, R., & Tilley, N. (1997). Realistic Evaluation. SAGE Publications Ltd.

Pawson, R., & Tilley, N. (2004). Realist evaluation. Unpublished.

Power, M. (2010). Emotion-focused cognitive therapy. London: Wiley.

Whittle, P. (1999). Experimental Psychology and Psychoanalysis: What We Can Learn from a Century of Misunderstanding. Neuropsychoanalysis1, 233-245.

Windgassen, S., Goldsmith, K., Moss-Morris, R., & Chalder, T. (2016). Establishing how psychological therapies work: the importance of mediation analysis. Journal of Mental Health, 25, 93–99.

So, you have pledged allegiance to critical realism – what next?

So, you have pledged allegiance to the big four critical realist axioms (Archer, et al., 2016) – what next?

Here are some ideas.

1. Ontological realism

What is it? There is a social and material world existing independently of people’s speech acts. “Reality is real.” One way to think about this slogan in relation to social kinds like laws and identities is they have a causal impact on our lives (Dembroff, 2018). Saying that reality is real does not mean that reality is fixed. For example, we can eat chocolate (which changes it and us) and change laws.

What to do? Throw radical social constructionism in the bin. Start with a theory that applies to your particular topic and provides ideas for entities and activities to use and possibly challenge in your own theorising.

Those “entities” (what a cold word) may be people with desires, beliefs, and opportunities (or lack thereof) who do things in the world like going for walks, shopping, cleaning, working, and talking to each other (Hedström, 2005). The entities may be psychological “constructs” like kinds of memory and cognitive control and activities like updating and inhibiting prepotent responses. The entities might be laws and activities carried out by the criminal justice system and campaigners. However you decide to theorise reality, you need something.

How an intervention may influence someone’s actions by influencing their desires, beliefs, and/or opportunities (Hedström, 2005, p. 44)

2. Epistemic relativity

What is it? The underdetermination of theories means that two theorists can make a compelling case for two different accounts of the same evidence. Their (e.g., political, moral) standpoint and various biases will influence what they can theorise. Quantitative researchers are appealing to epistemic relativity when they cite George Box’s “All models are wrong” and note the variety of models that can be fit to a dataset.

What to do? Throw radical positivism in the bin – even if you are running RCTs. Ensure that you foreground your values whether through statements of conflicts of interest or more reflexive articulations of likely bias and prejudice. Preregistering study plans also seems relevant here.

There may be limits to the extent to which an individual researcher can articulate their biases, so help out your colleagues and competitors.

3. Judgemental/judgmental rationality

What is it? Even though theories are underdetermined by evidence, there often are reasons to prefer one theory over another.

What to do? If predictive accuracy does not help choose a theory, you could also compare them in terms of how consistent they are with themselves and other relevant theories; how broad in scope they are; whether they actually bring some semblance of order to the phenomena being theorised; and whether they make novel predictions beyond current observations (Kuhn, 1977).

You might consider the aims of critical theory which proposes judging theories in terms of how well they help eliminate injustice in the world (Fraser, 1985). But you would have to take a political stance.

4. Ethical naturalism

What is it? Although is does not imply ought, prior ought plus is does imply posterior ought.

What to do? Back to articulating your values. In medical research the following argument form is common (if often implicit): We should prevent people from dying; a systematic review has shown that this treatment prevents people from dying; therefore we should roll out this treatment. We could say something similar for social research that is anti-racist, feminist, LGBTQI+, intersections thereof, and other research. But if your research makes a recommendation for political change, it must also foreground the prior values that enabled that recommendation to inferred.

In summary

The big four critical realist axioms provide a handy but broad metaphysical and moral framework for getting out of bed in the morning and continuing to do social research. Now we are presented with further challenges that depend on grappling with substantive theory and specific political and moral values. Good luck.

References

Archer, M., Decoteau, C., Gorski, P. S., Little, D., Porpora, D., Rutzou, T., Smith, C., Steinmetz, G., & Vandenberghe, F. (2016). What is Critical Realism? Perspectives: Newsletter of the American Sociological Association Theory Section, 38(2), 4–9.

Dembroff, R. (2018). Real talk on the metaphysics of gender. Philosophical Topics, 46(2), 21–50.

Fraser, N. (1985). What’s critical about critical theory? The case of Habermas and gender. New German Critique35, 97-131.

Kuhn, T. S. (1977). Objectivity, Value Judgment, and Theory Choice. In The Essential Tension: Selected Studies in Scientific Tradition and Change (pp. 320–339). The University of Chicago Press.

Hedström, P. (2005). Dissecting the social: on the principles of analytic sociology. Cambridge University Press.

Qual and quant – subjective and objective?

“… tensions between quantitative and qualitative methods can reflect more on academic politics than on epistemology. Qualitative approaches are generally associated with an interpretivist position, and quantitative approaches with a positivist one, but the methods are not uniquely tied to the epistemologies. An interpretivist need not eschew all numbers, and positivists can and do carry out qualitative studies (Lin, 1998). ‘Quantitative’ need not mean ‘objective’. Subjective approaches to statistics, for instance Bayesian approaches, assume that probabilities are mental constructions and do not exist independently of minds (De Finetti, 1989). Statistical models are seen as inhabiting a theoretical world which is separate to the ‘real’ world though related to it in some way (Kass, 2011). Physics, often seen as the shining beacon of quantitative science, has important examples of qualitative demonstrations in its history that were crucial to the development of theory (Kuhn, 1961).”

Fugard and Potts (2015, pp. 671-672)

From methods to goals in social science research

Note. This is quite a ranty blog post – especially the first two paragraphs. Readers may therefore wish to read it in the voice of Bernard Black from the series Black Books to make it more palatable. You may also be interested in this short BMJ comment.

History_of_Screaming.jpg

Onwards…

Many of the social science papers I read have long jargon-heavy sections justifying the methods used. This is particularly common in writeups of qualitative studies, though not unheard of in quantitative work. There are reflections on epistemology and ontology – sometimes these must be discussed by doctoral students if they are to acquire a degree.

There is discussion of social constructionism, critical realism, phenomenology, interpretation, intersubjectivity, hermeneutics. “But what is reality, really?” the authors ponder; “What can we know?” Quantitative analysis is “positivist” and to find or construct meaning you need a qualitative analysis (it is claimed).

Although I love philosophy, most of this reflection bores me to tears and seems irrelevant.

I think many differences between methods are exaggerated, clever-sounding –isms are fetishised, grandiose meta-theories concerning the nature of reality are used to explain away straightforward study limitations such as poor sampling. I bet some researchers feel they have to reel off fancy terminology to play the academic game, even though they think it’s bollocks.

But there are different kinds of research in the social sciences, beyond the dreary qual versus quant distinction as usually discussed. Might it be easiest to see the differences in terms of the goals of the research? Here are three examples of goals, to try to explain what I mean.

Evoke empathy. If you can’t have a chat with someone then the next best way to empathise with them is via a rich description by or about them. There is a bucket-load of pretentiousness in the literature (search for “thick description” to find some). But skip over this and there are wonderful works which are simply stories. I love stories. Biographies you read which make you long to meet the subject are prime examples. Film documentaries, though not fitting easily into traditional research output, are another. Anthologies capturing concise, emotive expressions of people’s lived experience. “Interpretative Phenomenological Analyses” manage to include stories too, though you might have to wade through nonsense to get to them.

Classify. This may be the classification of perspectives, attitudes, experiences, processes, organisations, or other stuff-that-happens in society. For example: social class, personality, goals people have in psychological therapy, political orientation, mental health problem, emotional experiences. The goal here is to impose structure on material, reveal patterns, whether it be interview responses, answers on Likert scales, or some other kind of observation. There’s no escaping theory, articulated and debated or unarticulated and unchallenged, when doing this. There may be a hierarchical structure to classifications. There may be categorical or dimensional judgments (or both, where the former is derived from a threshold on the latter), e.g., consider Myers-Briggs or the Big Five personality types. Dimensions are quantitative things, but there are qualitative differences between them.

Predict. Finally you often want to make predictions. Do people occupying a particular social class location tend to experience some mental health difficulties more often than others? Does your personality predict the kinds of books you like to read. Do particular events predict an emotion you will feel? Other predictions concern the impact of interventions of various kinds (broadly construed). What would happen if you voted Green and told your friends you were going to do so? What would happen if you funded country-wide access to cognitive behavioural therapy rather than psychoanalysis? Theory matters here too, usually involving a story or model of why variables relate to each other.

These distinctions cannot be straightforwardly mapped onto quantitative and qualitative analysis. As we wrote in 2016:

“Some qualitative research develops what looks like a taxonomy of experiences or phenomena. Much of this isn’t even framed as qualitative. Take for example Gray’s highly-cited work classifying type 1 and type 2 synapses. His labelled photos of cortex slices illustrate beautifully the role of subjectivity in qualitative analysis and there are clear questions about generalisability. Some qualitative analyses use statistical models of quantitative data, for example latent class analyses showing the different patterns of change in psychological therapies.”

People often try to make predictions without using a quantitative model. Others use quantitative approaches to develop qualitatively different groups. Cartoonish characterisations of the different approaches to doing social (and natural) science research stifle creativity and misrepresent how the research is and could actually be done.

Statistical models as cognitive models…

“Models of data have a deep influence on the kinds of theorising that researchers do. A structural equation model with latent variables named Shifting, Updating, and Inhibition (Miyake et al. 2000) might suggest a view of the mind as inter-connected Gaussian distributed variables. These statistical constructs are driven by correlations between variables, rather than by the underlying cognitive processes […]. Davelaar and Cooper (2010) argued, using a more cognitive-process-based mathematical model of the Stop Signal task and the Stroop task, that the inhibition part of the statistical model does not actually model inhibition, but rather models the strength of the pre-potent response channel. Returning to the older example introduced earlier of g (Spearman 1904), although the scores from a variety of tasks are positively correlated, this need not imply that the correlations are generated by a single cognitive (or social, or genetic, or whatever) process. The dynamical model proposed by van der Mass et al. (2006) shows that correlations can emerge due to mutually beneficial interactions between quite distinct processes.”

Fugard, A. J. B & Stenning, K. (2013). Statistical models as cognitive models of individual differences in reasoningArgument & Computation4, 89–102.

Those who want to study what is in front of their eyes

Wise words from Colin Mills:

“I’m seldom interested in the data in front of me for its own sake and normally want to regard it as evidence about some larger population (or process) from which it has been sampled. In saying this I am not saying that quantification is all there is to sociology. That would be absurd. Before you can count anything you have to know what you are looking for, which implies that you have to have spent some time thinking out the concepts that will organize reality and tell you what is important.”

“… the institutionalized and therefore little questioned distinction between qualitative and quantitative empirical research is, to say the least, unhelpful and should be abolished. There is a much bigger intellectual gulf between those who just want to study what is in front of their eyes and those who view what is in front of their eyes as an instantiation of something bigger. Qualitative or quantitative if your business is generalization you have to have some theory of inference and if you don’t then your intellectual project is, in my view, incoherent.”

Linking statistics and qualitative methods

You’ll be aware of the gist. Quantitative statistical models are great for generalizing, also data suitable for the stats tends to be quicker to analyze than qualitative data. More qualitative methods, such as interviewing, tend to provide much richer information, but generalization is very tricky and often involves coding up so the data can be fitted using the stats. How else can the two (crudely defined here!) approaches to analysis talk to each other?

I like this a lot:

“In the social sciences we are often criticized by the ethnographers and the anthropologists who say that we do not link in with them sufficiently and that we simply produce a set of statistics which do not represent reality.”

“… by using league tables, we can find examples of places which are perhaps not outliers but where we want to look for the pathways of influence on why they are not outliers. For example, one particular Bangladeshi village would have been expected to have high levels of immunization, whereas it was down in the middle of the table with quite a large confidence interval. This seemed rather strange, but our colleagues were able to attribute this to a fundamentalist imam. […] Another example is a village at the top of the league table, which our colleagues could attribute to a very enthusiastic school-teacher.”

“… by connecting with the qualitative workers, by encouraging the fieldworkers to look further at particular villages and by saying to them that we were surprised that this place was good and that one was bad, we could get people to understand the potential for linking the sophisticated statistical methods with qualitative research.” (Ian Diamond and Fiona Steele, from a comment on a paper by Goldstein and Spiegelhalter, 1996, p. 429)

Also reminds me of a study by Turner and Sobolewska (2009) which split participants on their Systemizing and Empathizing Quotient scores. Participants were asked, “What is inside a mobile phone?” Here’s what someone with high EQ said:

“It flashes the lights, screen flashes, and the buttons lights up, and it vibrates. It comes to life on the inside and it comes to life on the outside, and you talk to the one side and someone is answering on the other side”

And someone with high SQ:

“Many things, circuit boards, chips, transceiver [laughs], battery [pause], a camera in some of them, a media player, buttons, lots of different things. [pause] Well there are lots and lots of different bits and pieces to the phone, there are mainly in … Eh, like inside the chip there are lots of little transistors, which is used, they build up to lots of different types of gates…”

(One possible criticism is that the SQ/EQ just found students of technical versus non-technical subjects… But the general idea is still lovely.)

Would be great to see more quantitative papers with little excerpts of stories. We tried in our paper on spontaneous shifts of interpretation on a probabilistic reasoning task (Fugard, Pfeifer, Mayerhofer & Kleiter, 2011, p. 642), but we only squeezed in a few sentences:

‘Participant 34 (who settled into a conjunction interpretation) said: “I only looked at the shape and the color, and then always out of 6; this was the quickest way.” Participant 37, who shifted from the conjunction to the conditional event, said: “In the beginning [I] always [responded] ‘out of 6,’ but then somewhere in the middle . . . Ah! It clicked and I got it. I was angry with myself that I was so stupid before.” Five participants spontaneously reported when they shifted during the task, for example, saying, “Ah, this is how it works.”’

References

Fugard, A. J. B., Pfeifer, N., Mayerhofer, B., & Kleiter, G. D. (2011).  How people interpret conditionals: Shifts towards the conditional event.  Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 635–648.

Goldstein, H. & Spiegelhalter, D. J. (1996). League tables and their limitations: statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society. Series A (Statistics in Society) 159, 385–443.

Turner, P. & Sobolewska, E. (2009). Mental models, magical thinking, and individual differences. Human Technology 5, 90–113.

On methodology

“I did not make a deliberate decision to adopt a particular methodology: I had the good fortune to work alongside gifted colleagues from backgrounds in different disciplines, and their various techniques seemed to be producing results. With hindsight, I should describe how one learns from both experiments and intelligent software in terms of the distinction that philosophers draw between the correspondence and the coherence theories of truth. An assertion is true according to the first theory if it corresponds to some state of affairs in the world; true according to the second if it coheres with a set of assertions constituting a general body of knowledge. Experiments provide information about correspondence with the facts, but they exert a dangerous pull in the direction of empirical pedantry, where the only things that count are facts, no matter how limited their purview. Computer programs provide information about the coherence of a set of assumptions, but they exert a dangerous pull in the direction of systematic delusion, where all that counts is internal consistency, no matter how remote from reality. Give up one approach and you turn into a Gradgrind, the teacher in Dicken’s novel Hard Times, whose only concern is with the facts; give up the other and you become an architect for the Flat Earth Society. Those, at least, are the dangers.”

From the prologue to Mental Models by Philip Johnson-Laird

John Fox on SEM

From an Appendix to An R and S-PLUS Companion to Applied Regression:

“A cynical view of SEMs is that their popularity in the social sciences reflects the legitimacy that the models appear to lend to causal interpretation of observational data, when in fact such interpretation is no less problematic than for other kinds of regression models applied to observational data. A more charitable interpretation is that SEMs are close to the kind of informal thinking about causal relationships that is common in social-science theorizing, and that, therefore, these models facilitate translating such theories into data analysis.”