“… tensions between quantitative and qualitative methods can reflect more on academic politics than on epistemology. Qualitative approaches are generally associated with an interpretivist position, and quantitative approaches with a positivist one, but the methods are not uniquely tied to the epistemologies. An interpretivist need not eschew all numbers, and positivists can and do carry out qualitative studies (Lin, 1998). ‘Quantitative’ need not mean ‘objective’. Subjective approaches to statistics, for instance Bayesian approaches, assume that probabilities are mental constructions and do not exist independently of minds (De Finetti, 1989). Statistical models are seen as inhabiting a theoretical world which is separate to the ‘real’ world though related to it in some way (Kass, 2011). Physics, often seen as the shining beacon of quantitative science, has important examples of qualitative demonstrations in its history that were crucial to the development of theory (Kuhn, 1961).”
Note. This is quite a ranty blog post – especially the first two paragraphs. Readers may therefore wish to read it in the voice of Bernard Black from the series Black Books to make it more palatable. You may also be interested in this short BMJ comment.
Many of the social science papers I read have long jargon-heavy sections justifying the methods used. This is particularly common in writeups of qualitative studies, though not unheard of in quantitative work. There are reflections on epistemology and ontology – sometimes these must be discussed by doctoral students if they are to acquire a degree.
There is discussion of social constructionism, critical realism, phenomenology, interpretation, intersubjectivity, hermeneutics. “But what is reality, really?” the authors ponder; “What can we know?” Quantitative analysis is “positivist” and to find or construct meaning you need a qualitative analysis (it is claimed).
Although I love philosophy, most of this reflection bores me to tears and seems irrelevant.
I think many differences between methods are exaggerated, clever-sounding –isms are fetishised, grandiose meta-theories concerning the nature of reality are used to explain away straightforward study limitations such as poor sampling. I bet some researchers feel they have to reel off fancy terminology to play the academic game, even though they think it’s bollocks.
But there are different kinds of research in the social sciences, beyond the dreary qual versus quant distinction as usually discussed. Might it be easiest to see the differences in terms of the goals of the research? Here are three examples of goals, to try to explain what I mean.
Evoke empathy. If you can’t have a chat with someone then the next best way to empathise with them is via a rich description by or about them. There is a bucket-load of pretentiousness in the literature (search for “thick description” to find some). But skip over this and there are wonderful works which are simply stories. I love stories. Biographies you read which make you long to meet the subject are prime examples. Film documentaries, though not fitting easily into traditional research output, are another. Anthologies capturing concise, emotive expressions of people’s lived experience. “Interpretative Phenomenological Analyses” manage to include stories too, though you might have to wade through nonsense to get to them.
Classify. This may be the classification of perspectives, attitudes, experiences, processes, organisations, or other stuff-that-happens in society. For example: social class, personality, goals people have in psychological therapy, political orientation, mental health problem, emotional experiences. The goal here is to impose structure on material, reveal patterns, whether it be interview responses, answers on Likert scales, or some other kind of observation. There’s no escaping theory, articulated and debated or unarticulated and unchallenged, when doing this. There may be a hierarchical structure to classifications. There may be categorical or dimensional judgments (or both, where the former is derived from a threshold on the latter), e.g., consider Myers-Briggs or the Big Five personality types. Dimensions are quantitative things, but there are qualitative differences between them.
Predict. Finally you often want to make predictions. Do people occupying a particular social class location tend to experience some mental health difficulties more often than others? Does your personality predict the kinds of books you like to read. Do particular events predict an emotion you will feel? Other predictions concern the impact of interventions of various kinds (broadly construed). What would happen if you voted Green and told your friends you were going to do so? What would happen if you funded country-wide access to cognitive behavioural therapy rather than psychoanalysis? Theory matters here too, usually involving a story or model of why variables relate to each other.
These distinctions cannot be straightforwardly mapped onto quantitative and qualitative analysis. As we wrote in 2016:
“Some qualitative research develops what looks like a taxonomy of experiences or phenomena. Much of this isn’t even framed as qualitative. Take for example Gray’s highly-cited work classifying type 1 and type 2 synapses. His labelled photos of cortex slices illustrate beautifully the role of subjectivity in qualitative analysis and there are clear questions about generalisability. Some qualitative analyses use statistical models of quantitative data, for example latent class analyses showing the different patterns of change in psychological therapies.”
People often try to make predictions without using a quantitative model. Others use quantitative approaches to develop qualitatively different groups. Cartoonish characterisations of the different approaches to doing social (and natural) science research stifle creativity and misrepresent how the research is and could actually be done.
If you’re going to develop a questionnaire for something resulting in a total “score” — quality of life, feelings, distress, whatever — you’ll want all of the questions for one topic to be related to each other (as a bare minimum). This questionnaire probably wouldn’t be very “internally consistent”:
THE GENERAL STUFF QUESTIONNAIRE
- How often do you sing in the shower?
- What height are you?
- How far do you live from the nearest park?
- What’s your favourite number?
You won’t be able to do much with the result of summing answers to those together to a total score.
This one would:
THE RELIABLE FEELINGS QUESTIONNAIRE
- How do you feel?
- How do you feel?
- How do you feel?
- How do you feel?
- How do you feel?
- How do you feel?
- How do you feel?
- How do you feel?
- How do you feel?
- How do you feel?
However, you might wonder if questions 2 to 10 add anything… (So internal consistency isn’t everything.)
There are many ways to test the internal consistency of questionnaires, using the answers that people give. One is to use a formula by Lee Cronbach called Cronbach’s alpha. Answers run from 0 to 1. Higher is better (but not too high; see the second example above).
In England, it is now recommended (see p. 12 of Mental Health Payment by Results Guidance) to use scores on a “Mental Health Clustering Tool” to evaluate outcomes. I think there are at least two problems with this:
- It’s completed by clinicians. It’s unclear if service users even get to know how they have been scored, never mind to what extent they can influence the process.
- The questionnaire scores aren’t internally consistent.
The people who proposed the approach write (see p.30 of their report): “As a general guideline, alpha values of 0.70 or above are indicative of a reasonable level of consistency”. Their results: 0.44, 0.58, 0.63, 0.57. They also refer to previous studies showing that this would always be the case, due to “its original intended purpose of being a scale with independent items” (p. 30). So, by design, it’s closer to the General Stuff Questionnaire above: a list of “presenting problems” to be read individually.
Not only are clinicians deciding whether someone has a good outcome (are they really in the best position to decide?), but the questionnaire they’re using to do so is rubbish — as shown by the very people proposing the approach!
Undergraduate psychology students wouldn’t use a questionnaire this poor in their projects. Why is it acceptable for a national mental health programme?
- It’s okay if participants see the logic underlying a self-report questionnaire, e.g., can guess what the subscales are. It’s a self-report questionnaire — how else are they going to complete the thing? (Related: lie scales — too good to be true?)
- Brain geography is not sufficient to make psychology a science.
- Going beyond proportion of variance “explained” probably is necessary for psychology to become a science.
- People learn stuff. It’s worth explicitly thinking about this, especially for complex activities like reasoning and remembering. How much of psychology is the study of culture? (Not necessarily a criticism.)
- Fancy data analysis is nice but don’t forget to look at descriptives.
- We can’t completely know another’s mind, not even with qualitative methods.
- Observation presupposes theory (and unarticulated prejudice is the worst kind of theory).
- Most metrics in psychology are arbitrary, e.g., what are the units of PHQ-9?
- Latent variables don’t necessarily represent unitary psychological constructs. (Related: “general intelligence” isn’t itself an explanation for anything; it’s a statistical re-representation of correlations.)
- Averages are useful but the rest of the distribution is important too.
“Models of data have a deep inﬂuence on the kinds of theorising that researchers do. A structural equation model with latent variables named Shifting, Updating, and Inhibition (Miyake et al. 2000) might suggest a view of the mind as inter-connected Gaussian distributed variables. These statistical constructs are driven by correlations between variables, rather than by the underlying cognitive processes […]. Davelaar and Cooper (2010) argued, using a more cognitive-process-based mathematical model of the Stop Signal task and the Stroop task, that the inhibition part of the statistical model does not actually model inhibition, but rather models the strength of the pre-potent response channel. Returning to the older example introduced earlier of g (Spearman 1904), although the scores from a variety of tasks are positively correlated, this need not imply that the correlations are generated by a single cognitive (or social, or genetic, or whatever) process. The dynamical model proposed by van der Mass et al. (2006) shows that correlations can emerge due to mutually beneﬁcial interactions between quite distinct processes.”
Fugard, A. J. B & Stenning, K. (2013). Statistical models as cognitive models of individual differences in reasoning. Argument & Computation, 4, 89–102.
Wise words from Colin Mills:
“I’m seldom interested in the data in front of me for its own sake and normally want to regard it as evidence about some larger population (or process) from which it has been sampled. In saying this I am not saying that quantification is all there is to sociology. That would be absurd. Before you can count anything you have to know what you are looking for, which implies that you have to have spent some time thinking out the concepts that will organize reality and tell you what is important.”
“… the institutionalized and therefore little questioned distinction between qualitative and quantitative empirical research is, to say the least, unhelpful and should be abolished. There is a much bigger intellectual gulf between those who just want to study what is in front of their eyes and those who view what is in front of their eyes as an instantiation of something bigger. Qualitative or quantitative if your business is generalization you have to have some theory of inference and if you don’t then your intellectual project is, in my view, incoherent.”
All attempts to capture another’s phenomenological experience, either in a relatively bottom-up manner, through unstructured discourse (“qual”?) or more top-down through a questionnaire (“quant”?) get stuck eventually. You still can’t really know what it feels like to be the other.
Giving people a chance to go outside standardized questions makes it more likely an important experience will be reported. But we all have similar experiences; a lot can I think be gained by trying to capture the commonality. Basic questions can be answered like how many people (report) feel(ing) a particular way, how frequently, and how many of those enjoy, can cope with, or are bothered by the feeling. Simply knowing this population-level information can be helpful at an individual level.
The “quant” end is as subjective as the “qual” end of research. Data needs interpretation and the stats doesn’t know how to do that. Two people presented with the same ANOVA can and often do come to different conclusions as they think about the context around a study.
You’ll be aware of the gist. Quantitative statistical models are great for generalizing, also data suitable for the stats tends to be quicker to analyze than qualitative data. More qualitative methods, such as interviewing, tend to provide much richer information, but generalization is very tricky and often involves coding up so the data can be fitted using the stats. How else can the two (crudely defined here!) approaches to analysis talk to each other?
I like this a lot:
“In the social sciences we are often criticized by the ethnographers and the anthropologists who say that we do not link in with them sufficiently and that we simply produce a set of statistics which do not represent reality.”
“… by using league tables, we can find examples of places which are perhaps not outliers but where we want to look for the pathways of influence on why they are not outliers. For example, one particular Bangladeshi village would have been expected to have high levels of immunization, whereas it was down in the middle of the table with quite a large confidence interval. This seemed rather strange, but our colleagues were able to attribute this to a fundamentalist imam. […] Another example is a village at the top of the league table, which our colleagues could attribute to a very enthusiastic school-teacher.”
“… by connecting with the qualitative workers, by encouraging the fieldworkers to look further at particular villages and by saying to them that we were surprised that this place was good and that one was bad, we could get people to understand the potential for linking the sophisticated statistical methods with qualitative research.” (Ian Diamond and Fiona Steele, from a comment on a paper by Goldstein and Spiegelhalter, 1996, p. 429)
Also reminds me of a study by Turner and Sobolewska (2009) which split participants on their Systemizing and Empathizing Quotient scores. Participants were asked, “What is inside a mobile phone?” Here’s what someone with high EQ said:
“It flashes the lights, screen flashes, and the buttons lights up, and it vibrates. It comes to life on the inside and it comes to life on the outside, and you talk to the one side and someone is answering on the other side”
And someone with high SQ:
“Many things, circuit boards, chips, transceiver [laughs], battery [pause], a camera in some of them, a media player, buttons, lots of different things. [pause] Well there are lots and lots of different bits and pieces to the phone, there are mainly in … Eh, like inside the chip there are lots of little transistors, which is used, they build up to lots of different types of gates…”
(One possible criticism is that the SQ/EQ just found students of technical versus non-technical subjects… But the general idea is still lovely.)
Would be great to see more quantitative papers with little excerpts of stories. We tried in our paper on spontaneous shifts of interpretation on a probabilistic reasoning task (Fugard, Pfeifer, Mayerhofer & Kleiter, 2011, p. 642), but we only squeezed in a few sentences:
‘Participant 34 (who settled into a conjunction interpretation) said: “I only looked at the shape and the color, and then always out of 6; this was the quickest way.” Participant 37, who shifted from the conjunction to the conditional event, said: “In the beginning [I] always [responded] ‘out of 6,’ but then somewhere in the middle . . . Ah! It clicked and I got it. I was angry with myself that I was so stupid before.” Five participants spontaneously reported when they shifted during the task, for example, saying, “Ah, this is how it works.”’
Fugard, A. J. B., Pfeifer, N., Mayerhofer, B., & Kleiter, G. D. (2011). How people interpret conditionals: Shifts towards the conditional event. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 635–648.
Goldstein, H. & Spiegelhalter, D. J. (1996). League tables and their limitations: statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society. Series A (Statistics in Society) 159, 385–443.
Turner, P. & Sobolewska, E. (2009). Mental models, magical thinking, and individual differences. Human Technology 5, 90–113.