Mental testing

“The unfortunate habit in the mental testing field of devising a new test, administering it to some arbitrarily chosen group of subjects, calling these ‘the standardization population’, and then leaving it at that, does not seem to call for comment.” (Ehrenberg, 1955, p. 26, footnote 1)

Ehrenberg, A. S. C. (1955). Measurement and mathematics in psychology. British Journal of Psychology, 46(1), 20–9. Retrieved from


“We think that we know about uncertainty, and that when we have added a standard error or a confidence interval to a point estimate we have increased knowledge in some way or other. To many people, it does not look like that; they think that we are taking away their certainties – we are actually taking away information, and, if that is all that we can do, we are of no use to them. This was brought home to me forcibly when Peter Moore and I appeared before the Employment Select Committee of the House of Commons – which is not a random sample of the population at large. Our insistence that we could not deliver certainties was regarded as a sign of weakness, if not downright incompetence. One may laugh at that, but that is the way it was – and that is what we are up against. We must persist…” (David Bartholomew, discussion of Goldstein and Spiegelhalter 1996, p. 428).

Those who want to study what is in front of their eyes

Wise words from Colin Mills:

“I’m seldom interested in the data in front of me for its own sake and normally want to regard it as evidence about some larger population (or process) from which it has been sampled. In saying this I am not saying that quantification is all there is to sociology. That would be absurd. Before you can count anything you have to know what you are looking for, which implies that you have to have spent some time thinking out the concepts that will organize reality and tell you what is important.”

“… the institutionalized and therefore little questioned distinction between qualitative and quantitative empirical research is, to say the least, unhelpful and should be abolished. There is a much bigger intellectual gulf between those who just want to study what is in front of their eyes and those who view what is in front of their eyes as an instantiation of something bigger. Qualitative or quantitative if your business is generalization you have to have some theory of inference and if you don’t then your intellectual project is, in my view, incoherent.”

Computing number needed to treat from control group recovery rates and Cohen’s d

Furukawa and Leucht (2011) give a  formula for calculating the number needed to treat (NNT), i.e., (p. 1)

“the number of patients one would need to treat with the intervention in question in order to have one more success (or one less failure) than if treated in the control intervention”

based on the control group event rate (CER; for instance proportion of cases showing recovery) and Cohen’s d – an effect size in standard deviation units.

R code below:

NNT = function(d, CER) {
1 / ( pnorm( d - qnorm(1-CER) ) - CER )


Furukawa, T. A., & Leucht, S. (2011). How to obtain NNT from Cohen’s d: comparison of two methods. PloS one, 6(4), e19070.

Monitoring patients using control charts

Interesting collection of studies using control charts to monitor measures from individual patients.


Tennant, R., Mohammed, M. A., Coleman, J. J., & Martin, U. (2007). Monitoring patients using control charts: a systematic review. international journal for quality in health care, 19(4), 187-194.


Authors/Year/Sample size Results
Hayati et al. [18], 2006 (n = 45) Control charts, based on peak flow readings taken at work had a sensitivity of 86% and specificity of 88% compared with a gold standard measure (Specific Inhalation Challenge, SIC). 2/3 individuals with a positive diagnosis based on SIC had lower peak flow readings at work than at home, suggesting potential errors with the gold standard measure
Alemi and Neuhauser [19], 2004 (n = 3) Control charts for all three asthmatic patients in the study showed special cause variation on at least one occasion. One patient showed no attacks after changes in their asthma care regime. One patient showed special cause variation (a decrease in attacks), which was associated with a reduction to exposure to irritants at home
Boggs et al. [20], 1998 (n = 3) Patient 1: Peak flow readings ranged between 92% and 76% of personal best. The patient’s control chart was in statistical control: future peak flow readings likely to continue to fall within a safe range Patient 2: Peak flow readings ranged between 86% and 54% of personal best, indicating that the patient was at high risk of severe asthma. Changes in the patient’s treatment regime brought readings into statistical control Patient 3: Peak flow readings ranged between 17% and 101% of personal best, indicating that peak flow readings were not in statistical control. Changes in the patient’s treatment regime brought readings into statistical control
Gibson et al. [21], 1995 (n = 35) Exacerbations identified using 9 action points for identifying exacerbations (3 based on control chart exceedences, 6 based on action points taken from published guidelines) were compared with exacerbations identified by clinical assessment (using retrospective data collected by patients). The two methods with the highest sensitivity and specificity (peak flow rate < 80% of personal best, 2/3 successive measures between 2 and 3 lower sigma) were compared. True positive rate: peak flow rate < 80% = 88%, control chart (2/3 successive measures 2–3 lower sigma) = 91% (P = NS). False positive rate: peak flow rate < 80% = 47%, control chart (2/3 successive measures two- to three-sigma) = 23%. (P = 0.002). An action point of a single measure > 3 lower sigma detected 72% of exacerbations before they were clinically identified. An action point of 2/3 points 2–3 lower sigma identified 19% of exacerbations earlier. An action point of 4/5 points between 1 and 2 lower sigma identified 60% of exacerbations earlier
Hebert and Neuhauser [22], 2004 (n = 1) Patient 1: In the first period of observation, mean systolic blood pressure was 131.1 mmHg (Upper and Lower control limits 146.3 and 115.9 mmHg, respectively). In the second period of observation, the control chart indicated a significant drop in blood pressure (mean = 126.1 mmHg) (Upper and Lower control limits 143.3 and 109, respectively). Qualitative interviews showed a high level of patient acceptability (satisfaction in observing improvements in blood pressure, improved knowledge of own blood pressure measurements)
Solodky et al. [23], 1998 (n = 3) Case-series: In both patients, all seven systolic blood pressure readings taken after treatment fell below the mean for the seven pre-treatment values Case-study: The control chart for the period before treatment showed a mean blood sugar level of 130 mg/dL: upper control limits were exceeded on two occasions. The control chart for the period after treatment showed a drop in mean blood sugar levels to 97: upper control limits were exceeded on two occasions
Piccoli et al. [24], 1987 (n = 38) CUSUM charts of serum creatinine following kidney transplant had a sensitivity of 85% and a specificity of 94% in identifying positive or negative changes in renal function compared with gold standard measures (full clinical assessment). There was no significant difference in the time take to detect a change in renal function using either detection method

Simple cluster bargraphs with error bars in R

Here you go. (Edited to add: These days you should use ggplot2.)

bargraph.CI(dose, len, group = supp, data = ToothGrowth,
xlab = "Dose", ylab = "Growth", x.leg = 1,
col = c("white","grey"), legend = TRUE, = function(x) t.test(x)$

lineplot.CI(dose, len, group = supp, data = ToothGrowth,
xlab = "Dose", ylab = "Growth", x.leg = 1,
legend = TRUE, = function(x) t.test(x)$

Slides from Computing for Graphical models (16 December 2011)

Comments Off on Slides from Computing for Graphical models (16 December 2011)

Slides from Computing for Graphical models (16 December 2011) — looks interesting:

Graphical models has expanded substantially over the past decade with the analysis of large data sets, in particular from bioinformatics and retail, with developments of inference in relation to causality, and with applications involving complex data structures. The ubiquitous nature of conditional independence has meant these models are applied in many different subjects. Computing for graphical models has always been difficult but recently user friendly open source software has become available.  This meeting provided a platform to review the current provision and to elucidate remaining challenges in making graphical modelling more accessible to the wider scientific community.


Multiple imputation: special issue of Journal of Statistical Software

This issue looks good for folk wanting to do multiple imputation. Some goodies for useRs in particular:

Linking statistics and qualitative methods

You’ll be aware of the gist. Quantitative statistical models are great for generalizing, also data suitable for the stats tends to be quicker to analyze than qualitative data. More qualitative methods, such as interviewing, tend to provide much richer information, but generalization is very tricky and often involves coding up so the data can be fitted using the stats. How else can the two (crudely defined here!) approaches to analysis talk to each other?

I like this a lot:

“In the social sciences we are often criticized by the ethnographers and the anthropologists who say that we do not link in with them sufficiently and that we simply produce a set of statistics which do not represent reality.”

“… by using league tables, we can find examples of places which are perhaps not outliers but where we want to look for the pathways of influence on why they are not outliers. For example, one particular Bangladeshi village would have been expected to have high levels of immunization, whereas it was down in the middle of the table with quite a large confidence interval. This seemed rather strange, but our colleagues were able to attribute this to a fundamentalist imam. […] Another example is a village at the top of the league table, which our colleagues could attribute to a very enthusiastic school-teacher.”

“… by connecting with the qualitative workers, by encouraging the fieldworkers to look further at particular villages and by saying to them that we were surprised that this place was good and that one was bad, we could get people to understand the potential for linking the sophisticated statistical methods with qualitative research.” (Ian Diamond and Fiona Steele, from a comment on a paper by Goldstein and Spiegelhalter, 1996, p. 429)

Also reminds me of a study by Turner and Sobolewska (2009) which split participants on their Systemizing and Empathizing Quotient scores. Participants were asked, “What is inside a mobile phone?” Here’s what someone with high EQ said:

“It flashes the lights, screen flashes, and the buttons lights up, and it vibrates. It comes to life on the inside and it comes to life on the outside, and you talk to the one side and someone is answering on the other side”

And someone with high SQ:

“Many things, circuit boards, chips, transceiver [laughs], battery [pause], a camera in some of them, a media player, buttons, lots of different things. [pause] Well there are lots and lots of different bits and pieces to the phone, there are mainly in … Eh, like inside the chip there are lots of little transistors, which is used, they build up to lots of different types of gates…”

(One possible criticism is that the SQ/EQ just found students of technical versus non-technical subjects… But the general idea is still lovely.)

Would be great to see more quantitative papers with little excerpts of stories. We tried in our paper on spontaneous shifts of interpretation on a probabilistic reasoning task (Fugard, Pfeifer, Mayerhofer & Kleiter, 2011, p. 642), but we only squeezed in a few sentences:

‘Participant 34 (who settled into a conjunction interpretation) said: “I only looked at the shape and the color, and then always out of 6; this was the quickest way.” Participant 37, who shifted from the conjunction to the conditional event, said: “In the beginning [I] always [responded] ‘out of 6,’ but then somewhere in the middle . . . Ah! It clicked and I got it. I was angry with myself that I was so stupid before.” Five participants spontaneously reported when they shifted during the task, for example, saying, “Ah, this is how it works.”’


Fugard, A. J. B., Pfeifer, N., Mayerhofer, B., & Kleiter, G. D. (2011).  How people interpret conditionals: Shifts towards the conditional event.  Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 635–648.

Goldstein, H. & Spiegelhalter, D. J. (1996). League tables and their limitations: statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society. Series A (Statistics in Society) 159, 385–443.

Turner, P. & Sobolewska, E. (2009). Mental models, magical thinking, and individual differences. Human Technology 5, 90–113.