PHQ-9 “over-diagnosis” paper shows that arithmetic works

A recent paper by Levis et al. (2020) systematically reviews studies looking at depression prevalence in two ways: one using a structured assessment completed by a professional (SCID) and the other using a questionnaire completed by study participants (PHQ-9). The authors conclude that “PHQ-9 ≥10 substantially overestimates depression prevalence.” But this was entirely predictable.

Mean SCID-prevalence was 12.1%.

Mean PHQ-9 prevalence (using a score of 10 or above to decide that someone has depression) was 24.6%.

This is almost exactly what arithmetic predicts; my back-of-envelope estimate of what PHQ-9 would say (see below) gives 23.8%, using estimates of PHQ’s sensitivity and sensitivity from a meta-analysis (88% and 85%, respectively) and the SCID-prevalence found in the review (12.1%).

So the paper’s results are unsurprising.

PHQ-9 (and any other screening questionnaire) gives better predictions in groups with higher rates of depression, such as people who have asked for a GP appointment because they are worried about their mental health.

No clinical decisions – such as whether to accept someone for treatment – should be made on the basis of nine tick-box answers alone. Questionnaires can also miss people who need treatment.

Screening questionnaires are often designed to over-diagnose rather than risk missing people who need treatment, under the assumption that a proper follow-up assessment will be carried out.

When reporting condition prevalence, the psychometric properties of measures should be provided, including what “gold standard” they have been validated against, and the chosen clinical threshold.

Explore PPV and NPV using this app.

 

Back of envelope

P(SCID) = .121
P(PHQ | SCID) = .88
P(not-PHQ | not-SCID) = .85
P(PHQ | not-SCID) = 1 – P(not-PHQ | not-SCID) = .15

P(PHQ & SCID) = P(PHQ | SCID) * P(SCID)
= .88 * .121
= .10648

P(PHQ & not-SCID) = P(PHQ | not-SCID) * P(not-SCID)
= (1 – .85) * (1 – .121)
= .13185

P(PHQ) = P(PHQ & SCID) + P(PHQ & not-SCID)
= .10648 + .13185
= 0.23833

 

Thanks Chris, for pointing out the typo!

Mental testing

“The unfortunate habit in the mental testing field of devising a new test, administering it to some arbitrarily chosen group of subjects, calling these ‘the standardization population’, and then leaving it at that, does not seem to call for comment.” (Ehrenberg, 1955, p. 26, footnote 1)

Ehrenberg, A. S. C. (1955). Measurement and mathematics in psychology. British Journal of Psychology, 46(1), 20–9. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/23957389