Texts I like for learning statistics and using R


I’ll be updating this, but first thoughts:

Fitting regression models, GLMs, etc.

Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). London: SAGE Publications Ltd.

See also online material, including free appendices and R code.

Data transformation and visualisation

Healy, K. (2019). Data Visualization: A Practical Introduction. Princeton University Press. (Free online version.)

Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol, CA: O’Reilly. (Free online version.)

Chang, W. (2020). R Graphics Cookbook (2nd ed.). Sebastopol, CA: O’Reilly. (Free online version.)

Lüdecke D (2018). ggeffects: Tidy Data Frames of Marginal Effects from Regression ModelsJournal of Open Source Software3(26), 772. doi: 10.21105/joss.00772

This is very handy for getting predictions from models, focusing on the effect of predictors of interest whilst holding covariates at some fixed values like a mean or (for factors) mode.

See also the package website for illustrative examples.

Gelman, A. (2011). Tables as graphs: The Ramanujan principleSignificance, 8, 183.

Missing data imputation

Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL. (Free online version.)

See also the package website.

P-values

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology31, 337–350.

“… correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists.”

Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science, 1, 140216. doi: 10.1098/rsos.140216

This generated lots of debate – I like how it attempts to use Bayes rule to turn p-values into something useful and the explanation in terms of diagnostic test properties. See also this on PPV and NPV.

Rafi, Z., & Greenland, S. (2020). Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20(1), 244. doi: 10.1186/s12874-020-01105-9

Interesting proposal to use s-values, calculated from p-values as −log₂(p). It’s a simple transformation: p is probability of getting all heads from −log₂(p) fair coin tosses. For example if p = 0.5 then s = 1; toss a coin once then the probability of head is 0.5. If p = 0.03125 then s = 5; toss a coin 5 times then the probability of all heads is 0.03125. But the s-value is supposedly easier to think about. I’m not sure if it really is, but I like the idea!