Today is World Radio Day, so I thought I’d post something about analysing amateur radio data using R.
The radio bit
The ionosphere is a series of layers of the atmosphere, at heights between 50 and 1000 km, that are ionised by solar radiation. One of the things amateur radio operators do is experiment with how to use the ionosphere to bend the path of their radio waves and see where signals end up.
Given how the ionosphere is formed, it is dependent on how much radiation the sun is flinging at it. This is partly-driven by what bit of earth is currently pointing at the sun, so time of day is an important factor. There is an 11-year cycle of solar activity which has an enormous influence on ionisation. We’re currently on our way to a solar maximum next year, meaning that the ionosphere is particularly good at bending radio waves. The radio frequency used also influences how high a radio wave can travel before it is refracted by the ionosphere. Some frequencies pass straight through into space whereas others are easily absorbed; the trick is to choose the right frequency for season and time of day so that the wave is bent back to earth.
Digital modes of transmission are increasingly popular. They sound like robotic beeps and are produced and decoded by free software produced in the amateur radio community. These modes are used to explore different ways to encode information so that even if parts of a message are lost in the noise en route, there are ways to digitally reconstruct it at the receiving side.
One digital approach is called WSPR (pronounced “whisper”), which stands for “Weak Signal Propagation Reporter”. This is specifically designed for low power transmission, and transmitters and receivers are automatically controlled by computer. One challenge is how far you can get a signal with an absurdly low power transmitter and amateur antenna. All signal reports around the world are automatically logged on a web-based database, so it’s possible to analyse how far signals have travelled and what factors affect this.
The R bit
I had a play with WSPR data last year to see if I could find a way to visualise the impact of time of day and radio frequency on how far a signal travels. See my record of attempts, including many that turned out to be useless.
My favourite is below, showing reports of signals sent from an area around London. The colour of data points indicates how far a signal has travelled. The x-axis is date and time and y-axis is signal strength. One of the striking effects is how transmission over short distances is unaffected by time of day since the waves travel by line of sight (look for the horizontal lines). For longer distances, different parts of the world fade in and fade out as the earth spins and the sun’s effect on the ionosphere waxes and wanes. The three different graphs show results for three different frequencies.
Venture over here for my approach to solving/cheating at Wordle, using the tidyverse in R.
Updated: the Russian version is here.
For ease of copy-paste:
Venture over to this repo for a collection of examples showing how to query the WsprDaemon Timescale Database using R and SQL and visualise the results using the tidyverse.
Example pics below:
R has a handy package for accessing Google Scholar data, scholar.
I had a play around – here’s the R code.
And a picture of cumulative citations:
Data has recently been published on the percentage of civil servants who are LGB or other, broken down by profession and department, as of 31 March 2020. Here are pictures, created using R (code here).
This year as part of Covid-enforced “digital transformation” I ended up writing longer tutorial notes than usual so that students could work at their own pace. The module I teach assumes that students have already taken an intro stats course using software other than R, covering up to regression, but that they are likely to have forgotten how to do the latter.
The core texts I use are Fox and Weisberg (2019), An R Companion to Applied Regression (Third Edition) and Healy (2019) Data Visualization: A Practical Introduction – both excellent.
These notes add explanations where students were likely to be struggling and exercises with solutions.
I’ll be putting them all online over here.
Fitting regression models, GLMs, etc.
Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). London: SAGE Publications Ltd.
See also online material, including free appendices and R code.
Data transformation and visualisation
Healy, K. (2019). Data Visualization: A Practical Introduction. Princeton University Press. (Free online version.)
Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. Sebastopol, CA: O’Reilly. (Free online version.)
Chang, W. (2020). R Graphics Cookbook (2nd ed.). Sebastopol, CA: O’Reilly. (Free online version.)
Lüdecke D (2018). ggeffects: Tidy Data Frames of Marginal Effects from Regression Models. Journal of Open Source Software, 3(26), 772. doi: 10.21105/joss.00772
This is very handy for getting predictions from models, focusing on the effect of predictors of interest whilst holding covariates at some fixed values like a mean or (for factors) mode.
See also the package website for illustrative examples.
Gelman, A. (2011). Tables as graphs: The Ramanujan principle. Significance, 8, 183.
Missing data imputation
Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition.. Chapman & Hall/CRC. Boca Raton, FL. (Free online version.)
See also the package website.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European Journal of Epidemiology, 31, 337–350.
“… correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists.”
Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science, 1, 140216. doi: 10.1098/rsos.140216
This generated lots of debate – I like how it attempts to use Bayes rule to turn p-values into something useful and the explanation in terms of diagnostic test properties. See also this on PPV and NPV.
Rafi, Z., & Greenland, S. (2020). Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20(1), 244. doi: 10.1186/s12874-020-01105-9
Interesting proposal to use s-values, calculated from p-values as −log₂(p). It’s a simple transformation: p is probability of getting all heads from −log₂(p) fair coin tosses. For example if p = 0.5 then s = 1; toss a coin once then the probability of head is 0.5. If p = 0.03125 then s = 5; toss a coin 5 times then the probability of all heads is 0.03125. But the s-value is supposedly easier to think about. I’m not sure if it really is, but I like the idea!
The UK’s Intelligence and Security Committee’s report into Russian activity in the UK was finally released a few days ago.
Here’s my exploration of redactions in the report, using R. Some highlights below.
One of the best predictors of whether a sentence will have a redaction is what organisations are mentioned in the sentence:
According to a sentiment analysis, the angriest sentences are on page 11 (PDF page 18):
Here’s a word cloud of sentences with a redaction, against the organisations(s) mentioned…
Choropleth maps use shading to represent quantities and are common in the press. I gave them a go in R, using the rvest package to scrape the results of the 2020 Russian constitutional referendum and the raster package piped through tidyverse tools to map them.
The code is on my GitHub repo.
Some of the fun I encountered along the way (details in the repo):
- The CRAN version of raster didn’t work, but the latest on GitHub was fine and it’s easy to install this directly from R.
- The Russian regions names in the raster map of Russia didn’t always match those on the Wikipedia article. I tried fuzzy matching by edit distance, which did a pretty good job but I still had to match some manually (e.g., “Sakha” and “Yakutia” are different names for the same place and a long edit distance from each other). I suspect it would have been easier just to sort both lists alphabetically and match manually from the start!
- This warning is a worry: “support for gpclib will be withdrawn from maptools at the next major release” – I hope something comes along to replace it.
- Lots of the examples of maps online are for the US and one basic problem is what projection to use. The mapproj package is fab for this.