Harvey at PwC

LLM-driven text analysis is becoming a norm, allowing people to process huge volumes of text they wouldn’t otherwise have the capacity to do. Although outputs can be checked, the large volume of inputs processed means there are fundamental limits on how comprehensively analyses can be checked.

PwC announced yesterday that it is trialling the use of Harvey, built on Chat GPT, to “help generate insights and recommendations based on large volumes of data, delivering richer information that will enable PwC professionals to identify solutions faster.”

They say that “All outputs will be overseen and reviewed by PwC professionals.” But what about how the data was processed in the first place…?

“Randomista mania”, by Thomas Aston

Thomas Aston provides a helpful summary of RCT critiques, particularly in international evaluations.

Waddington, Villar, and Valentine (2022), cited therein, provide a handy review of comparisons between RCT and quasi-experimental estimates of programme effect.

Aston also cites examples of unethical RCTs. One vivid example is an RCT in Nairobi with an arm that involved threatening to disconnect water and sanitation services if landlords didn’t settle debts.

Migration and the Value of Social Networks

I haven’t read this working paper yet – just struck by this dataset:

“We leverage a rich new source of ‘digital trace’ data to provide a detailed empirical perspective on how social networks influence the decision to migrate. These data capture the entire universe of mobile phone activity in Rwanda over a five-year period. Each of roughly one million individuals is uniquely identified throughout the dataset, and every time they make or receive a phone call, we observe their approximate location, as well as the identity of the person they are talking to. From these data, we can reconstruct each subscriber’s 5-year migration trajectory, as well as a detailed picture of their social network before and after migration

An infinite trolley problem

Remember the trolley problem? There are now thousands of variants of this ethical conundrum. There’s a curious infinite variant, where there are as many people as integers on the top track and as many people as real numbers on the bottom:

My first thought on seeing this was, aha, finally a chance to apply Cantor’s diagonal argument to something useful: to solve the problem of whether or not to pull the lever.

Let’s start with an easier version. Suppose the segment of the lower rail where people are tied is bounded in length to a whisker short of 100 meters. Each person is tied at a position on that segment somewhere greater than or equal to 0 metres and less than 100 metres from the beginning of the segment. Another way to write that range of positions is [0,100).

All the infinitely many positions in [0, 100) have to be used. So there’s somebody at exactly 0 metres, someone at 43.54377239879432 metres, someone at 3.5 metres, and so on for all the real number positions in [0, 100).

Number the people 0, 1, 2, 3, … starting at the end of the track from which the train is approaching. Now let’s work along that infinite philosophically imagined mound of people and construct a position on the track as follows from where they are lying along the rail (in metres). You have a very precise measuring tape.

From person 0, take the number to the left of the decimal point on their measurement and compute 99 minus that number. From person 1, take 9 minus the 1st number to the right of the decimal point. From person 2 take 9 minus the 2nd digit, and so on. So from person i, take 9 minus the ith decimal digit (pad out the digits with zeros where necessary).

Here are some examples of positions:

Person 0: 0.1455487...
Person 1: 0.5534524...
Person 2: 1.2364765...
Person 3: 2.4500000...
Person 4: 3.6273692...
...

From these, we would calculate the following:

  • \(99 – 0 = 99\)
  • \(9 – 5 = 4\)
  • \(9 – 3 = 6\)
  • \(9 – 0 = 9\)
  • \(9 – 3 = 6\)

So now we have a position on the track, 99.4696… m along.

Think about what we have done here. We have worked along all the people and calculated a new position in [0,100) where nobody is tied to the track. This is because the new position differs from each existing position on at least one digit, by construction. But we are supposed to have one person at all infinitely many real number positions on the track. Here we have found a gap, a real number that isn’t being used.

We could add someone else at 99.4696… m. But if we did that, we could just follow the procedure above again to find another gap.

It follows that our original assumption is false: it is not possible to stack infinitely many people along infinitely many real-numbered positions along an almost 100 metre long segment of track. Assuming we could has led to a contradiction.

We can’t do it for [0,100). That means there’s no hope for doing it for all real numbers since [0,100) is a subset of the reals.

So it is not possible to tie heaps of people to a track so that there are as many people as there are real numbers. People are rather discrete, countable, beings, and I could have stopped at “Number the people 0, 1, 2, 3, …”.

It turns out that the two tracks must have the same countably infinite number of people, so it doesn’t matter whether you pull the lever.