Why is evaluation so white?

Useful resources to explore (work in progress):

‘I’m sometimes asked, “Why are there so few people of color in evaluation?” I flip the question: “Why is evaluation so white?” And answer: “Because our labor is actively erased.”’

Further resources, e.g.,

It’s striking how issues discussed in the 70s are still relevant now, e.g., concerning the impossibility of using IQ tests (and covert proxies thereof) to improve outcomes rather than simply to blame a child and excuse education systems for poor outcomes.

‘At its core, evaluation is value laden and embued with and responsive to a larger social political order, and evaluators are situated within contexts of study and within interactions of the setting that shape the evaluation study’s logic, structure, and practices (Hopson, Greene, Bledsoe, Villegas, & Brown, 2007). The question of “Who evaluates and why?” highlights the contexts, agendas, and intentions of the evaluation and the evaluator and so raises questions about practices—sometimes commonly accepted ones—and the structures of power and the uses of those power structures for or against hegemony.’ [p. 418]

This cites The oral history of evaluation, part 3: The professional evolution of Michael Scriven, which provides a clue – hiding in plain sight – to why the official history of evaluation is as it is:

“Now, there was the May 12th group, which was ahead of the game. The May 12th group was so called after the first date on which they met [1968, says Glass –AF]—but the general feeling was if we call it the May 12th group, that will have absolutely zero cachet, and so no one will be able to argue that they were entitled to join the May 12th group because it’s called something generic. And so the idea was you got invited to the May 12th group, and if you weren’t invited, then you weren’t in, and so there was no official stuff. So, they would meet in somebody’s house once a year. […] But some of us felt that we needed to do something that was slightly more official, and we’d got to start making this more than the intellectual elite group.”

The May 13 Group formed on that date in 2020 to challenge this.

‘Evaluation is political. At its simplest, evaluation is the systematic “process of determining the merit, worth and value of things” (Scriven, 1991, p. 1). Who gets to decide, the questions, the process, and the criteria for determining merit, worth, value, or significance—all of these matter.’ [p. 534]

‘As professionals and practitioners, we can no longer sit on the sidelines wearing the cape of objectivity and neutrality, a cape that shields beliefs and assumptions about knowledge, rigor, and evidence and which elevate a Western White worldview. [..] Everyday narratives that continue to marginalize, minimize, and disrespect people of color and those with less privilege could be replaced with ones that do not demonize and place blame on the individual. They could instead lift up the historical, contextual, and powerful dynamics that create and sustain oppression and shed light on the strategies and solutions which can shift the “rules of the game,” so that equity is achievable.’ [p. 538]

“Advisors of evaluation graduate students of colour should create spaces for students to express their feelings and, if they choose, be vulnerable and open about the stressors of simply being a person of colour in a world with white supremacy woven into its very fabric. “

“Whenever a prospective student emails me, I put them in touch with current students in my department. I find this is especially important for international students; I am unable to speak to how the culture in North Carolina and in our department differs from their home culture. I also aim to introduce students to faculty across campus who have similar cultures and backgrounds”

“Advisors of evaluation graduate students of colour can research or have conversations about the norms and dates associated with the holidays and events that their students observe. […] While I can’t know all the traditions observed by my students, I encourage them to inform me about their cultural and religious traditions as appropriate.”

“… advisors and mentors should also practice giving microvalidations […], small acts and words that validate who graduate students believe they can be. My post-doctoral advisor always praised me in public and raised concerns in private. I regularly let my advisees know that I am proud of them, see their potential, and believe in them. I learn every student’s name and work to pronounce their names correctly. And I make a concerted effort to refer to my advisees as my colleagues.”

Here’s a summary table of examples:

Tired Narrative Potential New Narrative The Difference
We should have more people of color on staff. The evaluation field needs to connect to and invite in talent coming from a broader range of lived experiences and expertise types to be relevant and useful. A singular focus on ethnic diversity promotes tokenism and an “unstated standard of whiteness.” New narratives should explicitly acknowledge how diversity of experience and expertise strengthens rigor and contributes to better evaluation.
Diverse applicants don’t meet our standard qualifications. Implicit bias and white- dominant norms constrain our ability to recognize valuable expertise and support talented people. Perceptions of knowledge, experience, and credibility are culturally based and tied to the establishment of cultural hegemony. It is important to recognize-and actively mitigate-how this plays out in workplaces and the evaluation profession overall.
If we supply individuals with knowledge and networks, they will be successful in our field. We need to remove structures and norms that prevent the flourishing of valuable expertise and talented people across our field. Individuals cannot be successful if the ecosystem of organizations in which they practice evaluation are inequitable and non-inclusive. We can do a better job breaking down white-dominant norms and creating the conditions that allow talented professionals to thrive.
We need experts to guide us in diversifying evaluation talent. We need to do a better job listening to people whose insights and talent have been historically marginalized within evaluation and philanthropy. Narrowly defined ideas about expertise and who is considered an expert aren’t getting us where we need to go. We must expand our conceptions of expertise and do a much better job of listening to and learning from people whose voices have been left out of conversations about talent in our field.
If individual organizations improve their hiring and management practices, the evaluation field will make progress. Making progress will only happen if we prioritize equity and inclusion across the evaluation ecosystem and work collectively on solutions. Focusing only on the organizational level prevents us from seeing and addressing larger narratives and systems at play. We need to prioritize this as a field and invest in collective solutions that help shift our outdated narratives.

“… evaluators of color noted that the burden of addressing DEI and calling out racism is often placed on them as they are assumed to be experts…”

“… evaluators of color cited examples of being tapped to join an evaluation project when philanthropic clients asked for demographics of staff in their RFPs, yet not feeling meaningfully included in the subsequent work…”

“When organizations have difficulty retaining staff of color, they often perceive the person of color as the problem, not the ecosystem that reinforces inequities. Persistent challenges with retention should signal a need for the organization to self-reflect on its culture and make changes…”

“I have been in too many meetings where a racialised person has felt they’ve had to speak about their lived experience, at great personal cost […]. Sometimes, the individual’s point is directly challenged or downplayed. In a head-spinning moment of gaslighting, they are left isolated and disbelieved, despite (or, perhaps, because) they are the racialised person specifically invited to the meeting to explain why the racist thing is racist.”

The misuses of “biological sex”

‘It is long overdue that we understand sex not as an essential property of individuals but as a set of biological traits and social factors that become important only in specific contexts, such as medicine, and even then complexity persists. If we are concerned with certain cancers, for example, knowing whether someone has a prostate or ovaries is what’s important, not their “sex” per se. If reproduction is the interest, what matters is whether one produces sperm or eggs, whether one has a uterus, a vaginal opening, and so on.’

Karkazis, K. (2019, p. 1899). The misuses of “biological sex.” The Lancet, 394, 1898–1899.

Wisdom(?) from the 1918 Dadaist manifesto by Tristan Tzara

The 1st and 2nd DADA Art Manifestos are online over there.

  • “Psychoanalysis is a dangerous disease, it deadens man’s anti-real inclinations and systematises the bourgeoisie.”
  • “Dialectics is an amusing machine that leads us (in banal fashion) to the opinions which we would have held in any case.”
  • “People observe, they look at things from one or several points of view, they choose them from amongst the millions that exist. Experience too is the result of chance and of individual abilities.”
  • “Logic is a complication. Logic is always false. It draws the superficial threads of concepts and words towards illusory conclusions and centres.”
  • “What we need are strong straightforward, precise works which will be forever misunderstood.”

Grand Hotel Abyss

“A considerable part of the leading German intelligentsia, including Adorno, have taken up residence in the ‘Grand Hotel Abyss’ which I described in connection with my critique of Schopenhauer as ‘a beautiful hotel, equipped with every comfort, on the edge of an abyss, of nothingness, of absurdity. And the daily contemplation of the abyss between excellent meals or artistic entertainments, can only heighten the enjoyment of the subtle comforts offered.’ (Die Zerstörung der Vernunft, Neuwied 1962, p. 219).”

—György Lukács (1962), Preface to The Theory of the Novel

 

“Schopenhauer’s philosophy rejects life in every form and confronts it with nothingness as a philosophical perspective. [… N]othingness as the pessimist outlook […] is quite unable, according to Schopenhauer’s ethics […], to prevent or even merely to discourage the individual from leading an enjoyable contemplative life. On the contrary: the abyss of nothingness, the gloomy background of the futility of existence, only lends this enjoyment an extra piquancy. Further heightening it is the fact that the strongly accented aristocratism of Schopenhauer’s philosophy lifts its adherents (in imagination) way above the wretched mob that is short-sighted enough to fight and to suffer for a betterment of social conditions. So Schopenhauer’s system, well laid out and architecturally ingenious in form, rises up like a modern luxury hotel on the brink of the abyss, nothingness and futility. And the daily sight of the abyss, between the leisurely enjoyment of meals or works of art, can only enhance one’s pleasure in this elegant comfort.

“This, then, fulfils the task of Schopenhauer’s irrationalism: the task of preventing an otherwise dissatisfied sector of the intelligentsia from concretely turning its discontent with the ‘established order’, i.e., the existing social order, against the capitalist system in force at any given time.”

—György Lukács (1962/1981, pp. 242-243). The destruction of reason. London: Merlin Press Ltd.

Two incontrovertible facts about RCTs

“… the following are two incontrovertible facts about a randomized clinical trial:

1. over all randomizations the groups are balanced;

2. for a particular randomization they are unbalanced.

Now, no [statistically] ‘significant imbalance’ can cause 1 to be untrue and no lack of a significant balance can make 2 untrue. Therefore the only reason to employ such a test must be to examine the process of randomization itself. Thus a significant result should lead to the decision that the treatment groups have not been randomized…”

– Senn (1994,  p. 1716)

Senn, S. (1994). Testing for baseline balance in clinical trials. Statistics in Medicine, 13, 1715–1726.

Parametric versus non-parametric statistics

There is no such thing as parametric or non-parametric data. There are parametric and non-parametric statistical models.

“The term nonparametric may have some historical significance and meaning for theoretical statisticians, but it only serves to confuse applied statisticians.”

– Noether, G. E. (1984, p. 177)

“. . . the distribution functions of the various stochastic variables which enter into their problems are assumed to be of known functional form, and the theories of estimation and of testing hypotheses are theories of estimation of and of testing hypotheses about, one or more parameters, finite in number, the knowledge of which would completely determine the various distribution functions involved. We shall refer to this situation for brevity as the parametric case, and denote the opposite situation, where the functional forms of the distributions are unknown, as the non-parametric case.”

– Wolfowitz, J. (1942, p. 264)

References

Noether, G. E. (1984). Nonparametrics: The early years—impressions and recollections. American Statistician, 38(3), 173–178.

Wolfowitz, J. (1942). Additive Partition Functions and a Class of Statistical Hypotheses. The Annals of Mathematical Statistics, 13(3), 247–279.

“The tendency of empiricism, unchecked, is always anti-realist…”

“The tendency of empiricism, unchecked, is always anti-realist; it has a strong tendency to degenerate into some form of verificationism: to treat the question of what there is (and even the question of what we can – intelligibly – talk about) as the same question as the question of what we can find out, or know for certain; to reduce questions of metaphysics and ontology to questions of epistemology.”
—Strawson, G. (1987, p. 267)

Strawson, G. (1987). Realism and causation. The Philosophical Quarterly, 37, 253–277.

Theories explain phenomena, not data (Bogen and Woodward, 1988)

“The positivist picture of the structure of scientific theories is now widely rejected. But the underlying idea that scientific theories are primarily designed to predict and explain claims about what we observe remains enormously influential, even among the sharpest critics of positivism.” (p. 304)

“Phenomena are detected through the use of data, but in most cases are not observable in any interesting sense of that term. Examples of data include bubble chamber photographs, patterns of discharge in electronic particle detectors and records of reaction times and error rates in various psychological experiments. Examples of phenomena, for which the above data might provide evidence, include weak neutral currents, the decay of the proton, and chunking and recency effects in human memory.” (p. 306)

“Our general thesis, then, is that we need to distinguish what theories explain (phenomena or facts about phenomena) from what is uncontroversially observable (data).” (p. 314)

Bogen, J., & Woodward, J. (1988). Saving the phenomena. The Philosophical Review, XCVII(3), 303–352.

“A mechanism is one of the processes in a concrete system that makes it what it is”

What a lovely paper! Here are some excerpts:

‘A mechanism is one of the processes in a concrete system that makes it what it is—for example, metabolism in cells, interneuronal connections in brains, work in factories and offices, research in laboratories, and litigation in courts of law. Because mechanisms are largely or totally imperceptible, they must be conjectured. Once hypothesized they help explain, because a deep scientific explanation is an answer to a question of the form, “How does it work, that is, what makes it tick—what are its mechanisms?”’ (p. 182; abstract)

‘Consider the well-known law-statement, “Taking ‘Ecstasy’ causes euphoria,” which makes no reference to any mechanisms. This statement can be analyzed as the conjunction of the following two well-corroborated mechanistic hypotheses: “Taking ‘Ecstasy’ causes serotonin excess,” and “Serotonin excess causes euphoria.” These two together explain the initial statement. (Why serotonin causes euphoria is of course a separate question that cries for a different mechanism.)’ (p. 198)

‘How do we go about conjecturing mechanisms? The same way as in framing any other hypotheses: with imagination both stimulated and constrained by data, well-weathered hypotheses, and mathematical concepts such as those of number, function, and equation. […] There is no method, let alone a logic, for conjecturing mechanisms. […] One reason is that, typically, mechanisms are unobservable, and therefore their description is bound to contain concepts that do not occur in empirical data.’ (p. 200)

‘Even the operations of a corner store are only partly overt. For instance, the grocer does not know, and does not ordinarily care to find out, why a customer buys breakfast cereal of one kind rather than another. However, if he cares he can make inquiries or guesses—for instance, that children are likely to be sold on packaging. That is, the grocer may make up what is called a “theory of mind,” a hypothesis concerning the mental processes that end up at the cash register.’ (p. 201)

Bunge, M. (2004). How Does It Work?: The Search for Explanatory Mechanisms. Philosophy of the Social Sciences, 34(2), 182–210.