It is all too easy to say data holds a monopoly over the truth. Who’s to argue against numbers and statistics, statements of fact that are not opinionated but wholly objective?
We see this monopoly rear its head in our everyday rhetoric: “It’s just a fact,” an impassioned — or disconcertingly smug — student may declare in virtual seminar. “That’s just what the numbers say.”
According to the numbers, the total undergraduate enrollment in degree-granting post-secondary institutions was 16.6 million students in 2018, marking a welcome 25% increase from 2000. Standing alone, this statistic projects a forward-looking trajectory for education equality.
What happens when another set of numbers disrupts the former’s narrative? Student loans have increased more than 26% in the past decade. Despite increased high school graduation, undergraduates must brave unprecedented price tags for a few years in higher education. While the average undergraduate believes they will be able to pay back the loans in six years, it is actually more like twenty.
Then there are occasions when the numbers themselves are false, but the very whispering of an integer — in the mind of the listener — offers some semblance of legitimacy.
On multiple occasions, President Trump stated that “Coronavirus numbers [were] looking much better, going down almost everywhere,” and that cases were going “way down.” As fact-checked by The Atlantic, when Trump orated these numerical assertions back in May, American coronavirus cases were swelling upwards or at a standstill.
In a status quo riddled with political divisiveness, hot-headed debates, and a dizzying number of presidential tweets, it can seem as though numbers are our only true friend, enduring bastions of a long-forgotten rationality in the American socio-political landscape.
How can data possibly be subjective?
Rather than asking the question of what numbers say, let us turn to a more complicated inquiry. Who are the numbers favoring, and who do the numbers choose to leave out?
Before data is collected, analysts must address the business of collecting, which is to say they must frame certain questions that will guide their research and fulfill what they seek to know or prove. For the sake of example, if I wanted to ascertain how a certain Harvard policy — say all students would be required to learn a programming language — would impact the lives of undergraduates, I would begin asking my friends, peers, and even strangers how they felt about the policy.
Let us understand, however, how much power lies in the hand of the inquirer, the data collector. The very architecture of my curiosity, the very means of data collections — whether it be the mode of surveying or the architecture of the questions — will regulate the responses of the surveyed, and, as a result, will critically inform my understanding of the data collected, which in turn critically informs the opinions of the beholder.
Say I asked questions predominantly relating to future employability, and students’ responses indicated the mandatory programming policy was favorable. This would be recorded, and I would proceed to publish a study that presented data largely supporting this claim.
The questions themselves may have been fair, and I could have collected data in the most representative manner possible. What I don’t regard, however, is a slew of other inquiries that are just as important, areas of focus including mental health and overall academic performance.
The consequence? I’ve just published a study that readers implicitly understand to be a holistic, objective representation of a policy’s effects. It is all too clear that the study is not that at all. Rather, it is a series of numbers and survey responses that have been designed by the hand of subjectivity. Perhaps as a disgruntled Harvardian facing the encroaching recruitment season, I fashioned my survey in a way that prioritized employability without even realizing it. Perhaps as a privileged undergraduate student, I did not stop to think about the financial implications of the policy I had so rigorously supported in my research.
This hypothetical study is a microcosm of the United States and its love affair with data-driven policy. A 2016 ProPublica publication titled “Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks,” details the story of 18-year-old African American Brisha Borden and 41-year-old Vernon Prater.
When rushing to pick her god-sister from school, Borden and a friend took a bike and scooter from a street in Fort Lauderdale. A neighbor called the police, and the two girls were arrested with burglary and petty theft. Summed together, the bike and scooter cost $80.
Prater had been arrested after shoplifting from a local Home Depot store. The sum of what was stolen totalled to $86.35.
The cases were remarkably similar, with a monetary difference of less than $10. “Yet something odd happened when Borden and Prater were booked into jail,” the ProPublica study explains: “A computer program spat out a score predicting the likelihood of each committing a future crime.”
“Borden – who is black – was rated a high risk. Prater – who is white – was rated a low risk.”
Upon eliminating the constants — the nature of the crime and the monetary sum of what was stolen — a probable independent variable emerges. Borden’s race, factored into the cryptic mechanics of the algorithm, helped produce a high-risk rating.
Aside from juvenile misdemeanors, Borden had not been arrested. Prater, on the other hand, had been convicted of armed robbery and had served five years in prison.
It would be deeply remiss to pass over the clear disparities between the hypothetical Harvard policy and the very real implications of the recidivism algorithm. Whereas the undergraduate survey failed to be representative of students’ holistic needs, the ProPublica story illuminates an algorithmic tendency to perpetuate Black incarceration. These statistical mechanisms are used to inform and executive American policy every day.
Looking forward, how best can policymakers wield statistics as a champion — not an obscurer — of the truth? A welcome first step would be to recognize data collection’s relationship to personal biases, and the fact that no one study will be fully representative of all people who should be considered. Policy necessitates an ability to recognize nuance and value human stories on top of referencing numerical evaluations. Ethical, representative data collection necessitates an acknowledgment of historical and present bias; more than merely acknowledge, policymakers must have the resolve to act, to discontinue the institutionalized, racist calculus of culpability in our criminal justice system and beyond.
Shifting from the White House to the classroom, teachers should promote healthy skepticism of data amongst students, particularly given the rapidly technologizing landscape. Rather than encourage an immediate correlation between numbers and certainty, educators should highlight the biases that arise from statistics and their real-world consequences.
Data has long been remembered as the bastion of objectivity in a particularly political era. Staring closer at the numbers, however, it is clear that there is arguably nothing more subjective than the cold, hard facts.
Image Credit: “Immersed in numbers” by Chris Khamken is licensed under CC BY-NC 2.0