A Quick Note
It’s been a bit since I last posted content on this site, but not to worry I am still very dedicated to producing more! My next project is proving to be a bit more intensive than my last few, so stay tuned as I dive into the MIMIC iii database for analysis and tutorials.
So what have I been doing in my spare time? Recently I got around to picking up Cathy O’Neil’s acclaimed data science book Weapons of Math Destruction which, as the cover explains, explores “How Big Data Increases Inequality and Threatens Democracy.” As a budding member of the data science community, I’ve heard a lot of speakers, podcasts, and critics cite O’Neil’s work and wanted to take a dive myself. One thing I’ve learned is that anyone involved in data or working peripherally on the fringes of it owes it to themselves and everyone around them to understand the impacts of data-driven decision making and to become fluent in data ethics. There is a weighted responsibility associated with drawing conclusions from the algorithms we’ve written into the framework of our society, and we’re only just beginning to see effects of them.
The TL;DR: At face value, I would highly recommend this book to anyone who works with data on all levels of experience (analysts, engineers, scientists, statisticians, etc.) as well as to teams where data makes up even a portion of their composition. It is not code or math heavy and it will absolutely benefit different backgrounds and levels of exposure.
What this Book Nails
O’Neil gives a number of examples of what can be called a “Weapon of Math Destruction” (or as she puts it, a “WMD”), and in each of these vignettes she explores the avenues that led to them and their effects on those at the receiving end of their product. According to the book an algorithm can be considered a WMD if it encompasses the following 3 traits:
It is opaque, i.e. a black box where data goes in and results come out with little to know explanation of how
It is unregulated and difficult to contest
It is scalable, i.e. like a virus it can easily affect greater and greater populations
What O’Neil does expertly is show, with brutal honesty, that a mixture of hindsight, greed, and misconduct lead to the continued disenfranchisement, poverty, and exploitation of vulnerable populations. Of course, not all algorithms are created with nefarious intent and not all are created equal. Some are genuinely meant to service the public good, but the consequences aren’t felt until after they have already inflicted damage.
One of the examples she touches on that comes first to mind is the use of PredPol, a policing algorithm which is shrouded behind a self-proclaimed “decade of detailed academic research.” PredPol is meant to assist officers in highlighting and targeting areas more likely to experience crime. On the surface, this is a great implementation of machine learning, allowing police to focus their efforts in the places that need it most. What it does not do is address what becomes a mainstay in much of O’Neil’s book: self-fulfilling prophecies. By targeting areas of high crime, you wind up ramping up the incarceration rate of minorities and individuals of low socioeconomic status. This leads to jarring statistics such as the Pew Research Center’s report that “In 2017, blacks represented 12% of the U.S. adult population but 33% of the sentenced prison population. Whites accounted for 64% of adults but 30% of prisoners. And while Hispanics represented 16% of the adult population, they accounted for 23% of inmates.” So from here we see that minorities make up an alarmingly disproportionate amount of our country’s incarceration population. This is a much, much more complex problem than simply saying “the numbers show more minorities commit more crime” when we are in fact turning our attention in greater excess to petty crimes and not turning that attention to more “white collar” crimes like tax evasion and fraud. By placing the goal of stopping crime into a specified vacuum of generalized locale, we cause the very statistics we seek to eliminate. One of the most interesting things to take away here requires stepping back and realizing that PredPol, and many of the algorithms in the book really do produce the results they promise. It’s the design and implementation of the algorithm that is fraught with incongruous assumptions, motives, and oversight. Other examples that WMD touches on include:
2016 polling algorithms that influenced candidates to campaign incorrectly in states, potentially costing valuable votes
For-profit colleges preying on low-credit individuals
Loan companies targeting poverty stricken individuals with payday loans having high interest
And many others. Another that really caught my attention was the use of health data in the workplace. Many employers offer incentives to employees in exchange for their health data, often monetary ones like cutting health insurance costs. And while this comes voluntarily, few are in a position to pass it up. At the same time, this arms companies with the ability to profile, analyze, and target internal workers using the statistics they’re given. Pay specific attention to the use of “body mass index” (BMI), a pretty bogus approximation of a person’s health in relation to their weight developed by an astronomer in the 1830’s. This equation (a person’s weight in kilograms divided by the square of their height in centimeters) was applied to the “average man,” not the average woman or the average African American or the average of any individual that isn’t specifically a male (and likely a white one). How can we possibly expect to equitably apply such faulty logic sweepingly across the board to all races and genders? It is through examples like these where WMD really hammers home a healthy skepticism of algorithmic engineering and implementation. By the end of the journey you may even put down the book wondering if there isn’t a stream of data being misused or mishandled behind an opaque door.
Where It Could Have Been Better
Don’t get me wrong, I’m hesitant to even include a section of negatives and I wouldn’t even really portray this as a negative (I only really have one point to make).
Through reading this I had wished that O’Neil had touched a bit more on the raw concepts of data integrity, mishandling, and human error. While I understand that the focus of this book was on social impacts and ripples, I would have personally found it beneficial to see some attestations to the effects of misreporting context of more scientific subject areas.
For example, in my personal line of work in medicine, hospital data can be messy in everything ranging from electronic medical records, to medical device data streams, to natural language processing of notes. I fear for any proclaimed machine learning algorithm that touts the ability to accurately predict onsets of disease, expert understanding of a doctor’s intentions, or profiling of patients. Data science in health care is entering a golden age of innovation which means it is also prone to being fraught with error. What if we implemented an incorrect dosing of a medication based on the promise of AI? What if it affected patients negatively? How would you even approach this from a litigation standpoint? To me these scenarios are additional considerations of responsible data science and algorithmic execution. They do not take advantage of or prey on vulnerable populations like the examples in WMD do, but I think they are importantly considerable nonetheless.
What I most appreciate about Weapons of Math Destruction is its call to action, imploring the reader to think skeptically, to tread data carefully, and to not only reach equitable, ethical, logical conclusions but also to accept that logic won’t always be as impartial or unbiased as we’ve been led to believe. O’Neil expertly delivers complicated, technical ideas in a way that non-technical individuals can understand. I wouldn’t consider this daunting for anyone who at least has an interest in data and its use in the real world. Please give it a read and let me know what you think below!