Data imbalance and algorithmic fairness #
The potential for bias in machine learning models is an existential problem. Even the perception of bias has the potential to limit the adoption of this technology and relegate its use to hand-written digit recognition or scouring social media for cat images. In his 2020 NeurIPS address, Charles Isbell wrote:
Successful technological fields have a moment when they become pervasive, important, and noticed. They are deployed into the world and, inevitably, something goes wrong. A badly designed interface leads to an aircraft disaster. A buggy controller delivers a lethal dose of radiation to a cancer patient. The field must then choose to mature and take responsibility for avoiding the harms associated with what it is producing. Machine learning has reached this moment. In this talk, I will argue that the community needs to adopt systematic approaches for creating robust artifacts that contribute to larger systems that impact the real human world. I will share perspectives from multiple researchers in machine learning, theory, computer perception, and education; discuss with them approaches that might help us to develop more robust machine-learning systems; and explore scientifically interesting problems that result from moving beyond narrow machine-learning algorithms to complete machine-learning systems.
Charles is a character; I urge you to watch his talk even if you are not interested in this project. There are many ways to proceed, but below I offer an outline of a project.
-
There is no fixed definition of algorithmic bias. Begin by reading this survey focused on quantitative definitions of fairness.
-
My favorite method of assessing fairness in two-class classification tasks uses the detection error trade-off curve. As we have seen, in such problems there is a balance between false positive versus false negative error rates that can be encoded in the so-called DET curve. They are easy to implement in Python. Most importantly, one measure of disparate treatment of two populations via a classification algorithm is by a direct comparison of the DET curves as applied to each group.
I have two possible project ideas:
Project A. Find a data set, train a classifier, and construct separate DET curves for each of the populations present in the data. Is your algorithm fair, at least from this perspective? Look at Section 3 of this paper for some common data sets studied from this perspective, or examine this site for another large collection.
Project B. An oft-cited cause for algorithmic bias is data imbalance. A group with fewer members will tend to be more poorly served by a algorithm that relies on a large volume of data for its training. In this project, evaluate the impact of data imbalance on fairness. Train a model on a data set consisting of two groups and artificially vary the balance between the two in the training data. At what point does a difference emerge? As above, you can use DET curves to assess fairness, or choose your own favorite metric.