Skip to content

AI Ethics, Bias, and Fairness

AI & You 6 min read

In Short

Algorithmic bias causes AI systems to produce systematically worse outcomes for certain groups, most often because they learn from historical data that encodes human prejudice. Fairness is a contested concept with multiple mathematical definitions that provably cannot all be satisfied simultaneously, which means every deployed system reflects an implicit ethical choice.

01. What It Is

Algorithmic bias is a systematic, repeatable error in an AI system that produces unfair outcomes, typically disadvantaging a protected class or demographic group. Unlike a random error that might affect anyone, bias consistently skews results in the same direction for the same groups. The bias is almost never intentional; it usually flows from the training data, the framing of the problem, or the choice of optimization target.

Fairness, in the technical sense, is a family of mathematical criteria used to measure whether an AI system treats groups equitably. The problem is that these criteria conflict with each other: satisfying one definition of fairness can make it impossible to satisfy another, a result proven formally by researchers Chouldechova (2017) and Kleinberg et al. (2016).

02. Why It Matters

AI systems now influence bail and sentencing decisions, hiring pipelines, credit scoring, medical diagnosis support, and benefit eligibility. When these systems are biased, the consequences are not statistical abstractions. People are denied jobs, held in pre-trial detention, denied loans, or misdiagnosed at higher rates based on their race, gender, age, or other characteristics. The scale of automated decision-making amplifies any bias present: a biased algorithm can affect millions of people simultaneously in ways a single biased human decision-maker cannot.

The EU AI Act (in force from 2024, major obligations phasing in 2025-2026) classifies many of these use cases as high-risk AI and mandates bias testing, transparency, and human oversight. Regulators in the US and EU are actively litigating and legislating in this space as of 2026.

03. How It Works

Sources of bias

Historical bias in training data:
If the training data reflects past human discrimination, the model learns to replicate it. The COMPAS recidivism algorithm trained on criminal justice data in which Black defendants were arrested and incarcerated at higher rates due to structural inequities in policing, not because of higher actual recidivism. The model reproduced this disparity.

Unrepresentative data:
MIT researcher Joy Buolamwini found that commercial facial recognition systems trained on datasets that were more than 75% male and more than 80% white produced error rates above 34% for darker-skinned women, compared to under 1% for lighter-skinned men. The training sets simply contained far more examples of one group.

Proxy variables:
An algorithm may not receive race as an input and still produce racially discriminatory outcomes. Zip code is a well-documented proxy for race in the US because of residential segregation. Amazon's same-day delivery exclusions initially excluded predominantly Black neighborhoods while optimizing on zip-code-level profitability. Barocas and Selbst define these as "mere stand-ins for protected groups."

Label bias:
If the ground-truth labels used in training were themselves the product of biased human judgment, the model learns to replicate that judgment. A hiring model trained on which candidates were hired in the past learns the hiring manager's preferences, not an objective measure of job fitness.

Feedback loops:
A biased prediction shapes the world it predicts. Predictive policing tools send more police to areas they flag as high-risk, which produces more arrests in those areas, which confirms the algorithm's prediction in the next training cycle. The bias amplifies over time.

Fairness definitions and their conflicts

Demographic parity (equal selection rates across groups): The system selects applicants, defendants, or patients at the same rate regardless of group. This ignores genuine differences in base rates.

Equalized odds (equal true positive and false positive rates across groups): The system is equally accurate for everyone. ProPublica used this framing to criticize COMPAS, showing Black defendants were more likely to be falsely flagged as high risk.

Calibration (equal predictive accuracy at a given score across groups): A risk score of 70 means the same probability of reoffending regardless of race. Northpointe used this framing to defend COMPAS.

The formal impossibility result states that calibration and equalized odds cannot both be satisfied when base rates differ between groups. Any choice between them is a value judgment, not a technical decision.

04. Key Terms

Disparate impact:
A facially neutral policy that disproportionately harms a protected group. Legally cognizable under the US Fair Housing Act, Equal Credit Opportunity Act, and Title VII.

Disparate treatment:
Explicit use of a protected characteristic in a decision.

Algorithmic impact assessment (AIA):
A structured review of an algorithm's potential harms, modeled on environmental impact statements. Advocated by NYU's AI Now Institute.

Responsible AI:
An umbrella term for practices, frameworks, and governance structures aimed at ensuring AI is fair, transparent, accountable, and safe. IBM, Microsoft, Google, and others publish responsible AI principles. The OECD AI Principles (2019, updated) are the first intergovernmental standard.

Explainability / XAI:
The practice of making AI decisions interpretable to affected parties and auditors. Directly related to bias detection, since an unexplainable decision cannot be audited for bias.
See also: Explainability and Interpretable AI (XAI).

05. Examples and Cases

COMPAS (2016):
ProPublica's analysis found the recidivism risk tool assigned higher risk scores to Black defendants who did not reoffend (false positives) at twice the rate as for white defendants. Northpointe contested this by citing a different fairness metric. Both findings can be simultaneously true, illustrating the impossibility result directly.

Amazon hiring algorithm (2014-2018):
Amazon trained a resume-screening model on 10 years of hiring data from a predominantly male engineering workforce. The algorithm learned to penalize the word "women's" and to downgrade graduates of all-women's colleges. Amazon discontinued it in 2018.

Facial recognition and law enforcement:
Georgetown Law found in 2016 that roughly 117 million American adults were in facial recognition databases used by law enforcement, with Black Americans overrepresented due to their overrepresentation in mug-shot records. Combined with higher error rates for darker-skinned faces, this creates compounding bias risk.

Advertising targeting:
Latanya Sweeney at Harvard found that searches for distinctively Black names returned arrest-record ads at significantly higher rates than searches for distinctively white names, even though no race field was passed to the ad system.

06. Common Pitfalls and Misconceptions

"Removing the protected attribute eliminates the bias."
Proxy variables mean this almost never works. A model that does not receive race can still be racially biased through zip code, name patterns, or other correlated features.

"Fairness is a technical problem with a technical solution."
It is partly technical, but the choice of which fairness definition to optimize is a normative, political, and legal judgment. Engineers should not make this choice in isolation.

"A more accurate model is a fairer model."
Higher overall accuracy can coexist with severe disparities in error rates across groups. Accuracy is an aggregate; fairness is about distribution.

"Audit it once at launch."
Bias can emerge or worsen over time as the world changes, as the model continues to learn, or as feedback loops build up. Regular auditing is required.

"Open-source data or widely used benchmarks are neutral."
Many widely used benchmark datasets replicate biases from their sources. ImageNet has documented issues with how it labels people.

Verified against primary sources

Every claim traces to a cited source below.

Key terms

Algorithmic bias
A systematic, repeatable AI error producing unfair outcomes against a protected group.
Disparate impact
A neutral policy that disproportionately harms a protected group.
Disparate treatment
Explicit use of a protected characteristic in a decision.
Responsible AI
Practices and governance for fair, transparent, accountable, and safe AI.
Explainability / XAI
Making AI decisions interpretable to affected parties and auditors.

Tags

#ai-ethics #algorithmic-bias #fairness #facial-recognition #ai-regulation

More in AI & Society