202009101158

Framework for Fairness

tags: [ proj:fairness ]

The general framework (population level) proceeds as follows:

Goal: write out examples as part of our framework. Helps us to see how our framework differs from others, or where we might need to change the current version.

Mouzannar et al. 2018

Let’s consider (Mouzannar, Ohannessian, and Srebro 2018Mouzannar, Hussein, Mesrob I Ohannessian, and Nathan Srebro. 2018. “From Fair Decision Making to Social Equality.” arXiv.org, December. http://arxiv.org/abs/1812.02952v1.), as they probably get the closest to what our framework is trying to achieve, though with very important differences.

  • \(X\) is given by a score \(\theta\) (e.g. [GPA, SAT]) and \(G\in\left\{ A,B \right\}\).3 With \(X,G\) lying in the same sample space.
    • The first thing (that differs from our framework) is that they assume that all the information is encapsulated in the covariates, and there is no ambiguity. Thus, they define \(\widetilde{F}: \theta\to V = \left\{ 0,1 \right\}\) to be a deterministic evaluation of qualification.
      • example: if \(\theta\) is [GPA, SAT], then it says that this is sufficient to determine the eventuality of a student’s success, and that it is deterministic!
    • it’s subtly different from the way we think about this. They don’t actually care about \(X\), and in fact, they never really deal with it, since everything below acts on the population through \(\widetilde{F}(X)\). What this means is that they can basically ignore \(X\).
    • instead of looking at the map \(\widetilde{F}(X)\), let us instead just focus on the distribution of success per group: \(\pi(V \given G) \in [0,1]\) collapses everything and just considers the success probability.
      • it’s really just \(P(\widetilde{F}(X) \given G)\).
  • \(Z\) is a little bit complicated in this case.
    • this next function is the ML “model”, which is given by \(\tau(V,G): \left\{ 0,1 \right\}\times\left\{ A,B \right\} \to [0,1]\), assigning individuals (completely determined by their coefficients, remember) a probability of selection. All of this boils down to \(Z = F(X,G) = \tau(\widetilde{F}(X), G)\).
      • the flexibilty comes in the \(\tau\), choosing what probability to assign individuals.
    • in some sense, basically \(Y=Z\) in this case, at least from a utility perspective, since there’s no ambiguity in terms of the success of an individual.
  • \(\phi\): with their deterministic success function, they are able to basically measure success directly.4 Originally, I thought their measure was actually based on the ML model (or, in their case, the institution that performs the selection). But it turns out that’s not the case.
    • This metric, \(\pi(\,\cdot \given G)\) is essentially what they track. And, as I said earlier, that’s actually the whole population encapsulated/reduced into these 4 probabilities.
  • \(D\): the metric allows us to write down the feedback mechanism
    • \(\pi^{next}(1) = \pi(1) f_1(\beta(0), \beta(1)) + \pi(0) f_0(\beta(0), \beta(1))\), where the \(\beta\)’s are the selection rates (convolving success rates with institution selection).5 Here is where I first got confused, because this doesn’t say anything about how the \(X\) or \(\theta\)’s change. But they don’t care about the nitty-gritty of that. It can change in complicated ways, but all of the changes are expressed through the crucial change, which is the probability of success.

Additional things that they consider:

  • (average) institutional utility, given by \(U(\tau)\).

Question: what are the pros/cons of this formulation of the social-aware machine learning problem?

Table of Examples

Example Z
Hiring P(success)
Recidivism P(reoffend)
Credit P(default)
Ads relevance/stickyness score -> ranking
CS Majors rank in class | P(graduate)

Classical / Standard Machine Learning

Pre-fairness, the standard framework for ML is essentially that of a prediction problem.

Online Learning and Bandit Theory are different frameworks that apply to sequential data that comes as a stream of data.