#adversarial_training

The Perils of Explainability

via Michelle.

This is a little confusing, so let’s go over this a little slowly.

The rise in the usage of black-box models and algorithms across all aspects of human life has raised concerns and red-flags about their credibility and fairness, especially at the junction between the law and society. Concurrently, there has been a push to ensure that these algorithms are both fair and interpretable.1 These goals go hand in hand, as interpretable models are easier to determine if they are fair. Of particular note (and relevance to this discussion) is the push from the legal academics for some standard of explainability when using machine learning in the courtroom.

It’s important to note that this is nothing new. For the longest time, we have had statisticians as expert witnesses bring their wares into the courtroom, and I imagine on some occasions the methods they’ve used have been at the edge of interpretability.

So I think it’s helpful to take a step back and think about why only now are the warning bells being rung?

  1. I think part of it stems from the fact that ML/AI has successful seeped into the mainstream (probably due to FAANG being such an integral part of our lives now). This isn’t particularly interesting.
  2. More substantively is the fact that we, the people, have been sold this as some form of Artificial Intelligence, which comes parcelled with its baggage of ideas. This makes the prospect of “having one’s fate be decided by an AI” seem decidedly more nefarious than “used a regression model in the analysis.”
  3. Perhaps the most reasonable (though not necessarily the most likely) reason is that the scope of problems tackled by our methods has grown, to the point where we’re now at the high table, so to speak, and making important decisions (and oftentimes being the arbiter, or a crucial component).

It seems to me that part of the issue here is the notion of autonomy, and where humans are in the chain of decision-making. I would expect that, provided there’s a human in the loop, then people will not be so worried.

At the outset, this feels like a very reasonable ask: in fact, it feels like a moral imperative for any black-boxes used in deciding the fate of human lives to at least be explainable. Why do I say that? It somehow feels wrong for such decision making to rest on the virtue of the machines.2 Perhaps this is two different biases coming into play

Outline/Plan

  • xAI, what it is, why people are pushing it
  • what are the potential pitfalls?
  • solutions?
    • we should at least propose some solutions
    • make it robust to such effects: and perhaps that’s the overarching theme in all this: we have all these potential feedback loops, how do we go about making sure that our systems are robust to it
  • literature review (difficult)

Quantification (or is it Pontification)

In a similar spirit to #differential_privacy, let’s see if we can systemize or quantify this problem. It helps to think in terms of extremes: suppose everyone has access to the whole model, so they can reproduce all the results, and play around with tweaking the inputs. Two problems immediately come to mind:

  • perhaps there are blindspots (a little similar to the spirit of #adversarial_training) in the model, allowing adversaries to create profiles whose model output would be deemed surprising
    • while this is an important consideration, we leave this to those who work in that area of research to help robustify models.
  • even with a completely robust model, without blindspots, one still has the potential problem of actors curating their covariates in such a way as to game the model.
    • for instance, they might know how to answer the questionnaires as part of the COMPAS dataset to minimize their supposed probability of reoffending
    • incentivized to move cities, have more (or less) children
    • anti-causal, perverse incentives/selection

Let’s try to make this notion of gaming more formal:

  • we can categorize/quantify the degree to which a covariate is game-able, or mutable (mutability, taking a leaf from CS). obviously it’s easier for everyone if we could do this automatically, but I feel it is important to distinguish things like race/gender from, say, how many books you read a night.
  • having weighed each covariate by their mutability, then it depends on the explainable method
    • we could adopt a sort of differential privacy framework, where we obfuscate any queries to the model (i.e. if we don’t want to reveal the actual model, but we don’t mind allowing people to query, which we just obfuscate with noise). in this case, then somehow the weights would factor in to data privacy leakage, though in this case game model leakage.
    • alternatively, when we create the surrogate models, we might want to make the approximation more vague for covariates with higher mutability
      • the problem here is that unless it’s specifically noted that this model is an approximation, then people will still try to game, though they’ll just not succeed (is that a problem, I can’t tell).

Next Steps for Interpolation

Let’s record the kinds of experimental data that we are interested in for [[project-interpolation]].

  1. Replicating what’s been shown in the literature
  1. New Ideas