#adversarial_training

The Perils of Explainability

Outline/Plan
Quantification (or is it Pontification)
Backlinks

via Michelle.

This is a little confusing, so let’s go over this a little slowly.

The rise in the usage of black-box models and algorithms across all aspects of human life has raised concerns and red-flags about their credibility and fairness, especially at the junction between the law and society. Concurrently, there has been a push to ensure that these algorithms are both fair and interpretable.11 These goals go hand in hand, as interpretable models are easier to determine if they are fair. Of particular note (and relevance to this discussion) is the push from the legal academics for some standard of explainability when using machine learning in the courtroom.

It’s important to note that this is nothing new. For the longest time, we have had statisticians as expert witnesses bring their wares into the courtroom, and I imagine on some occasions the methods they’ve used have been at the edge of interpretability.

So I think it’s helpful to take a step back and think about why only now are the warning bells being rung?

I think part of it stems from the fact that ML/AI has successful seeped into the mainstream (probably due to FAANG being such an integral part of our lives now). This isn’t particularly interesting.
More substantively is the fact that we, the people, have been sold this as some form of Artificial Intelligence, which comes parcelled with its baggage of ideas. This makes the prospect of “having one’s fate be decided by an AI” seem decidedly more nefarious than “used a regression model in the analysis.”
Perhaps the most reasonable (though not necessarily the most likely) reason is that the scope of problems tackled by our methods has grown, to the point where we’re now at the high table, so to speak, and making important decisions (and oftentimes being the arbiter, or a crucial component).

It seems to me that part of the issue here is the notion of autonomy, and where humans are in the chain of decision-making. I would expect that, provided there’s a human in the loop, then people will not be so worried.

At the outset, this feels like a very reasonable ask: in fact, it feels like a moral imperative for any black-boxes used in deciding the fate of human lives to at least be explainable. Why do I say that? It somehow feels wrong for such decision making to rest on the virtue of the machines.22 Perhaps this is two different biases coming into play

Outline/Plan

xAI, what it is, why people are pushing it
what are the potential pitfalls?
solutions?
- we should at least propose some solutions
- make it robust to such effects: and perhaps that’s the overarching theme in all this: we have all these potential feedback loops, how do we go about making sure that our systems are robust to it
literature review (difficult)

Quantification (or is it Pontification)

In a similar spirit to #differential_privacy, let’s see if we can systemize or quantify this problem. It helps to think in terms of extremes: suppose everyone has access to the whole model, so they can reproduce all the results, and play around with tweaking the inputs. Two problems immediately come to mind:

perhaps there are blindspots (a little similar to the spirit of #adversarial_training) in the model, allowing adversaries to create profiles whose model output would be deemed surprising
- while this is an important consideration, we leave this to those who work in that area of research to help robustify models.
even with a completely robust model, without blindspots, one still has the potential problem of actors curating their covariates in such a way as to game the model.
- for instance, they might know how to answer the questionnaires as part of the COMPAS dataset to minimize their supposed probability of reoffending
- incentivized to move cities, have more (or less) children
- anti-causal, perverse incentives/selection

Let’s try to make this notion of gaming more formal:

we can categorize/quantify the degree to which a covariate is game-able, or mutable (mutability, taking a leaf from CS). obviously it’s easier for everyone if we could do this automatically, but I feel it is important to distinguish things like race/gender from, say, how many books you read a night.
having weighed each covariate by their mutability, then it depends on the explainable method
- we could adopt a sort of differential privacy framework, where we obfuscate any queries to the model (i.e. if we don’t want to reveal the actual model, but we don’t mind allowing people to query, which we just obfuscate with noise). in this case, then somehow the weights would factor in to data privacy leakage, though in this case game model leakage.
- alternatively, when we create the surrogate models, we might want to make the approximation more vague for covariates with higher mutability
  - the problem here is that unless it’s specifically noted that this model is an approximation, then people will still try to game, though they’ll just not succeed (is that a problem, I can’t tell).

Backlinks

[[project-fairness]]
- The last factor, whether or not someone put a phone number down, is something that feels incredibly easy to game. And even in the last paragraph, a staffer encourages a felon to put down a phone number. This definitely rings of [[the-perils-of-explainability]]: if you give people the factors, then basically it’s no longer going to be meaningful.

Next Steps for Interpolation

Let’s record the kinds of experimental data that we are interested in for [[project-interpolation]].

Replicating what’s been shown in the literature

In the Zhang et al. (⊕2019Zhang, Chiyuan, Benjamin Recht, Samy Bengio, Moritz Hardt, and Oriol Vinyals. 2019. “Understanding deep learning requires rethinking generalization.” In 5th International Conference on Learning Representations, Iclr 2017 - Conference Track Proceedings. University of California, Berkeley, Berkeley, United States.) paper, they show that if you randomize the training data labels (in a classification problem), then you can still get zero training loss.
- the point here is that the classical ideas of understanding the performance of machine learning methods (VC dimension) break down11 Though, I always get confused about this. Probably should reread one of the blog posts of Aroras..
In some more recent work (Belkin et al. ⊕2019Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling modern machine-learning practice and the classical biasvariance trade-off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54.), it was cited that if you randomize some of the training labels, you can basically still get the same generalization performance.
- unclear to me if these two results are the same.
- I vaguely remember there was some place that talked about the relevance to #robustness or #adversarial_training.
establish that for standard problems (like CIFAR/MNIST/*), if you perturb the data (some percentage, or be clever about picking the points), then you’re going to maintain test performance.

New Ideas

We conjecture that what’s going on is that there
- show that this