AB Testing in Networks
source: h.
Summary
- Typical A/B testing (RCT) is done uniformly across the whole population.
- the resolution of the \(t\)-statistic across treatment and control assumes that the TE is independent of treatment
- this assumption is usually treated as given, but difficulties when subjects can interact
- example: group effect, if enough of your friends have the treatment, then you’ll respond positively
- when you have (social) networks, or just people interacting on the same system, what to do?
- expand your unit of experiment: instead of subject, just find components in your graph. to be explicit, this simply means assigning the same treatment to all those in that component.
- however, in most social network settings, we have a very large component (and you get small diameter for free)
- then you’re basically stuck. you could partition into subgroups, but there will always be edges between groups
- could turn into a graph cut problem, aka community detection problem, where you’re finding a partition that minimizes connecting edges.
- in certain settings, the networks are collections of small components. then you’re good to go.
- however, in most social network settings, we have a very large component (and you get small diameter for free)
- my feeling/intuition is that academics will try to measure this network effect. getting into #causal_inference territory.
- oftentimes, this is done post-hoc, so rerunning the experiment is not possible.
- expand your unit of experiment: instead of subject, just find components in your graph. to be explicit, this simply means assigning the same treatment to all those in that component.
- so, if you widen your unit, you’re going to lose power
- fewer experimental units
- differences in units (large vs small, different behaviours)
- it’s like the propensity matching problem, you want similar units, so it’s easier to extract the treatment effect
- solution, part (a): do matching, but in a stratified way
- solution, part (b): expand your model
- the usual model is \(y_i \sim \mu + \tau t_i\), where \(t_i\) is indicator function of treatment, and \(\tau\) is our (average) treatment effect.
- because we have these mini-units, we can do some further modelling here, so that our response is not just a function of the treatment \(t_i\), but also a function of the groups within this unit.
- spill-over effects:
- \(y_i \sim \mu + \tau t_i + \gamma a_i \cdot T\), where \(a_i\) is the column of the adjacency matrix, and \(T\) is the vector of treatments, so \(a_i \cdot T\) is just a cute way to count the number of neighbors
- since by definition, a neighbour must be in the same treatment, you can just replace \(T\) with the ones vector.
- this is a first-order spill-over effect, and it seems like it is simply additive.
Highlights
One particular area involves experiments in marketplaces or social networks where users (or randomized samples) are connected and treatment assignment of one user may influence another user’s behavior.
Typical A/B experiments assume that the response or behavior of an individual sample depends only on its own assignment to a treatment group. This is known as the Stable Unit Treatment Value Assumption (SUTVA). However, this assumption no longer holds when samples interact with each other, such as in a network. An example of this is when the effects of exposed users can spill over to their peers.
Experiments in social networks must care about “spillover” or influence effects from peers.
we lose experimental power. This comes from two factors: fewer experimental units, and greater inherent difference across experimental units.
cluster samples into more homogeneous strata and sample proportionately from each stratum.
Another aspect of concern is that treatment can in theory affect not just experiment metrics but also the graph topology itself. Thus graph evolution events also need to be tracked over the course of the experiment.