#semisupervised_learning

Pediatric Transfer Learning

A colleague is a pediatrician. A (very) common problem is: there’s plenty of studies (RCTs) of a particular procedure/drug/intervention, but very little data when applied to children. Right now, you simply abandon all information from the adult clinical trials, and run these prohibitively expensive (and limited) trials on children. It seems like there should be some way to utilize those adult trials.

I guess this isn’t particular to pediatric research; anytime you have some more common/majority group of individuals, and you want to study an under-represented group (or, the more stupid thing would be to think that the models trained on the majority group are somehow universal, but in fact fail completely when trained on a minority: a problem I came across in the [[honduras-face-project]]).

Overall, this is very much like [[transfer-learning]].

To a lesser extent, it’s like #semisupervised_learning (Wiki). There, though, the point is that you can use large quantities of unlabeled data to perform supervised learning better.

Figure 1: Classic semi-supervised data example

For some reason, it reminded me of a problem that a student was working on in causal inference: the problem there being that you have some trial/study that has multiple resolutions/outcomes (e.g. some childhood intervention can affect high school grades, and also average salary down the road), and they were trying to come up with a causal inference framework to be able to basically “fill-in-the-blank,” or “fill-out-the-square,” whatever the better phrase is.11 I forget the exact details of what’s available, but in some sense it’s just a missing data/imputation problem.

At a high level: we’re basically trying to learn functions under various structural assumptions (or, at the very least learn that there’s no fixed, stable function).