#CV

Neural Code for Faces

src: (Chang and Tsao 2017Chang, Le, and Doris Y Tsao. 2017. The Code for Facial Identity in the Primate Brain.” Cell 169 (6): 1013–1028.e14.), Nature news feature.

Remark. They have a section in their paper (Computational Advantages of an Axis Metric over a Distance Metric) that I think doesn’t make much sense: they compare a distance metric (distance to an exemplar face) to an axis metric, so comparing non-linear to linear, but that’s just a bad non-linear model. What’s more confusing is the fact that their model already has non-linearities, in the feature extraction step, it’s just this last step that’s a projection, so really it’s the combination of non-linear and linear (projection) that gets the job done.

Thoughts:

Robustness of Facial Embeddings

Facial embeddings (or, more precisely, the methods that produce such embeddings) seem like they would be very susceptible to problems with image artefacts and other distortions to the image data.1 This feels like a common refrain from industry folks, whereby a lot of the algorithms touted by academics are run on very clean, systematic images, and once you move away from the clinical datasets and venture into real life, there’s little guarantee that your algorithms are going to recover anything useful. At least, that’s my impression – do wish I had some data/evidence at my fingertips for this. Even more troubling is that, since they’re usually trained on caucasian datasets, this lack of coverage makes one question even more the fidelity of said model outputs.

Concretely, we have been working on a social network/computer vision problem, the gist of which is the following: we want to extract some facial features from these photographs we have of our cohort (rural villagers in Honduras), and test, for instance, whether friends look more similar. As referenced above, the quality of the photographs varies drastically; we have photographs of driver’s licenses, partially lighted faces, pictures of people in the background, blurry photos. Of course, we did our best to salvage, normalize, or just discard anything that was too poor in quality. On top of all that, we have no idea how the ethnicity of the individuals affects the model (facial embeddings). All in all, it makes one very wary of using such models.

Update

Clarification on Face Networks

Firstly, face embeddings (or vectors) are very different from word embeddings (language/words are their own special domain), but they’re also slightly different from vector representations of images. It’s still the penultimate layer of the network (e.g. VGG). The key difference is that you’re trying to capture the notion of a face, different from image classification or object detection.2 You could say it’s image classification where the group/category is huge (individual faces), with only slight variation, and you get multiple samples of the same group. So not really. Thus, the training process is going to be different: in particular, what they use are siamese networks (feeding two different images of the same face into the same network), and the loss minimizes the distances between congruous pairs of faces (and maximize distance between incongruous). Actually, in a similar spirit to Word2Vec’s SGNS (skip-gram, negative-sampling), you can do better if, for congruent pairs \((a,b)\) and incongruous \((c)\), you make \(a\) closer to \(b\) than \(c\). The key point here being that the loss is going to be different, and hope is that the projections learned are able to capture something fundamental about people’s faces.

Goal

There are certain diagnostics one may use to check the output of the models: one of them involves looking at which pixels light up, but I think that’s usually only applicable if you have some sort of classification problem.

But I think this is sort of a fundamental goal, but basically, what I would like is for the embedding model to come equipped with a confidence band, telling me how confident I should be that this embedding is actually useful – and clearly the way that I’ve set this up makes one want to relate this to porting a notion of significance to the results of neural networks.

Honduras Face Project

Given a signed social network, various demographics, and faces, what interesting questions can be answered?

Existing Literature

  • The Faces of Group Members Share Physical Resemblance (doi):
    • similar-looking people are more likely to be friends
    • not particularly surprising. the main question is one of causal direction?
  • Multidimensional Homophily in Friendship Networks (link):
    • this paper doesn’t actually seem that interesting. not sure what linear model they’re using, but ignoring that, what they show is that the coefficient for two variables (same sex, same ethnicity) are positive while the interaction of these two variables is negative.
    • but (I’m pretty sure) all that tells you is that it’s not an additive relationship, so you get diminishing returns for homophily.
    • this seems very plausible (they do mention this in the discussion). there’s redundancies involved.
  • Attractiveness and Symmetry: much existing work showing that people rate averageness as more attractive
    • i.e. deviation from the norm is penalized.1 except when dealing with high fashion models, where singularity is prized. this was mentioned in a recent episode on Tyler’s podcast