#embeddings

Linguistic Neuroscience

Wild Conjectures

As somewhat of an interdisciplinary nut1 like a gun-nut, except, with interdisciplinary research, instead of guns, duh!, it comes as no surprise that I think the advent of computational linguistics has really helped us better understand language. Something that I try to emphasize in class is the uniqueness of languages and words as a data form. In some (highly reductionist) sense, it’s just categorical data. For one thing, that’s definitely not how we would think about it at the outset, but the moment you start thinking statistically, you realize it’s a very natural formulation. Secondly, there’s all this interesting structure and rules (i.e. grammer), meaning, relationships, logic, intelligence. In fact, as we’ve sort of discovered from [[gpt3]], it is quite the proxy for general intelligence. And in between all this are the distributed semantic representation lessons from #embeddings. But I digress.

A natural question to ask is, are all these things we’ve learned about language and words back-propagatable to the way our brains understand and organize language? It is rather tempting to hope that perhaps the way word embedding representation might be similar to the way our brains encode words. And it turns out that there is empirical evidence to suggest that something interesting is going on, though I need to read the literature more carefully. This piece is an attempt to take a skeptical view of all this.

Firstly, I’m pretty sure that we have no idea how our brains handle words/language. What we can do is the next best thing, which is to see how our brain patterns (i.e. fMRI scans) look like when we hear particular words: that is, we can equate words to “brain activation patterns.”2 I’m being particularly vague here because there are many modalities of such patterns, from using magnets to oxygen levels, the details of which I haven’t been bothered to score into memory.3 This reminds me of conversations from #vbw. Probably many millions of human-brain-hours have been spent tackling this problem, so I don’t think this little brain will make any particularly groundbreaking inroads here.

A few remarks though:

  • I think you understand the mechanics of a system best when it’s under stress/strain. In my case, when doing things like word problems/crossword puzzles or just whenever a word is at the tip of your tongue.
    • I’m not really sure what space I’m traversing (if any) when I’m reaching for a particular word that’s at the tip of my tongue, but in any case I feel it must be a confluence of sounds/muscle-memory/meaning/memory.
    • There must be individually differences in terms of degrees of abstract thought.
  • I find that Chinese and English are very different systems in my head (which can be partially explained by my not-so-great Chinese). In particular, I think rather phonetically when it comes to Chinese, which has the funny byproduct that I’m much better at making puns in Chinese than my innate ability would suggest otherwise.
  • I just find it hard to believe that the way I reach words in my head is accessing some space: in particular, that would suggest that oftentimes I’ll mistake words that are very close in said space (which rarely happens).

All this is to say: my default position is that the distributed representation of word might be correlated, but that correlation is almost tautological: that is, if you have any good representation of words, then it must necessarily be correlated with the way our brains process words.4 Having written that statement out, it actually feels like quite a strong statement, and I’m not even sure if it’s true. I think what my intuition is getting at is: the act of finding a representation, it’s like collapsing something down to a finite-dimensional vector space. The process of doing so makes any potential equivalence/correlation moot.

The problem is that, what I’ve describe above is almost impossible to falsify!

Another way to frame this is that, the sheer complexity of language means that if you’re squeezing it down5 It’s a little like a compression algorithm. into a small space (\(\mathbb{R}^{300}\))6 Funny how things quickly become small. then there’s no way for any two representations to be uncorrelated?

Literature

(Huth et al. 2016Huth, Alexander G, Wendy A de Heer, Thomas L Griffiths, Frédéric E Theunissen, and Jack L Gallant. 2016. “Natural speech reveals the semantic maps that tile human cerebral cortex.” Nature 532 (7600): 453–58.): looks like there are dedicated regions of the brain that correspond to various semantic types (and it’s pretty consistent across individuals). they do this by forming a generative model. You have local regions (hidden states) that are disjoint and provde full support, which you then learn.

Remark. This supports my hypothesis (!), in that if there are just distinct regions that trigger depending on the semantic classification, then that should be sufficient to be correlated with the distributed word vectors.

(Fereidooni et al. 2020Fereidooni, Sam, Viola Mocz, Dragomir Radev, and Marvin Chun. 2020. “Understanding and Improving Word Embeddings through a Neuroscientific Lens.” bioRxiv, September, 2020.09.18.304436.): they first show that correlation is significant (but it’s 0.1 (!), which is like…I guess not 0…?). They then find a way to inform word embedding models with data from the features from brain scans (didn’t look at this that carefully, as I don’t really trust some of the authors of this paper).