202006161811
Explaining Word Embeddings
tags: [ src:article , nlp ]
- blog:
- gives an “intuitive” explanation for why Word Embeddings should satisfy analogies
- which is pretty straightforward
- Words close in this space are often synonyms (e.g. happy and delighted), antonyms (e.g. good and evil) or other easily interchangeable words (e.g. yellow and blue).
- this relates to a project that has been at the back of my mind, [[signed-word-embeddings]]
- this is an important concept: that you should be getting “interchangeability” not closeness in meaning, though they are proxies
- this feels like an open question, right?
- a is to b is as A is to B gives \(\frac{P(w|a)}{P(w|b)} = \frac{P(w|A)}{P(w|B)}\)
- dog is to puppy as cat is to kitten
- \(\frac{P(w|dog)}{P(w|puppy)} = \frac{f(w\vert age=adult)}{f(w\vert age=cub)} = \frac{P(w|cat)}{P(w|kitten)}\)
- you’re sort of decomposing as follows:
- \[\displaylines{P(w\vert dog) = f(w\vert species=dog) \times f(w\vert age=adult) \times P(w\vert is\_a\_pet) \\ P(w\vert puppy) = f(w\vert species=dog) \times f(w\vert age=cub) \times P(w\vert is\_a\_pet) \\ P(w\vert cat) = f(w\vert species=cat) \times f(w\vert age=adult) \times P(w\vert is\_a\_pet) \\ P(w\vert kitten) = f(w\vert species=cat) \times f(w\vert age=cub) \times P(w\vert is\_a\_pet)}\]
- that’s how the ratios end up being the same, because you’re basically cancelling the shared “hidden variables”
- this makes it feel like a hidden variable type thing?
- you’re sort of decomposing as follows:
- \(\frac{P(w|dog)}{P(w|puppy)} = \frac{f(w\vert age=adult)}{f(w\vert age=cub)} = \frac{P(w|cat)}{P(w|kitten)}\)
- gives an “intuitive” explanation for why Word Embeddings should satisfy analogies
Todo
- recent paper exploring this further: blog
Backlinks
- [[relational-learning]]
- taking the diff of two vectors does surprisingly well in determining analogies (and was the selling point of (Mikolov et al. 2013Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed representations ofwords and phrases and their compositionality.” In Advances in Neural Information Processing Systems. Google LLC, Mountain View, United States.)), also see [[explaining-word-embeddings]] for a principled reason why taking the difference should work)