#idea
One Stock to Rule Them All
A small, stupid, simple idea (that’s been bouncing around my head): the price of a stock is a function of all the information about that particular stock (which should dictate its movement). If you believe the efficient market hypothesis (EMH) (which, granted, most people do not), then by extension this suggests that a stock’s price must also capture all the information in the world. Of course, most other information takes up an infinitely small part of that stock’s price (the less relevant, the smaller the part), but in some sense, the whole world is captured in every single stock.
Now, even if you don’t believe in the EMH, such a view might be instructive when considering the real world: essentially each stock is like a view of the infinite dimensional representation of the world. Under the EMH, a view (a projection) does not reduce the dimensions (i.e. no zero’d coordinates), whereas in real life, you’re most likely getting a projection into a pretty compact finite space.
What’s the end result of all this? It sort of feels like the word embedding stuff, but not really.1 The key difference there is that you assume words/stocks have some latent representation in some shared space, whereas here we assume all stocks derive from the same latent representation, but simply take on different views. The conjecture that comes out of this is that, essentially, there’s a very high dimensional vector that corresponds to the world, and each stock is simply one view into that world. The goal is then to learn this vector, as well as the projections associated with each stock. The question is then: is this a solvable problem?
More formally, let \(Z_t\) be this latent representation of the world (at time \(t\)). For simplicity, let’s assume that \(Z_t\) is finite-dimensional, \(\in \mathbb{R}^{n}\). Then, each stock \(X^{i}_{t}\) corresponds to a view of \(Z_{t}\), namely \(X^{i}_{t} = P^{i}(Z_{t})\), where \(P^{i}\) is some projection matrix (possibly varies with \(t\), but if it does, we probably need to assume that it is slowly-varying). We are given the \(\left\{ X^{i}_{t} \right\}_{i, t}\), and our goal is to estimate \(\left\{ Z_t \right\}_{t}\) and \(\left\{ P^{i} \right\}_{i}\).
The above formulation still subscribes to the linear world, whereas nowadays we want to express things as non-linear functions, allowing us to take full advantage of deep learning. However, it’s unclear to me whether or not it’s even possible to frame this in a way that’s conducive for deep learning. Of course, every time there’s talk of a latent representation, it feels like you should be able to solve it with neural networks.
Signed Word Embeddings
The basis for word embeddings lies in the distributional hypothesis, which states that “a word is characterized by the company it keeps”.
- [[idea]]
- I’ve thought about this before, but for some reason I can’t quite find where I wrote about this. basically it follows from two pieces of intuition:
- that word embeddings cannot distinguish between synonyms and antonyms, because all they care about is co-occurrence
- as a result, oftentimes you’ll see that antonyms will appear close to each other in the embedding space, which doesn’t really make much sense if we’re think of the embedding space as a proxy for “meaning”
- that I’ve worked on signed graphs, and so have some intuition about dealing with positive/negative “ties”
- that word embeddings cannot distinguish between synonyms and antonyms, because all they care about is co-occurrence
- I think that an interesting place to start will be thinking in terms of the SVD decomposition of the PMI matrix (even though that’s not exactly the same thing as SGNS/Word2Vec)
- How does this relate to Signed Graphs?
- The one time I was thinking of embedding positive and negative edges, it was when considering the spectral decomposition (?) of the graph
- I vaguely remember a paper that tried to spectral methods to do some sort of visualization (though it was very unwieldy)
- the problem is that when you do things geometrically, then you’re basically going to be conforming to transitivity indirectly. just like when you’re trying to model signed graphs. thus, I think that even though it seems innocuous to have some sort of geometric interpretation, it has this unconscious bias
- I’ve thought about this before, but for some reason I can’t quite find where I wrote about this. basically it follows from two pieces of intuition:
Backlinks
- [[explaining-word-embeddings]]
- this relates to a project that has been at the back of my mind, [[signed-word-embeddings]]
- [[relational-learning]]
- they also have problems with antonyms, which was the impetus for my project on [[signed-word-embeddings]].
Concurrent Face and Word Embeddings
- crazy [[idea]]:
- we have face embeddings, which on their own are just a way to represent faces efficiently in a low-dimension
- would it be possible to associate faces with adjectives?
- the goal, unsurprisingly, is to be able to put to text the biases inherent in these models
- the problem is that it’s currently not biased in the more insidious way
- but what if it’s the case that, if you were to do some type of alignment, then by doing so, you’re actually inheriting all the biases that the word embeddings have?
- in other words, whenever you have some sort of process that does some concurrent learning, then the final model will be biased. but that doesn’t feel particularly interesting.
Personalized Art
Pitch: have an art installation that is personalized to the person. That is, it has facial recognition, so the piece of art relates to you. But more importantly, it has sentiment analysis, so it’s going to train your art on your facial reactions to the current generation. Thus, it is a constantly evolving piece of art that is individualized and unique to the eye of the beholder, now literally.