Besides helping with interpretability, my immediate thought is that maybe we could pretrain models faster by adding regularization terms in the objective function that induce representations of distinct categories to be in subspaces that are orthogonal to each other, and representations of subcategories to be in orthogonal subspaces that can form polytopes. The data necessary for doing so is readily available: Wordnet synsets. Induce representations of synsets to be orthogonal to each other and representations of hierarchically related synsets to be arranged in polytopes. There's already some evidence that we can leverage Wordnet synsets to pretrain some models faster. Take a look at https://news.ycombinator.com/item?id=40160728 for example.
Thank you for sharing this on HN.
It's satisfying that the structure is precisely the one you would hope for.
I think everyone _knew_ in some sense that the structure of categorial information about vectors must be hierarchical and have this general kind of structure, but they managed to formalize that intuition into just a few theorems that seem sort of inevitable only in retrospect.
Is this result useful only for basic concepts backed by huge numbers of cases in the training data, or is it more general than that?
Comments?
A type theory corresponds to a complex diagram, as outlined in topos theory. (Note: complex as in CW-complexes.)
I think it’s fascinating LLMs ended up being a similar structure — but perhaps not entirely surprising. There have been similar results, eg a topological covering can generate an ML model.
https://ncatlab.org/nlab/show/William+Lawvere#RelationToPhil...
From a 10,000 foot view, I think nailing down a more "objective" understanding of dialectics (idealist, material, whatever) is a promising direction to ameliorate this meta-problem. People arguing in journals is pretty much a dialectic problem, so understanding that can go a long way to understanding issues beyond that.