Since I joined, we've gone from <1k hours to >10k hours, and I've been really excited by how much our whole setup has changed. I've been implementing lots of improvements to the whole data pipeline and the operations side. Now that we train lots of models on the data, the model results also inform how we collect data (e.g. we care a lot less about noise now that we have more data).
We're definitely still improving the whole system, but at this point, we've learned a lot that I wish someone had told us when we started, so we thought we'd share it in case any of you are doing human data collection. We're all also very curious to get any feedback from the community!
I have dreamed many times about same story but with apple or epic games. But they have millions of human beings testing their products FOR FREE in every place of the world, hahahaha
But it feels eery to read a detailed story how they built and improved their setup and what obstacles they encountered, complete with photos - without any mention who is doing the things we are reading about. There is no mention of the staff or even the founders on the whole website.
I had a hard time judging how large this project even is. The homebuilt booths and trial-and-error workflow sound like "three people garage startup", but the bookings schedule suggests a larger team.
(At least there is an author line on that blog post. Had to google the names to get some background on this company)
You should consider an "about us" page :)
Though, I suppose if the model had LLM-like context where it kept track of brain data and speech/typing from earlier in the conversation then it could perform in-context learning to adapt to the user.
We only got any generalization to new users after we had >500 individuals in the dataset, fwiw. There's some interesting MRI studies also finding a similar thing that when you have enough individuals in the dataset, you start seeing generalization.
Have you played at all with thought-to-voice? Intuitively I’d think EEG readout would be more reliable for spoken rather than typed words, especially if you’re not controlling for keyboard fluency.
It does generalize between typed and spoken, i.e. it does much better on spoken decoding if we've also trained on the typing data, which is what we were hoping to see.
Both of these modes are incredibly slow thinking. Conciously shifting from thinking in concepts to thinking in words is like slamming on brakes for a school zone on an autobahn.
I've gathered most people think in words they can "hear in their head", most people can "picture a red triangle" and literally see one, and so on. Many folks who are multi-lingual say they think in a language, or dream in that language, and know which one it is.
Meanwhile, some people think less verbally or less visually, perhaps not verbally or visually at all, and there is no language (words).
A blog post shared here last month discussed a person trying to access this conceptual mode, which he thinks is like "shower thoughts" or physicists solving things in their heads while staring into space, except "under executive function". He described most of his thoughts as words he can hear in his head, with these concepts more like vectors. I agree with that characterization.
I'm curious what % of folks you've scanned may be in this non-word mode, or if the text and voice requirement forces everyone into words.
One thing that's particularly exciting here is that the model often gets the high-level idea correct, without getting any words correct (as in some of the examples above), which suggests that it is picking up the idea rather than the particular words.
Are you pursing an idea of how to help people like this author* access this mode that some of us are always in unless kicked out of it by the need for words?
Very needed right now — the opposite of the YouTube-ization of idea transfer.
It doesn't seem clear this is accessible without other changes in wiring? The inability to "picture" things as visuals seems to swap out for "conceptualizing" things in -- well, I don't have words for this.
An attempt from that essay:
This is not what Hadamard is talking about when he describes the wordless thought of the mathematicians and researchers he has surveyed. Instead, what they seem to be doing is something similar to this subconscious, parallelized search, except they do it in a “tensely” focused way.
The impression I get is that Hadamard loads a question into his mind (either in a non-verbal way, or by reading a mathematical problem that has been written by himself or someone else), and then he holds the problem effortfully centered in his mind. Effortfully, but wordlessly, and without clear visualizations. Describing the mental image that filled his mind while working on a problem concerning infinite series for his thesis, Hadamard writes that his mind was occupied by an image of a ribbon which was thicker in certain places (corresponding to possibly important terms). He also saw something that looked like equations, but as if seen from a distance, without glasses on: he was unable to make out what they said.
I’m not sure what is going on here.
* https://www.henrikkarlsson.xyz/p/wordless-thought
A couple of this author's speculations aren't how I'd say it works when this is one's default mode, but most are in the neighborhood. He comes the closest of what I've read by people who do think the way the author thinks — which seems to be most people.
What you are trying to do is BIG, I love it. And I hope you could have more than 1M in a few months!
Keep pushing team!!!
That said, the way to 10-20x data collection would be to open a couple other data collection centers outside SF, in high-population cities. Right now, there's a big advantage in just having the data collection totally in-house, because it's so much easier to debug/improve it because we're so small. But now we've mostly worked out the process, it should also be very straightforward for us to just replicate the entire ops/data pipeline in 3-4 parallel data collection centers.
* A ceiling-based pully system could help take the physical load off the users and may allow for increased sensor density. Some large/public VR setups do this.
* I'm sure you considered it, but a double-converting UPS might reduce the noise floor of your sensors and could potentially support multiple booths. Expensive though, and it's already mentioned that data quantity > quality at this stage. Maybe a future fine-tuning step could leverage this.
Cool write up and hope to see more in the future!
“the room seemed colder” -> “ there was a breeze even a gentle gust”
Very interesting!
A couple of questions: What's the relationship between the number of hours of neurodata you collect and the quality of your predictions? Does it help to get less data from more people, or more data from fewer people?
For a given amount of data, is it better to have more people with less data per person or fewer people with more data per person?
For a given amount of data, whether you want more or less data per person really depends on what you're trying to do. The thing we want is for it to be good at zero-shot, that is, for it to decode well on people who have zero hours in the train set. So for that, we want less data per person. If instead we wanted to make it do as well as possible on one individual, then we'd want way more data from that one person. (So, e.g., when we make it into a product at first, we'll probably finetune on each user for a while)
I wonder if there will be medical applications for this tech, for example identifying people with brain or neurological disorders based on how different their "neural imaging" looks from normal.
If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores)
We tried google/facebook/instagram ads, and we tried paying for some video placements. Basically none of the explicit advertisement worked at all and it wasn't worth the money. Though for what it's worth, none of us are experts in advertising, so we might have been going about it wrong -- we didn't put loads of effort into iterating once we realized it wasn't working.
Those predictions sound good enough to get you CIA funding.
[see https://news.ycombinator.com/item?id=45988611 for explanation]