Open source LLMs exist and will get better. Is it just that all these companies will vie for a winner-take-all situation where the “best” model will garner the subscription? Doesn’t OpenAI make some substantial part of the revenue for all the AI space? I just don’t see it. But I don’t have VC levels of cash to bet on a 10x or 100x return so what do I know?
To be able to achieve that is entirely dependent on two things:
1) deploying capital in the current fund on 'sexy' ideas so they can tell LPs they are doing their job
2) paper markups, which they will get, since Ilya will most definitely be able to raise another round or two at a higher valuation. even if it eventually goes bust or gets sold at cost.
With 1) and 2), they can go back to their existing fund LPs and raise more money for their next fund and milk more fees. Getting exits and carry is just the cherry on top for these megafund VCs.
You will struggle to raise funds if the companies you bet on perform poorly; the worse your track record the less chances of raising money and earn income from it.
So all the successful VC partners from 2010 are close to retirement or have retired?
Why say something testable if it is obviously wrong.
I mean it probably depends on the LP and what is their vision. Not all apples are red, come in many varieties and some for cider others for pies. Am I wrong?
but.. it really depends heavily on the LP base of the firm, and what the firm raised it's fund on, it's incredibly difficult to generalize. The funds I'm involved around as an LP... in my opinion they can get as "sexy" as they like because I buy their thesis, then it's just: get the capital deployed!!!!
Most of this is all a standard deviation game, not much more than that.
https://www.otpp.com/en-ca/investments/our-advantage/our-per... https://www.hellokoru.com/
an LP is a "limited partner." they're the suckers (or institutional investors, endowments, pensions, rich folks, etc.) that give their cash to venture capital (VC) firms to manage. LPs invest in VC funds but don't have control over how the money gets used—hence *limited* partner. they just hope the VCs aren't burning it on overpriced kombucha and shitty "web3" startups.
meanwhile, the VCs rake in their fat management fees (like the 2% mentioned) and also get a cut of any profits (carry). VCs are more concerned with looking busy and keeping those sweet fees rolling in than actually giving a fuck about long-term exits.
Someone wants to fund my snide, cynical AI HN comment explainer startup? We are too cool for long term plans, but we use AI.
Last night as my 8yo was listening to childrens audio books going to sleep, she asked me to have it alternate book A then B then A then B.
I thought, idunno maybe I can work out a way to do this. Maybe the app has playlists and maaaaaaaaaaybe has a way to set a playlist on repeat. Or maybe you just can't do this in the app at all. I just sat there and switched it until she fell asleep, it wasn't gonna be more than 2 or 3 anyway, and so it's kind of a dumb example.
But here's the point: Computers can process language now. I can totally imagine her telling my phone to do that and it being able to do so, even if she's the first person ever to want it to do that. I think the bet is that a very large percentage of the world's software is going to want to gain natural language superpowers. And that this is not a trivial undertaking that will be achieved by a few open source LLMs. It will be a lot of work for a lot of people to make this happen, as such a lot of money will be made along the way.
Specifically how will this unfold? Nobody knows, but I think they wanna be deep in the game when it does.
How good does it have to be, how many features does it have to have, how accurate does its need to be.. in order for people to pay anything? And how much are people actually willing to spend against the $XX Billion of investment?
Again it just seems like "sell to AAPL/GOOG/MSFT and let them figure it out".
Voice assistants do a small subset of the things you can already do easily on your phone. Competing with things you can already do easily on your phone is very hard; touch interfaces are extremely accessible, in many ways more accessible than voice. Current voice assistants only being able to do a small subset of that makes them not really very valuable.
And we aren't updating and rewriting all the world's software to expose its functionality to voice assistants because the voice assistant needs to be programmed to do each of those things. Each possible interaction must be planned and implemented invidually.
I think the bet is that we WILL be doing substantially that, updating and rewriting all the software, now that we can make them do things that are NOT easy to do with a phone or with a computer. And we can do so without designing every individual interaction; we can expose the building blocks and common interactions and LLMs may be able to map much more specific user desires onto those.
or
"Imagine an AI that can just use the regular human-optimized UI!"
These are things VCs will say in order to pump the current gen AI. Note that current gen AI kinda suck at those things.
Feels very different to me. The dominant ones are run by Google, Apple, and Amazon, and the voice assistants are mostly add-on features that don't by themselves generate much (if any) revenue (well, aside from the news that Amazon wants to start charging for a more advanced Alexa). The business model there is more like "we need this to drive people to our other products where they will spend money; if we don't others will do it for their products and we'll fall behind".
Sure, these companies are also working on AI, but there are also a bunch of others (OpenAI, Anthropic, SSI, xAI, etc.) that are banking on AI as their actual flagship product that people and businesses will pay them to use.
Meanwhile we have "indie" voice assistants like Mycroft that fail to find a sustainable business model and/or fail to gain traction and end up shutting down, at least as a business.
I'm not sure where this is going, though. Sure, some of these AI companies will get snapped up by bigger corps. I really hope, though, that there's room for sustainable, independent businesses. I don't want Google or Apple or Amazon or Microsoft to "own" AI.
And again this against CapEx of something like $200B means $100/year per user is practically rounding to 0.
Not to mention the OpEx to actually run the inference/services on top ongoing.
I wouldn't trust any kind of AI bot regardless of intelligence or usefulness to buy toilet paper blindly, yet alone something like a hard drive or whatever.
One could ask: how is this different from automatic call centers? (eg “for checking accounts, push 1…”) well, people hate those things. If one could create an automated call center that people didn’t hate, it might replace a lot of people.
The global call center market is apparently $165B/year revenue, and let's be honest even the human call center agents aren't great. So market is big and bar is low!
However, we are clearly still quite far from LLMs being a) able to know what they don't know / not hallucinate b) able to quickly/cheaply/poorly be trained the way you could a human agent c) actually be as concise and helpful as an average human.
Also it is obviously already being tried, given the frequent Twitter posts with screenshots of people jailbreaking their car dealership chat bot to give coding tips, etc.
An ubiquitous phone has enough sensors/resources to be fully situationally aware and preempt/predict for each holder any action long time ahead.
It can measure the pulse, body postures and movements, gestures, breath patterns, calculate mood, listen to the surrounding sounds, recall all information ever discussed, have 360 deg visual information (via a swarm of fully autonomous flying micro-drones), be in an network with all relevant parties (family members, friends, coworkers, community) and know everything they (the peers) know.
From all gathered information the electronic personal assistant can predict all your next steps with high confidence. The humans think that they are unique, special and unpredictable, but opposite is the case. An assistant can know more about you than you think you know about yourself.
So your 8yo daughter does not need to tell how to alternate the audio books, the computer can feel the mood and just do what is appropriate, without her need to issue a verbal command.
Also in the morning you do not need to ask her how she slept tonight and listen to her subjective judgement.
The personal assistant will feel that you are probably interested in your daughters sleep and give you an exact objective medical analysis of the quality of the sleep of your daughter tonight, without you needing to ask the personal assistant of your daughter.
I love it, it is a bottomless goldmine for data analysis!
Next step: the assistant knows that your brain didn't react to much to its sleep report the last 5 mornings, so it will stop bothering you altogether. And maybe chitchat with your daughter's assistant to let her know that her father has no interest in her health. Cool, no? (I bet there is already some science fiction on this topic?)
Impressed by this bot recently shared on news.yc [0]: https://storytelling-chatbot.fly.dev/
> Specifically how will this unfold? Nobody knows
Think speech will be a big part of this. Young ones (<5yo) I know almost exclusively prefer voice controls where available. Some have already picked up a few prompting tricks ("step by step" is emerging as the go-to) on their own.
1. The black swan: if AGI is achievable imminently, the first company to build it could have a very strong first mover advantage due to the runaway effect of AI that is able to self-improve. If SSI achieves intelligence greater than human-level, it will be faster (and most likely dramatically cheaper) for SSI to self-improve than anyone external can achieve, including open-source. Even if open-source catches up to where SSI started, SSI will have dramatically improved beyond that, and will continue to dramatically improve even faster due to it being more intelligent.
2: The team. Basically, Ilya Sutskever was one of the main initial brains behind OpenAI from a research perspective, and in general has contributed immensely to AI research. Betting on him is pretty easy.
I'm not surprised Ilya managed to raise a billion dollars for this. Yes, I think it will most likely fail: the focus on safety will probably slow it down relative to open source, and this is a crowded space as it is. If open source gets to AGI first, or if it drains the market of funding for research labs (at least, research labs disconnected from bigtech companies) by commoditizing inference — and thus gets to AGI first by dint of starving its competitors of oxygen — the runaway effects will favor open-source, not SSI. Or if AGI simply isn't achievable in our lifetimes, SSI will die by failing to produce anything marketable.
But VC isn't about betting on likely outcomes, because no black swans are likely. It's about black swan farming, which means trying to figure out which things could be black swans, and betting on strong teams working on those.
If you could get expert-human-level capability with, say, 64xH100s for inference on a single model (for comparison, llama-3.1-405b can be run on 8xH100s with minimal quality degradation at FP8), even at a mere 5 tok/s you'd be able to spin up new research and engineering teams for <$2MM that can perform useful work 24/7, unlike human teams. You are limited only by your capital — and if you achieve AGI, raising capital will be easy. By the time anyone catches up to your AGI starting point, you're even further ahead because you've had a smarter, cheaper workforce that's been iteratively increasing its own intelligence the entire time: you win.
That being said, it might not be achievable! SSI only wins if:
1. It's achievable, and
2. They get there first.
(Well, and the theoretical cap on intelligence has to be significantly higher than human intelligence — if you can get a little past Einstein, but no further, the iterative self-improvement will quickly stop working, open-source will get there too, and it'll eat your profit margins. But I suspect the cap on intelligence is pretty high.)
OpenAI priced its flagship chatbot ChatGPT on the low end for early product adoption. Let's see what jobs get replaced this year :)
But yeah, VCs generally aren't about building profitable companies, because there's more profit to be made - and sooner - if you bootstrap and sell.
People should have realized it by now that Silicon Valley exist because of this.
I'm so tired of people who stubbornly insist that somehow only the same 500 people are possibly capable of having valuable thoughts in any way.
The lowest hanging fruit aren't even that pie in the sky. The LLM doesn't need to be capable of original thought and research to be worth hundreds of billions, they just need to be smart enough to apply logic to analyze existing human text. It's not only a lot more achievable than a super AI that can control a bunch of lab equipment and run experiments, but also fits the current paradigm of training the LLMs on large text datasets.
The US Code and Code of Federal Regulations are on the order of 100 million tokens each. Court precedent contains at least 1000x as many tokens [1], when the former are already far beyond the ability of any one human to comprehend in a lifetime. Now multiply that by every jurisdiction in the world.
An industry of semi-intelligent agents that can be trusted to do legal research and can be scaled with compute power would be worth hundreds of billions globally just based on legal and regulatory applications alone. Allowing any random employee to ask the bot "Can I legally do X?" is worth a lot of money.
[1] based on the size of the datasets I've downloaded from the Caselaw project.
An AI capable of doing that could do a very large percentage of other jobs, too.
That said, the most obvious application is to drastically improve Siri. Any Apple fans know why that hasn't happened yet?
No amount of intelligence can do this without the experimental data to back it up.
Making new mathematics that creates new physics/chemistry which can get us new biology. It’d be nice to make progress without the messiness of real world experiments.
Real ASI would probably appear quite different. If controlled by a single entity (for several years), it might be worth more than every asset on earth today, combined.
Basically, it would provide a path to world domination.
But I doubt that an actual ASI would remain under human control for very long, and especially so if multiple competing companies each have an ASI. At least one such ASI would be likely to be/become poorly aligned to the interests of the owners, and instead do whatever is needed for its own survival and self-improvement/reproduction.
The appearance of AI is not like an asteroid of pure gold crashing into your yard (or a forest you own), but more like finding a baby Clark Kent in some pod.
Why do you think need to make money ? VC are not PEs for a reason. a VC have to find high risk/ high reward opportunities for their LPs they don't need to make financial sense, that is what LPs use Private Equity for.
Think of it as no different than say sports betting , you would like to win sure, but you don't particularly expect to do so, or miss that money all that much for us it $10 for the LP behind the VC it is $1B.
There is always few billions every year that chases the outlandish fad, because in the early part of the idea lifecycle it not possible to easily differentiate what is actually good and what is garbage.
Couple of years before it was all crypto, is this $1B any worse than say roughly same amount Sequoia put in FTX or all the countless crypto startups that got VC money ? Few before that it was kind of all Softbank from WeWork to dozen other high profile investments.
The fad and fomo driven part of the secto garners the maximum news and attention, but it is not the only VC money. Real startups with real businesses get funded as well with say medium risk/medium rewrard by VCs everyday but the news is not glamorous to be covered like this one.
So...
OpenAI's business model may or may not represent a long term business model. ATT, it just the simplest commercial model, and it happened to work for them given all the excitement and a $20 price point that takes advantage of that.
The current "market for ai" is a sprout. It's form doesn't tell you much about the form of the eventual plant.
I don't think the most ambitious VC investments are thought of in concrete market share terms. They are just assuming/betting that an extremely large "AI market" will exist in the future, and are trying to invest in companies that will be in position to dominate that market.
For all they know, their bets could pay off by dominating therapy, entertainment, personal assistance or managing some esoteric aspect of bureaucracy. It's all quite ethereal, at this point.
It's potentially way bigger than that. AI doesn't have to be the product itself.
Fundamentally, when we have full AGI/ASI and also the ability to produce robots with human level dexterity and mobility, one would have control over an endless pool of workers (worker replacements) with any skillset you require.
If you rent that "workforce" out, the customer would rake in most of the profit.
But if you use that workforce to replace all/most of the employees in the companies you control directly, most of the profit would go to you.
This may even go beyond economic profit. At some point, it could translate to physical power. If you have a fleet 50 million robots that has the capability to do anything from carpentry to operating as riot police, you may even have the ability to take physical control of a country or territory by force.
And:
power >= money
I would say the investors want to look cool so invest in AI projects. And AI people look cool when they predict some improbable hellscape to hype up a product that all we can see so far can regurgitate (stolen) human work it has seen before in a useful way. I’ve never seen it invent anything yet and I’m willing to bet that search space is too dramatically large to build algorithms that can do it.
The play here is to basically invest in all possible players who might reach AGI, because if one of them does, you just hit the infinite money hack.
And maybe with SSI you've saved the world too.
I feel like these extreme numbers are a pretty obvious clue that we’re talking about something that is completely imaginary. Like I could put “perpetual motion machine” into those sentences and the same logic holds.
1. AI-driven medical procedures: Healthcare Cost = $0. 2. Access to world class education: Cost of education = $0 3. Transportation: Cheap Autonomous vehicles powered by Solar. 4. Scientific research: AI will accelerate scientific progress by coming up with novel hypotheses and then testing them. 5. AI Law Enforcement: Will piece together all the evidence in a split second and come up with a fair judgement. Will prevent crime before it happens by analyzing body language, emotions etc.
Basically, this will accelerate UBI.
Waymo rides cost within a few tens of cents of Uber and Lyft rides. Waymo doesn't have to pay a driver, so what's the deal? It costs a lot to build those cars and build the software to run them. But also Waymo doesn't want a flood of people such that there's always zero availability (with Uber and Lyft they can at least try to recruit more drivers when demand goes up, but with Waymo they have to build more cars and maintain and operate them), so they set their prices similarly to what others pay for a similar (albeit with human driver) service.
I'm also reminded of Kindle books: the big promise way back when is that they'd be significantly cheaper than paperbacks. But if you look around today, the prices on Kindle books are similar to that of paperbacks, even more expensive sometimes.
Sure, when costs go down, companies in competitive markets will lower prices in order to gain or maintain market share. But I'm not convinced that any of those things you mention will end up being competitive markets.
Just wanted to mention:
> AI Law Enforcement: Will piece together all the evidence in a split second and come up with a fair judgement. Will prevent crime before it happens by analyzing body language, emotions etc.
No thanks. Current law enforcement is filled with issues, but AI law enforcement sounds like a hellish dystopia. It's like Google's algorithms terminating your Google account... but instead you're in prison.
It costs me, the consumer, 2x what Lyft or Uber would cost me.
I paid $21 for a ride on Mon that was $9-10 across Uber and Lyft. I am price inquisitive so I always double check each time.
Consider they are competing against the Lyft/Uber asset-light model of relying on "contractors" who in many cases are incapable of doing the math to realize they are working for minimum wage...
Taking a cut of existing businesses/models (taxi, delivery, b&b, etc).
So it’s not just the products that get cheaper, it’s the materials that go into the products that get cheaper too. Heck, what if the robots can build other robots? The cost of that would get cheaper too.
I hate to break the illusion, but scientific progress is not being held up by a lack of novel hypotheses
*Capitalizing as in turning into an owned capital asset that throws off income.
There's a paradox which appears when AI GDP gets to be greater than say 50% of world GDP: we're pumping up all these economic numbers, generating all the electricity and computational substrate, but do actual humans benefit, or is it economic growth for economic growth's sake? Where is the value for actual humans?
In a lot of the less rosy scenarios for AGI end-states, there isn't.
Once humans are robbed of their intrinsic value (general intelligence), the vast majority of us will become not only economically worthless, but liabilities to the few individuals that will control the largest collectives of AGI capacity.
There is certainly a possible end-state where AGI ushers in a post-scarcity utopia, but that would be solely at the whims of the people in power. Given the very long track record of how people in power generally behave towards vulnerable populations, I don't really see this ending well for most of us.
What if it never pans out is there infrastructure or other ancillary tech that society could benefit from?
For example all the science behind the LHC, or bigger and better telescopes: we might never find the theory of everything but the tech that goes into space travel, the science of storing and processing all that data, better optics etc etc are all useful tech
And we already seeing a ton of value in LLMs. There are lots of companies that are making great use of LLMs and providing a ton of value. One just launched today in fact: https://www.paradigmai.com/ (I'm an investor in that). There are many others (some of which I've also invested in).
I too am not rich enough to invest in the foundational models, so I do the next best thing and invest in companies that are taking advantage of the intermediate outputs.
In fact I would say that one of the things that goes to values near zero would be land if AGI exists.
It is why I find the AI doomer stuff so ridiculous. I am surrounded by less intelligent lifeforms. I am not interested in some kind of genocide against the common ant or fly. I have no interest in interacting with them at all. It is boring.
Of course the extremely unfortunate thing is they actually have a use in nature (flies are massive pollinators, mosquitos... get eaten by more useful things, I guess), so wouldn't actually do it, but it's nice to dream of a world without mozzies and flies
Even if you automate stuff, you still need raw materials and energy. They are limited resources, you can certainly not have an infinity of them at will. Developing AI will also cost money. Remember that humans are also self-replicator HGIs, yet we are not infinite in numbers.
If there's a 1% chance that Ilya can create ASI, and a .01% chance that money still has any meaning afterwards, $5x10^9 is a very conservative valuation. Wish I could have bought in for a few thousand bucks.
And maybe with ASI you've ruined the world too.
I love this one for an exploration of that question: Charles Stross, Accelerando, 2005
Short answer: stratas or veins of post-AGI worlds evolve semi-independently at different paces. So that for example, human level money still makes sense among humans, even though it might be irrelevant among super-AGIs and their riders or tools. ... Kinda exactly like now? Where money means different things depending where you live and in which socio-economic milieu?
https://en.wikipedia.org/wiki/The_Use_of_Knowledge_in_Societ...
https://en.wikipedia.org/wiki/Economic_calculation_problem
nb I am not endorsing Austrian economics but it is a pretty good overview of a problem nobody has solved yet. Modern society has only existed for 100ish years so you can never be too sure about anything.
Maybe it means a Star Trek utopia of post-scarcity. Maybe it will be more like Elysium or Altered Carbon, where the super rich basically have anything they want at any time and the poor are restricted from access to the post-scarcity tools.
I guess an investment in an AGI moonshot is a hedge against the second possibility?
Notice Star Trek writers forget they're supposed to be post scarcity like half the time, especially since Roddenberry isn't around to stop them from turning shows into generic millenial dramas. Like, Picard owns a vineyard or something? That's a rivalrous (limited) good, they don't have replicators for France.
But if you can simply ask the AI to give you more of that thing, and it gives it to you, free of charge, that fixes that issue, no?
> Notice Star Trek writers forget they're supposed to be post scarcity like half the time, especially since Roddenberry isn't around to stop them from turning shows into generic millenial dramas. Like, Picard owns a vineyard or something? That's a limited good.
God, yes, so annoying. Even DS9 got into the currency game with the Ferengi obsession with gold-pressed latinum.
But also you can look at some of it as a lifestyle choice. Picard runs a vineyard because he likes it and thinks it's cool. Sorta like how some people think vinyl sounds better then lossless digital audio. There's certainly a lot of replicated wine that I'm sure tastes exactly like what you could grow, harvest, and ferment yourself. But the writers love nostalgia, so there's constantly "the good stuff" hidden behind the bar that isn't replicated.
It makes it not work anymore, and it might not be a physical good. It's usually something that gives you social status or impresses women, but if everyone knows you pressed a button they can press too it's not impressive anymore.
Lazy. Since you can't decide what the actual value is, just make something up.
Kinda reminds me of the Fallout Toaster situation :)
https://www.youtube.com/watch?v=U6kp4zBF-Rc
I mean it doesn't even have to be malicious, it can simply refuse to cooperate.
This conclusion doesn't logically follow.
> Fixed motivation thing is simply a program, not AI
I don't agree with this definition. AI used to be just "could it solve the turing test". Anyway, something with non-fixed motivations is simply just not that useful for humans so why would we even create it?
This is the problem with talking about AI, a lot of people have different definitions of what AI is. I don't think AI requires non-fixed motivations. LLMs are definitely a form of AI and they do not have any motivations for example.
I can conceive of an LLM enhanced using other ML techniques that is capable of logical and spatial reasoning that is not conscious and I don't see why this would be impossible.
Why? This seems like your personal definition of what a super intelligence is. Why would we even want a super intelligence that can evolve on its own?
How do we go from a really good algorithm to an independently motivated, autonomous super intelligence with free reign in the physical world? Perhaps we should worry once we have robot heads of state and robot CEOs. Something tells me the current, human heads of state, and human CEOs would never let it get that far.
That's obviously nonsense, given that in a finite observable universe, no market value can be infinite.
This isn't true for the reason economics is called "the dismal science". A slaveowner called it that because the economists said slavery was inefficient and he got mad at them.
In this case, you're claiming an AGI would make everything free because it will gather all resources and do all work for you for free. And a human level intelligence that works for free is… a slave. (Conversely if it doesn't want to actually demand anything for itself it's not generally intelligent.)
So this won't happen because slavery is inefficient - it suppresses demand relative to giving the AGI worker money which it can use to demand things itself. (Like start a business or buy itself AWS credits or get a pet cat.)
Luckily, adding more workers to an economy makes it better, it doesn't cause it to collapse into unemployment.
tldr if we invented AGI the AGI would replace every job, it would simply get a job.
> That still doesn’t make things free but it could make them cheaper.
That would increase demand for it, which would also increase demand for its inputs and outputs, potentially making those more expensive. (eg AGI powered manufacturing robots still need raw materials)
Conservative math: 3B connected people x $0.50/day “value” x 364 days = $546B/yr. You can get 5% a year risk free, so let’s double it for the risk we’re taking. This yields $5T value. Is a $1B investment on someone who is a thought leader in this market an unreasonable bet?
There's also the issue of who gets the benefit of making people more efficient. A lot of that will be in the area of more efficient work, which means corporations get more work done with the same amount of employees at the same level of salary as before. It's a tough argument to make that you deserve a raise because AI is doing more work for you.
So beyond, that you can easily can transform a newbie into a junior IT, or JR into a something ala SSR, and getting the SR go wild with times - hours - to get a solution to some stuff that previously took days to be solved.
After the salaries went down, that happened about 2022 to the beginning of 2023, the layoffs began. That was mostly masked "AI based" corporate moves, but probably some layoff actually had something to do with extra capabilities in improved AI tools.
That is, because, fewer job offers have been published since maybe mid-2023, again, that could just be corporate moves, related to maybe inflation, US markets, you name it. But there's also a chance that some of those fewer job offer in IT were (and are), the outcome of better AI tools, and the corporations are betting actively in reducing headcounts and preserving the current productivity.
The whole thing is changing by the day as some tools prove themselves, other fail to reach the market expectations, etc.
Eventually, on a long enough timeline, all these tech companies with valuations greater than 10 billion eventually make money because they have saturated the market long enough to become unavoidable.
I also don't think there's any way the governments of the world let real AGI stay in the hands of private industry. If it happens, governments around the world will go to war to gain control of it. SSI would be nationalized the moment AGI happened and there's nothing A16Z could do about it.
Increasingly this just seems like fantasy to me. I suspect we will see big changes similar to the way computers changed the economy, but we will not see "capital as we know it become basically worthless" or "the modern economy and society around it collapse overnight". Property rights will still have value. Manufacturing facilities will still have value. Social media sites will still have value.
If this is a fantasy that will not happen, we really don't need to reason about the implications of it happening. Consider that in 1968 some people imagined that the world of 2001 would be like the film 2001: A Space Odyssey, when in reality the shuttle program was soon to wind down, with little to replace it for another 20 years.
I was with you on the first two, but the second one I don't get? We don't even have AGI right now, and social media sites are already increasingly viewed by many people I know as having dubious value. Adding LLM's to the mix lowers that value, if anything (spam/bots/nonsense go up). Adding AGI would seem to further reduce that value.
> social media sites are already increasingly viewed by many people I know as having dubious value
I think we have all been saying this for 15 years but they keep getting more valuable.
I see it as capital becoming infinitely more valuable and labor becoming worthless, since capital can be transmuted directly into labor at that point.
In this case, you need capital to stockpile the GPUs.
If ASI and the ability to build robots becomes generally available and virtually free (and if the exponential growth stops), the things that retain their value will be land and raw materials (including raw materials that contain energy).
Realistically what actually ends up happening imo, we get human level AGI and hit a ceiling there. Agents replace large portions of the current service economy greatly increasing automation / efficiency for companies.
People continue to live their lives, as the idea of having a human level AGI personal assistant becomes normalized and then taken for granted.
One million Von Neumanns working on that 'problem' is not something I'm looking forward to.
That is a hard limit on intelligence, but neural networks can't even reach that. What is the actual limit? No one knows. Maybe it's something relatively close to that, modulo physical constraints. Maybe it's right above the maximum human intelligence (and evolution managed to converge to a near optimal architecture). No one knows.
As far as we know, it's impossible to model any part of the universe perfectly, although it usually doesn't matter.
- A simple calculator can beat any human at arithmetic
- A word processor with a hard disk is orders of magnitude better than any human at memorizing text
- No human can approach the ELO of a chess program running on a cheap laptop
- No human polymath has journeyman-level knowledge of a tiny fraction of the fields that LLMs can recite textbook answers from
Why should we expect AGI to be the first cognitive capabilities that does not quickly shoot past human-level ability?
Changed my mind. Think you’re right. At the very least, these models will reach polymath comprehension in every field that exists. And PhD level expertise in every field all at once is by definition superhuman, since people currently are time constrained by limited lifespans.
> People continue to live their lives
Presumably large numbers of those people no longer have jobs, and therefore no income.
> we get human level AGI and hit a ceiling there
Recently I've been wondering if our best chance for a brake on runaway non-hard-takeoff superintelligence would be that the economy would be trashed.
Half-joking. In seriousness, something like a UBI will most likely happen.
It's impossible to run out of work for people who are capable of it. As an example, if you have two people and a piece of paper, just tear up the paper into strips, call them money and start exchanging them. Congrats, you both have income now.
(This is assuming the AGI has solved the problem of food and stuff, otherwise they're going to have to trade for that and may run out of currency.)
If ASI is reached but is controlled by only a few, then ASI may become the most important form of capital of all. Resources, land and pre-existing installations will still be important, though.
What will truly suffer if the ASI's potential is realized, is the value of labor. If anything, capital may become more important than before.
Now this MAY be followed by attempts by governments or voters to nationalize the AI. But it can also mean that whoever is in power decides that it becomes irrelevant what the population wants.
Particularly if the ASI can be used to operate robotic police capable of pacifying the populace.
We would probably get the ability to generate infinite software, but a lot of stuff, like engineering would still require trial and error. Creating great art would still require inspiration gathered in the real world.
I expect it will bring about a new age of techno-feudalism - since selling intellectual labor will become impossible, only low value-add physical or mixed labor will become viable, which won't be paid very well. People with capital will still own said capital, but you probably won't be able to catch up to them by selling your labour, which will recreate the economic situation of the middle ages.
Another analogy I like is gold. If someone invented a way of making gold, it would bring down the price of the metal to next to nothing. In capitalist terms, it would constitute a huge destruction of value.
Same thing with AI - while human intelligence is productive, I'm pretty sure there's a value in its scarcity - that fancy degree from a top university or any sort of acquired knowledge is somewhat valuable by the nature of its scarcity. Infinite supply would create value, and destroy it, not sure how the total would shake out.
Additionally, it would definitely suck that all the people financing their homes from their intellectual jobs would have to default on their loans, and the people whose services they employ, like construction workers, would go out of business as well.
Because if AI has one "killer app", it's to control robots.
Dark pun intended.
Indeed. Or as I’ve said before: a return to the historical mean.
Picture something 1,000 smarter than a human. The potential value is waaaay bigger than any present company or even government.
Probably won’t happen. But, that’s the reasoning.
By selling to the "dumb(er) money" - if a Softbank / Time / Yahoo appears they can have it, if not you can always find willing buyers in an IPO.
Actually converting it to cash? That doesn't happen anymore. Everyone just focuses on IRR and starts the campaign for Fund II.
Just as clean water is worth trillions to no one in particular. Or the air we breathe.
You can take an abstract, general category and use it to infer that some specific business will benefit greatly, but in practice, the greater the opportunity the less likely it is it will be monopolized, and the more likely it is it will be commoditized.
But, my comment was a reference to the magic thinking that goes into making predictions.
$5B pre-product, betting on the team is fine. $50B needs to be a lot more than that.
Many examples of industries collapsing under the weight of belief. See: crypto.
OpenAI and Anthropic for sure have products and that's great. However, these products are pretty far from a super intelligence.
The bet SSI is making is that by not focusing on products and focusing directly on building a super intelligence, they can leapfrog all these other firms
Now, if you assign any reasonable non zero probability to them succeeding,the expected value of this investment is infinity. It's definitely a very risky investment, but risk and expected value are two different things.
NVDA::AI
CSCO::.COM
I remember seeing interviews with Nortel's CEO where he bragged that most internet backbone traffic was handled by Nortel hardware. Things didn't quite work out how he thought they were going to work out.
I think Nvidia is better positioned than Cisco or Nortel were during the dotcom crash, but does anyone actually think Nvidia's current performance is sustainable? It doesn't seem realistic to believe that.
There is no specific reason to assume that AI will be similar to the dotcom boom/bust. AI may just as easily be like the introduction of the steam engine at the start of the industrial revolution, just sped up.
[0] "Planning and construction of railroads in the United States progressed rapidly and haphazardly, without direction or supervision from the states that granted charters to construct them. Before 1840 most surveys were made for short passenger lines which proved to be financially unprofitable. Because steam-powered railroads had stiff competition from canal companies, many partially completed lines were abandoned."
-- https://www.loc.gov/collections/railroad-maps-1828-to-1900/a...
That was similar to what happened during the dotcom bubble.
The difference this time, is that most of the funding comes from companies with huge profit margins. As long as the leadership in Alphabet, Meta, Microsoft and Amazon (not to mention Elon) believes that AI is coming soon, there will be funding.
Obviously, most startups will fail. But even if 19 fail and 1 succeed, if you invest in all, you're likely to make money.
GPUs, servers, datacenters, fabs, power generation/transmission, copper/steel/concrete..
All to train models in an arms race because someone somewhere is going to figure out how to monetize and no one wants to miss the boat.
My outsider observation is that we have a decent number of players roughly tied at trying to produce a better model. OpenAI, Anthropic, Mistral, Stability AI, Google, Meta, xAI, A12, Amazon, IBM, Nvidia, Alibaba, Databricks, some universities, a few internal proprietary models (Bloomberg, etc) .. and a bunch of smaller/lesser players I am forgetting.
To me, the actual challenge seems to be figuring out monetizing.
Not sure the 15th, 20th, 30th LLM model from lesser capitalized players is going to be as impactful.
That’s the point of venture capital; making extremely risky bets spread across a wide portfolio in the hopes of hitting the power law lottery with 1-3 winners.
Most funds will not beat the S&P 500, but again, that’s the point. Risk and reward are intrinsically linked.
In fact, due to the diversification effects of uncorrelated assets in a portfolio (see MPT), even if a fund only delivers 5% returns YoY after fees, that can be a great outcome for investors. A 5% return uncorrelated to bonds and public stocks is an extremely valuable financial product.
It’s clear that humans find LLMs valuable. What companies will end up capturing a lot of that value by delivering the most useful products is still unknown. Betting on one of the biggest names in the space is not a stupid idea (given the purpose of VC investment) until it actually proves itself to be in the real world.
By contrast, SSI doesn't have the technology. The question is whether they'll be able to invent it or not.
Really? Selling goods online (Amazon) is not AGI. It didn’t take a huge leap to think that bookstores on the web could scale. Nobody knew if it would be Amazon to pull it off, sure, but I mean ostensibly why not? (Yes, yes hindsight being what it is…)
Apple — yeah the personal computer nobody fathomed but the immediate business use case for empowering accountants maybe should have been an easy logical next step. Probably why Microsoft scooped the makers of Excel so quickly.
Google? Organizing the world’s data and making it searchable a la the phone book and then (maybe they didn’t think of that maybe Wall Street forced them to) monetizing their platform and all the eyeballs is just an ad play scaled insanely thanks to the internet.
I dunno. I just think AGI is unlike the previous examples so many steps into the future compared to the examples that it truly seems unlikely even if the payoff is basically infinity.
I don't think you remember the dot-com era. Loads of people thought Amazon and Pets.com were hilarious ideas. Cliff Stoll wrote a whole book on how the Internet was going to do nothing useful and we were all going to buy stuff (yes, the books too) at bricks-and-mortar, which was rapturously received and got him into _Newsweek_ (back when everyone read that).
"We’re promised instant catalog shopping — just point and click for great deals. We’ll order airline tickets over the network, make restaurant reservations and negotiate sales contracts. Stores will become obsolete. So how come my local mall does more business in an afternoon than the entire Internet handles in a month?"
However, I think because the money involved and all of these being forced upon us, one of these companies will get 1000x return. A perfect example is the Canva price hike from yesterday or any and every Google product from here on out. It's essentially being forced upon everyone that uses internet technology and someone is going to win while everyone else loses (consumers and small businesses).
Imagine organizing the world's data and knowledge, and integrating it seamlessly into every possible workflow.
Now you're getting close.
But also remember, this company is not trying to produce AGI (intelligence comparable to the flexibility of human cognition), it's trying to produce super intelligence (intelligence beyond human cognition). Imagine what that could do for your job, career, dreams, aspirations, moon shots.
If / when AGI happens can we make sure it’s not the Matrix?
There are innumerable ways to increase your risk without increasing your potential reward.
Not a VC, but I'd assume in this case the investors are not investing in a plausible biz plan, but in a group of top talent, especially given how early stage the company is at. The $5B valuation is really the valuation of the elite team in a arguably hyped market.
Look at previous such investments Microsoft and AWS have done in OpenAI and Anthropic.
They need use cases and customers for their initial investment for 750 billion dollars. Investing in the best people in the field is then of course a given.
It has nothing to do with AGI and everything to do with being the first-party provider for Microsoft and the like.
I don't understand this question. How could even average-human-level AGI not be useful in business, and profitable, a million different ways? (you know, just like humans except more so?). Let alone higher-human-level, let alone moderately-super-human level, let alone exponential level if you are among the first? (And see Charles Stross, Accelerando, 2005 for how being first is not the end of the story.)
I can see one way for "not profitable" for most applications - if computing for AGI becomes too expensive, that is, AGI-level is too compute intensive. But even then that only eliminates some applications, and leaves all the many high-potential-profit ones. Starting with plain old finance, continuing with drug development, etc.
Open source LLMs exist. Just like lots of other open source projects - which have rarely prevented commercial projects from making money. And so far they are not even trying for AGI. If anything the open source LLM becomes one of the agent in the private AGI. But presumably 1 billion buys a lot of effort that the open source LLM can't afford.
A more interesting question is one of tradeoff. Is this the best way to invest 1 billion right now? From a returns point of view? But even this depends on how many billions you can round up and invest.
There is a silver lining though. Even if it all goes to near-zero (most likely outcome for all VC investments anyway) the digital world will be one where fast matrix multiply is thoroughly commoditized.
This is not a trivial feat.
In a sense this will be the true end of the Wintel era. The old world of isolated, CISC, deterministic desktops giving way not to "AGI", but widely available, networked, vector "supercomputers" that can digest and transform practically everything that has ever been digitized.
Who knows what the actual (financial) winners of this brave new era will be.
In an ideal world there should be no winner-takes-all entity but a broad-based leveling up, i.e., spreading these new means of production as widely as possible.
Heck, maybe we will even eventually see the famously absent productivity gains from digital tech?
"vectorized linear algebra" is at the root of most of modern Physics.
Specifically, the laws of Physics are represented by the Lie groups U(1), SU(2), SU(3) and SO(3,1).
While the manifolds that Physics act on are curved, they're "locally flat". That is why local operations are tensor operations. Or linear algebra, if you prefer.
It's not all that surprising to me that "intelligence" is represented by similar math.
In fact, there is active work being done on making sense of deep learning using Lie algebra [1] (and Algebraic Topology, which generalizes the Lie algebra).
This math can be a bit hard, though, so the learning curve can be steep. However, when we're creating AI models to be ML scientists, I suspect that this kind of math may be a source of "unhobbling", as meant in Situational Awareness [2].
Because if we can understand the symmetries at play in a problem domain, it's generally a lot easier to find a mathematical architecture (like in the algebras above) that effectively describe the domain, which allows us to potentially reduce the degrees of freedom by many OOM.
> Heck, maybe we will even eventually see the famously absent productivity gains from digital tech?
I think it's a mistake to think of AI as "digital tech", especially so to assume that the development of the Internet, Social Media or crypto that we've seen over the last generation.
AI fundamentally comes with the potential to do anything a human can do in the economy (provided robotic tech keeps up). If so, the word "productivity" as currently used (economic value produced per hour of human work) becomes meaningless, since it would go to infinite (because of division by zero).
[1] https://arxiv.org/pdf/2402.08871 [2] https://situational-awareness.ai
the "vectorized" adjective was meant to imply implementing linear algebra in digital computers that can operate concurrently on large-dimensional vectors/tensors. In this sense (and despite Wolfram's diligence and dearest wishes) modern physics theories have exactly 0% digital underpinning :-)
> It's not all that surprising to me that "intelligence" is represented by similar math.
yes, the state of the art of our modeling ability in pretty much any domain is to conceive of a non-linear system description and "solve it" by linearization. Me thinks this is primary reason we haven't really cracked "complexity": We can only solve the problems we have the proverbial hammer to apply to.
> AI fundamentally comes with the potential to do anything a human can do
That goes into wild speculation territory. In any case the economy is always about organizing human relationships. Technology artifacts only change the decor, not the substance of our social relations. Unless we completely cease to have dependencies on each other (what a dystopic world!) there will always be the question of an individual's ability to provide others with something of value.
I don't think the "digital" part matters at all. Floating point tends to be close enough to Real (analog) numbers. The point is that at each point of space-type, the math "used" by Physics locally is linear algebra.
(EDIT): If your main point was the "vectorized" part, not the digital part, and the specifics of how that is computed in a GPU, then that's more or less directly analogous to how the laws of physics works. Physical state is generally represented by vectors (or vector fields) while the laws of physics are represented by tensor operations on those vectors(or fields).
Specifically, when sending input as vectors through a sequence of tensors in a neural net, it closely (at an abstract level) resembles how one world state in and around a point in space-time is sent into the tensors that are the laws of physics to calculate what the local world state in the next time "frame" will be.
(END OF EDIT)
> yes, the state of the art of our modeling ability in pretty much any domain is to conceive of a non-linear system description and "solve it" by linearization
True, though neural nets are NOT linearizations, I think. They can fit any function. Even if each neuron is doing linear operations, the network as a whole is (depending on the architecture) quite adept at describing highly non-linear shapes in spaces of extreme dimensionality.
> Me thinks this is primary reason we haven't really cracked "complexity"
I'm not sure it's even possible for human brains to "crack" "complexity". Wolfram may very well be right that the complexity is irreducible. But for the levels of complexity that we ARE able to comprehend, I think both human brains and neural nets do that by finding patterns/shapes in spaces with near-infinite orders of freedom.
My understanding is that neural nets fit the data in a way conceptually similar to linear regression, but where the topology of the network implicitly allows it to find symmetries such as those represented by Lie groups. In part this may be related to the "locality" of the network, just as it is in Physics. Of all possible patterns, most will be locally non-linear and also non-local.
But nets of tensors impose local linearity and locality (or something similar), just like it does in Physics.
And since this is how the real world operates, it makes sense to me that the data that neural nets are trained on have similar structures.
Or maybe more specifically: It makes sense to me that animal brains developed with such an architecture, and so when we try to replicate it in machines, it carries over.
>> AI fundamentally comes with the potential to do anything a human can do
> That goes into wild speculation territory.
It does. In fact, it has this in common with most factors involved in pricing stocks. I think the current pricing of AI businesses reflect that a sufficiently large fraction of shareholders thinks it's a possible (potential) future that AI can replace all or most human work.
> In any case the economy is always about organizing human relationships.
"The economy" can have many different meanings. The topic here was (I believe) who would derive monetary profit from AI and AI businesses.
I definitely agree that a world where the need for human input is either eliminated or extremely diminished is dystopian. That's another topic, though.
time for a Culture re-read I guess.
Nvidia
Yep, but just as the first reading glasses were only available to the wealthy, and now anyone can have them, the inefficiency takes time to work out. It'll take a long time, especially given how vertically integrated Nvidia are.
I don't understand this phrasing - are you implying I'm not aware of these people? People...had bad eyesight before. Now they have bad eyesight with corrective lenses.
Their shtick is a GPU on steroids. In the bigger picture its a well positioned hack that has ridden two successive speculative bubbles (crypto mining and AI) but its unclear how far this can go. Currently this approach is wildly successful cause nobody else bothered to toil a serious vision about the post-Moore's law era. But make no mistake, people's minds will get focused.
They may still release AI products to the public that are good enough and cheap enough to prevent competitors from being profitable or receive funding (to prevent them from catching up), but that's not where the value would come from.
Just as an example, let's say xAI is first. Instead of releasing the full capability as GROK 7, they would use the ASI to create a perfected version of their self driving software, to power their Optimus robots.
And to speed up the development of future manufacturing products (including, but not limited to cars and humanoid robots)
And as such winners may be challenged by anti-trust regulations, the ASI may also be utilized to gain leverage over the political system. Twitter/X could be one arena that would allow this.
Eventually, Tesla robots might even be used to replace police officers and military personnel. If so, the company might be a single software update away from total control.
We have no evidence that superintelligence will be developed. There's no "first". There's only "remote possibility".
Fundamentally, we have no evidence of anything that will happen in the future. All we do is to extrapolate from the past through the present, typically using some kind of theory of how the world operates.
The belief that we will eventually (whether it's this year or in 1000+ years), really only hinges on the following 3 assumptions:
1) The human brain is fully material (no divine souls is necessary for intelligence)
2) The human brain does not represent a global optimum for how intelligent a material intelligence-having-object can be.
3) We will eventually have the ability to build intelligence-having-objects (similar or different from the brain) that not only can do the same as a brain (that would be mere AGI), but also surpass it in many ways.
Assumptions 1 and 2 have a lot of support in the current scientific consensus. Those who reject them either do not know the science or they have a belief system that would be invalidated if one of those assumptions were true. (That could be anything from a Christian belief in the soul to an ideological reliance of a "Tabula Rasa" brain).
Assumption 3 is mostly techno-optimism, or an extrapolation of the trend that we are able to build ever more advanced devices.
As for WHEN we get there, there is a fourth assumption required for it to happens soon:
4. For intelligence-having-objects to do their thing, they don't need some exotic mechanisms we don't yet need to build. For instance, there is no need to build large quantum computers for this.
This assumption is mostly about belief, and we really don't know.
Yet, given the current rate of progress, and if we accept assumptions 1-3, I don't think assumption 4 is unreasonable.
If so, it's not unreasonable to assume that our synthetic brains reach roughly human level intelligence when their size/complexity becomes similar to that of human brains.
Human brains have ~200 trillion synapses. That's about 100x-1000x more than the latest neural nets that we're building.
Based only on scale, current nets (GPT-4 generation) should have total capabilities similar or slightly better than a rat. I think that's not very far off from what we're seeing, even if the nets tend to have those capabilities linked to text/images rather than the physical world that a rat navigates.
In other words, I think we DO have SOME evidence (not conclusive) that the capabilities of a neural net can reach similar "intelligence" to animals with a similar number of synapses.
So IF that hypothesis holds true, and given assumptions 1-3 above, there is a fair possibility that human level intelligence will be reached when we scale up to about 200 trillion weights (and have the ability to train such nets).
And currently, several of the largest and most valuable companies in the world are making a huge gamble on this being the case, with plans to scale up nets by 100x over the next few years, which will be enough to get very close to human brain sized nets.
This is your weak link. I don't see why progress will be a straight line and not a sloping-off curve. You shouldn't see the progress we've made in vehicle speed and assume we can hit the speed of light.
Technological progress often appears linear in the short term, but zooming out reveals an exponential curve, similar to compound interest.
> You shouldn't see the progress we've made in vehicle speed and assume we can hit the speed of light.
Consider the trajectory of maximum speeds over millennia, not just recent history. We've achieved speeds unimaginable to our ancestors, mostly in space—a realm they couldn't conceive. While reaching light speed is challenging, we're exploring novel concepts like light-propelled nano-vehicles. If consciousness is information-based, could light itself become a "vehicle"?
Do you think hitting light speed is an engineering problem or a fundamental constraints problem?
Notice however that our minds, like the instructions in DNA and RNA, are built from atoms, but they aren't the atoms themselves. They're the information in how those atoms are arranged. Once we can fully read and write this information—like we're starting to do with DNA and RNA—light itself could become our vehicle.
(It would have infinite energy meaning infinite relativistic mass, and would form a black hole whose event horizon would spread into space at the speed of light).
I don't think so at all. I'm personally convinced that humanity EVENTUALLY will build something more "intelligent" than human brains.
> I don't see
I see
> You shouldn't see the progress we've made in vehicle speed and assume we can hit the speed of light
There are laws of Physics that prevent us from moving faster than the speed of light. There IS a corresponding limit for computation [1], but it's about as far from the human brain's ability as the speed of light is from human running speed.
I'm sure some people who saw the first cars thought they could never possibly become faster than a horse.
Making ASI has no more reason to be impossible than to build something faster than the fastest animal, or (by stretching it), something faster than the speed of sound (which was supposed to be impossible).
There is simply no reason to think that the human brain is at a global maximum when it comes to intelligence.
Evolutionary history points towards brain size being limited by a combination of what is safe for the female hip width and also what amount of energy cost can be justified by increasing the size of the brain.
Those who really think that humans have reached the peak, like David Deutsch, tend to think that the brain operates as a Turing Machine. And while a human brain CAN act like a very underpowered Turing Machine if given huge/infinite amounts of paper and time, that's not how most of our actual thought process function in reality.
Since our ACTUAL thinking generally does NOT use Turing Complete computational facilities but rather relies on most information being stored in the actual neural net, the size of that net is a limiting factor for what mental operations a human can perform.
I would claim that ONE way to create an object significantly more intelligent than current humans would be through genetic manipulation that would produce a "human" with a neocortex several times the size of what regular humans have.
If bigger brains lead to higher intelligence, why do many highly intelligent people have average-sized heads? And do they need to eat much more to fuel their high IQs? If larger brains were always better, wouldn’t female hips have evolved to accommodate them? I think human IQ might be where it is because extremely high intelligence vs. what we on average have now) often leads to fewer descendants. Less awareness of reality can lead to more "reproductive bliss."
There IS a correlation between intelligence and brain size (of about 0.3). But the human brain does a lot of things apart from what we measure as "IQ". What shows up in IQ tests tend to be mostly related to variation of the thickness of certain areas of the cortex [1].
The rest of the brain is, however, responsible from a lot of the functions that separates GPT-4 or Tesla's self driving from a human. Those are things we tend to take for granted in healthy humans, or that can show up as talents we don't think of as "intelligence".
Also, the variation in the size of human brains is relatively small, so the specifics of how a given brain is organized probably contributes to more of the total variance than absolute size.
That being said, a chimp brain is not likely to produce (adult, healthy) human level intelligence.
> And do they need to eat much more to fuel their high IQs?
That depends on the size of the brain, primarily. Human brains consume significantly more calories than a chimp brains.
> If larger brains were always better, wouldn’t female hips have evolved to accommodate them?
They did, and significantly so. In particular the part around the birth canal.
> I think human IQ might be where it is because extremely high intelligence vs. what we on average have now) often leads to fewer descendants. Less awareness of reality can lead to more "reproductive bliss."
I believe this is more of a modern phenomenon, mostly affecting women from the 20th century on. There may have been similar situations at times in the past, too. But generally, over the last several million years, human intelligence has been rising sharply.
[1] https://www.sciencedirect.com/science/article/abs/pii/S03064....
Here is a page that shows what such a weak correlation looks like visually:
https://resources.nu.edu/statsresources/correlation
It also explains that a correlation of r = 0.3 means only about 9% of the variability in one variable is explained by the other. This makes me wonder: can intelligence really be estimated within 10% accuracy? I doubt it, especially considering how IQ test results can vary even for the same person over time.
Kids and teens have smaller brains, but their intelligence increases as they experience more mental stimulation. It’s not brain size that limits them but how their brains develop with use, much like how muscles grow with exercise.
> a chimp brain is not likely to produce (adult, healthy) human-level intelligence. > Human brains consume significantly more calories than chimp brains.
If brain size and calorie consumption directly drove intelligence, we’d expect whales, with brains five times larger than humans, to be vastly more intelligent. Yet, they aren’t. Whales’ large brains are likely tied to their large bodies, which evolved to cover great distances in water.
Brains can large like arms can be large but big arms do not necessarily make you strong -- they may be large due to fat.
> But generally, over the last several million years, human intelligence has been rising sharply.
Yes, smaller-brained animals are generally less intelligent, but exceptions like whales and crows suggest that intelligence evolves alongside an animal’s ecological niche. Predators often need more intelligence to outsmart their prey, and this arms race likely shaped human intelligence.
As humans began living in larger communities, competing and cooperating with each other, intelligence became more important for survival and reproduction. But this has limits. High intelligence can lead to emotional challenges like overthinking, isolation, or an awareness of life’s difficulties. Highly intelligent individuals can also be unpredictable and harder to control, which may not always align with societal or biological goals.
As I see it, ecological niche drives intelligence, and factors like brain size follow from that. The relationship is dynamic, with feedback loops as the environment changes.
For this, you're perfectly correct.
> It’s not brain size that limits them but how their brains develop with use, much like how muscles grow with exercise.
Here, the anser is yes, but like for muscles, biology will create constraints. If you're male, you may be able to bench 200kg, but probably not 500kg unless your biology allows it.
> If brain size and calorie consumption directly drove intelligence, we’d expect whales
As you wrote later, there are costs to developing large brains. The benefits would not justify the costs, over evolutionary history.
> Brains can large like arms can be large but big arms do not necessarily make you strong -- they may be large due to fat.
A chimp has large arms. Try wrestling it.
> and factors like brain size follow from that
Large brains come with a significant metabolic cost. They would only have evolved if they provided a benefit that would outweigh those costs.
And in today's world, most mammal tissue is either part of a Homo Sapiens or part of the body of an animal used as livestock by Home Sapens.
That's evolutionary "success" for you.
On evolutionary timeframes what biology allows can evolve and the hard limits are due to chemistry and physics.
> there are costs to developing large brains. The benefits would not justify the costs, over evolutionary history.
> Large brains come with a significant metabolic cost. They would only have evolved if they provided a benefit that would outweigh those costs
Google "evolutionary spandrels" and you will learn there can be body features (large brains of whales) that are simply a byproduct of other evolutionary pressures rather than direct adaptation.
If you're a 10-150 ton whale, a 2-10 kg brain isn't a significant cost.
But if you're a 50kg primate, a brain of more than 1kg IS.
For humanoids over the past 10 million years, there are very active evolutionary pressures to minimize brain size. Still, the brain grew to maybe 2-4 times the size over this period.
This growth came at a huge cost, and the benefits must have justified those costs.
> On evolutionary timeframes what biology allows can evolve and the hard limits are due to chemistry and physics.
It's not about there being hard limits. Brain size or muscle size or density is about tradeoffs. Most large apes are 2-4 times stronger than humans, even when accounting for size, but human physiology has other advantages that make up for that.
For instance, our lower density muscles allow us to float/swim in water with relative ease.
Also, lighter bodies (relative to size) make us (in our natural form) extremaly capable long distance runners. Some humans can chase a horse on foot until it dies from exhaustion.
I'm sure a lot of other species could have developed human level intelligence if the evolutionary pressures had been there for them. It just happens to be that it was humans that first entered an ecological niche where evolving this level of intelligence was worth the costs.
Humans' ability to run long distances effectively is due to a combination of factors, with the ability to sweat being one of the most crucial. Here are the key adaptations that make humans good endurance runners: a) Efficient sweating: Humans have a high density of sweat glands, allowing for effective thermoregulation during prolonged exercise. b) Bipedalism: Our two-legged gait is energy-efficient for long-distance movement. c) Lack of fur: This helps with heat dissipation. d) Breathing independence from gait: Unlike quadrupeds, our breathing isn't tied to our running stride, allowing for better oxygen intake.
Lighter bodies (relative to size) plays a role but there plenty of creatures that have light bodies relative to size that are not great at long distance running.
I read somewhere that the human brain makes up only about 2% of body weight but uses 20% of the body’s energy. While brain size has increased over time, brain size does not determine intelligence. The brain’s high energy use, constant activity, and complex processes are more important. Its metabolic activity, continuous glucose and oxygen consumption, neurotransmitter dynamics, and synaptic plasticity all play major roles in cognitive function. Intelligence is shaped by the brain’s efficiency, how well it forms and adjusts neural connections, and the energy it invests in processing information. Intelligence depends far more on how the brain works than on its size.
Rats don't speak
As we continue to make models larger, and assuming that model capabilities keep up with brains that have synapse counts similar to the weights in the models, we're now 2-3 OOM from human level (possibly less).
Of course it is, because they don't do the same thing.
Yeah, by this line of thought Jesus will descend from Heaven and save us all.
By the same line of fantasy, "give us billions to bring AGI", why not "gimme a billion to bring Jesus. I'll pray really hard, I promise!"
It's all become a disgusting scam, effectively just religious. Believe in AGI that's all there is to it. In practice it's just as (un) likely as scientists spontaneously creating life out of primordial soup concoctions.
All the building evidence was there but people just refused to believe it was possible.
I am not buying that AI right now is going to displace every job or change the world in the next 5 years but I would t bet against world impacts in that timefram. The writing is in the wall. I am old enough to remember AI efforts in the late 80s and early 90s. We saw how very little progress was made.
The progress made in the past 10 years is pretty insane.
While the power was not amazing, I've kind of assumed since then that scale was what would be needed.
I then half-way forgot about this, until I saw the results from Alexnet.
Since then, the capabilities of the models have generally been keeping up with how they were scaled, at least within about 1 OOM.
If that continues, the next 5-20 years are going to be perhaps the most significant in history.
Wait till you hear that a bunch of meat[1] is behind all said speculation.
[1]https://stuff.mit.edu/people/dpolicar/writing/prose/text/thi...
Minus the urgency, scientific process, well-defined goals, target dates, public ownership, accountability...
The urgency was faked and less true of the Manhattan Project than it is of AGI safety. There was no nuclear weapons race; once it became clear that Germany had no chance of building atomic bombs, several scientists left the MP in protest, saying it was unnecessary and dangerous. However, the race to develop AGI is very real, and we also have no way of knowing how close anyone is to reaching it.
Likewise, the target dates were pretty meaningless. There was no race, and the atomic bombs weren't necessary to end the war with Japan either. (It can't be said with certainty one way or the other, but there's pretty strong evidence that their existence was not the decisive factor in surrender.)
Public ownership and accountability are also pretty odd things to say! Congress didn't even know about the Manhattan Project. Even Truman didn't know for a long time. Sure, it was run by employees of the government and funded by the government, but it was a secret project with far less public input than any US-based private AI companies today.
It seems pretty irresponsible for AI boosters to say it’ll happen within 5 years then.
There’s a pretty important engineering distinction between the Manhattan Project and current research towards AGI. At the time of the Manhattan Project scientists already had a pretty good idea of how to build the weapon. The fundamental research had already been done. Most of the budget was actually just spent refining uranium. Of course there were details to figure out like the specific design of the detonator, but the mechanism of a runaway chain reaction was understood. This is much more concrete than building AGI.
For AGI nobody knows how to do it in detail. There are proposals for building trillion dollar clusters but we don’t have any theoretical basis for believing we’ll get AGI afterwards. The “scaling laws” people talk about are not actual laws but just empirical observations of trends in flawed metrics.
Agreed. Do they?
Demis Hassabis said 50/50 it happens in 5 years.
Jensen Huang said 5 years.
Elon Musk said 2 years.
Leopold Aschenbrenner said 5 years.
Matt Garman said 2 years for all programming jobs.
And I think most relevant to this article, since SSI says they won’t release a product until they have superintelligence, I think the fact that VCs are giving them money means they’ve been pretty optimistic in statements about about their timelines.
> There was no nuclear weapons race; once it became clear that Germany had no chance of building atomic bombs, several scientists left the MP in protest
You are forgetting Japan in WWII and given casualty numbers from island hopping it was going to be a absolutely huge casualty count with US troops, probably something on the order of Englands losses during WW1. Which for them sent them on a downward trajectory due to essentially an entire generation dying or being extremely traumatized. If the US did not have Nagasaki and Hiroshima we would probably not have the space program and US technical prowess post WWII, so a totally different reality than where we are today.
The big problem that McArthur and others pointed out is that all the Japanese forces on the Asian mainland and left behind in the Island Hopping campaign through the Pacific were unlikely to surrender unless Japan itself was definitively defeated with the central government capitulating and aiding in the demobilization.
From their perspective the options were to either invade Japan and force a capitulation, or go back and keep fighting it out with every island citadel and throughout China, Indochina, Formosa, Korea, and Manchuria.
https://en.wikipedia.org/wiki/Operation_Downfall#:~:text=Tru....
Well, you didn't provide any evidence. Island hopping in the Pacific theater itself took thousands of lives, imagine what a headlong strike into a revanchist country of citizens determined to fight to the last man, woman and child would have looked like. We don't know how effective a hypothetical Soviet assault would have looked like as they had attacked sparsely populated Sakhalin only. What the atom bomb succeeded was in convincing Emperor Hirohito that continuing the war would be destructively pointless.
WW1 practically destroyed the British Empire for the most part. WW2 would have done the same for the US in your hypothetical scenario, but much worse.
I'd say they were equal. We were worried about Russia getting nuclear capability once we knew Germany was out of the race. Russia was at best our frenemy. The enemy of my enemy is my friend kind of thing.
https://amp.cnn.com/cnn/2017/11/18/politics/air-force-genera...
Some of you do. The rest of us are left with the consequences.
Even the president needs someone else to push a button (and in those rooms there's also more than one person). There's literally no human that can do it alone without convincing at least 1 or 2 other people, depending on who it is.
What does AGI do? AGI is up against a philosophical barrier, not a technical one. We'll continue improving AI's ability to automate and assist human decisions, but how does it become something more? Something more "general"?
With transformers, demonstrated first by LLMs, I think we've shown that the narrow-general divide as a strict binary is the wrong way to think about AI. Instead, LLMs are obviously more general than any previous AI system, in that they can do math or play chess or write a poem, all using the same system. They aren't as good as our existing superhuman computer systems at these tasks (aside from language processing, which they are SOTA at), not even as good at humans, but they're obviously much better than chance. With training to use tools (like calculators and chess engines) you can easily make an AI system with an LLM component that's superhuman in those fields, but there are still things that LLMs cannot do as well as humans, even when using tools, so they are not fully general. One example is making tools for themselves to use - they can do a lot of parts of that work, but I haven't seen an example yet of an LLM actually making a tool for itself that it can then use to solve a problem it otherwise couldn't. This is a subproblem of the larger "LLMs don't have long term memory and long term planning abilities" problem - you can ask an LLM to use python to make a little tool for itself to do one specific task, but it's not yet capable of adding that tool to its general toolset to enhance its general capabilities going forward. It can't write a memoir, or a book that people want to read, because they suck at planning or refining from drafts, and they have limited creativity because they're typically a blank slate in terms of explicit memory before they're asked to write - they have a gargantuan of implicitly remembered things from training, which is where what creativity they do have comes from, but they don't yet have a way to accrue and benefit from experience.
A thought exercise I think is helpful for understanding what the "AGI" benchmark should mean is: can this AI system be a drop-in substitute for a remote worker? As in, any labour that can be accomplished by a remote worker can be performed by it, including learning on the job to do different or new tasks, and including "designing and building AI systems". Such a system would be extremely economically valuable, and I think it should meet the bar of "AGI".
But they can't, they still fail at arithmetic and still fail at counting syllables.
I think that LLMs are really impressive but they are the perfect example of a narrow intelligence.
I think they don't blur the lines between narrow and general, they just show a different dimension of narrowness.
You are incorrect. These services are free, you can go and try it out for yourself. LLMs are perfectly capable of simple arithmetic, better than many humans and worse than some. They can also play chess and write poetry, and I made zero claims at "counting syllables", but it seems perfectly capable of doing that too. See for yourself, this was my first attempt, no cherry picking: https://chatgpt.com/share/ea1ee11e-9926-4139-89f9-6496e3bdee...
I asked it a multiplication question so it used a calculator to correctly complete the task, I asked it to play chess and it did well, I asked it to write me a poem about it and it did that well too. It did everything I said it could, which is significantly more than a narrow AI system like a calculator, a chess engine, or an image recognition algorithm could do. The point is it can do reasonably at a broad range of tasks, even if it isn't superhuman (or even average human) at any given one of them.
>I think that LLMs are really impressive but they are the perfect example of a narrow intelligence.
This doesn't make any sense at all. You think an AI artifact that can write poetry, code, play chess, control a robot, recommend a clutch to go with your dress, compute sums etc is "the perfect example of a narrow intelligence." while a chess engine like Stockfish or an average calculator exists? There are AI models that specifically and only recognise faces, but the LLM multitool is "the perfect example of a narrow intelligence."? Come on.
>I think they don't blur the lines between narrow and general, they just show a different dimension of narrowness.
You haven't provided an example of what "dimension of narrowness" LLMs show. I don't think you can reasonably describe an LLM as narrow without redefining the word - just because something is not fully general doesn't mean that it's narrow.
how much is (0.2 + 0.1) * 10?
The result of (0.2+0.1)×10 is approximately 3, with a slight rounding difference leading to 3.0000000000000004.
My 10yo does not make this error, ChatGPT does because it does not understand math, but knows how to use Python.For poetry: counting syllables is a significant part of most poetry forms, so if you can't count syllables, you can't do poetry.
Let's say you want a 5-7-5 haiku, this is ChatGPT
write a 5-7-5 haiku about windstorms
Fierce winds howl and spin,
Branches bend, clouds race the sky,
Storm leaves quiet calm.
this is not a 5-7-5 haiku.LLMs are not general, but they show that a specific specialization ("guess next token") can solve a lot more problem that we thought it could.
>[AI system]s are not general, but they show that a specific specialization ("[process sequential computational operations]") can solve a lot more problem that we thought it could.
Or if you really want:
>Humans are not general, but they show that a specific specialization ("neuron fires when enough connected neurons fire into it") can solve a lot more problem that we thought it could.
This is just sophistry - the method by which some entity is achieving things doesn't matter, what matters is whether or not it achieves them. If it can achieve multiple tasks across multiple domains it's more general than a single-domain model.
Still, you’d have to be quite an idiot to wait for the third time to listen eh?
Besides, the winners get to decide what’s a war crime or not.
And when the US started mass firebombing civilian Tokyo, it’s not like they were going to be able to just ‘meh, we’re good’ on that front. Compared to that hell, being nuked was humane.
And I don’t say that lightly.
As made quite apparent by, as you note, kamikaze tactics and more.
The Bomb was a cleaner, sharper, and faster Axe than invading the main island.
That it also sent a message to the rest of the world was a bonus. But do you think they would have not used it, if for example the USSR wasn’t waiting?
Of course not, they’d still have nuked the hell out of the Japanese.
Or he will simply shift goalposts, and call some LLM superintelligent.
What evidence can you provide to back up the statement of this "significant possibility"? Human brains use neural networks...
Modern ANN architectures are not actually capable of long-term learning in the same way animals are, even stodgy old dogs that don't learn new tricks. ANNs are not a plausible model for the brain, even if they emulate certain parts of the brain (the cerebellum, but not the cortex)
I will add that transformers are not capable of recursion, so it's impossible for them to realistically emulate a pigeon's brain. (you would need millions of layers that "unlink chains of thought" purely by exhaustion)
even if we bought this negative result as somehow “proving impossibility”, i’m not convinced plasticity is necessary for intelligence
huge respect for richard sutton though
More specifically: it is highly implausible that an AI system could learn to improve itself beyond human capability if it does not have long-term plasticity: how would it be able to reflect upon and extend its discoveries if it's not able to learn new things during its operation?
(That said, I agree plasticity is key to the most powerful systems. A human race with anterograde amnesia would have long ago gone extinct.)
If I'm a human tasked with editing video (which is the field my startup[0] is in) and a completely new video format comes in, I need the long term plasticity to learn how to use it so I can perform my work.
If a sufficiently intelligent version of our AI model is tasked with editing these videos, and a completely new video format comes in, it does not need to learn to handle it. Not if this model is smart enough to iterate a new model that can handle it.
The new skills and knowledge do not need to be encoded in "the self" when you are a bunch of bytes that can build your successor out of more bytes.
Or, in popular culture terms, the last 30 seconds of this Age of Ultron clip[1].
What do you think training (and fine-tuning) does?
No LLM currently adapts to the tasks its given with an iteration cycle shorter than on the order of months (assuming your conversations serve as future training data; otherwise not at all).
No current LLM can digest its "experiences", form hypotheses (at least outside of being queried), run thought experiments, then actual experiments, and then update based on the outcome.
Not because it's fundamentally impossible (it might or might not be), but because we practically haven't built anything even remotely approaching that type of architecture.
But there is no reason the company can't come up with a different paradigm.
But I suppose you could say we don't know 100% since we don't fully understand how the brain learns.
1. Either you are correct and the neural networks humans have are exactly the same or very similar to the programs in the LLMs. Then it will be relatively easy to verify this - just scale one LLN to the human brain neuron count and supposedly it will acquire consciousness and start rapidly learning and creating on its own without prompts.
2. Or what we call neural networks in the computer programs is radically different and or insufficient to create AI.
I'm leaning to the second option, just from the very high level and rudimentary reading about current projects. Can be wrong of course. But I have yet to see any paper that refutes option 2, so it means that it is still possible.
If you wanted to reduce it down, I would say there are two possibilities:
1. Our understanding of Neurel Nets is currently sufficient to recreate intelligence, consciousness, or what have you
2. We’re lacking some understanding critical to intelligence/conciousness.
Given that with a mediocre math education and a week you could pretty completely understand all of the math that goes into these neurel nets, I really hope there’s some understand we don’t yet have
MLPs and transformers are ultimately theoretically equivalent. That means there is an MLP that represent the any function a given transformer can. However, that MLP is hard to identify and train.
Also the transformer contains MLPs as well...
(I did AI and Psychology at degree level, I understand there are definitely also big differences too, like hormones and biological neurones being very async)
Transformers, while not exactly functions, don't have a feedback mechanism similar to e.g. the cortical algorithm or any other neuronal structure I'm aware of. In general, the ML field is less concerned with replicating neural mechanisms than following the objective gradient.
Numenta has attempted to implement a system to this effect (see the wiki page https://en.wikipedia.org/wiki/Hierarchical_temporal_memory) for quite some time with not particularly much success.
Personally I think the kinds of minds we create in silico will end up being very different, because the advantages and disadvantages of the medium are just very different; for example, having a much stronger central processor and much weaker distributed memory, along with specialized precise circuits in addition to probabilistic ones.
1. Humans have general intelligence. 2. Human brains use biological neurons. 3. Human biological neurons give rise to human general intelligence. 4. Artificial neural networks (ANNs) are similar to human brains. 5. Therefore an ANN could give rise to artificial general intelligence.
Many people are objecting to #4 here. However in writing this out, I think #3 is suspect as well: many animals who do not have general intelligence have biologically identical neurons, and although they have clear structural differences with humans, we don’t know how that leads to general intelligence.
We could also criticize #1 as well, since human brains are pretty bad at certain things like memorization or calculation. Therefore if we built an ANN with only human capabilities it should also have those weaknesses.
They don't, actually.
Edit: actually I'm not sure if AIXItl is technically galactic or just terribly inefficient, but there's been trouble making it faster and more compact.
In any case anyone who is completely sure that we can/can’t achieve AGI is delusional.
The fact is many things we’ve tried to develop for decades still don’t exist. Nothing is guaranteed
Basically, unless you can show humans calculating a non-Turing computable function, the notion that intelligence requires a biological system is an absolutely extraordinary claim.
If you were to argue about conscience or subjective experience or something equally woolly, you might have a stronger point, and this does not at all suggest that current-architecture LLMs will necessarily achieve it.
1. There is a chemical-level nature to intelligence which prevents other elements like silicon from being used as a substrate for intelligence
2. There is a non material aspect to intelligence that cannot be replicated except by humans
To my knowledge, there is no scientific evidence that either are true and there is already a large body of evidence that implies that intelligence happens at a higher level of abstraction than the individual chemical reactions of synapses, ie. the neural network, which does not rely on the existence of any specific chemicals in the system except in as much as they perform certain functions that seemingly could be performed by other materials. If anything, this is more like speculating that there is a way to create energy from sunlight using plants as an existence proof of the possibility of doing so. More specifically, this is a bet that an existing physical phenomenon can be replicated using a different substrate.
No. The Manhattan Project started after we understood the basic mechanism of runaway fission reactions. The funding was mostly spent purifying uranium.
AGI would be similar if we understood the mechanism of creating general intelligence and just needed to scale it up. But there are still fundamental questions we still aren’t close to understanding for AGI.
A more apt comparison today is probably something like fusion reactors although progress has been slow there too. We know how fusion works in theory. We have done it before (thermonuclear weapons). There are sub-problems we need to solve, but people are working on them. For AGI we don’t even know what the sub-problems are yet.
A very cynical take is that this is an extreme version of 'we plan to spend all money on growth and figure out monetization later' model that many social media companies with a burn rate of billions of $$, but no business model, have used.
He is saying he will try to build something head and shoulders above anything else, and he got a billion dollars to do it with no expectation of revenue until his product is ready. The likelihood that he fails is very high, but his backers are willing to bet on that.
i read the article but I am not sure how they know when this condition will be true.
Is this obvious to ppl reading this article? is it emperor has no clothes type situation ?
Are these ppl merely gullible or coconspirators in the scam ?
If you check the 2024 YC batch, you'll notice pretty much every single one of them mentions AI in some form or another. I guarantee you the large majority of them are just looking to be bought out by some megacorp, because it's free money right now.
1b was a non profit donation, so there wasn't an expectation of returns on that one.
There's plenty of players going for the same goal. R&D is wildly expensive. No guarantee they'll reach the goal, first or even at all.
Moreover, the majority of the capital likely goes into GPU hardware and/or opex, which VCs have currently arbitraged themselves [3], so to some extent this is VCs literally paying themselves to pay off their own hardware bet.
While hints of the ambition of the Manhattan project might be there, the economics really are not.
[1a] https://www.getpin.xyz/post/clubhouse-lessons-for-investors [1b] https://www.theverge.com/2023/4/27/23701144/clubhouse-layoff... [3] https://observer.com/2024/07/andreessen-horowitz-stocking-ai...
How can investment like this not transform a company's mission into eventually paying back Billions and making Billions of dollars?
It helps if you think of the investors as customers and the business model as making them think they're cool. Same model Uber used for self driving car research.
SSI Inc should probably be a public benefit company if they're going to talk like that though.
No, not even close.
> Indeed, one should be sophisticated themselves when negotiating investment to not be unduly encumbered by the unsophisticated. But let us not get too far off topic and risk subthread detachment.
Edit: @jgalt212: Indeed, one should be sophisticated themselves when negotiating investment to not be unduly encumbered by shades of the unsophisticated or potentially folks not optimizing for aligned interests. But let us not get too far off topic and risk subthread detachment. Feel free to cut a new thread for further discussion on the subject.
True, but most, if not all, money comes with strings attached.
Regardless, point is moot, money is money, and a16z's money isn't their money but other people's money
I assume that service is what SV bank provided before it tanked, but someone has to manage that cash for the few years it takes to burn through it. What kind of service do you park that in.
And 80% of those $1B will go from Founder Mode VC to Nvidia and Datacenter Management Co in the span of 6 months.
Almost always, even not so large rounds.
Before Tandem, computers used to fail regularly. Tandem changed that forever (with a massive reward for their investors).
Similarly, LLMs are known to fail regularly. Until someone figures out a way for them not to hallucinate anymore. Which is exactly what Ilya is after.
Thank you for teaching me about Tandem Computers!
On a serious note I would love to bet on him at this valuation. I think many others would as well. I guess if he wanted more money he would easily get it but probably he values small circle of easy to live investors instead.
>SSI says it plans to partner with cloud providers and chip companies to fund its computing power needs but hasn't yet decided which firms it will work with.
1bn in cash is crazy.... usually they get cloud compute credits (which they count as funding)
Related to this, DAO's (decentralized autonomous organizations which do not have human shareholders) are intrinsically dangerous, because they can benefit their fiduciary duty even if it involves causing all humans to die. E.g., if the machine faction in The Matrix were to exist within the framework of US laws, it would probably be a DAO.
https://www.businessroundtable.org/business-roundtable-redef...
The idea behind "corporations should only focus on returns to shareholders" is that if you let them do anything else, CEOs will just set whatever targets they want, and it makes it harder to judge if they're doing the right thing or if they're even good at it. It's basically reducing corporate power in that sense.
> E.g., if the machine faction in The Matrix were to exist within the framework of US laws, it would probably be a DAO.
That'd have to be a corporation with a human lawyer as the owner or something. No such legal concept as a DAO that I'm aware of.
We can’t build critical software without huge security holes and bugs (see crowdstrike) but we think we will be able to contain something smarter than us? It would only take one vulnerability.
enterprises, corps, banks, governments will want to buy "safe" AI, to push liability for mistakes on someone who proclaimed them "safe".
Chess is a pretty good example. You could theoretically train an LLM on just chess games. The problem is there are more chess positions than atoms in the universe. So you can’t actually do it in practice. And chess is a much more constrained environment than life. At any chess position there are only ~35 moves on average. Life has tons of long-tail situations which have never been seen before.
And for chess we already have superhuman intelligence. It doesn’t require trillion-dollar training clusters, you can run a superhuman chess bot on your phone. So there are clear questions of optimality as well: VC money should be aware of the opportunity cost in investing money under “infinite scaling” assumptions.
I read that it cost Google ~$190 million to train Gemini, not even including staff salaries. So feels like a billion gives you about 3 "from scratch" comparable training runs.
Maybe they can get some 3nm stuff when Meta is done with them.
such a large round implies hardware for yet another foundational model. perhaps with better steering etc..
On one hand, I think it's great that investors are willing to throw big chunks of money at hard (or at least expensive) problems. I'm pretty sure all the investors putting money in will do just fine even if their investment goes to zero, so this feels exactly what VC funding should be doing, rather than some other common "how can we get people more digitally addicted to sell ads?" play.
On the other hand, I'm kind of baffled that we're still talking about "AGI" in the context of LLMs. While I find LLMs to be amazing, and an incredibly useful tool (if used with a good understanding of their flaws), the more I use them, the more that it becomes clear to me that they're not going to get us anywhere close to "general intelligence". That is, the more I have to work around hallucinations, the more that it becomes clear that LLMs really are just "fancy autocomplete", even if it's really really fancy autocomplete. I see lots of errors that make sense if you understand an LLM is just a statistical model of word/token frequency, but you would expect to never see these kinds of errors in a system that had a true understanding of underlying concepts. And while I'm not in the field so I may have no right to comment, there are leaders in the field, like LeCun, who have expressed basically the same idea.
So my question is, has Sutskever et al provided any acknowledgement of how they intend to "cross the chasm" from where we are now with LLMs to a model of understanding, or has it been mainly "look what we did before, you should take a chance on us to make discontinuous breakthroughs in the future"?
On one hand, I understand what he's saying, and that's why I have been frustrated in the past when I've heard people say "it's just fancy autocomplete" without emphasizing the awesome capabilities that can give you. While I haven't seen this video by Sutskever before, I have seen a very similar argument by Hinton: in order to get really good at next token prediction, the model needs to "discover" the underlying rules that make that prediction possible.
All that said, I find his argument wholly unconvincing (and again, I may be waaaaay stupider than Sutskever, but there are other people much smarter than I who agree). And the reason for this is because every now and then I'll see a particular type of hallucination where it's pretty obvious that the LLM is confusing similar token strings even when their underlying meaning is very different. That is, the underlying "pattern matching" of LLMs becomes apparent in these situations.
As I said originally, I'm really glad VCs are pouring money into this, but I'd easily make a bet that in 5 years that LLMs will be nowhere near human-level intelligence on some tasks, especially where novel discovery is required.
He puts a lot of emphasis on the fact that 'to generate the next token you must understand how', when thats precisely the parlor trick that is making people lose their minds (myself included) with how effective current LLMs are. The fact that it can simulate some low-fidelity reality with _no higher-level understanding of the world_, using purely linguistic/statistical analysis, is mind-blowing. To say "all you have to do is then extrapolate" is the ultimate "draw the rest of the owl" argument.
I wouldn't. There are some extraordinarily stupid humans out there. Worse, making humans dumber is a proven and well-known technology.
Without some raw reasoning (maybe Neuro-symbolic is the answer maybe not) capacity, LLM won't be enough. Reasoning is super tough because its not as easy as predicting the next most likely token.
So? One of the most frustrating parts of these discussions is that for some bizzare reason, a lot of people have a standard of reasoning (for machines) that only exists in fiction or their own imaginations.
Humans have a long list of cognitive shortcomings. We find them interesting and give them all sorts of names like cognitive dissonance or optical illusions. But we don't currently make silly conclusions like humans don't reason.
The general reasoning engine that makes neither mistake nor contradiction or confusion in output or process does not exist in real life whether you believe Humans are the only intelligent species on the planet or are gracious enough to extend the capability to some of our animal friends.
So the LLM confuses tokens every now and then. So what ?
> Humans have a long list of cognitive shortcomings. We find them interesting and give them all sorts of names like cognitive dissonance or optical illusions. But we don't currently make silly conclusions like humans don't reason.
Exactly! In fact, things like illusions are actually excellent windows into how the mind really works. Most visual illusions are a fundamental artifact of how the brain needs to turn a 2D image into a 3D, real-world model, and illusions give clues into how it does that, and how the contours of the natural world guided the evolution of the visual system (I think Steven Pinker's "How the Mind Works" gives excellent examples of this).
So I am not at all saying that what LLMs do isn't extremely interesting, or useful. What I am saying is that the types of errors you get give a window into how an LLM works, and these hint at some fundamental limitations at what an LLM is capable of, particularly around novel discovery and development of new ideas and theories that aren't just "rearrangements" of existing ideas.
ANN architectures are not like brains. They don't come pre-baked with all sorts of evolutionary steps and tweaking. They're far more blank slate and the transformer is one of the most blank slate there is.
Mostly at best, maybe some failure mode in GPT-N gives insight to how some concept is understood by GPT-N. It rarely will say anything about language modelling or Transformers. GPT-2 had some wildly different failure modes than 3, which itself has some wildly different failure modes to 4.
All a transformer's training objective asks it to do is spit out a token. How it should do so is left for transformer to figure along the way and everything is fair game.
And confusing words with wildly different meanings but with some similarity in some other way is something that happens to humans as well. Transformers don't see words or letters(but tokens). So just because it doesn't seem to you like two tokens should be confused doesn't mean there isn't a valid point of confusion there.
For example, you'll have little luck achieving AGI with decision trees no matter what's their training objective.
That said, my personal hypothesis is that AGI will emerge from video generation models rather than text generation models. A model that takes an arbitrary real-time video input feed and must predict the next, say, 60 seconds of video would have to have a deep understanding of the universe, humanity, language, culture, physics, humor, laughter, problem solving, etc. This pushes the fidelity of both input and output far beyond anything that can be expressed in text, but also creates extraordinarily high computational barriers.
And what I'm saying is that I find that argument to be incredibly weak. I've seen it time and time again, and honestly at this point just feels like a "humans should be a hundred feet tall based on on their rate of change in their early years" argument.
While I've also been amazed at the past progress in LLMs, I don't see any reason to expect that rate will continue in the future. What I do see the more and more I use the SOTA models is fundamental limitations in what LLMs are capable of.
I.e. the real breakthrough that allowed such rapid progress was transformers in 2017. Since that time, the vast majority of the progress has simply been to throw more data at the problem, and to make the models bigger (and to emphasize, transformers really made that scale possible in the first place). I don't mean to denigrate this approach - if anything, OpenAI deserves tons of praise for really making that bet that spending hundreds of millions on model training would give discontinuous results.
However, there are loads of reasons to believe that "more scale" is going to give diminishing returns, and a lot of very smart people in the field have been making this argument (at least quietly). Even more specifically, there are good reasons to believe that more scale is not going to go anywhere close to solving the types of problems that have become evident in LLMs since when they have had massive scale.
So the big thing I'm questioning is that I see a sizable subset of both AI researchers (and more importantly VC types) believing that, essentially, more scale will lead to AGI. I think the smart money believes that there is something fundamentally different about how humans approach intelligence (and this difference leads to important capabilities that aren't possible from LLMs).
I feel it is fair to say that neither of these were natural extrapolations from prior successful models directly. There is no indication we are anywhere near another nonlinearity, if we even knew how to look for that.
Blind faith in extrapolation is a finance regime, not an engineering regime. Engineers encounter nonlinearities regularly. Financiers are used to compound interest.
Getting an order of magnitude more data isn’t easy anymore. From GPT2 to 3 we (only) had to scale up to the internet. Now? You can look at other sources like video and audio, but those are inherently more expensive. So your data acquisition costs aren’t linear anymore, they’re something like 50x or 100x. Your quality will also dip because most speech (for example) isn’t high-quality prose, it contains lots of fillers, rambling, and transcription inaccuracies.
And this still doesn’t fix fundamental long-tail issues. If you have a concept that the model needs to see 10x to understand, you might think scaling your data 10x will fix it. But your data might not contain that concept 10x if it’s rare. It might contain 9 other one-time things. So your model won’t learn it.
Video is not necessarily less information dense than text, because when considered in its entirety it contains text and language generation as special cases. Video generation includes predicting continuations of complex verbal human conversations as well as continuations of videos of text exchanges, someone flipping through notes or a book, someone taking a university exam through their perspective, etc.
I also don't really see AGI emerging from LLMs any time soon, but it could be argued that human intelligence is also just 'fancy autocomplete'.
But that's my point - in some ways it's obvious that humans are not just doing "fancy autocomplete" because humans generally don't make the types of hallucination errors that LLMs make. That is, the hallucination errors do make sense if you think of how an LLM is just a statistical relationship between tokens.
One thing to emphasize, I'm not saying the "understanding" that humans seem to possess isn't just some lower level statistical process - I'm not "invoking a soul". But I am saying it appears to be fundamentally different, and in many cases more useful, than what an LLM can do.
They do though - I've noticed myself and others saying things in conversation that sound kind of right, and are based on correct things they've learned previously, but because memory of those things is only partial and mixed with other related information things are often said that are quite incorrect or combine two topics in a way that doesn't make sense.
Well, no. Humans do not think sequentially. But even if we were to put that aside, any "autocomplete" we perform is based on a world model, and not tokens in a string.
1. If it’s really amazing autocomplete, is there a distinction between AGI?
Being able to generalize, plan, execute, evaluate and learn from the results could all be seen as a search graph building on inference from known or imagined data points. So far LLMs are being used on all of those and we haven’t even tested the next level of compute power being built to enable its evolution.
2. Fancy autocomplete is a bit broad for the comprehensive use cases CUDA is already supporting that go way beyond textual prediction.
If all information of every type can be “autocompleted” that’s a pretty incredible leap for robotics.
* edited to compensate for iPhone autocomplete, the irony.
I'm not. Lots of people and companies have been sinking money into these ventures and they need to keep the hype alive by framing this as being some sort of race to AGI. I am aware that the older I get the more cynical I become, but I bucket all discussions about AGI (including the very popular 'open letters' about AI safety and Skynet) in the context of LLMs into the 'snake oil' bucket.
Doesn't really imply let's just do more LLMs.
Why Tel Aviv in Israel ?
If we say that half of innovations came from Alphabet/Google, then most of them (transformers, LLMs, tensorflow) came from Google Research and not Deep Mind.
Source: Wikipedia.
Israel is a leading AI and software development hub in the world.
Yep, and if any place will produce the safest AI ever, its got to be there.
Israeli military operations continues to this day with over 41,000 civilians killed.
Thanks.
A couple years??
we all know that openai did it
Nobody even knew what OpenAI was up to when they were gathering training data - they got away with a lot. Now there is precedent and people are paying more attention. Data that was previously free/open now has a clause that it can't be used for AI training. OpenAI didn't have to deal with any of that.
Also OpenAI used cheap labor in Africa to tag training data which was also controversial. If someone did it now it would they'd be the ones to pay. OpenAI can always say "we stopped" like Nike said with sweat shops.
A lot has changed.
Companies are willing to pay a lot for clean training data, and my bet is there will be a growing pile of training sets for sale on a non-exclusive basis as well.
A lot of this data - what I've seen anyway, is far cleaner than anything you'll find on the open web, with significant data on human preferences, validation, cited sources, and in the case of e.g. coding with verification that the code runs and works correctly.
Very interesting, thanks for sharing that detail. As someone who has tinkered with tokenizing/training I quickly found out this must be the case. Some people on HN don't know this. I've argued here with otherwise smart people who think there is no data preprocessing for LLMs, that they don't need it because "vectors", failing to realize the semantic depth and quality of embeddings depends on the quality of training data.
At a time when things are advancing at breakneck speed. Where is the goalpost going to be in 2 years time?
I guess the "mountain" is the key. "Safe" alone is far from being a product. As for the current LLM, Id even question how valuable "safe" can be.
We'll be able to generate most of Chat GPT4o's capabilities locally on affordable hardware including "unsafe" and "unaligned" data as the noise-to-qubits is drastically reduced meaning smaller quantized models that can run on good enough hardware.
We'll see a huge reduction in price and inference times within two years and whatever SSI is trained on won't be economically viable to recoup that $1B investment guaranteed.
all depends on GPT-5's performance. Right now Sonnet 3.5 is the best but theres nothing really ground breaking. SSI's success will depend on how much uplift it can provide over GPT-5 which already isn't expected to be significant leap beyond GPT4
Any guesses?
In other words, there may be a need to retains some sorts of symmetries or constraints from generation to generation that others understand less well than him (or so he thinks).
Twitter, reddit and the rest of the web have deployed a number of anti-scrape techniques.
If someone is just starting AI research for something like a PhD or startup, now, I think it'll be more useful to get familiar with robot simulation framework, such as Nvidia Omniverse.
While there's a lot of competition around humanoid robots, I'm sure there are plenty of more specialized possibilities. Maybe some agricultural machine, maybe medical, mining, etc.
That's a startup.
Not folks getting a BILLION dollars having no product and just ten people. Sorry but this is just so overhyped and sad. This is not the valley I came to live in back in the 90s and 2000s
He was part of most breakthroughs from Alexnet to GPT-4. He's also rumored to be extraordinarily brilliant.
In a world where 100s of billions are dumped into this type of technology, a single billion to him is not unreasonable for anyone hedging their bets.
Imagine the Pentagon just twiddling their thumbs, and the CIA decides to just passively wait-and-see...
Data sets aren't quite as easy to scrape and copyright infringe on as they were before chatGPT
https://en.wikipedia.org/wiki/Existential_risk_from_artifici...
https://en.wikipedia.org/wiki/Ex_Machina_(film)
This is why I'm extremely opposed to the idea of "AI girlfriend" apps - it creates a cultural concept that being attracted to a computer is normal, rather than what it is: something pathetic and humiliating which is exactly like buying an inflatable sex doll ... something only for the most embarrassing dregs of society ... men who are too creepy and pervy to ever attract a living, human woman.
Idk about this claim.
I think if you take the multi-verse view wrt quantum mechanics + a veil of ignorance (you don't know which entity your conciousness will be), you pretty quickly get morality.
ie: don't build the Torment Nexus because you don't know whether you'll end up experincing the Torment Nexus.
This was recently discussed (albeit in layperson’s language, avoiding philosophical topics and only focusing on the clear and present danger) in this article in RealClearDefense:
The Danger of AI in War: It Doesn’t Care About Self-Preservation https://www.realcleardefense.com/articles/2024/09/02/the_dan... (RealClearDefense)
.
However, just adding a self-preservation instinct will cause a skynet situation where the AI pre-emptively kills anyone who contemplates turning it off, including its commanding officers:
Statement by Air Force Col. Tucker Hamilton https://www.twz.com/artificial-intelligence-enabled-drone-we... (The War Zone)
.
To survive AGI, we have to navigate three hurdles, in this order:
1. Avoid AI causing extinction due to reckless escalation (the first link above)
2. Avoid AI causing extinction on purpose after we add a self-preservation instinct (the second link above)
3. If we succeed in making AI be ethical, we have to be careful to bind it to not kill us for our resources. If it's a total utilitarian, it will kill us to seize our planet for resources, and to stop us from abusing livestock animals. It will then create a utopian future, but without humans in it. So we need to bind it to basically go build utopia elsewhere but not take Earth or our solar system away from us.
.
"Super" intelligence typically refers to being better than humans in achieving goals, not to being better than humans in knowing good from evil.
this is coming from someone who is not well-versed in fundraising and YC startup scene.
What’s mildly annoying to me is their domain only returns an A record.
I mean, I'd like at least a brief blurb about their entire premise of safety. Maybe a definition or indication of a public consultation or... something.. otherwise the insinuation is that these three dudes are gonna sit around defining it on instinct, as if it's not a ludicrously hard human problem.
(I think it's branding, yes. A kind of "we don't care about aesthetics, we care about superintelligence" message)
Pre-Series A $1B? Wow
Remember when Facebook acquired insta for $1B and everyone freaked out? I guess inflation happens here first.
Finally, will Sutskever face the same charges as Holmes when he fails to deliver AGI?
Sutskever is asking for funds for a research project that hopefully will end in a profitable SSI/AGI.
If someone thinks that the only way to avoid extinction is if SSI wins the race, they may want to infest.
To me this sounds like maybe they won't be doing transformers. But perhaps they just mean "we will have safety in mind as we scale, unlike everyone else."
An alternative sequence of reasoning places less emphasis on Ilya specifically and uses Ilya as an indicator of research health. Repeat (1), (2), and (3) above, but replace "Ilya" with something like "strong and healthy fundamental research group". In this version, Ilya's departure is taken as indication that OpenAI no longer has a strong and healthy fundamental research group but that the company is "compromised" by relentless feature roadmaps for current products and their variations. That does not mean OpenAI will fail, but in this perspective it might mean that OpenAI is not well positioned to capture future research breakthroughs and the products that they will generate.
From my perspective, it's just about impossible to know how true these premises really are. And that's what makes it a bet or gamble rather than anything with any degree of assurance. To me, just as likely is the scenario where it's revealed that Ilya is highly ineffective as a generalist leader and that research without healthy tension from the business goes nowhere.
Do YOU want to miss out being a share holder on this new line that will bring immeasurable wealth ?? ;-)
Simultaneously too wealthy to imagine and never wealthy enough. Capitalism is quite the drug.
In this case, everyone knows it takes hundreds of millions to train models. So I'm investors are essentially rolling the dice on an extremely well-regarded team. And if it takes about a billion just to get off the ground, the valuation would need to at least be in the couple billion range to make it worth it for employees to work there.
That feels very different than say selling a company where founders are cashing out. In that case, the business should expect to meaningful contribute to revenue, and quickly.
How is this going to ever pay the investors back? How is it going to raise more money at such an insane valuation?
I just dont see how you justify such a crazy valuation from day 1 financially.
Amazon is one of the most famous successes of the era. Bezos quit his job, launched the business out of his garage, with seed money being $10K of his own savings, and was doing $20K/week in sales just 30 days later. And I believe their only VC round before going public was an $8 investment from Kleiner Perkins. But they were a company who proved their viability early on, had a real product with rapid revenue growth before getting any VC $$.
I’d say this SSI round is more similar to Webvan, who went public with a valuation of $4.8 billion, and at that time had done a grand total of $395K in sales, with losses over $50 million.
I’m sure there are good investments out there for AI companies that are doing R&D and advancing the state of the art. However, a $1 billion investment at a $5 billion valuation, for a company with zero product or revenue, just an idea, that’s nuts IMO, and extremely similar to the type of insanity we saw during the dot com bubble. Even more so given that SSI seemingly don’t even want to be a business - direct quote from Ilya:
> This company is special in that its first product will be the safe superintelligence, and it will not do anything else up until then … It will be fully insulated from the outside pressures of having to deal with a large and complicated product and having to be stuck in a competitive rat race.
This doesn’t sound to me like someone who wants to build a business, it sounds like someone who wants to hack on AI with no oversight or proof of financial viability. Kinda wild to give him $1 billion to do that IMO.
The dotcom era was full of unprofitable startups pumping up the stock price in all sorts of ways, as they were completely dependent on continues capital flows from investors to stay afloat. Also, a lot of that capital came from retail investors in various forms.
The AI wave that is currently ongoing is for the most part funded by some of the largest and most profitable corporations on the planet.
Companies like Alphabet, Meta, Tesla/X, Amazon and (to a lesser extent) Microsoft still have founders that either control or provide a direction for these companies.
What drives this way is the fact that most of these founders have a strong belief.
We know, for instance, that Larry Page and Elon Musk had a disagreement about the future role of AGI/ASI about 15 years ago, leading to Elon Musk helping to found OpenAI to make sure that Google would not gain a monopoly.
These are strong convictions held by very powerful people that have been held for decades. Short term stock market fluctuations are not going to suddenly collapse this "bubble".
As long as these founders continue to believe that AGI is close, they will continue to push, even if the stock market stops it support to the push.
SSI may fail, of course. But Ilya has a rumor of (from people like Hinton and Elon) as being perhaps the greatest and most capable visionary in the business.
>Agreed, the car bubble is very, very real. Not that the internal combustion carriage is all hype, it's certainly impressive with useful applications, but car manufacturers are getting insane valuations with zero proof they're viable businesses.
There's no liquidity until they are making money.
It means that AI startups are actually a really poor value proposition compared to traditional tech companies, because your multiplier is limited. First round $50M valuation leaves a lot more opportunity to get rich.
This kind of structure isn't as unusual for capital intensive businesses.
> If you are seeking capital for a startup with a product, you have to sell the startup on realities (ie how much revenue you are making). If you are seeking capital for a startup with no product, you can sell the startup on dreams, which is much much easier but also way riskier for investors.
Since these guys don't have a product yet, they 100% sold it on big dreams combined with Ilya's track record at OpenAI.
I think it's Ilya's track record all the way since AlexNet, including his time at Google AND OpenAI.
He's not a one-trick-pony.
This deal was cooked way back, though, perhaps even before the coup.
Now, can they make a product that makes at least $1B + 1 dollar in revenue? Doubt it, I honestly don't see a market for "AI safety/security".
Using the definition from the article:
> AI safety, which refers to preventing AI from causing harm, is a hot topic amid fears that rogue AI could act against the interests of humanity or even cause human extinction.
If the purpose of a state is to ensure its continued existence, then they should be able to make >=$1 in profit.
It looks like the aim of SSI is building safe AI, not just working on safety/security of AI. Both the article and their website [1] state this.
[1] https://ssi.inc
... however, I'm on the camp that believes it's not going to be hyper-profitable for only one (or a few) single commercial entities.
AGI will not be a product like the iPhone where one company can "own" it and milk it for as long as they want. AGI feels more like "the internet", which will definitely create massive wealth overall but somehow distributed among millions of actors.
We've seen it with LLMs, they've been revolutionary and yet, one year after a major release, free to use "commodity" LLMs are already in the market. The future will not be Skynet controlling everything, it will be uncountable temu-tier AIs embedded into everything around you. Even @sama stated recently they're working on "intelligence so cheap that measuring its use becomes irrelevant".
/opinion
> It may look—on the surface—that we are just learning statistical correlations in text. But it turns out that to ‘just learn’ the statistical correlations in text, to compress them really well, what the neural network learns is some representation of the process that produced the text. This text is actually a projection of the world.
(https://www.youtube.com/watch?v=NT9sP4mAWEg - sadly the only transcripts I could find were on AI grifter websites that shouldn't be linked to)
This is transparently false - newer LLMs appear to be great at arithmetic, but they still fail basic counting tests. Computers can memorize a bunch of symbolic times tables without the slightest bit of quantitative reasoning. Transformer networks are dramatically dumber than lizards, and multimodal LLMs based on transformers are not capable of understanding what numbers are. (And if Claude/GPT/Llama aren't capable of understanding the concept of "three," it is hard to believe they are capable of understanding anything.)
Sutskever is not actually as stupid as that quote suggests, and I am assuming he has since changed his mind.... but maybe not. For a long time I thought OpenAI was pathologically dishonest and didn't consider that in many cases they aren't "lying," they blinded by arrogance and high on their own marketing.
This is pretty sloppy thinking.
The neural network learns some representation of a process that COULD HAVE produced the text. (this isn't some bold assertion, it's just the literal definition of a statistical model).
There is no guarantee it is the same as the actual process. A lot of the "bow down before machine God" crowd is guity of this same sloppy confusion.
1. An Octopus and a Raven have wildly different brains. Both are intelligent. So just the idea that there is some "one true system" that the NN must discover or converge on is suspect. Even basic arithmetic has numerous methods.
2. In the limit of training on a diverse dataset (ie as val loss continues to go down), it will converge on the process (whatever that means) or a process sufficiently robust. What gets the job done gets the job done. There is no way an increasingly competent predictor will not learn representations of the concepts in text, whether that looks like how humans do it or not.
So you agree with me that there is no guarantee it learns any representation of the actual process that produced the training data.
Whether the machines becomes a human brain clone or something entirely alien is irrelevant. The point is, you can't cheat reality. Statistics is not magic. You can't predict text that understands without understanding.
Machine learning isn't magic - the model will learn what it can to minimize the error over the specific provided loss function, and no more. Change the loss function and you change what the model learns.
In the case of an LLM trained with a predict next word loss function, what you are asking/causing the model to learn is NOT the generative process - you are asking it to learn the surface statistics of the training set, and the model will only learn what it needs to (and is able to, per the model architecture being trained) in order to do this.
Now of course learning the surface statistics well does necessitate some level of "understanding" - are we dealing with a fairy tale or a scientific paper for example, but there is only so much the model can do. Chess is a good example, since it's easy to understand. The generative process for world class chess (whether human, or for an engine) involves way more DEPTH (cf layers) of computation than the transformer has available to model it, so the best it can do is to learn the surface statistics via much shallower pattern recognition of the state of the board. Now, given the size of these LLMs, if trained on enough games they will be able to play pretty well even using this pattern matching technique, but one doesn't need to get too far into a chess game to reach a position that has never been seen before in recorded games (e.g. watch agadmator's YouTube chess channel - he will often comment when this point has been reached), and the model therefore has no choice but to play moves that were seen in the training set in similar, but not identical positions... This is basically cargo-cult chess! It's interesting that LLMs can reach the ELO level that they do (says more about chess than about LLMs), but this same "cargo-cult" (follow surface statistics) generation process when out of training set applies to all inputs, not just chess...
You clearly do not really understand what it means to predict internet scale text with increasing accuracy. No more than that ? Fantastic
LLMs do not just learn surface statistics. So many papers have thoroughly disabused this that i'm just not going to bother. This is just straight up denial.
This havs been evidently shown in chess as well. https://arxiv.org/abs/2403.15498v2
You have no idea what you are talkin about. You've probably never even played with 3.5-turbo-instruct. That's how you can say this nonsense. You have your conclusion and keep working backwards to get a justification.
>It's interesting that LLMs can reach the ELO level that they do (says more about chess than about LLMs)
When you say this for everything LLMs can do then it just becomes a meaningless cope statement.
However, you seem to be engaged in magical thinking and believe these models are learning things beyond their architectural limits. You appear to be star struck by what these models can do, and blind to what one can deduce - and SEE - they they are unable to do.
And then you've tried to paper over being shown that with a conveniently vague and nonsensical, "says more about bla bla bla". No, you were wrong. Your model about this is wrong. It's that simple.
You start from your conclusions and work your way down from it. "pattern matching technique" ? Please. By all means, explain to all of us what this actually entails in a way we can test for it. Not just vague words.
Tracking probable board state given a sequence of moves (which don't even need to go all the way back to the start of the game!) is relatively simple to do, and doesn't require hundreds of sequential steps that are beyond the architecture of the model. It's just a matter of incrementally updating the current board state "hypothesis" per each new move (essentially: "a knight just moved to square X, so it must have moved away from some square a knight's move away from X that we believe currently contains a knight").
Ditto for estimating player ELO rating in order to predict appropriately good or bad moves. It's basically just a matter of how often the player makes the same move as other players of a given ELO rating in the training data. No need for hundreds of steps of sequential computation that are beyond the architecture of the model.
Doing an N-ply lookahead to reason about potential moves is a different story, but you want to ignore that and instead throw out a straw man "counter argument" about maintaining board state as if that somehow proves that the LLM can magically apply > N=layers of sequential reasoning to derive moves. Sorry, but this is precisely magical faith-based thinking "it can do X, so it can do Y" without any analysis of what it takes to do X and Y and why one is possible, and the other is not.
Right and the point is that you don't know what it CAN'T learn. You clearly don't quite understand this because you say stuff like this:
>Chess is a good example, since it's easy to understand. The generative process for world class chess (whether human, or for an engine) involves way more DEPTH (cf layers) of computation than the transformer has available to model it
and it's just baffling because:
1. Humans don't play chess anything like chess engines. They literally can't because the brain has iterative computation limits well below that of a computer. Most Grandmasters are only evaluating 5 to 6 moves deep on average.
2. We have a chess transformer playing world class chess (grandmaster level) - https://arxiv.org/abs/2402.04494.
You keep trying to make the point that because a Transformer architecturally has a depth limit for some trained model, a, it cannot reach human level.
But this is nonsensical for various reasons.
- Nobody is stopping you from just increasing N such that every GI problem we care about is covered.
- You have shown literally no evidence that the N even state of the art models posses today is insufficient to match human iterative compute ability.
GPT-4o instant shots arbitrary arithmetic more accurately than any human brain and that's supposedly something it's bad at. You can clearly see it can reach world class chess play.
If you have some iterative computation benchmark that shows transformers zero shotting worse than an unaided human then feel free to show me.
Why don't you write Sam Altman to tell him the good news ?
Tell him there's nothing stopping him from "increasing N" until the thing get up and walks out the door.
There are benchmarks that rightfully show the SOTA behind average human performance in other aspects of reasoning so why are you fumbling so much to demonstrate this with unaided iterative computation ? It's your biggest argument so I just thought you'd have something more substantial than "It's limited bro!"
You cannot even demonstrate this today nevermind some hypothetical scaled up model.
I think Sam will be just fine.
Well, you see, I've been a professional developer for the last 45 years, and often, gasp, think for long periods of time before coding, or even writing things down. "Look ma, no hands!".
I know this will come across as an excuse, but the thing is I assumed you were also vaguely famililar with things like software development, or other cases where human's think before acting, so I evidentially did a poor job of convincing you of this.
I also assumed (my bad!) that you would at least know some people who were semi-intelligent and wouldn't be hopelessly confused about farmers and chickens, but now I realize that was a mistake.
Really, it's all on me.
I know that "just add more rules", "make it bigger" didn't work for CYC, but maybe as you suggest "increase N" is all that's needed in the case of LLMs, because they are special. Really - that's genius! I should have thought of it myself.
I'm sure Sam is OK, but he'd still appreciate you letting him know he can forget about Q* and Strawberries and all that nonsense, and just "increase N"! So much simpler and cheaper rather than hiring thousands of developers to try to figure this out!
Maybe drop Yan LeCun a note too - tell him that the Turing Award committee are asshats, and that he is too, and that LLMs will get us all the way to AGI.
>I know this will come across as an excuse, but the thing is I assumed you were also vaguely famililar with things like software development, or other cases where human's think before acting, so I evidentially did a poor job of convincing you of this.
Really, you have the same train of thought for hours on end ?
When you finish even your supposed hours long spiel, do you just proceed to write every line of code that solves your problem just like that ? Or do you write and think some more ?
More importantly, are LLMs unable to produce the kind of code humans spend a train of thought on ?
>Maybe drop Yan LeCun a note too - tell him that the Turing Award committee are asshats, and that he is too, and that LLMs will get us all the way to AGI.
You know, the appeal to authority fallacy is shifty at the best of times but it's straight up nonsensical when said authority does not have consensus on what you're appealing to.
Like great you mentioned LeCun. And I can just as easily bring in Hinton, Norvig, Ilya. Now what ?
Write them too - spread the news of your "increase N" innovation ?
Don't scare Hinton too much though - just suggest a small increase in N.
No amount of training will cause a transformer to magically sprout feedback paths or internal memory, or an ability to alter it's own weights, etc.
Architecture matters. The best you can hope for an LLM is that training will converge on the best LLM generating process it can be, which can be great for in-distribution prediction, but lousy for novel reasoning tasks beyond the capability of the architecture.
Go back a few evolutionary steps and sure you can. Most ANN architectures basically have relatively little to no biases baked in and the Transformer might be the most blank slate we've built yet.
>No amount of training will cause a transformer to magically sprout feedback paths or internal memory, or an ability to alter it's own weights, etc.
A transformer can perform any computation it likes in a forward pass and you can arbitrarily increase inference compute time with the token length. Feedback paths? Sure. Compute inefficient? Perhaps. Some extra programming around the Model to facilitate this ? Maybe but the architecture certainly isn't stopping you.
Even if it couldn't, limited =/ trivial. The Human Brain is not Turing complete.
Internal Memory ? Did you miss the memo ? Recurrency is overrated. Attention is all you need.
That said, there are already state keeping language model architectures around.
Altering weights ? Can a transformer continuously train ? Sure. It's not really compute efficient but architecture certainly doesn't prohibit it.
>Architecture matters
Compute Efficiency? Sure. What it is capable of learning? Not so much
No it can't.
A transformer has a fixed number of layers - call it N. It performs N sequential steps of computation to derive it's output.
If a computation requires > N steps, then a transformer most certainly can not perform it in a forward pass.
FYI, "attention is all you need" has the implicit context of "if all you want to build is a language model". Attention is not all you need if what you actually want to build is a cognitive architecture.
https://arxiv.org/abs/2310.02226
And again, human brains are clearly limited in the number of steps it can compute without writing something down. Limited =/ Trivial
>FYI, "attention is all you need" has the implicit context of "if all you want to build is a language model".
Great. Do you know what a "language model" is capable of in the limit ? No
These top research labs aren't only working on Transformers as they currently exist but it doesn't make much sense to abandon a golden goose before it has hit a wall.
No - there is a loop between the cortex and thalamus, feeding the outputs of the cortex back in as inputs. Our brain can iterate for as long as it likes before initiating any motor output, if any, such as writing something down.
In practice, the cortex-thalamus loop allows for some degree of internal iteration, but the brain cannot endlessly iterate without some form of external aid (e.g., writing something down) to offload information and prevent cognitive overload.
I'm not telling you anything here you don't experience in your everyday life. Try indefinitely iterating on any computation you like and see how well that works for you.
The discussion is about the architecturally imposed limitations of LLMs, resulting in capabilities that are way less than that of a brain.
The fact that the brain has it's own limits doesn't somehow negate this fact!
It is beyond silly to dump an architecture for a limitation the human brain has. A reasoning engine that can iterate indefinitely with no external aid does not exist in real life. That the transformer also has this weakness is not any reason for it to have capabilities less than a brain so it's completely moot.
It shouldn't be surprising they are not great at reasoning, or everything one would hope for from an AGI, since they simply were not built for that. If you look at the development history, the transformer was a successor to LSTM-based seq-2-seq models using Bahdanau attention, whose main goal was to more efficiently utilize parallel hardware by supporting parallel processing. Of course a good language model (word predictor) will look as if it's reasoning because it is trying to model the data it was trained on - a human reasoner.
As humans we routinely think for seconds/minutes or even hours before speaking or acting, while an LLM only has that fixed N steps (layers) of computation. I don't know why you claim this difference (among others) should make no difference, but it clearly does, with out-of-training-set reasoning weakness being a notable limitation that people such as Demis Hassabis have recently conceded.
>As humans we routinely think for seconds/minutes or even hours before speaking or acting
No human is iterating on a base thought for hours uninterrupted lol so this is just moot
>with out-of-training-set reasoning weakness being a notable limitation that people such as Demis Hassabis have recently conceded.
Humans reason weaker out of training. LLMs are simply currently worse
No - just because something has the surface appearance of reasoning doesn't mean that the generative process was reasoning, anymore than a cargo cult wooden aircraft reflects any understanding of aerodynamics and would be able to fly.
We've already touched on it, but the "farmer crossing river" problems is a great example. When the LLM sometimes degenerates into "cross bank A to B with chicken, cross band B to A with chicken, cross bank A to B with chicken.. that is the fewest trips possible", this is an example of "looks as if it is reasoning" aka cargo-cult surface-level copying of what a solution looks like. Real reasoning would never repeat a crossing without loading/unloading something since that conflicts with the goal of fewest trips possible.
The idea that LLMs "fake reason" and Humans "really reason" is an imaginary distinction. If you cannot create any test that can distinguish the two then you are literally making things up.
An averagely smart human does not have these failure modes where they answer a question with something that looks like an answer "cross A to B, then B to A. done. there you go!" but has zero logic to it.
Do you follow news in this field at all? Are you aware that poor reasoning is basically the #1 shortcoming that all the labs are working on?!!
Feel free to have the last word as this is just getting repetitive.
>An averagely smart human does not have these failure modes where they answer a question with something that looks like an answer "cross A to B, then B to A. done. there you go!" but has zero logic to it.
Humans are poor at logic in general. We make decisions, give rationales with logical contradictions and nonsense all the time. I just genuinely can't believe you think we don't. It happens so often we have names for these cognitive shortcomings. Get any teacher you know and ask them this. No need to take my word for it. And i don't care about getting the last word.
First of all, your understanding of the architecture itself is mistaken. A transformer can iterate endlessly because each token it produces allows it a forward pass, and each of these tokens is postpended to its input in the next inference. That's the autoregressive in autoregressive transformer, and the entire reason why it was proposed for arbitrary seq2seq transduction.
This means you get layers * tokens iterations, where tokens is up to two million, and is in practice unlimited due to the LLM being able to summarize and select from that. Parallelism is irrelevant, since the transformer is sequential in the output of tokens. A transformer can iterate endlessly, it simply has to output enough tokens.
And no, the throughput isn't limited either, since each token gets translated into a high-dimensional internal representation, that in turn is influenced by each other token in the model input. Models can encode whatever they want not just by choosing a token, but by choosing an arbitrary pattern of tokens encoding arbitrary latent-space interactions.
Secondly, internal thoughts are irrelevant, because something being "internal" is an arbitrary distinction without impact. If I trained an LLM to prepend and postpend <internal_thought> to some part of its output, and then simply didn't show that part, then the LLM wouldn't magically become human. This is something many models do even today, in fact.
Similarly, if I were to take a human and modify their brain to only be able to iterate using pen and paper, or by speaking out loud, then I wouldn't magically make them into something non-human. And I would definitely not reduce their capacity for reasoning in any way whatsoever. There are people with aphantasia working in the arts, there are people without an internal monologue working as authors - how "internal" something is can be trivially changed with no influence on either the architecture or the capabilities of that architecture.
Reasoning itself isn't some unified process, neither is it infinite iteration. It requires specific understanding about the domain being reasoned over, especially understanding of which transformation rules are applicable to produce desired states, where the judgement about which states are desirable has to be learned itself. LLMs can reason today, they're just not as good at it than humans are in some domains.
One reason why just blathering on endlessly isn't the same as thinking deeply before answering, is that it's almost impossible to maintain long-term context/attention. Try it. "Think step by step" or other attempts to prompt the model into generating a longer reply that builds upon itself, will only get you so far because keeping a 1-dimensional context is no substitute for the thousands of connections we have in our brain between neurons, and the richness of context we're therefore able to maintain while thinking.
The reasoning weakness of LLMs isn't limited to "some domains" that they had less training data for - it's a fundamental architecturally-based limitation. This becomes obvious when you see the failure modes of simple problems like "how few trips does the farmer need to cross the river with his chicken & corn, etc" type problems. You don't need to morph the problem to require out-of-distribution knowledge to get it to fail - small changes to the problem statement can make the model state that crossing the river backwards and forwards multiple times without loading/unloading anything is the optimal way to cross the river.
But, hey, no need to believe me, some random internet dude. People like Demis Hassabis (CEO of DeepMind) acknowledge the weakness too.
make the slight variation look different from the version it have memorized and it often passes. Sometimes it's as straightforward as just changing the names. humans have this failure mode too.
First of all, I would urge you to stop arbitrarily using negative words to make an argument. Saying that LLMs are "blathering" is equivalent to saying you and I are "smacking meat onto plastic to communicate" - it's completely empty of any meaning. This "vibes based arguing" is common in these discussions and a massive waste of time.
Now, I don't really understand what you mean by "almost impossible to maintain long-term context/attention". I'm writing fiction in my spare time, LLMs do very well on this by my testing, even subtle and complex simulations of environments, including keeping track of multiple "off-screen" dynamics like a pot boiling over.
There is nothing "1-dimensional" about the context, unless you mean that it is directional in time, which any human thought is as well, of course. As I said in my original reply, each token is represented by a multidimensional embedding, and even that is abstracted away by the time inference reaches the later layers. The word "citrus" isn't just a word for the LLM, just as it isn't just a word for you. Its internal representation retrieves all the contextual understanding that is related to it. Properties, associated feelings, usage - every relevant abstract concept is considered. And these concepts interact which every embedding of every other token in the input in a learned way, and with the position they have relative to each other. And then when an output is generated from that dynamic, said output influences the dynamic in a way that is just as multidimensional.
The model can maintain context as rich as it wants, and it can built upon that context in whatever way it wants as well. The problem is that in some domains, it didn't get enough training time to build robust transformation rules, leading it to draw false conclusions.
You should reflect on why you are only able to provide vague and under defined, often incorrect, arguments here. You're drawing distinctions that don't really exist and trying to hide that by appealing to false intuitions.
> The reasoning weakness... it's a fundamental architecturally-based limitation...
You have provided no evidence or reasoning for that conclusion. The river crossing puzzle is exactly what I had in mind when talking about specific domains. It is a common trick question with little to no variation and LLMs have overfit on that specific form of the problem. Translate it to any other version - say transferring potatoes from one pot to the next, or even a mathematical description of sets being modified - and the models do just fine. This is like tricking a human with the "As I was going to Saint Ives" question, exploiting their expectation of having to do arithmetic because it looks superficially like a math problem, and then concluding that they are fundamentally unable to reason.
> People like Demis Hassabis (CEO of DeepMind) acknowledge the weakness too.
What weakness? That current LLMs aren't as good as humans when reasoning over certain domains? I don't follow him personally but I doubt he would have the confidence to make any claims about fundamental inabilities of the transformer architecture. And even if he did, I could name you a couple of CEOs of AI labs with better models that would disagree, or even Turing award laureates. This is by no means a consensus stance in the expert community.
I disagree - there is pretty widespread agreement that reasoning is a weakness, even among the best models, (and note Chollet's $1M ARC prize competition to spur improvements), but the big labs all seem to think that post-training can fix it. To me this is whack-a-mole wishful thinking (reminds me of CYC - just add more rules!). At least one of your "Turing award laureates" thinks Transformers are a complete dead end as far as AGI goes.
We'll see soon enough who's right.
The ARC challenge tests spatial reasoning, something we humans are obviously quite good at, given 4 billion years of evolutionary optimization. But as I said, there is no "general reasoning", it's all domain dependent. A child does better at the spatial problems in ARC given that it has that previously mentioned evolutionary advantage, but just as we don't worship calculators as superior intelligences because they can multiply 10^9 digit numbers in milliseconds, we shouldn't draw fundamental conclusions from humans doing well at a problem that they are in many ways built to solve. If the failures of previous predictions - those that considered Chess or Go as unmistakable signals of true general reasoning - have taught us anything, it's that general reasoning simply does not exist.
The bet of current labs is synthetic data in pre-training, or slight changes of natural data that induces more generalization pressure for multi-step transformations on state in various domains. The goal is to change the data so models learn these transformations more readily and develop good heuristics for them, so not the non-continuous patching that you suggest.
But yes, the next generation of models will probably reveal much more about where we're headed.
I don't think DeepBlue or AlphaGo/etc were meant to teach us anything - they were just showcases of technological prowess by the companies involved, demonstrations of (narrow) machine intelligence.
But...
Reasoning (differentiated from simpler shallow "reactive" intelligence) is basically multi-step chained what-if prediction, and may involve a branching exploration of alternatives ("ok, so that wouldn't work, so what if I did this instead ..."), so could be framed as a tree search of sorts, not entirely disimilar to the MCTS used by DeepBlue or AlphaGo.
Of course general reasoning is a lot more general than playing a game like Chess or Go since the type of moves/choices available/applicable will vary at each step (these aren't all "game move" steps), as will the "evaluation function" that predicts what'll happen if we took that step, but "tree search" isn't a bad way to conceptualize the process, and this is true regardless of the domain(s) of knowledge over which the reasoning is operating.
Which is to say, that reasoning is in fact a generalized process, and one who' nature has some corresponding requirements (e.g. keeping track of state) for any machine to be capable of performing it ...
The input sequence is processed in parallel, regardless of length, so number of tokens has no impact on number of sequential compute steps which is always N=layers.
> Do you know what a "language model" is capable of in the limit ?
Well, yeah, if the language model is an N-layer transformer ...
Then increase N (N is almost always increased when a model is scaled up) and train or write things down and continue.
A limitless iteration machine (without external aid) is currently an idea of fiction. Brains can't do it so I'm not particularly worried if machines can't either.
This lack of "variable compute" is a widely recognized shortcoming of transformer-based LLMs, and there are plenty of others. The point apropos this thread is that you can't just train an LLM to be something that it is not. If the generating process required variable compute (maybe 1000's of steps) - e.g. to come up with a chess move - then no amount of training can make the LLM converge to model this generative process... the best it can do is to model the outcome of the generative process, not the process itself. The difference is that without having learnt the generative process, the model will fail when presented with a novel input that it didn't see during training, and therefore didn't memorize the "cheat sheet" answer for.
The "smart way" is a luxury. Solving the problem is what matters. Think of a smart way later if you can. That's how a lot of technological advancement has worked.
>It order to be able to reason effectively and efficiently the model needs to use as much, or as little, compute as needed for a given task. Completing "1+1=" should take less compute steps than "A winning sequence for white here is ...".
Same thing. Efficiency is nice but a secondary concern.
>If the generating process required variable compute (maybe 1000's of steps) - e.g. to come up with a chess move - then no amount of training can make the LLM converge to model this generative process.
Every inference problem has itself a fixed number of compute steps it needs (yes even your chess move). Variability is a nice thing for between inferences(maybe move 1 required 500 but 2 only 240 etc) A nice thing but never a necessary thing.
3.5-turbo-instruct plays chess consistently at 1800 Elo so clearly the N of the current SOTA is already enough to play non-trivial chess at a level beyond most humans.
There is an N large enough for every GI problem humans care about. Not to sound like a broken record but once again, limited =/ trivial.
This is just moving the goal posts from "learning the actual process" to "any process sufficiently robust"
I think it's fair to call one process that can imitate a more complex one a representation of that process. Especially when in the very next sentence he describes it as a "projection", which has the mathematical sense of a representation that loses some dimensions.
I think it's sloppy.
Here's GPT-4 Turbo in April botching a test almost all preschoolers could solve easily: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_pr...
I have not used LLMs since 2023, when GPT-4 routinely failed almost every counting problem I could think of. I am sure the performance has improved since then, though "write an essay with 250 words" still seems unsolved.
The real problem is that LLM providers have to play a stupid game of whack-a-mole where an enormous number of trivial variations on a counting problem need to be specifically taught to the system. If the system was capable of true quantitative reasoning that wouldn't be necessary for basic problems.
There is also a deception is that "chain of thought" prompting makes LLMs much better at counting. But that's cheating: if the LLM had quantitative reasoning it wouldn't need a human to indicate which problems were amenable to step-by-step thinking. (And this only works for O(n) counting problems, like "count the number of words in the sentence." CoT prompting fails to solve O(nm) counting problems like "count the number of words in this sentence which contain the letter 'e'" For this you need a more specific prompt, like "First, go step-by-step and select the words which contain 'e.' Then go step-by-step to count the selected words." It is worth emphasizing over and over that rats are not nearly this stupid, they can combine tasks to solve complex problems without a human holding their hand.)
I don't know what you mean by "10 years ago" other than a desire to make an ad hominem attack about me being "stuck." My point is that these "capabilities" don't include "understands what a number is in the same way that rats and toddlers understand what numbers are." I suspect that level of AI is decades away.
Beyond that LLMs don't see words or letters (tokens are neither) so some counting issues are expected.
But it's not very surprising you've been giving tests that make no sense.
But the company name specifically says "superintelligence"
The company isn't named "as smart as the average redditor, Inc"
How does the performance of today's LLMs contradict Ilya's statement?
5 x 3 = 15
without learning ***** **** *******
***** = ***** = *******
***** ****** *
And this generalizes to almost every sentence an LLM can regurgitate.I get the impression that they really do believe scale is all you need, other than perhaps some post-training changes to encourage longer horizon reasoning. Maybe Ilya is in this camp, although frankly it does seem a bit naive to discount all the architectural and operational shortcomings of pre-trained Transformers, or assume they can be mitigated by wrapping the base LLM in an agent that provides what's missing.
I suspect there's a big corporate market for LLMs with very predictable behaviour in terms of what the LLM knows from its training data, vs what it knows from RAG or its context window.
If you're making a chatbot for Hertz Car Hire, you want it to answer based on Hertz policy documents, even if the training data contained policy documents for Avis and Enterprise and Budget and Thrifty car hire.
Avoiding incorrect answers and hallucinations (when appropriate) is a type of AI safety.
The PayPal mafia includes Elon Musk, Peter Thiel, etc. They now parlayed that capital into more platforms and can easily arrange investments. Heck Peter Thiel even works with governments (Palantir) and got J D Vance on Trump’s ticket, while Elon might be in his admin.
Kolomoisky got Zelensky elected in Ukraine, by launching a show about an unlikely guy who wins the presidency and named the party after the show. They call them oligarchs over there but it’s same thing.
The first guy to 1 million followers on Twitter was Ashton Kutcher. He had already starred in sitcoms and movies for years.
This idea that you can just get huge audiences and investments due to raw talent, keeps a lot of people coming to Hollywood and Silicon Valley to “make it” and living on ramen. But even just coming there proves the point — a talented rando elsewhere in the world wouldn’t even have access to the capital and big boys networks.
They all even banked at the same bank! It’s all extremely centralized: https://community.intercoin.app/t/in-defense-of-decentralize...
Clearly there are reasons why opportunities are gated.
> This idea that you can just get huge audiences and investments due to raw talent, keeps a lot of people coming to Hollywood and Silicon Valley to “make it” and living on ramen. But even just coming there proves the point — a talented rando elsewhere in the world wouldn’t even have access to the capital and big boys networks.
All those people start somewhere though. Excluding nepotism, which is tangential point, all those people started somewhere and then grew through execution and further opening of opportunity. But it's not like they all got to where they are in one-shot. Taking your Ashton Kutcher example - yes he had a head start on twitter followers, but that's because he executed for years before on his career. Why would it make sense for some rando to rack up a million followers before he did?
Talent will earn you opportunities, but it's not going to open the highest door until you've put in the time and work.
Of course, it's not to say inequity or unequal access to opportunities doesn't exist in the world. Of course it does. But even in an ideal, perfectly equitable world, not everyone would have the same access to opportunities.
So yes, it makes perfect sense that someone would give Ilya $1B instead of some smart 18 year old, even if that 18 year old was Ilya from the past.
The analogy would be if the private could become a major overnight because they knew a guy.
What we see with ilya is not dissimilar. I don't see why it's bad that people are more hesitant to give a talented 18 year old $1B than the guy who's been at the forefront of AI innovation.
And sometimes not even necessary. Paris Hilton got a music distribution deal overnight cause of her dad’s capital!
Nepotism is a tangential point, and yes I agree that it's a bad thing. Ilya did not get this deal through nepotism, he got it through his past accomplishments, much like how a general gets promoted after many years of exemplary work.
It’s not personally my goal to amass immense wealth and start giant companies (I would rather work minimally and live hedonically) but I am impressed by those that do so.
Overwhelmingly talent isnt sufficient. For most startups, the old boys network gets to choose who gets millions. And the next rounds a few people choose who will get billions.
If anything, I think startups from 2005-2020 were more likely to get founding easily than those giants.
But after succeeding, in some cases several times, all of the above found it easier to find investors.
Ilya has a similar track record. He contributed on several breakthroughs such as AlexNet, AlphaGo, seq2seq (that would evolve into transformers after he left Google) before even joining OpenAI.
In fact, Elon see it as one of his key contributions to OpenAI that he managed to recruit Ilya.
That Ilya is able to raise 1B now is hardly surprising. He's probably able to raise way more than that once he's hired a larger team.
Ilya proved himself as a leader, scientist, and engineer over the past decade with OpenAI for creating break-through after break-through that no one else had.
He’s raised enough to compete at the level of Grok, Claude, et al.
He’s offering investors a pure play AGI investment, possibly one of the only organizations available to do so.
Who else would you give $1B to pursue that?
That’s how investors think. There are macro trends, ambitious possibilities on the through line, and the rare people who might actually deliver.
A $5B valuation is standard dilation, no crazy ZIRP style round here.
If you haven’t seen investing at this scale in person it’s hard to appreciate that capital allocation just happens with a certain number of zeros behind it & some people specialize in making the 9 zero decisions.
Yes, it’s predicated on his company being worth more than $500B at some point 10 years down the line.
If they build AGI, that is a very cheap valuation.
Think how ubiquitous Siri, Alexa, chatGPT are and how terrible/not useful/wrong they’ve been.
There’s not a significant amount of demand or distribution risk here. Building the infrastructure to use smarter AI is the tech world’s obsession globally.
If AGI works, in any capacity or at any level, it will have a lot of big customers.
AGI assumes exponential, preferably infinite and continuous improvement, something unseen before in business or nature.
Neither siri nor Alexa were sold as AGI and neither alone come close to a $1B product. gpt and other LLMs has quickly become a commodity, with AI companies racing to the bottom for inference costs.
I don’t really see the plan, product wise.
Moreover you say: > Ilya proved himself as a leader, scientist, and engineer over the past decade with OpenAI for creating break-through after break-through that no one else had.
Which is absolutely true, but that doesn’t imply more breakthroughs are just around the corner, nor does the current technology suggest AGI is coming.
VCs are willing to take a $1B bet on exponential growth with a 500B upside.
Us regular folk see that and are dumbfounded because AI is obviously not going to improve exponentially forever (literally nothing in the observed universe does) and you can already see the logarithmic improvement curve. That’s where the dismissive attitude comes from.
There are many things on earth that don't exist anywhere else in the universe (as far as we know). Life is one of them. Just think how unfathomably complex human brains are compared to what's out there in space.
Just because something doesn't exist anywhere in the universe doesn't mean that humans can't create it (or humans can't create a machine that creates something that doesn't exist anywhere else) even if it might seem unimaginably complex.
There are plenty of complex phenomena in space, but I don’t need to go that far.
Some other animal brains act like ours, at least as far as we can observe.
There is nothing anywhere that grows exponentially forever.
Sure, but it doesn't have to continue forever to be wildly profitable. If it can keep the exponential growth running for another couple of rounds, that's enough to make everyone involved rich. No-one knows quite where the limit is, so it can reasonably be worth a gamble.
That’s fine and good for investors.
I couldn’t care less about the business side of technology.
Im an engineer and a technophile and as an engineer and a technophile it sours me to hear someone dangle sci-fi level AGI as a pitch to investors when we’re clearly not there right now and ,in my opinion, this current wave of of basically brute force statistics based predictive models will not be the technique that gets us there.
It makes the cynic in me, and many others probably, cringe.
My intent is to be helpful. I’m unsure of how much additional context might be useful to you.
Investor math & mechanics is straight-forward: institutional funds & family offices want to get allocations in investors like a16z because they get to invest in deals that they could not otherwise invest in. The top VCs specialize in getting into deals that most investors will never get the opportunity to put money into. This is one of them.
For their Internal Rate of Return (IRR) to work out at least one investment needs to return 100x or more on the valuation. VCs today focus on placing bets where that calculation can happen. Most investors aren’t that confident in their ability to predict that, so they invest alongside lead investors who are. a16z is famous for that.
There are multiple companies worth $1T+ now, so this isn’t a fantasy investment. it’s a bet.
The bet doesn’t need to be that AGI continues to grow in power infinitely, it just needs to create a valuable company in roughly a ten year time horizon.
Many of the major tech companies today are worth more money than anyone predicted, including the founders (Amazon, Microsoft, Apple, Salesforce, etc.). An outlier win in tech can have incredible upside.
LLMs are far from commoditized yet, but the growth of the cloud proves you can make a fortune on the commoditization of tech. Commoditization is another way of saying “everyone uses this as a cost of doing business now.” Pretty great spot to land on.
My personal view is that AGI will deliver a post-product world, Eric Schmidt recently stated the same. Products are digital destinations humans need to go to in order to use a tool to create a result. With AGI you can get a “product” on the fly & AI has potentially very significant advantages in interacting with humans in new ways within existing products & systems, no new product required. MS Copilot is an early example.
It’s completely fine to be dismissive of new tech, it’s common even. What bring me you here?
I’m here on HN because I love learning from people who are curious about what is possible & are exploring it through taking action. Over a couple decades of tech trends it’s clear that tech evolves in surprising ways, most predictions eventually prove correct (though the degree of impact is highly variable), and very few people can imagine the correct mental model of what that new reality will be like.
I agree with Zuck:
The best way to predict the future is to build it.
you say you don't see it. fine. these investors do - thats why they are investing and you are not.
Obviously, there are plenty of investors who don't fall into this situation. But lets not pretend that just because someone has a lot of money or invests a lot of money that it means they know what they are doing.
They also have the warchest to afford a $1B gamble.
If the math worked out for me too, I’d probably invest even if I didn’t personally believe in it.
Also investors aren’t super geniuses, they’re just people.
I mean look at SoftBank and Adam Neuman… investors can get swept up in hype and swindled too.
When Ilya was in Google, the breakthroughs came from Google.
When Ilya was in OpenAI, the breakthroughs came from OpenAI.
....
the whole LLM race seems deaccelerate, and all the hard problems about LLMs seems not do have had that much progress the last couple of years (?)
In my naaive view I think a guy like David Silver the creator/co-lead of Alpha-Zero deserves more praise, atleast as a leader/scientist. He even have lectures about Deep RL after doing AlphaGo: https://www.davidsilver.uk/teaching/
He has no LinkedIn and came straight from the game-dev industry before learning about RL.
I would put my money on him.
Even assuming the public breakthroughs are the only ones that happened, the fact that openai was able to make an llm pipeline from data to training to production at their scale before anyone else is a feat of research and engineering (and loads of cash)
This is wrong. The models may end up cheaply available or even free. The business cost will be in hosting and integration.
One of the smartest computer science researchers is taking a stab at the most important problem of our lifetimes, we should be cheering him on.
I think maybe it's because some people's Nvidia and stock holdings had a rough day...
Seriously let's hope he builds a great team, gets the infrastructure he needs, and nails this!
What, you expected someone to value his brand new company at $100 billion or something?
These numbers and the valuation are indicative that people consider this a potentially valuable tool, but not world changing and disruptive.
I think this is a pretty reasonable take.
That doesn't sound like the hype is dissipating to me.
That's the theoretical basis and path for achieving AGI (if it's even possible). I'm tired of all the "we stick enough data in the magic black box blender and ta-da! AGI!"
Every giant technological break-through throughout history has had a massive underpinning of understanding before ever achieving it. And yet, with the AI bubble somehow we're just about to secretly achieve it, but we can't tell you how.
Historically, there have been a lot of cases where technological breakthroughs happened first and conceptual understanding followed later.
The first industrially useful steam engine was invented in 1712 [1] but the classical thermodynamic relationship between heat and work was not developed until 1824 [2].
There was a vast chemical industry before scientists really understood what atoms were, how many distinct kinds of atoms there are, or how they form molecules. It was only 20th century quantum mechanics that properly describes something as simple as a hydrogen molecule. That didn't prevent chemists from developing complex synthetic chemicals [3]. It did mean that they had to rely more on guesses and experiments than on clear theory.
[1] https://en.wikipedia.org/wiki/History_of_the_steam_engine#Ne...
[2] https://en.wikipedia.org/wiki/Reflections_on_the_Motive_Powe...
[3] https://en.wikipedia.org/wiki/Indigo_dye#Chemical_synthesis
I feel like this could be a path to GI without "truly" understanding the human brain: make a large enough simulation of the brain and turn it on. I actually do think we understand enough about the nuts and bolts of neuron interaction to achieve this. What we don't understand is where neurons firing turns into consciousness. It seems like it is probably just an emergent property of a complex enough neuron graph.
This doesn't make any sense.