I go on Polymarket and find things that would make me happy or optimistic about society and tech, and then bet a couple of dollars (of some shitcoin) against them.
e.g. OpenAI releasing an open weights model before September is trading at 81% at time of writing - https://polymarket.com/event/will-openai-release-an-open-sou...
Last month I was up about ten bucks because OpenAI wasn't open, the ceasefire wasn't a ceasefire, and the climate metrics got worse. You can't hedge away all the existential despair, but you can take the sting out of it.
That's just speculation though. I saw a cynical comment on reddit yesterday that unfortunately made a lot of sense. Many people now are just so certain that the future of work is not going to include many humans, so they're throwing everything into stocks and crypto, which is why they remain so high even in the face of so much political uncertainty. It's not that people are investing because they have hope. People are just betting everything as a last ditch survival attempt before the robots take over.
Of course this is hyperbolic - market forces are never that simple. But I think there might be some truth to it.
https://moonshotai.github.io/Kimi-K2/
OpenAI know they need to raise the bar with their release. It can't be a middle-of-the-pack open weights model.
With half the key team members they had a month prior
TLDR Zuck’s recent actions definitely smell like a predictable failure driven by desperation to me.
That's not even considering tool use!
And the safety testing actually makes this worse, because it leads people to trust that LLMs are less likely to give dangerous advice, when they could still do so.
Manipulation is a genuine concern!
...later someone higher-up decided that it's actually great at programming as well, and so now we all believe it's incredibly useful and necessary for us to be able to do our daily work
If you hook up a chat bot to a chat interface, or add tool use, it is probable that it will eventually output something that it should not and that output will cause a problem. Preventing that is an unsolved problem, just as preventing people from abusing computers is an unsolved problem.
It's really not. Parent's examples are all out-of-the-box behavior.
(1) Execute yes (with or without arguments, whatever you desire).
(2) Let the program run as long as you desire.
(3) When you stop desiring the program to spit out your argument,
(4) Stop the program.
Between (3) and (4) some time must pass. During this time the program is behaving in an undesired way. Ergo, yes is not a counter example of the GP's claim.
That said, I suspect the other person was actually agreeing with me, and tried to state that software incorporating LLMs would eventually malfunction by stating that this is true for all software. The yes program was an obvious counter example. It is almost certain that all LLMs will eventually generate some output that is undesired given that it is determining the next token to output based on probabilities. I say almost only because I do not know how to prove the conjecture. There is also some ambiguity in what is a LLM, as the first L means large and nobody has made a precise definition of what is large. If you look at literature from several years ago, you will find people saying 100 million parameters is large, while some people these days will refuse to use the term LLM to describe a model of that size.
Table saws sold all over the world are inspected and certified by trusted third parties to ensure they operate safely. They are illegal to sell without the approval seal.
Moreover, table saws sold in the United States & EU (at least) have at least 3 safety features (riving knife, blade guard, antikickback device) designed to prevent personal injury while operating the machine. They are illegal to sell without these features.
Then of course there are additional devices like sawstop, but it is not mandatory yet as far as I'm aware. Should be in a few years though.
LLMs have none of those board labels or safety features, so I'm not sure what your point was exactly?
An example is the first Microsoft bot that started to go extreme rightwing when people realized how to make it go that direction. Grok had a similar issue recently.
Google had racial issues with its image generation (and earlier with image detection). Again something that people don't forget.
Also an OpenAI 4o release was encouraging stupid things to people when they asked stupid questions and they just had to roll it back recently.
Of course I'm not saying that that's the real reason (somehow they never say that the problem is with performance for not releasing stuff), but safety matters with consumer products.
And then you proceed to give a number of examples of that not happening. Most people already forgot those.
We typically don’t critique the requirements of users, at least not in functionality.
The marketing angle is that this measure is needed because LLMs are “so powerful it would be unethical not to!”
AI marketers are continually emphasizing how powerful their software is. “Safety” reinforces this.
“Safety” also brings up many of the debates “mis/disinformation” brings up. Misinformation concerns consistently overestimate the power of social media.
I’d feel much better if “safety” focused on preventing unexpected behavior, rather than evaluating the motives of users.
* the federal state of Bavaria
LM safety is just a marketing gimmick.
AI 'safety' is one of the most neurotic twitter-era nanny bullshit things in existence, blatantly obviously invented to regulate small competitors out of existence.
AI safety is about proactive safety. Such an example: if an AI model could be used to screen hiring applications, making sure it doesn’t have any weighted racial biases.
The difference here is that it’s not reactive. Reading a book with a racial bias would be the inverse; where you would be reacting to that information.
That’s the basis of proper AI safety in a nutshell
Luckily, this is something that can be studied and has been. Sticking a stereotypically Black name on a resume on average substantially decreases the likelihood that the applicant will get past a resume screen, compared to the same resume with a generic or stereotypically White name:
https://www.npr.org/2024/04/11/1243713272/resume-bias-study-...
Got a better suggestion?
That should really be done for humans reviewing the resumes as well, but in practice that isn't done as much as it should be
without OpenAI, Anthropic and Google's fearmongering, AI 'safety' would exist only in the delusional minds of people who take sci-fi way too seriously.
https://en.wikipedia.org/wiki/Regulatory_capture
for fuck's sake, how more obvious could they be? sama himself went on a world tour begging for laws and regulations, only to purge safetyists a year later. if you believe that he and the rest of his ilk are motivated by anything other than profit, smh tbh fam.
it's all deceit and delusion. China will crush them all, inshallah.
Don’t discuss making drugs or bombs.
Don’t call yourself MechaHitler… which I don’t care that while scenario was objectively funny on its sheer ridiculousness.
You have to understand that a lot of people do care about these kind of things.
It is. It is also part of Sam Altman’s whole thing about being the guy capable of harnessing the theurgical magicks of his chat bot without shattering the earth. He periodically goes on Twitter or a podcast or whatever and reminds everybody that he will yet again single-handedly save mankind. Dude acts like he’s Buffy the Vampire Slayer
the bot has no agency, the bot isn't doing anything, people talk to themselves, augmenting their chain of thought with an automated process. If the automated process is acting in an undesirable manner, the human that started the process can close the tab.
Which part of this is dangerous or harmful?
Most companies, for better or worse (I say for better) don’t want their new chatbot to be a RoboHitler, for example.
That said, I am happy to accept the term safety used in other places, but here it just seems like a marketing term. From my recollection, OpenAI had made a push to get regulation that would stifle competition by talking about these things as dangerous and needing safety. Then they backtracked somewhat when they found the proposed regulations would restrict themselves rather than just their competitors. However, they are still pushing this safety narrative that was never really appropriate. They have a term for this called alignment and what they are doing are tests to verify alignment in areas that they deem sensitive so that they have a rough idea to what extent the outputs might contain things that they do not like in those areas.
Callous. Software does have real impact on real people.
Nobody died
Prolonged use of conversational programs does reliably induce certain mental states in vulnerable populations. When ChatGPT got a bit too agreeable, that was enough for a man to kill himself in a psychotic episode [1]. I don't think this magnitude of delusion was possible with ELIZA, even if the fundamental effect remains the same.
Could this psychosis be politically weaponized by biasing the model to include certain elements in its responses? We know this rhetoric works: cults have been using love-bombing, apocalypticism, us-vs-them dynamics, assigned special missions, and isolation from external support systems to great success. What we haven't seen is what happens when everyone has a cult recruiter in their pocket, waiting for a critical moment to offer support.
ChatGPT has an estimated 800 million weekly active users [2]. How many of them would be vulnerable to indoctrination? About 3% of the general population has been involved in a cult [3], but that might be a reflection of conversion efficiency, not vulnerability. Even assuming 5% are vulnerable, that's still 40 million people ready to sacrifice their time, possessions, or even their lives in their delusion.
[1] https://www.rollingstone.com/culture/culture-features/chatgp...
[2] https://www.forbes.com/sites/martineparis/2025/04/12/chatgpt...
[3] https://www.peopleleavecults.com/post/statistics-on-cults
American AI companies have shown they are money and compute eaters, and massively so at that. Billions later, and well, not much to show.
But Deepseek cost $5M to develop, and made multiple novel ways to train.
Oh, and their models and code are all FLOSS. The US companies are closed. Basically, the US ai companies are too busy treating each other as vultures.
This is highly contested, and was either a big misunderstanding by everyone reporting it, or maliciously placed there (by a quant company, right before the stock fell a lot for nvda and the rest) depending on who you ask.
If we're being generous and assume no malicious intent (big if), anyone who has trained a big model can tell you that the cost of 1 run is useless in the big scheme of things. There is a lot of cost in getting there, in the failed runs, in the subsequent runs, and so on. The fact that R2 isn't there after ~6 months should say a lot. Sometimes you get a great training run, but no-one is looking at the failed ones and adding up that cost...
Most importantly those who mention that the game was made by 30 people do it to compare it with other much larger teams with hundreds if not thousands of people and those teams use contractors too!
The researchers? Yes.
What followed afterwards, I'm not so sure. There was clearly some "cheap headlines" in the media, but there were also some weird coverage being pushed everywhere, from weird tlds, and they were all pushing nvda dead, cheap deepseek, you can run it on raspberries, etc. That might have been a campaign designed to help short the stocks.
That's not accurate. The Gemini family of models are all proprietary.
Google's Gemma models (which are some of the best available local models) are open weights but not technically OSI-compatible open source - they come with usage restrictions: https://ai.google.dev/gemma/terms
A yea the Gemma series is incredible and while maybe not meeting the standards of OSI - I consider them to be pretty open as far as local models go. And it’s not just the standard Gemma variants, Google is releasing other incredible Gemma models that I don’t think people have really even caught wind of yet like MedGemma, of which the 4b variant has vision capability.
I really enjoy their contributions to the open source AI community and think it’s pretty substantial.
Don't forget they also quite literally eat books
If you lease, those costs are amortized. It was definitely more than $5M, but I don't think it was as high as $100M. All things considered, I still believe Deepseek was trained at one (perhaps two) orders of magnitude lower cost than other competing models.
https://interestingengineering.com/culture/deepseeks-ai-trai...
This is obviously false, I'm curious why you included it.
> Oh, and their models and code are all FLOSS.
No?
Will it be restricted like Llama, or fully open like Whisper or Granite?
I think the sweet spot for local models may be around the 20B size - that's Mistral Small 3.x and some of the Gemma 3 models. They're very capable and run in less than 32GB of RAM.
I really hope OpenAI put one out in that weight class, personally.