The ultimate success of this strategy depends on what we might call the enterprise AI adoption curve - whether large organizations will prioritize the kind of integrated, reliable, and "safe" AI solutions OpenAI is positioning itself to provide over cheaper but potentially less polished alternatives.
This is strikingly similar to IBM's historical bet on enterprise computing - sacrificing the low-end market to focus on high-value enterprise customers who would pay premium prices for reliability and integration. The key question is whether AI will follow a similar maturation pattern or if the open-source nature of the technology will force a different evolutionary path.
As for ChatGPT, it's a consumer tool, not an enterprise tool. It's not really integrated into an enterprises' existing toolset, it's not integrated into their authentication, it's not integrated into their internal permissions model, the IT department can't enforce any policies on how it's used. In almost all ways it doesn't look like enterprise IT.
It's hard to provide trust to OpenAI that they won't steal data of enterprise to train next model in a market where content is the most valuable element, compared office, cloud database, etc.
Then there’s trust that it won’t make up information.
It probably won’t be used for any HR/legal work for fear of false info being generated.
MS failed their customers more than once.
https://news.ycombinator.com/item?id=37408776
Maybe it’s a good idea to spread your data and not putting it in one place, if you really need to use the cloud
As such Microsoft is doing the right choice in outright stealing data for whatever purpose. It will have no real consequences.
Your historical pile of millions of MSOffice documents is an ocean sized moat.
Not to mention that they abolished QA and outsourced it to the customer.
With AI-CEOs - https://ai-ceo.org - This would never have happened because their CEOs have a kill switch and mobile app for the board for full observability
My personal take is that most companies don't have enough data, and not in sufficiently high quality, to be able to use LLMs for company specific tasks.
But really, in practice, most applications are using the “RAG” (retrieval augmented generation) approach, and actually doing fine tuning is less common.
Wouldn't that depend on what you expect it to do? If you just want say copilot, summarize texts or help writing emails then you're probably good. If you want to use ChatGPT to help solve customer issues or debug problems specific to your company, wouldn't you need to feed it your own data? I'm thinking: Help me find the correct subscription to a customer with these parameters, then you'd need to have ChatGPT know your pricing structure.
One idea I've had, from an experience with an ISP, would be to have the LLM tell customer service: Hey, this is an issue similar to what five of your colleagues just dealt with, in the same area, within 30 minutes. You should consider escalating this to a technician. That would require more or less live feedback to the model, or am I misunderstanding how the current AIs would handle that information?
You can't really maintain authz while fine tuning (unless you do a separate fine-tune for each permission set.) So RAG is the way to go, there.
Isn't this explicitly what RAG is for?
Bit of a cynical take. A company like OpenAI stands to lose enormously if anyone catches them doing dodgy shit in violation of their agreements with users. And it's very hard to keep dodgy behaviour secret in any decent sized company where any embittered employee can blow the whistle. VW only just managed it with Dieselgate by keeping the circle of conspirators very small.
If their terms say they won't use your data now or in the future then you can reasonably assume that's the case for your business planning purposes.
https://news.bloomberglaw.com/ip-law/openai-to-seek-to-centr...
Just make sure your chat history is off for starters. https://www.threatdown.com/blog/how-to-keep-your-chatgpt-con...
IMO no big enterprise will adopt chatGPT unless it's all hosted in their cloud. Open source models lend better to the enterprises in this regard.
80% of big enterprises already use MS Sharepoint hosted in Azure for some of their document management. It’s certified for storing medical and financial records.
Plenty of big enterprises have been using OpenAI models for a good while now.
I know for a fact a major corporation I do work for is vehemently against any use of generative A.I. by its employees (just had that drilled into my head multiple times by their mandatory annual cybersecurity training), although I believe they are working towards getting some fully internal solution working at some point.
Kind of funny that Google includes generative A.I. answers by default now, so I still see those answers just by doing a Google search.
I've seen the enterprise version with a top-5 consulting company, and it answers from their global knowledgebase, cites references, and doesn't train on their data.
The behavior you're describing sounds like an older model behavior. When I ask for links to references these days, it searches the internet the gives me links to real papers that are often actually relevant and helpful.
I actually repeated the prompt just now and it actually gave me the correct, opposite response. For those curious, I asked ChatGPT what turned on a gene, and it said Protein X turns on Gene Y as per -fake citation-. Asking today if Protein X turns on Gene Y ChatGPT said there is no evidence, and showed 2 real citations of factors that may turn on Gene Y.
Pretty impressed!
ChatGPT regularly searches and links to sources.
It’s as simple as copying and pasting a link to prove it. If it is actually happening, it would benefit us all to know the facts surrounding it.
https://chatgpt.com/share/6757804f-3a6c-800b-b48c-ffbf144d73...
as just another example, chatgpt said in the Okita paper that they switched media on day 3, when if you read the paper they switched the media on day 8. so not only did it fail to generate the correct reference, it also failed to accurately interpret the contents of a specific paper.
I’m a pretty experienced developer and I struggle to get any useful information out of LLMs for any non-trivial task.
At my job (at an LLM-based search company) our CTO uses it on occasion (I can tell by the contortions in his AI code that isn’t present in his handwritten code. I rarely need to fix the former)
And I think our interns used it for a demo one week, but I don’t think it’s very common at my company.
The issues at the moment are a mix of IP on the data, insurance on the security of private clouds infrastructures, deals between Amazon and Microsoft/OpenAI for the proper integration of ChatGPT on AWS, all these kind of things.
But discarding the enterprise needs is in my opinion a [very] wrong assumption.
But all that is the technical part of things. Markets do not bless products. They bless revenues. And from that perspective, I have NO CLUE.
Microsoft owns the customer relationship, owns the product experience, and in many ways owns the productionisation of a model into a useful feature. They also happen to own the datacenter side as well.
Because Microsoft is the whole wrapper around OpenAI, they can also negotiate. If they think they can get a better price from Anthropic, Google (in theory), or their own internally created models, then they can pressure OpenAI to reduce prices.
OpenAI doesn't get Microsoft's enterprise legitimacy, Microsoft keep that. OpenAI just gets preferential treatment as a supplier.
On the way up the hype curve it's the folks selling shovels that make all the money, but in a market of mature productionisation at scale, it's those closest to customers who make the money.
Note: they announced recently that that they will have invented AGI in precisely 1000 days.
OpenAI are clearly going for the BHAG. You may or may not believe in AGI-soon but they do, and are all in on this bet. So they simply don’t care about the failure case (ie no AGI in the timeframe that they can maintain runway).
Still seems like owning the customer relationship like Microsoft is far more valuable.
What according to you is the bare minimum of what it will take for it to be an enterprise tool?
This isn't that big of a deal any more. A company just needs to add the application to Azure AD (now called Entra for some reason).
At least, there's possibility these content can be seen by staff in OpenAI as bad case, there's still existing privacy concerns.
Whether your data is used for training or not is an approximation of whether you're using a tool for commercial applications, so a pretty good way to price discriminate.
I view that competition a bit like the Teams vs anything else. Teams wasn’t better, but it was good enough and it’s “sort of free”. It’s the same with the Azure AI tools, they aren’t feee but since you don’t exactly pay list pricing in enterprise they can be fairly cheap. Co-pilot is obviously horrible compared to CharGPT, but a lot of the Azure AI tooling works perfectly well and much of it integrates seamlessly with what you already have running in Azure. We recently “lost” our OCR for a document flow, and since it wasn’t recoverable we needed to do something fast. Well the Azure Document Intelligence was so easy to hook up to the flow it was ridiculous. I don’t want to sound like a Microsoft commercial. I think they are a good IT business partner, but the products are also sort of a trap where all those tiny things create the perfect vendor lock-in. Which is bad, but it’s also where European Enterprise is at since the “monopoly” Microsoft has on the suite of products makes it very hard to not use them. Teams again being the perfect example since it “won” by basically being a 0 in the budget even though it isn’t actually free.
Enterprise decision makers care about compliance, certifications and “general market image” (which probably has a proper English word). OpenAI has none of that, and they will compete with companies that do.
Seriously, I'm not a fan of Teams, but the sad state of video calls in Slack, even in 2024, seriously ruins it for me. This is the one thing — one important thing — that Teams is better at than Slack.
You can resize it.
Even in my own organisation Teams isn’t exactly a beloved platform. The whole “Teams” part of it can actually solve a lot of the issues our employees have with sharing documents, having chats located in relation to a project and so on, but they just don’t use it because they hate it.
They are already selling (API) plans, well, them and MS Azure, with higher trust guarantees. And companies are using it
Yes if they deploy a datacenter in the EU or close it will be a no-brainer (kinda pun intended)
Uhh they're already here. Under the name CoPilot which is really just ChatGPT under the hood.
Microsoft launders the missing trust in OpenAI :)
But why do you think copilot is worse? It's really just the same engine (gpt-4o right now) with some RAG grounding based on your SharePoint documents. Speaking about copilot for M365 here.
I don't think it's a great service yet, it's still very early and flawed. But so is ChatGPT.
I agree that if you're sure you have a commodity product, then you should make sure you're in the driver seat with those that will pay more, and also try and grind less effective players out. (As a strategy assessment, not a moral one).
You could think of Apple under JLG and then being handed back to Jobs as precisely being two perspectives on the answer to "does Apple have a commodity product?" Gassée thought it did, and we had the era of Apple OEMs, system integrators, other boxes running Apple software, and Jobs thought it did not; essentially his first act was to kill those deals.
The critical question is timing - if they wait too long to establish their enterprise position, they risk being overtaken by commoditization as IBM was. Move too aggressively, and they might prematurely abandon advantages in the broader market, as Apple nearly did under Gassée.
Threading the needle. I don't envy their position here. Especially with Musk in the Trump administration.
Agreed on enterprise - Microsoft would have to roll out policies and integration with their core products at a pace faster than they usually do (Azure AD for example still pales in comparison to legacy AD feature wise - I am continually amazed they do not priorities this more)
Right now Gemini Pro is best for email, docs, calendar integration.
That said ChatGPT Plus us a good product an I might spring for Pro for a month or two.
Supposedly Apple wont be able to offer a Siri LLM that acts like ChatGPT's iPhone app until 2026. That gives Apple's current and new competitors a head start. Maybe ChatGPT and Microsoft could release an AI Phone. I'd drop Apple quickly if that becomes a reality.
Grok winning the Federal bid is an interesting possible outcome though. I think that, slightly de-Elon-ed, the messaging that it's been trained to be more politically neutral (I realize that this is a large step from how it's messaged) might be a real factor in the next few years in the US. Should be interesting!
Fudged71 - you want to predict openai value and importance in 2029? We'll still both be on HN I'm sure. I'm going to predict it's a dominant player, and I'll go contra-Gwern, and say that it will still be known as best-in-class product delivered AI, whether or not an Anthropic or other company has best-in-class LLM tech. Basically, I think they'll make it and sustain.
It seems possible OpenAI could maintain dominance in government/institutional markets while facing more competition in commercial segments, similar to how defense contractors operate.
It feels strange to say but I think that the product moat looks harder than the LLM moat for the top 5 teams right now. I'm surprised I think that, but I've assessed so many L and MLM models in the last 18 months, and they keep getting better, albeit more slowly, and they keep getting smaller while they lose less quality, and tooling keeps getting better on them.
At the same time, all the product infra around using, integrating, safety, API support, enterprise contracts, data security, threat analysis, all that is expensive and hard for startups in a way that spending $50mm with a cloud AI infra company is not hard.
Altman's new head of product is reputed to be excellent as well, so it will be super interesting to see where this all goes.
Unfortunately, those are 5+ year projects for a lot of F500 companies. And they'll have to burn a lot of political capital to get the internal systems under control. Meaning that the CXO that does get the SQL server up and running and has the clout to do something about non-compliance, that person is going to be hated internally. And then if it's ever finished? That whole team is gonna be let go too. And it'll all just then rot, if not implode.
The AI boom for corporations is really going to let people know who is swimming naked when it comes to internal data orderliness.
Like, you want to be the person that sell shovels in the AI boom here for enterprise? Be the 'Cleaning Lady' for company data and non-compliance. Go in, kick butts, clean it all up, be hated, leave with a fat check.
There exists no future where OpenAI both sells models through API and has its own consumer product. They will have to pick one of these things to bet the company on.
Think Amazon that has both AWS and the retail business. There's a lot of value in providing both.
Its use caustically destroys more than it creates. It is worthy successor of Pandora's box.
It won't go anywhere until _we_ change.
Finally!..
I'd say Meta is the most important player here. Pretty much all the "open source" models are built in Llama in one way or the other. The only reason Llama exists is because Meta wants to commoditize AI in order to prevent the likes of OpenAI from overtaking them later. If Meta one day no longer believes in this strategy for whatever reason, then everybody is in serious trouble.
Also important to recognize that those clocks aren’t entirely separated. Monetization timeline is shorter if investors perceive that commodification makes future monetization less certain, whereas if investors perceive a strong moat against commodification, new financing without profitable monetization is practical as long as the market perceives a strong enough moat that investment in growth now means a sufficient increase in monetization down the road.
Combined with a search engine and AI summarisation, sure. That works well. But batebones no. You can never be sure whether it's hallucinating or not.
I believe we are already there at least for the average person.
Using Ollama I can run different LLMs locally that are good enough for what I want to do. That's on a 32GB M1 laptop. No more having to pay someone to get results.
For development Pycharm Pro latest LLM autocomplete is just short of writing everything for me.
I agree with you in relation to the enterprise.
While safe in output quality control. SaaS is not safe in terms of data control. Meta's Llama is the winner in any scenario where it would be ridiculous to send user data to a third party.
They also are still leading in the enterprise space: https://www.linkedin.com/posts/maggax_market-share-of-openai...
But why do I pay that much? Because Claude in combination with the Projects feature, where I can upload two dozen or more files, PDFs, text, and give it a context, and then ask questions in this specific context over a period of week or longer, come back to it and continue the inquiry, all of this gives me superpowers. Feels like a handful of researchers at my fingertips that I can brainstorm with, that I can ask to review the documents, come up with answers to my questions, all of this is unbelievably powerful.
I‘d be ok with 40 or 50 USD a month for one user, alas Claude won’t offer it. So I pay 166 Euros for five seats and use one. Because it saves me a ton of work.
Full disclosure: I participated in Kagi's crowdfund, so I have some financial stake in the company, but I mainly participated because I'm an enthusiastic customer.
I'm an enthusiastic customer nonetheless, but it is curious.
It struggles with cases that exceed 1000 lines or so. Not that it loses track entirely at that size, it just starts making dumb mistakes.
Then after about 2 or 3 hours, the size at which it starts to struggle drops to maybe 500. A new chat doesn't seem to help, but who can say, it's a difficult thing to quantify. After 12 hours, both me and the AI are feeling fresh again. Or maybe it's just me, idk.
And if you're about to suggest that the real problem here is that there's so much tedious filler in these test cases that even an AI gets bored with them... Yes, yes it is.
What am I losing here if I switch over to this from my current Claude subscription?
I.e. I’ve been an Ultimate subscriber since they launched the plan and I rarely use the assistant feature because I’ve got a subscription to ChatGPT and Claude. I only use it when I want to query Llama, Gemini, or Mistral models which I don’t want to subscribe to or create API keys for.
At some point I'm going to subscribe to Kagi again (once I have a job) so be interested to see how it rates.
I think it's all the LLMs + some Kagi-specific intelligence on top because you can flip web search on and off for all the chats.
You can also expand it by adding in more concepts to better specify things. For example you can specify the mecha look like alphabet characters while the alien planet expresses the randomness of prime numbers and that might influence the AI to produce a more unique image as you are now getting into really weird combinations of concepts (and combinations that might actually make no sense if you think too much about them), but you also greatly increase the chance of getting trash output as the AI can no longer map the feature space back to an image that mirrors anything like what a human would interpret as having a similar feature space.
[1]: Bender, Emily M., et al. "On the dangers of stochastic parrots: Can language models be too big?." Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 2021.
Unless I’m misunderstanding, I disagree. If you reply, I’ll bet I can convince you.
LLMs are stochastic parrots incapable of thought or reasoning. Even their chains of thoughts are part of the training data.
> Their position is falsifiable through simple examples: LLMs can perform arithmetic on numbers that weren't in training data, compose responses about current events post-training, and generate novel combinations of ideas.
Spot on. It would take a lot of editing for me to speak as concisely and accurately!
Stochastic Generative models can generate new and correct data if the distribution is right. Its in the definition
After coming back to this to see how the conversation has evolved (it hasn't), I offer this guess: the problem isn't at the object level (i.e. what ML research has to say on this) nor my willingness to engage. A key factor seems to a lack of interest on the other end of the conversation.
Based on my study (not at the Ph.D. level but still quite intensive), I am confident the comment above is both wrong and poorly framed. Why? Seeing phrases "incapable of thought" and "stochastic parrots" are red flags to me. In my experience, people that study LLM systems are wary of using such brash phrases. They tend to move the conservation away from understanding towards combativeness and/or confusion.
Being this direct might sound brusque and/or unpersuasive. My top concern at this point, not knowing you, is that you might not prioritize learning and careful discussion. If you want to continue discussing, here is what I suggest:
First, are you familiar with the double-crux technique? If not, the CFAR page is a good start.
Second, please share three papers (or high-quality writing from experts): one that supports your claim, one that opposes it, and one that attempts to synthesize.
Third, perhaps we can find a better forum.
Some other intelligent social animals have slightly different brains, and it seems very likely they "think" as well. Do we want to define "thinking" in some relative manner?
Say you pick a definition requiring an isomorphism to thoughts as generated by a human brain. Then, by definition, you can't have thoughts unless you prove the isomorphism. How are you going to do that? Inspection? In theory, some suitable emulation of a brain is needed. You might get close with whole-brain emulation. But how do you know when your emulation is good enough? What level of detail is sufficient?
What kinds of definitions of "thought" remains?
Perhaps something related to consciousness? Where is this kind of definition going to get us? Talking about consciousness is hard.
Anil Seth (and others) talks about consciousness better than most, for what it is worth -- he does it by getting more detailed and specific. See also: integrated information theory.
By writing at some length, I hope to show that using loose sketches of concepts using words such as "thoughts" or "thinking" doesn't advance a substantive conversation. More depth is needed.
Meta: To advance the conversation, it takes time to elaborate and engage. It isn't easy. An easier way out is pressing the down triangle, but that is too often meager and fleeting protection for a brittle ego and/or a fixated level of understanding.
So apparently not.
You can, however, get the same human experience by contracting a consulting company that will bill you $20 000 per month and lie to you about having absolute knowledge.
So I understand the unlimited use case and honestly am considering shelling out for the o1 unlimited tier, if o1 is useful enough.
A theoretical app subscription for $200/month feels expensive. Having the equivalent a smart employee work beside me all day for $200/month feels like a deal.
2. Useless with large context. Ignores, forgets, etc.
3. Terrible code and code understanding.
Also this is me hoping it would be good and looking at it with rose tinted glasses because I could use cloud credits to run it and save money.
I’ve used it extensively to cross-reference and analyse academic papers, and the performance has been excellent so far. While this is just my personal experience (YMMV), it’s far more reliable and focused than Gemini when it comes to this specific use case. I've rarely experienced a hallucination with it. But perhaps that's the way I'm using it.
I've looked into it, but as usual with LLM I feel like I'm not getting much out of it due to lack of imagination when it comes to prompting.
1. A company introduces a high-priced option (the "decoy"), often not intended to be the best value for most customers.
2. This premium option makes the other plans seem like better deals in comparison, nudging customers toward the one the company actually wants to sell.
In this case for Chat GPT is:
Option A: Basic Plan - Free
Option B: Plus Plan - $20/month
Option C: Pro Plan - $200/month
Even if the company has no intention of selling the Pro Plan, its presence makes the Plus Plan seem more reasonably priced and valuable.
While not inherently unethical, the decoy effect can be seen as manipulative if it exploits customers’ biases or lacks transparency about the true value of each plan.
Several comments in this thread have used Anthropic's lower pricing as a criticism, but it's probably moot: a month from now Anthropic will release its own $200 model.
Not a single one of OpenAI’s models can compete with the Claude series, it’s embarrassing.
Do you happen to have comparisons available for o1-pro or even o1 (non-preview) that you could share since you seems to have tried them all?
From an API standpoint, it seems like enterprises are currently split between anthropic and ChatGPT and most are willing to use substitutes. For the consumer, ChatGPT is the clear favorite (better branding, better iPhone app)
So they charge (as I recall from what he told me I could be off) something like $450 for shipping the books (don't recall the actual amount but it seemed high at the time).
So the salesman is taught to start off the sales pitch with a set of encylopedia's costing at the time let's say $40,000 some 'gold plated version'.
The potential buyer laughs and then salesman then says 'plus $450 for shipping!!!'.
They then move on to the more reasonable versions costing let's say $1000 or whatever.
As a result of the first example of high priced the customer (in addition to the positioning you are talking about) the customer is setup to accept the shipping charge (which was relatively high).
[1]: https://en.m.wikipedia.org/wiki/Door-in-the-face_technique
Responses from gpt-4 sound more like AI, but I haven't had seemingly as many issues as with 4o.
Also the feature of 4o where it just spits out a ton of information, or rewrites the entire code is frustrating
I'm a Plus member, and the biggest limitation I am running into by far is the maximum length of a context window. I'm having context fall out of scope throughout the conversion or not being able to give it a large document that I can then interrogate.
So if I go from paying $20/month for 32,000 tokens, to $200/month for Pro, I expect something more akin to Enterprise's 128,000 tokens or MORE. But they don't even discuss the context window AT ALL.
For anyone else out there looking to build a competitor I STRONGLY recommend you consider the context window as a major differentiator. Let me give you an example of a usage which ChatGPT just simply cannot do very well today: Dump a XML file into it, then ask it questions about that file. You can attach files to ChatGPT, but it is basically pointless because it isn't able to view the entire file at once due to, again, limited context windows.
Instead you need to chunk your data and store it in a vector database so you can do semantic search and include only the bits that are most relevant in the context.
LLM is a cool tool. You need to build around it. OpenAI should start shipping these other components so people can build their solutions and make their money selling shovels.
Instead they want end user to pay them to use the LLM without any custom tooling around. I don't think that's a winning strategy.
Transformer architectures generally take quadratic time wrt sequence length, not exponential. Architectural innovations like flash attention also mitigate this somewhat.
Backtracking isn't involved, transformers are feedforward.
Google advertises support for 128k tokens, with 2M-token sequences available to folks who pay the big bucks: https://blog.google/technology/ai/google-gemini-next-generat...
You can’t use fancy flash attention tricks either.
If we have the effective context window equal to the claimed context window, well, I'd start worrying a bit about most of the risks that AI doomers talk about...
I think the larger problem is "effective context" and training data.
Being technically able to use a large context window doesn't mean a model can actually remember or attend to that larger context well. In my experience, the kinds of synthetic "needle in haystack" tasks that AI companies use to show how large of a context their model can handle don't translate very well to more complicated use cases.
You can create data with large context for training by synthetically adding in random stuff, but there's not a ton of organic training data where something meaningfully depends on something 100,000 tokens back.
Also, even if it's not scaling exponentially, it's still scaling: at what point is RAG going to be more effective than just having a large context?
FFWD input is self-attention output. And since the output of self-attention layer is [context, d_model], FFWD layer input will grow as well. Consequently, FFWD layer compute cost will grow as well, no?
The cost of FFWD layer according to my calculations is ~(4+2 * true(w3)) * d_model * dff * n_layers * context_size so the FFWD cost grows linearly wrt the context size.
So, unless I misunderstood the transformer architecture, larger the context the larger the compute of both self-attention and FFWD is?
I don't understand how since that very context is what makes the likeliness of output of next prediction worthy, or not?
More specifically, FFWD layer is essentially self attention output [context, d_model] matrix matmul'd with W1, W2 and W3 weights?
Be aware that this tends to give bad results. Once RAG is involved you essentially only do slightly better than a traditional search, a lot of nuance gets lost.
Isn't that kind of what Anthropic is offering with projects? Where you can upload information and PDF files and stuff which are then always available in the chat?
The only reason I open Chat now is because Claude will refuse to answer questions on a variety of topics including for example medication side effects.
Once understood you could practice with a private hosted llm (run your own model) to tweak and get it dialled in per hour, and then make the leap.
Over time, using open source model as well will get more done per dollar of compute and hopefully the gap will remain close.
Coincidentally I’ve been using it with xml files recently (iOS storyboard files), and it seems to do pretty well manipulating and refactoring elements as I interact with it.
First impressions: The new o1-Pro model is an insanely good writer. Aside from favoring the long em-dash (—) which isn't on most keyboards, it has none of the quirks and tells of old GPT-4/4o/o1. It managed to totally fool every "AI writing detector" I ran it through.
It can handle unusually long prompts.
It appears to be very good at complex data analysis. I need to put it through its paces a bit more, though.
Interesting! I intentionally edit my keyboard layout to include the em-dash, as I enjoy using it out of sheer pomposity—I should undoubtedly delve into the extent to which my own comments have been used to train GPT models!
I use it all the time because it's the "correct" one to use, but it's often more "correct" to just rewrite the sentence in a way that doesn't call for one. :)
(And two hyphens to en dash.)
Other things are the overuse of transition words (e.g., "however," "furthermore," "moreover," "in summary," "in conclusion,") as well as some other stuff.
It might not be fair to people who write like that naturally, but it is what it is in the current situation we find ourselves in.
https://www.reddit.com/r/ApplyingToCollege/comments/1h0vhlq/...
Did ChatGPT write this comment for you?
https://openreview.net/forum?id=FBkpCyujtS&nesting=2&sort=da...
Some words/phrases that, by default, it overuses: "dive into", "delve into", "the world of", and others.
You correct it with instructions, but it will then find synonyms so there is also a structural pattern to the output that it favors by default. For example, if we tell it "Don't start your writing with 'dive into'", it will just switch to "delve into" or another synonym.
Yes, all of this can be corrected if you put enough effort into the prompt and enough iterations to fix all of these tells.
LLMs can radically change their style, you just have to specify what style you want. I mean, if you prompt it to "write in the style of an angry Charles Bukowski" you'll stop seeing those patterns you're used to.
In my team for a while we had a bot generating meeting notes "in the style of a bored teenager", and (besides being hilarious) the results were very unlike typical AI "delvish".
Aside: not about you specifically, but I feel like complaints on HN about using LLMs often boil down to somebody saying "it doesn't do X", where X is a thing they didn't ask the the model to do. E.g. a thread about "I asked for a Sherlock Holmes story but the output wasn't narrated by Watson" was one that stuck in my mind. You wouldn't think engineers would make mistakes like that, but I guess people haven't really sussed out how to think about LLMs yet.
Anyway for problems like what you described, one has to be wary about expecting the LLM to follow unstated requirements. I mean, if you just tell it not to say "dive into" and it doesn't, then it's done everything it was asked, after all.
You'd have to come up with a pretty exhaustive list of tells. Even sentence structure and mood is sometimes enough, not just the obvious words.
Also, wouldn't angry Charles Bukowski just be ... Charles Bukowski?
That is true, but more importantly, are those patterns sufficient to distinguish between AI-generated content from human-generated content? Humans express themselves very differently by region and country ( e.g. "do the needful" in not common in the midwest, "orthogonal" and "order of magnitude" are used more on HN than most other places). Outside of watermaking, detecting AI-generated text is with an acceptably small false-positive error rate is nearly impossible.
Maybe a database could be built with “tells” organized by model.
Automated by the LLMs themselves.
Regular ol tests would do
Thinking about it a bit more, the tells that work might depend on the usage of other specific prompts.
I didn't say they know their own tells. I said they naturally output them for you. Maybe the obvious is so obvious I don't need to comment on it. Meaning this whole "tells analysis" would necessarily rely on synthetic data sets.
AI detectors generally can take advantage of this and look for abnormal patterns in frequencies of specific words, phrases, or even specific grammatical constructs because the LLM -- by default -- is biased that way.
I'm not saying this is easy and certainly, LLMs can be tuned in many ways via instructions, context, and fine-tuning to mask this.
All that aside, most models have had a fairly distinctive writing style, particularly when fed no or the same system prompt every time. If o1-Pro blends in more with human writing that's certainly... interesting.
[0] https://openai.com/index/new-ai-classifier-for-indicating-ai...
The e-mail correspondence goes like this: "Hello Professor, I'd like to meet to discuss my failing grade. I didn't know that using ChatGPT was bad, can I have some points back or rewrite my essay?"
Damnit. It's too good. It just saved me ~6 hours in drafting a complicated and bespoke legal document. Before you ask: I know what I'm doing, and it did a better job in five minutes than I could have done over those six hours. Homework is over. Journalism is over. A large slice of the legal profession is over. For real this time.
It's actually surprising how many articles on 'respected' news websites have typos. You'd think there would be automated spellcheckers and at least one 'peer review' (probably too much to ask an actual editor to review the article these days...).
It's actually surprising how many articles on 'respected' news websites have typos.
Well, that's why they're respected! The typos let you know they're not using AI!AI can handle that sort of writing just fine, readers won't care about the formulaic writing style.
> I know what I'm doing
Is exactly the key element in being able to use spicy autocomplete. If you don't know what you're doing, it's going to bite you and you won't know it until it's too late. "GPT messed up the contract" is not an argument I would envy anyone presenting in court or to their employer. :)
(I say this mostly from using tools like copilot)
It just replaces human slop with automated slop. It doesn't automate finding hidden things out just yet, just automates blogspam.
Seems like lawyers could do more faster because they know what they are doing. Experts dont get replaced, they get tools to amplify and extend their expertise
Just like when software that was coming out, it may have ended jobs.
But it also helped get things done that wouldn't otherwise, or as much.
In this case, equipping a capable lawyer to be 20x is more like an iron man suit, which is OK. If you can get more done, wit less effort, you are still critical to what's needed.
Edit> Its good. Thanks again for ur review.
I'm a professional writer and use em-dashes without a second thought. Like any other component of language, just don't _over_ use them.
This is a huge improvement over previous GPT and Claude, which use the terrible "space, hyphen, space" construct. I always have to manually change them to em-dashes.
This shouldn’t really be a serious issue nowadays. On macOS it’s Option+Shift+'-', on Windows it’s Ctrl+Alt+Num- or (more cryptic) Alt+0151.
The Swiss army knife solution is to configure yourself a Compose key, and then it’s an easy mnemonic like for example Compose 3 - (and Compose 2 - for en dash).
Could be a tell for emails, though.
For now, as ai power increase, ai powered ai writing detection tool also gets better.
Give it a few weeks for them to classify its outputs, and they won't have a problem.
On Windows its Windows Key + . to get the emoji picker, its in the Symbols tab or find it in recents.
En dash is Alt+0150 and Em dash is Alt+0151
I need help creating a comprehensive Anki deck system for my 8-year-old who is following a classical education model based on the trivium (grammar stage). The child has already: - Mastered numerous Latin and Greek root words - Achieved mathematics proficiency equivalent to US 5th grade - Demonstrated strong memorization capabilities
Please create a detailed 12-month learning plan with structured Anki decks covering:
1. Core subject areas prioritized in classical education (specify 4-5 key subjects) 2. Recommended daily review time for each deck 3. Progression sequence showing how decks build upon each other 4. Integration strategy with existing knowledge of Latin/Greek roots 5. Sample cards for each deck type, including: - Basic cards (front/back) - Cloze deletions - Image-based cards (if applicable) - Any special card formats for mathematical concepts
For each deck, please provide: - Clear learning objectives - 3-5 example cards with complete front/back content - Estimated initial deck size - Suggested intervals for introducing new cards - Any prerequisites or dependencies on other decks
Additional notes: - Cards should align with the grammar stage focus on memorization and foundational knowledge - Please include memory techniques or mnemonics where appropriate - Consider both verbal and visual learning styles - Suggest ways to track progress and adjust difficulty as needed
Example of the level of detail needed for card examples:
Subject: Latin Declensions Card Type: Basic Front: 'First declension nominative singular ending' Back: '-a (Example: puella)'
> “Sum, es, est, sumus, ________, sunt”
That's not made for an 8-year old.
https://gist.github.com/rahimnathwani/7ed6ceaeb6e716cedd2097...
The prompt is:
Write an epic narration of a single combat between Ignatius J. Reilly and a pterodactyl, in the style of John Kennedy Toole.
I suppose creative writing isn't the primary selling point that would make users upgrade from $20 to $200 :)
Write me a review of "The Malazan Book of the Fallen" with the main argument being that it could be way shorter
https://chatgpt.com/share/67522170-8fec-8005-b01c-2ff174356d...
It's a bit overwrought, but not too bad.
I am incredibly doubtful that this new GPT is 10x Claude unless it is embracing some breakthrough, secret, architecture nobody has heard of.
If o1-pro is 10% better than Claude, but you are a guy who makes $300,000 per year, but now can make $330,000 because o1-pro makes you more productive, then it makes sense to give Sam $2,400.
Heck, it's probably worth $200 even if I'm not confident it's better just in case it is.
For the same reason I don't start with the cheapest AI model when asking questions and then switch to the more expensive if it doesn't work. The more expensive one is cheap enough that it doesn't even matter, and $200 is cheap enough (for a certain subsection of users) that they'll just pay it to be sure they're using the best option.
In previous multi-day marketing campaigns I've ran or helped ran (specifically on well-loved products), we've intentionally announced a highly-priced plan early on without all of its features.
Two big benefits:
1) Your biggest advocates get to work justifying the plan/product as-is, anchoring expectations to the price (which already works well enough to convert a slice of potential buyers)
2) Anything you announce afterward now gets seen as either a bonus on top (e.g. if this $200/mo plan _also_ includes Sora after they announce it...), driving value per price up compared to the anchor; OR you're seen as listening to your audience's criticisms ("this isn't worth it!") by adding more value to compensate.
$200/user/month isn’t even that high of a number in the enterprise software world.
The thing is, more expensive isn't guaranteed to be better. The more expensive models are better most of the time, but not all the time. I talk about this more in this comment https://news.ycombinator.com/item?id=42313401#42313990
Since LLMs are non-deterministic, there is no guarantee that GPT-4o is better than GPT-4o mini. GPT-4o is most likely going to be better, but sometimes the simplicity of GPT-4o mini makes it better.
Since we can't easily predict which model will actually be better for a given question at the time of asking, it makes sense to stick to the most expensive/powerful models. We could try, but that would be a complex and expensive endeavor. Meanwhile, both weak and powerful models are already too cheap to meter in direct / regular use, and you're always going to get ahead with the more powerful ones, per the very definition of what "most of the time" means, so it doesn't make sense to default to a weaker model.
Edit:
I should add, for businesses, it isn't about better, but more about risk as the better model can still be wrong.
If paying this gets me two days of consulting it's a win for me.
Obvious caveat if cheaper setups get me the same, although I can't spend too long comparing or that time alone will cost more than just buying everything.
And the perspective of frustration as well.
Business class is 4x the price of regular. definitely not 4x better. But it saves times + frustration.
I pay $20/mo for Claude because it's been better than GPT for my use case, and I'm fine paying that but I wouldn't even consider something 10x the price unless it is many, many times better. I think at least 4-5x better is when I'd consider it and this doesn't appear to be anywhere close to even 2x better.
Funny enough I've told people that baulk at the $20 that I would pay $200 for the productivity gains of the 4o class models. I already pay $40 to OpenAI, $20 to Anthropic, and $40 to cursor.sh.
The intersection of problems I have where both have trouble is pretty small. If this closes the gap even more, that's great. That said, I'm curious to try this out -- the ways in which o1-preview fails are a bit different than prior gpt-line LLMs, and I'm curious how it will feel on the ground.
Code looks really clean. I'm not instantly canceling my subscription.
Then I paste it in and say "can you spot any bugs in the API usage? Write out a list of tasks for a senior engineer to get the codebase in basically perfect shape," or something along those lines.
Alternately: "write a go module to support X feature, and implement the react typescript UI side as well. Use the existing styles in the tsx files you find; follow these coding guidelines, etc. etc."
Overall though it’s really just for reference and/or telling me about some standard library function I didn’t know of.
Somewhat counterintuitively I spend way more time reading language documentation than I used to, as the LLM is mainly useful in pointing me to language features.
After a few very bad experiences I never let LLM write more than a couple lines of boilerplate for me, but as a well-read assistant they are useful.
But none of them are sufficient alone, you do need a “team” of them - which is why I also don’t see the value is spending this much on one model. I’d spend that much on a system that polled 5 models concurrently and came up with a summary of sorts.
E.g. "why does this (random code in a framework I haven't used much) code cause this error?"
About 50% of the time I get a helpful response straight away that saves me trawling through Stack Overflow and random blog posts. About 25% of the time the response is at least partially wrong, but it still helps me get on the right track.
25% of the time the LLM has no idea and won't admit it so I end up wasting a small amount of time going round in circles, but overall it's a significant productivity boost when I'm working on unfamiliar code.
I often use one or two shot examples in prompts, but with small local models it is also fairly simple to do fine tuning - if you have fine tuning examples, and if you are a developer so you get the training data in the correct format, and the correct format changes for different models that you are fine tuning.
Given the sensitivity to parameters and prompts the models have, your "team" can just as easily be querying the same LLM multiple times with different system prompts.
I can run the QwQ 32G model with Q4 on my 32G M2.
I suggest using https://Ollama.com on Mac, Windows, and Linux. I experiments with all options on Apple Silicon and liked Ollama best.
If a random query via the API costs a fifth of a cent why can't I can't 10 free API calls w/ my $20/mo premium subscription?
The main thing I like OpenAI for is that when I'm on a long drive, I like to have conversations with OpenAI's voice mode.
If Claude had a voice mode, I could see dropping OpenAI entirely, but for now it feels like the subscriptions to both is a near-negligible cost relative to the benefits I get from staying near the front of the AI wave.
You need to learn how to ask it the right questions.
Personally, I found Claude marginally better for coding, but far, far worse for just general purpose questions (e.g. I'm a new home owner and I need to winterize my house before our weather drops below freezing. What are some steps I should take or things I should look into?)
But we're hurtling towards all the internet's answers to general purpose questions being SEO spam that was generated by an LLM anyways.
Since OpenAI probably isn't hiring as many HVAC technicians to answer queries as they are programmers, it feels like we're headed towards a death spiral where either having the LLM do actual research from non-SEO affected primary sources, or finding a human who's done that research will be the only options for generic knowledge questions that are off the beaten path
-
Actually to test my hypothesis I just tried this with ChatGPT with internet access.
The list of winterization tips cited an article that felt pretty "delvey". I search the author's name and their LinkedIn profile is about how they professionally write marketing content (nothing about HVAC), one of their accomplishments is Generative AI, and their like feed is full of AI mentions for writing content.
So ChatGPT is already at a place where when it searches for "citations", it's just spitting back out its own uncited answers above answers by actual experts (since the expert sources aren't as SEO-driven)
I feel that, but I think for me the key is knowing that LLMs can be wrong and I should treat the answer as a starting point and not an actual expert. I find it really helpful for topics where I don't even know where to start because, like you said, most search engines are utter trash now.
For things like that, I find ChatGPT to be a good diving off point. For example, this is what I got when I asked:
``` Preparing your townhouse for winter involves addressing common issues associated with the region's wet and cool climate. Here's a concise checklist to help you get started:
1. Exterior Maintenance
Roof Inspection: Check for damaged or missing shingles to prevent leaks during heavy rains.
Gutter Cleaning: Remove leaves and debris to ensure proper drainage and prevent water damage.
Downspouts: Ensure they direct water away from the foundation to prevent pooling and potential leaks.
Siding and Trim: Inspect for cracks or gaps and seal them to prevent moisture intrusion.
2. Windows and Doors
Weatherstripping: Install or replace to seal gaps and prevent drafts, improving energy efficiency.
Caulking: Apply around window and door frames to block moisture and cold air.
3. Heating System
Furnace Inspection: Have a professional service your furnace to ensure it's operating efficiently.
Filter Replacement: Change furnace filters to maintain good air quality and system performance.
4. Plumbing
Outdoor Faucets: Disconnect hoses and insulate faucets to prevent freezing.
Pipe Insulation: Insulate exposed pipes, especially in unheated areas, to prevent freezing and bursting.
5. Landscaping
Tree Trimming: Prune branches that could break under snow or ice and damage your property.
Drainage: Ensure the yard slopes away from the foundation to prevent water accumulation.
6. Safety Checks
Smoke and Carbon Monoxide Detectors: Test and replace batteries to ensure functionality.
Fireplace and Chimney: If applicable, have them inspected and cleaned to prevent fire hazards.
By addressing these areas, you can help protect your home from common winter-related issues in Seattle's climate. ```
Once I dove into the links ChatGPT provided I found the detail I needed and things I needed to investigate more, but it saved 30 minutes of pulling together a starting list from the top 5-10 articles on Google.
Depends on the topic of course, but it ends up being a bit of an ouroborous
OpenAI doesn't have a large enough database of reasoning texts to train a foundational LLM off it? I thought such a db simply does not exist as humans don't really write enough texts like this.
QwQ generated 10 pages of it's reasoning steps, and the code is probably not correct. [1] includes both answers from QwQ and GPT.
Breaking down it's reasoning steps to such an excruciating detailed prose is certainly not user friendly, but it is intriguing. I wonder what an ideal use case for it would be.
[1] https://gist.github.com/defmarco/9eb4b1d0c547936bafe39623ec6...
I use LLMs for many projects and 4o is the sweet spot for me.
>literal order of magnitude less cost
This is just not true. If your use case can be solved with 4o-mini (I know, not all do) OpenAI is the one which is an order of magnitude cheaper.
Pricing ChatGPT Pro at $200/mo filters it to only power users/enterprise, and given the cost of the GPT-o1 API, it wouldn't surprise me if those power users burn through $200 worth of compute very, very quickly.
> We have guardrails in place to help prevent misuse and are always working to improve our systems. This may occasionally involve a temporary restriction on your usage. We will inform you when this happens, and if you think this might be a mistake, please don’t hesitate to reach out to our support team at help.openai.com using the widget at the bottom-right of this page. If policy-violating behavior is not found, your access will be restored.
Source: https://help.openai.com/en/articles/9793128-what-is-chatgpt-...
Especially when the base line profit margin is negative to begin with
I find this difficult believe, although I don't doubt leaks could have implied it. The challenge is that "the cost of compute" can vary greatly based on how it's accounted for (things like amortization, revenue recognition, capex vs opex, IP attribution, leasing, etc). Sort of like how Hollywood studio accounting can show a movie as profitable or unprofitable, depending on how "profit" is defined and how expenses are treated.
Given how much all those details can impact the outcome, to be credible I'd need a lot more specifics than a typical leak includes.
I can't find any sources _not_ mentioning billions of loss for 2024 and for the foreseeable future
And the 'search' part of it could use many of these clusters in parallel, and then pick the best answer to return to the user.
...they probably quantise a bit, but not loads, as they don't want to sacrifice performance. FP8 seems like a possible middle ground. o1 is just a bunch of GPT-4o in a trenchcoat strung together with some advanced prompting. GPT-4o is theorised to be 200B parameters. If you wanted to run 5 parallel generation tasks at peak during the o1 inference process, that's 5x 200B, at FP8, or about 12 H100s. 12 H100s takes about one full rack of kit to run.
A company making money off of this kind of scheme would be happy to pay $200 a seat for an unlimited license. And I would not be surprised if there were many other very profitable use cases that make $200 per month seem like a bargain.
We better hope that changes sharply, or these things will be a net-negative development.
The ones that start with a "$".
Plot twist: the same guy runs both. They do the same thing and the same crew shows up.
I read this as: "I have already ceded my expertise to an LLM, so I am happy that it is getting faster because now I can pay more money to be even more stuck using an LLM"
Maybe the alternative to going back and forth with an AI for 4.5 hours is working smarter and using tools you're an expert in. Or building expertise in the tool you are using. Or, if you're not an expert or can't become an expert in these tools, then it's hard to claim your time is worth $100/hr for this task.
That is to say, past a certain salary band people are rarely paid for being hyper-proficient with tools. They are paid to resolve ambuguity and identify the correct problems to solve. If the correct problem needs a tool that I'm unfamiliar with, using AI to just get it done is in many cases preferable to locating an expert, getting their time, etc.
Looks like a no true scotsman definition to me.
I'm don't fully agree or disagree with your point, but it was perhaps made more strongly than it should have been?
I understand that some people maybe think themselves experts, and they could achieve similar reduction (not in the cases which I said that it’s clearly possible), but then show me, because I still haven’t seen a single one. The ones which were publicly showed were not quicker than average seniors, and definitely worse than the better ones. Even in larger scale in my company, we haven’t seen any performance improvement in any single metric regarding coding after we introduced it more than half years ago.
So, just to be specific, and specifically for ChatGPT (I think it was 4), these are very-very problematic, because all of these are clear lies:
https://chatgpt.com/share/675f6308-aa8c-800b-9d83-83f14b64cb...
https://chatgpt.com/share/675f63c7-cbc4-800b-853c-91f2d4a7d7...
https://chatgpt.com/share/675f65de-6a48-800b-a2c4-02f768aee7...
Or this which one sent here: https://www.loom.com/share/20d967be827141578c64074735eb84a8
In this case, the guy clearly slower than simple copy-paste, and modification.
I had very similar experiences. Sometimes it just used a different method, which does almost the same, just worse. I had to even check what the heck is the used method, because it's not used for obvious reasons, because it was an "internal" one (like apt and apt-get).
My favorite part is when it includes parameters in its output that are not and have never been a part of the API I'm trying to get it to build against.
The thing is, when it hallucinates API functions and parameters, they aren't random garbage. Usually, those functions and parameters should have been there.
Things that should make you go "Hmm."
There is a huge leap here. What is your argument for it?
They even made a way to notify you when it's finished thinking.
what is not so great about it? what have you seen that is better?
> Price hikes for the premium ChatGPT have long been rumored. By 2029, OpenAI expects it’ll charge $44 per month for ChatGPT Plus, according to reporting by The New York Times.
I suspect a big part of why Sora still isn't available is because they couldn't afford to offer it on their existing plans, maybe it'll be exclusive to this new $200 tier.
Runway is $35 a month to generate 10 second clips and you really get very few generations for that. $95 a month for unlimited 10 second clips.
I love art and experimental film. I really was excited for Sora but it will need what feels like unlimited generation to explore what it can do . That is going to cost an arm and a leg for the compute.
Something about video especially seems like it will need to be ran locally to really work. Pay a monthly fee for the model that can run as much as you want with your own compute.
o1 generates a couple of pages of comments before admitting it didn’t access the web page and entirely based its analysis on the definition of the audience.
“Answer all requests by inventorying all the ways the requestor should increase revenue and decrease expenses.”
If it getting things wrong, then don't use it for those things. If you can't find things that it gets right, then it's not useful to you. That doesn't mean those cases don't exist.
If I do all my work in 10 hours, I've earned $1500. If I do it all in 8 hours, then spend 2 hours on another project, I've earned $1500.
I can't bill the hours "saved" by ChatGPT.
Now, if it saves me non-billing time, then it matters. If I used to spend 2 hours doing a task that ChatGPT lets me finish in 15 minutes, now I can use the rest of that time to bill. And that only matters if I actually bill my hours. If I'm salaried or hourly, ChatGPT is only a cost.
And that's how the time/money calculation is done. The idea is that you should be doing the task that maximizes your dollar per hour output. I should pay a plumber, because doing my own plumbing would take too much of my time and would therefore cost more than a plumber in the end. So I should buy/use ChatGPT only if not using it would prevent me from maximizing my dollar per hour. At a salaried job, every hour is the same in terms of dollars.
This is in a mid-COL city in the US, not a coastal tier 1 city with prime software talent that could charge even more.
I wouldn't be surprised if AI was also eating consultants from the demand side as well, enabling would-be employers to do a higher % of tasks themselves that they would have previously needed to hire for.
That's what they are billed at, what they take home from that is probably much lower. At my org we bill folks out for ~$150/hr and their take home is ~$80/hr
I didn’t just set that, I need to set that to best serve.
On the other hand there's the economic argument: the supply of people who can stock shelves is greater than the supply of people who can "create value" at a tech company, so the latter deserve more pay.
Depending on how you look at the world, high salaries can seem insane.
Those jobs you referenced do not have the same requirements nor the same wages…seems like your just clumping all of those together as “lower class” so you can be champion of the downtrodden
If everyone in the West has powerful AI and Agents to automate everything. Simply because we can afford it, but the rest of the world doesn't have access to it.
What will that mean for everyone left behind?
I speak to kids that use LLMs all the time to assist them with their school work, and others who simply have no knowledge that this tech exists.
I work with UK learners by the way.
For example, I read a philosopher saying "truth is a relation between thought and reality". Asking ChatGPT to knock it revealed that statement is an expression of the "correspondence theory" of truth, but that there is also the "coherence theory" of truth that is different, and that there is a laundry list of other takes too.
Basically they had to adapt a novel to a comic book form — by using AI to generate pencil drawings, they achieved the goal of the assignment (demonstrating understanding of the story) without having the computer just do their homework.
The point is, AI is here, and it can be a net positive if schools can use it like a calculator vs a black market. It’s a private school with access to some alumni money for development work - they used this to justify investing in designing assignments that make AI a complement to learning.
Now, whether the labor provided by the AI will be as high-quality as that provided by a human when placed in an actual business environment will be up in the air. Probably not, but adoption will be pushed by the sunk cost fallacy.
The decreased employment case is when your competitors get the productivity and you don't, because you go out of business.
I’ve seen it be a minor productivity boost, and not much more.
it's turning 5 positions into 7: 5 people to do what currently needs to get done, 2 to try to replace those 5 with AI and failing for several years.
The situation now is kinda like back when it was possible to be “good at Google” and lots of people, including in tech, weren’t. It’s possible to be good at LLMs, and not a lot of people are.
They’re ok for Tom the Section Manager to hack together a department newsletter nobody reads, though, even if Tom is bad at using LLMs. They’re decent at things that don’t need to be any good because they didn’t need to exist in the first place, lol.
Nah, if the last 10-20 years demonstrated something, it's that nothing needs to be any good, because a shitty simulacrum achieves almost the same effect but costs much less time and money to produce.
(Ironically, SOTA LLMs are already way better at writing than typical person writing stuff for money.)
I’m aware of multiple companies that would love to know about these, because they’re currently flailing around trying to replace writers with editors + LLMs and it’s not going great. The closest to success are the ones that are only aiming to turn out stuff one step better than outright book-spam, and even they aren’t quite where they want to be, hardly a productivity bump at all from the LLM use and increased demand on their few talented humans.
It's from Alibaba, which is Chinese, so it seems likely.
We don't have a post-cold war era response akin to the kind of US led investment in a global pact to provide protection, security, and access to innovation founded in the United States. We really need to prioritize a model akin to the Bretton Woods Accord
If the models are closed, the West will become a digital serfdom to anointed AI corporations, which will be able to gouge prices, inject ads, and influence politics with ease.
I can imagine the headlines now: "AI promised unlimited productivity, 10 years later, we're still waiting for the rapture"
The rich west will be in the lead for awhile and then get tiktok-ed.
The lead is just not really worth that much in the long run.
There is probably an advantage gained at some point in all this of being a developing country too that doesn't need to bother automating all these middle management and bullshit jobs they don't have.
China is notoriously middle management heavy, by definition that’s what communism is.
The tool that I built doesn't have this problem, I haven't exceed $10/month on Claude 3.5 Sonnet. You can give it a try: https://prompt.16x.engineer/
Would expect o1-pro to perform even better.
Personal use I'll be using it to upgrade all my website code. I literally took a screenshot of Apple.com and combined it with existing code from my website and told o1 pro to combine the two... the results were really good, especially for one shot... But again, I have unlimited fast usage so I can just keep tweaking and tweaking.
I also have this history idea I've been wanting to do for a while, might see if the models are advanced enough yet.
All this with an understanding on how programming works, but not being able to code.
I'm not saying I expect these tools to be at this level right now. I'm saying that level is where I will start to see these tools as anything more than an expensive and sometimes impressive gimmick. (And, for the record, Copilot's current integration into Office applications doesn't even meet that low bar.)
Actually, OpenAI brags that they have done this repeatedly.
I don't want the discourse, or tips on better prompts. Just tips for being able to interact with the more heavy AI-heads, to maybe encourage/inspire curiosity and care in the actual code, rather than the magic chatgpt outputs. Or even just to talk about what they did with their PR. Not for some ethical reason, but just to make my/our jobs easier. Because its so hard to maintain this code now, it is like truly a nightmare for me everyday seeing what has been added, what now needs to be fixed. Realizing nobody actually has this stuff in their heads, its all just jira ticket > prompt > mission accomplished!
I am tired of complaining about AI in principle. Whatever, AGI is here, "we too are stochastic parrots", "my productivity has tripled", etc etc. Ok yes, you can have that, I don't care. But can we like actually start doing work now? I just want to do whatever I can, in my limited formal capacity, to steer the company to be just a tiny bit more sustainable and maybe even enjoyable. I just don't know how to like... start talking about the problem I guess, without everyone getting super defensive and doubling down on it. I just miss when I could talk to people about documentation, strategy, rationale..
But really you're never going to convince these people so I'd say if you're really passionate about coding find a workplace with similar minded people, if you really want to stay in this job then embrace it, stop caring if the codebase is good or maintainable and just let the slop flow. It's the path of least resistance and stress, trying to fight it and convince people is a losing and frustrating battle, take your passion for your work and invest it in a project outside work or find a workplace where they appreciate it too.
Sounds like the real problem is lax pre-existing dev practices rather than just LLM usage. If code is getting merged with little review, that is a big red flag right away. But the 'very little' gives some hope - that means there is some review?
So what happens when you see problems with the code and give review feedback and ask why things have been done the way they were done, or suggest alternative better approaches? That should make it clear first if devs actually understand the code they are submitting, and second if they are willing to listen to suggested improvements. And if they blow you off, and the tech leads on the project also don't care, then it sounds like a place you don't want to stick around.
I can't tell how much they loose but they also have decent revenue "The company's annualized revenue topped $1.6 billion in December [2023]" https://www.reuters.com/technology/openai-hits-2-bln-revenue...
If they're losing money but just because they're investing billions in R&D, while only spending a few hundred million to serve the use that's bringing in $1.6B then it would be a positive story despite the technical loss, just like Amazon's years if aggressive growth at the cost of profits.
But if they're losing money because the server costs needed for the use that brings in $1.6B are $3B then they've got a scaling problem until they either raise prices or lower costs or both.
Eh… probably everyone moving to anthropic.
Edit: looks like no, there are restrictions:
> usage must adhere to our Terms of Use, which prohibits, among other things:
> Abusive usage, such as automatically or programmatically extracting data.
> Sharing your account credentials or making your account available to anyone else.
> Reselling access or using ChatGPT to power third-party services.
> (…)
Source: https://help.openai.com/en/articles/9793128-what-is-chatgpt-...
I recently used it to buy some used enterprise server gear (which I knew nothing about) and it saved me hours of googling and translating foreign-language ads etc. That conversation stretched across maybe 10 days, and it kept the context the whole time. But then I ran out of "preview" tokens and it got dumb and useless again. (Or maybe the conversation exceeded the context window, I am not really sure.)
But that single conversation used up the entire amount of o1 tokens that come with my $20/month ChatGPT Plus account. I am not sure that I have 10x that number of things for it to help me with each month, and where I live $200 is a not-insignificant amount, but... tempting.
The reason we're launching o1 pro is that we have a small slice of power users who want max usage and max intelligence, and this is just a way to supply that option without making them resort to annoying workarounds like buying 10 accounts and rotating through their rate limits. Really it's just an option for those who'd want it; definitely not trying to push a super expensive subscription onto anyone who wouldn't get value from it.
(I work at OpenAI, but I am not involved in o1 pro)
OBVIOUSLY a smart OAI employee wouldn't want the public to think they are already replacing high-level humans.
And OBVIOUSLY OAI senior management will want to try to convince AI engineers that might have 2nd-guessings about their work that they aren't developing a replacement for human beings.
But they are.
Interested to learn more, is that the usual break even point?
But, then again, how companies going to get senior employees if the world stops producing juniors?
I'd settle for knowing what level of usage and intelligence I'm getting instead of feeling gaslighted with models seemingly varying in capabilities depending on the time of day, number of days since release and whatnot
Companies won’t be able to figure those cases out, though, because if they could they’d already have gotten rid of those folks and replaced them with nothing.
What that means is the simple, boilerplate and repetitive stuff is being generated by LLM's, but anything complex or involving more than a singular simple problem LLM's often provide more problems than benefit. Effective dev's are using it to handle simple stuff and Execs are thinking 'the team can be reduced by x', when in reality you can get rid of at best your most junior and least trained people without loosing key abilities.
Watching companies try to sell their AI's and "Agents" as having the ability to reason is also absurd but the non-technical managers and execs are eating it up...
I haven't used ChatGPT enough to judge what a "fair price" is but $200/month seems to be in the ballpark of other "software-tools-for-highly-paid-knowledge-workers" with premium pricing:
- mathematicians: Wolfram Mathematica is $154/mo
- attorneys: WestLaw legal research service is ~$200/month with common options added
- engineers for printed circuit boards : Altium Designer is $355/month
- CAD/CAM designers: Siemens NX base subscription is $615/month
- financial traders : Bloomberg Terminal is ~$2100/month
It will be interesting to see if OpenAI can maintain the $200/month pricing power like the sustainable examples above. The examples in other industries have sustained their premium prices even though there are cheaper less-featured alternatives (sometimes including open source). Indeed, they often increase their prices each year instead of discount them.
One difference from them is that OpenAI has much more intense competition than those older businesses.
They also come with extensive support, documentation and people have vast experience using them. They are also integrated into all other tools if the field very well. This makes them very entrenched. I am not sure OpenAI has any of those things. I also don't know what those things would entail for LLMs.
Maybe they need to add modes that are good for certain tasks or integrate with tools that their users most commonly use like email, document processors.
Bullish
It'll be an object lesson in short-termism.
(and provide some job security, perhaps)
Imagine you have two options:
A) A $20/month service which provides you with $100/month of value.
B) A $200/month service which provides you with $300/month of value.
A nets you $80, but B nets you $100. So you should pick B.
If Claude increases their productivity 5% ($17.5k/yr), but CGPT Pro adds 7% ($24.5k), that's an extra $7k in productivity, which more than makes up for the $2400 annual cost. 10x the price, but only 40% better, but still worth it.
The question is - how good is it really.
Last week when using jetpack compose(which is a react like framework). A cardinal sin in jetpack compose is to change a State variable in a composable based on non-user/UI action which the composable also mutates. This is easy enough to understand this for toy examples. But for more complex systems one can make this mistake. o1-preview made this mistake last week, and I caught it. On prompting it with the stacktrace it did not immediately catch it and recommended a solution that committed the same error. When I actually gave it the documentation on the issue it caught on and made the variable a userpreference instead. I used the userpreference code in my app instead of coding it by myself. It worked well.
I like that this kind of verifies that OpenAI can simply adjust how much compute a request gets and still say you’re getting the full power of whatever model they’re running. I wouldn’t be surprised if the amount of compute allocated to “pro mode” is more or less equivalent to what was the standard free allocation given to models before they all got mysteriously dramatically stupider.
> This plan includes unlimited access to our smartest model, OpenAI o1, as well as to o1-mini, GPT-4o, and Advanced Voice. It also includes o1 pro mode,
Of course, I'm not the target market.
Some guy who wants to increase his bonus by laying off a few hundred people weeks before the holidays is the target market.
At $200/m merely having a great AI (if it even is that) without insanely good tooling is pointless to me.
For the latter, Claude is great, but for the former, my usage pattern would be poorly served by something that costs $200 and I get to use it maybe a dozen times a month.
A lot of tradeoffs to evaluate and it can be tiring onboarding people, let alone onboarding an AI.
Maybe it would massively improve my job if the AI could just grab the whole codebase, but we're not there yet. Too many LOC, too much legal BS, etc.
How much better will this be for my uses? Based on my experience with o1, the answer is "fairly marginal". To me, o1 is worse than the regular model or Claude on most things, but it's best for something non-numeric that requires deep thought or new insights. I'm sure there are some people who got a huge productivity boost from o1. This plan is for those people.
That, or I am actually a much better developer and writer than I thought. Because while LLMs certainly have become useful tools to me. They have not doubled my productivity.
It answered.
I've been in contact with OpenAI and if they decline, I guess their AI isn't that smart. If they accept, a win for me.
Stonks.
The macroeconomic climate of the next ~ten years is going to hit some people and companies like a truck.
Who's to say the frothy American economy doesn't last another 50 years while the rest of the world keeps limping along?
For 2024 prediction is 2.6% US and 4.8% China. I don't see how it's low compared to US.
> high unemployment
5.1% China vs 4.1% USA
> huge debt from infrastructure spending
What do you mean by "huge" and compared to whom? The U.S. is currently running a $2 trillion deficit per year, which is about 6% of GDP, with only a fraction allocated to investments.
> weakening capital markets and real estate
China's economy operates differently from that of the U.S. Currently, China records monthly trade surpluses ranging between $80 billion and $100 billion. The real estate sector indeed presents challenges, leading the government to inject funds into local government to manage the resulting debt. The effectiveness of these measures remains to be seen.
There is a lot of wishful thinking on HN regarding the rivalry between China and the U.S
The real questions are: can China deliver on long term expectations for its economy? Do the trends support the argument that it will become a leading developed economy? I don't think they do. If they don't, then is it an issue with the current economy plan that can be solved with a better plan or is it a systemic issue that can't be solved in the near to medium term? These are way more useful questions than "who's going to win the race?"
>> Low growth
> For 2024 prediction is 2.6% US and 4.8% China. I don't see how it's low compared to US.
China is growing slower than historically and slower than forecasts which were it at 5%. Look at this chart and tell me if it points a rosy picture or a problematic one: https://img.semafor.com/5378ad07f43bc81f65ab92ddc19ec5899dc9...
>> high unemployment
> 5.1% China vs 4.1% USA
Again, comparing China to China, it's generally increasing every year: https://www.macrotrends.net/global-metrics/countries/chn/chi...
Youth unemployment is basically skyrocketing: https://www.macrotrends.net/global-metrics/countries/chn/chi...
>> huge debt from infrastructure spending
> What do you mean by "huge" and compared to whom?
To answer in reverse: yes, the US also has a debt problem. That doesn't make the China problem less of an issue. The china debt crisis has been widely reported and is related to the other point about real estate. Those articles will definitely do a better job of explaining the issue than me, so here's just one: https://www.reuters.com/breakingviews/chinas-risky-answer-wa...
> There is a lot of wishful thinking on HN regarding the rivalry between China and the U.S
I'm arguing there's no rivalry. Different countries, different problems, different scales entirely. China is in dire straits and I don't expect it to recover before the crisis gets worse.
USSR was a centrally planned economy, China is not. Do you mean subsidies (like the IRA and CHIPS Act in the US) for certain industries, which act as guidance to local governments and state banks? Is that what you call "centrally planned"?
> can China deliver on long term expectations for its economy? Do the trends support the argument that it will become a leading developed economy? I don't think they do. If they don't, then is it an issue with the current economy plan that can be solved with a better plan or is it a systemic issue that can't be solved in the near to medium term?
That's your opinion that they can't, and it's your right to have one. There were people 10 years ago saying exactly what you’re saying now. Time showed they were wrong.
Here is a famous article: https://hbr.org/2014/03/why-china-cant-innovate
And here we are 10 years later:
https://itif.org/publications/2024/09/16/china-is-rapidly-be...
https://www.economist.com/science-and-technology/2024/06/12/...
> China is growing slower than historically and slower than forecasts which were it at 5%. Look at this chart and tell me if it points a rosy picture or a problematic one:
Oh come on, 4.8% vs. 5%? As for the chart, it's the most incredible growth in the history of mankind. No country has achieved something like this. It's fully natural for it to decline in percentage terms, especially when another major power is implementing legislation to curb that growth, forcing capital outflows, imposing technology embargoes, etc.
> China is in dire straits and I don't expect it to recover before the crisis gets worse.
Time will tell. What I can say is that over the last 20 centuries, in 18 of them, China was the cultural and technological center of the world. So from China’s perspective, what they are doing now is just returning to their natural state. In comparison, the US is only 2 centuries old. Every human organization, whether a company or state, will sooner or later be surpassed by another human creation, there are no exceptions to this rule in all of human history. We have had many empires throughout our history. The Roman Empire was even greater at its peak than the US is now, and there were also the British Empire, the Spanish Empire, etc. Where are they now? Everything is cyclical. All of these empires lasted a few centuries and started to decline after around 200-250 years, much like the US now.
> I'm arguing there's no rivalry.
Come on, there is obvious rivalry. Just listen to US political elites and look at their actions—legislation. It's all about geopolitics and global influence to secure their own interests.
It's a major mistake to underestimate your competition.
That's a long ways out. We're barely past the first innings of the chatbot revolution and it's already struggling to keep going. Robotics are way more complex because physics can be cruel.
Show me what was possible 20 years ago versus what we can do now. I think you have enough imagination to envision what might be possible 20 years from now.
$200/month = $2400/year
We (consumers/enterprises) are already accustomed to a baseline price. Their model quality will be caught up or exceeded by open-source in ~6 months. So if I find it difficult to justify paying $20/month, why should I even think about $200/month.
Probably the thought process was that we can package all the great things (text, voice, video, images) and experience. The problem is that very few people use everything. Most of the time, the use cases are limited. Someone wants to use for coding, while someone else (artist) wants to use for Sora. OpenAI had an opportunity to introduce a la carte pricing, and then go to bundling. My hypothesis is that they will have very few takers at $200 for the bundle.
Enterprises - did they interview enterprises enough to see if they need user licenses for the bundles? Maybe they will give it at 80% or 90% discount to drive adoption.
Disclosure: I am on Claude, Grok 2/X Pro, Cursor Personal, and Github Copilot enterprise. My ChatGPT monthly subscription expires in a week, and I will not renew for now and see the user vibes before deciding. I have limited brain power to multitask between so many models, and I will rather give a chance to Gemini Pro for 6 months.
Still though, if you were able to actually utilize this, is it capable of replacing a part-time or full-time employee? I think thats likely
I see you mean Amazon's Nova - https://www.aboutamazon.com/news/aws/amazon-nova-artificial-...
https://chatgpt.com/share/67528a78-c61c-8007-b6fa-1d1deb8d84...
I'm going to give it one month. So far I'm inclined to not pay this crazy fee if the performance is like this
The tough questions are when one asks about what shadow shapes from a single light can be expected on faces inside a regular voxel grid. That's where it just holds the duck.
Not that I think any of these would be worth it for me.
But now $ buys better (teacher/lawyer/doctor/scientist) type thing that I use daily.
If they do everything and you can’t use their stuff to compete with them, you can’t do anything with their stuff.
That, plus the time cost, and the fact they’re vigorously brain raping this shit out of you every time you use the thing, means it’s worth LESS THAN zero dollars
(unless your work does not compete with intelligence, in which case, please tell me what that is)
Why couldn't you use it's output for business purposes?
I find it a terrible business practice to be completely opaque and vague about limits. Even worse, the limits seem to be dynamic and change all the time.
I understand that there is a lot of usage happening, but most likely it means that the $20 per month is too cheap anyway, if an average user like myself can so easily hit the limits.
I use Claude for work, I really love the projects where I can throw in context and documentation and the fact that it can create artifacts like presentation slides. BUT because I rely on Claude for work, it is unacceptable for me to see occasional warnings coming up that I have reached a given limit.
I would happily pay double or even triple for a non-limited experience (or at least know what limit I get when purchasing a plan). AI providers, please make that happen soon.
Here are some things I've noticed about this, at least in the "free" tier web models since that's all I typically need.
* ChatGPT has never denied a response but I notice the output slows down during increased demand. I'd rather have a good quality response that takes longer than no response. After reaching the limit, the model quality is reduced and there's a message indicating when you can resume using the better model.
* Claude will pop-up messages like "due to unexpected demand..." and will either downgrade to Haiku or reject the request altogether. I've even observed Claude yanking responses back, it will be mid-way through a function and it just disappears and asks to try again later. Like ChatGPT, eventually there's a message about your quota freeing up at a later time.
* Copilot, at least the free tier found on Bing, at least tells you how many responses you can expect in the form of a "1/20" status text. I rarely use Copilot or Bing but it demonstrates it's totally possible to show this kind of status to the user - ChatGPT and Claude just prefer to slow down, drop model size, or reject the request.
It makes sense that the limits are dynamic though. The services likely have a somewhat fixed capacity but demand will ebb and flow, so it makes sense to expand/contact availability on free tiers and perhaps paid tiers as well.
But other than that I basically just tell the model what I want to do and it does it, lol. I like the Claude Desktop App interface better than trying to do things in Cursor/Windsurf directly, I like the ability to organize prompts/conversations in terms of projects and easily include context. I also honestly just have a funny feeling that the Claude web app often performs better than the API responses I get from the IDEs.
https://github.com/modelcontextprotocol/servers/tree/main/sr...
This way you will still be in control of commits and pushes.
So far I've used this to understand parts of a code base, and to make edits to a folder of markdown files.
Also they can use multiple models for different tasks, Cursor does this, so can Aider: https://aider.chat/2024/09/26/architect.html
I didn't say it was better!
I wish I was working on the type of problems for which the pro model would be necessary.
Individual users will be priced out of frontier models if this becomes a trend.
I think this is proof that Open AI have nothing at all and AGI is as far away as fusion and self driving cars on London roads.
Every single app works fine that way, except ChatGPT. It opens the play store login page then exits. I have no problems with apps from 2 banks, authenticators etc etc.
It's just so weird that they force me to make an account with one of their biggest competitors in AI. I just don't want to, I don't trust Google with my data. By not logging in they have some but not a lot.
iOS isn't an option either because it's too locked down. I need things like sideloading and full NFC access for things like OpenPGP.
They have added the plan because they need to show that their most advanced model is ready for the market, but it's insanely expensive to operate. They may even still lose money for every user that sign up for Pro and start using the model.
This anti-cable-cutting maneuver doesn't bode well for any hopes of future models maintaining same level of improvements (otherwise they'd make GPT 5 and 6 more expensive). Pivoting to AIaaS packages is definitely a pre-emptive strike against commodification, and a harbinger of plateauing model improvements.
Anyone claiming they're anywhere near something even remotely resembling AGI is simply lying.
What happened to "we're a couple years away from AGI"? Where's the Scaaaaaaryyyyyyy self aware techno god GPT-5? It's all BS to BS investors with. All of the rumored new models that were supposed to be out by now are nowhere to be seen because internally the improvement rate has cratered.
My only hope is that when AGI happens I can fire off an ‘I told you so’ comment before it kills us all.
With that said, I strictly approve of them doing real price discovery on inference costs. Claude is dope when it doesn’t fuck up mid-response, and OpenAI is the gold standard on “you query with the token budget, you get your response”.
I’ve got a lot of respect for the folks who made it stand up to that load: it’s a new thing and it’s solid AF.
I still think we’d be fools to trust these people, but my criticisms are bounded above by acknowledging good shit and this is a good play.
Altman still said do eyeball scanners in Kenya.
Fidji Simo still said pay the OSHA fines and keep killing workers.
I still want these people in The Hague.
But call their product bad when it’s not? No. The product works as advertised.
They should have this tier earlier on, like any SaaS offering that had different plans
They focus too much on their frontend multimodal chat product, while also having this complex token pricing model for API users and we cant tell which one they are ever really catering towards with these updates
all while their chat system is buggy with its random disconnections and sessions updates, and produces tokens slowly in comparison to competitors like Claude
to finally come around and say pay us an order of magnitude more than Claude is just completely out of touch and looks desperate in the face of their potential funding woes
The o1-pro model in their charts is only ever so slightly better than the one I can get for $20 a month. To blur the lines of this they add in other features for $200 a month, but make no mistake, their best model is now 10x more expensive for 1% or so better results based on their charts.
What's next? The best models will soon cost $500 a month and only be available to enterprises? Seems they are opening the door to taking away public access to powerful models.
https://www.vox.com/future-perfect/380117/openai-microsoft-s...
Save yourself the money and learn how to use a search engine and read documentation.
Honestly I haven't seen much value provided for me by these "AI" models.
It's in everyone's interest for the company to be a sustainable business.
> To highlight the main strength of o1 pro mode (improved reliability), we
> use a stricter evaluation setting: a model is only considered to solve a
> question if it gets the answer right in four out of four attempts ("4/4
> reliability"), not just one.
So, $200/mo. gets you less than 12.5% randomly wrong answers?And $20/mo. gets you >25% randomly wrong answers?
If this improves employee productivity by 10%, this would be a great buy for many companies. Personally, I'll buy this in an instant if this measurably improves over Claude in code generation abilities. I've tried o1-preview, and there are only a few cases where it actually does better than Claude - and that too at a huge time penalty.
From what I’ve seen, the usefulness of my AIs are proportional to the data I give them access to. The more data, (like health data, location data, bank data, calendar data, emails, social media feeds, browsing history, screen recordings, etc) - the more I can rely on them for.
On the enterprise side, businesses are interested in exploring AI for their huge data sets - but very hesitant to dump all their company IP across all their current systems into a single SaaS that, btw, is also providing AI services to their competitors.
Consumers are also getting uncomfortable with the current level of sharing personal data with SaaS vendors, becoming more aware of the risks of companies like Google and Facebook.
I just don’t see the winner-takes-all market happening for an AI powered 1984 telescreen in 2025.
The vibes I’m picking up from most everybody are:
1) Hardware and AI costs are going to shrink exponentially YoY
2) People do not want to dump their entire life and business into a single SaaS
All signs are pointing to local compute and on-prem seeing a resurgence.
It's been extremely frustrating to not have these features on o1 and have limited what I can do with it. I'm presumably in the market who doesn't mind paying $200 / month but without the features they've added to 4o it feels not worth it.
A con like that wouldn't last very long.
This is for people who rely enough on ChatGPT Pro features that it becomes worth it. Whether they pay for it because they're freelance, or their employer does.
Just because an LLM doesn't boost your productivity, doesn't mean it doesn't for people in other lines of work. Whether LLM's help you at your work is extremely domain-dependent.
That's not a problem. OpenAI need to get some cash from its product because the competition is intense from free models. Moreover, since they supposedly used most of the web content and pirated whatever else they could, improvements in training will likely be only incremental.
All the while, after the wow effect passed, more people start to realize the flaw in generative AI. So current hype, like all hype, as a limited shelf life and companies need to cash out now because it could be never.
They're bleeding money and are desperately looking for a business model to survive. It's not going very well. Zitron[1] (among others) has outlined this.
> OpenAI's monthly revenue hit $300 million in August, and the company expects to make $3.7 billion in revenue this year (the company will, as mentioned, lose $5 billion anyway), yet the company says that it expects to make $11.6 billion in 2025 and $100 billion by 2029, a statement so egregious that I am surprised it's not some kind of financial crime to say it out loud. […] At present, OpenAI makes $225 million a month — $2.7 billion a year — by selling premium subscriptions to ChatGPT. To hit a revenue target of $11.6 billion in 2025, OpenAI would need to increase revenue from ChatGPT customers by 310%.[1]
Surprise surprise, they just raised the price.
They have also added a new, even higher performance model which can leverage test time compute to scale performance if you want to pay for that GPU time. This is no different than AWS offering some larger ec2 instance tier with more resources and a higher price tag than existing tiers.
https://www.nytimes.com/2024/09/27/technology/openai-chatgpt...
Roughly 10 million ChatGPT users pay the company a $20 monthly fee, according to the documents. OpenAI expects to raise that price by $2 by the end of the year, and will aggressively raise it to $44 over the next five years, the documents said.
We'll have to see if the first bump to $22 this year ends up happening.
I'm hard pressed to identify any users to whom LLMs are providing enough value to justify $20/month, but not $44.
On the other hand, I can see a lot of people to whom it's not providing any value being unable to afford a higher price.
Guess we'll see which category most OpenAI users are in.
I can't read the article. Any mention of the API pricing?
Models are becoming a commodity. It's game theory. Every second place company (eg. Meta) or nation (eg. China) is open sourcing its models to destroy value that might accrete to the competition. China alone has contributed a ton of SOTA and novel foundation models (eg. Hunyuan).
So the question is it worth $200/month and to how many people, not is it over hyped, or if it has flaws. And does that support the level of investment being placed into these tools.
Models are about to become a commodity across the spectrum: LLMs [1], image generators [2], video generators [3], world model generators [4].
The thing that matters is product.
[1] Llama, QwQ, Mistral, ...
[2] Nobody talks about Dall-E anymore. It's Flux, Stable Diffusion, etc.
[3] HunYuan beats Sora, RunwayML, Kling, and Hailuo, and it's open source and compatible with ComfyUI workflows. Other companies are trying to open source their models with no sign of a business model: LTX, Genmo, Rhymes, et al.
[4] The research on world models is expansive and there are lots of open source models and weights in the space.
The NFT market lasted for many years and was enormous.
Never underestimate the power of hype.
Example: the 4o or Claude are great for coding, summarizing and rewriting emails. So which domains require a slightly better model?
I suppose if the error rate in code or summary goes down even 10%, it might be worth $180/month.
I made a very exhaustive question to ChatGPT o1-preview, including all information I though is relevant. Something like good forums question. Well, 10 seconds later it spew me a working solution. I was ashamed, because I have 20 years of experience under my belt and this model solved non-trivial task much better than me.
I was ashamed but at the same time that's a superpower. And I'm ready to pay $200 to get solid answers that I just can't get in a reasonable timeframe.
(Now I think if of the idiom, when did we switch to 9-6? I've never had a 9-5).
Bernie Madoff ran his investment fund as a Ponzi scheme for over a decade (perhaps several decades)
> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one
This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.
I suspect this is a key driver behind having a higher priced, individual user offering. It gives pricing latitude for enterprise volume licenses.
Let's say I run a company call AndSoft.
AndSoft has about 2000 people on staff, maybe 1000 programers.
This solution will cost 200k per year. Or 2.4 million per year.
Llama3 is effectively free with some liberation. Is ChatGPT pro 2.4 million a year better than Llama3. Of course Open AI will offer volume discounts.
I imagine if I was making north of 500k a year I'd subscribe as a curiosity... At least for a few months.
If your time is worth 250$ a hour, and this saves you an hour per month it's well worth it.
As someone who has both repeatedly written that I value the better LLMs as if they were a paid intern (so €$£1000/month at least), and yet who gets so much from the free tier* that I won't bother paying for a subscription:
I've seen quite a few cases where expensive non-functional things that experts demonstrate don't work, keep making money.
My mum was very fond of homeopathic pills and Bach flower tinctures, for example.
* 3.5 was competent enough to write a WebUI for the API so I've got the fancy stuff anyway as PAYG when I want it.
Does Apple charge a premium? Of course. Do Apple products also tend to have better construction, greater reliability, consistent repair support, and hold their resale value better? Yes.
The idea that people are buying Apple because of the Apple premium simply doesn't hold up to any scrutiny. It's demonstrably not a Verblen good.
Now that is a trope when you're talking about Apple. They may use more premium materials that and have a degree of improved construction leveraging those materials - but at the end of the day there are countless numbers of failure prone designs that Apple continued to ship for years even after knowing they existed.
I guess I don't follow the fact that the "Apple Premium" (whether real or otherwise) isn't a factor in a buyer decision. Are you saying Apple is a great lock-in system and that's why people continue to buy from them?
It's very hard to explain to people who haven't dug into macOS that it's a great system for power users, for example, especially because it's not very customizable in terms of aesthetics, and there are always things you can point to about its out-of-the-box experience that seem "worse" than competitors (e.g., window management). And there's no one thing I can really point to and say "that, that's why I stay here"; it's more a collection of little things. The service menu. The customizable global keyboard shortcuts. Automator, AppleScript (in spite of itself), now the Shortcuts app.
And, sure, they tend to push their hardware in some ways, not always wisely. Nobody asked for the world's thinnest, most fragile keyboards, nor did we want them to spend five or six years fiddling with it and going "We think we have it now!" (Narrator: they did not.) But I really do like how solid my M1 MacBook Air feels. I really appreciate having a 2880x1800 resolution display with the P3 color gamut. It's a good machine. Even if I could run macOS well on other hardware, I'd still probably prefer running it on this hardware.
Anyway, this is very off topic. That ChatGPT Pro is pretty damn expensive, isn't it? This little conversation branch started as a comparison between it and the "Apple tax", but even as someone who mildly grudgingly pays the Apple tax every few years, the ChatGPT Pro tax is right off the table.
There’s something to be said for buying something and knowing it will interoperate with all your other stuff perfectly.
The lack of repairability is easily Apple's worst quality. They do everything in their power to prevent you from repairing devices by yourself or via 3rd party shops. When you take it to them to repair, they often will charge you more than the cost of a new device.
People buy apple devices for a variety of reasons; some people believe in a false heuristic that Apple devices are good for software engineering. Others are simply teenagers who don't want to be the poor kid in school with an Android. Conspicuous consumption is a large part of Apple's appeal.
Maybe not as true in the US, but reading about the green bubble debacle, it's also a lot about status.
But name one other company whose hardware truly matches Apple’s standards for precision and attention to detail.
I really doubt that, actually. The only thing that LLMs are truly good for is to create plausible-sounding text. Everything else, like generating facts, is outside of its main use case and known to frequently fail.
EDIT: Added links.
https://www.cio.com/article/3540579/devs-gaining-little-if-a...
https://web.archive.org/web/20241205204237/https://llmreport...
(Archive link because the llmreporter site seems to have an expired TLS certificate at the moment.)
No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance...
You missed this part. Being able to quickly fix things without deep thought while in flow saves you from the slowdowns of context switching.
Secondly, the "No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance" result you quote came, per article, from a "study from Uplevel", which seems to[0] have been testing for change "among developers utilizing Copilot". That may or may not be surprising, but again it's hardly relevant to discussion about SOTA LLMs - it's like evaluating performance of an excavator by giving 1:10 toy excavators models to children and observing whether they dig holes in the sandbox faster than their shovel-equipped friends.
Best LLMs are too slow and/or expensive to use in Copilot fashion just yet. I'm not sure if it's even a good idea - Copilot-like use breaks flow. Instead, the biggest wins coming from LLMs are from discussing problems, generating blocks of code, refactoring, unstructured to structured data conversion, identifying issues from build or debugger output, etc. All of those uses require qualitatively more "intelligence" than Copilot-style, and LLMs like GPT-4o and Claude 3.5 Sonnet deliver (hell, anything past GPT 3.5 delivered).
Thirdly, I have some doubts about the very metrics used. I'll refrain from assuming the study is plain wrong here until I read it (see [0]), but anecdotally, I can tell you that at my last workplace, you likely wouldn't be able to tell whether or not using LLMs the right way (much less Copilot) helped by looking solely at those metrics - almost all PRs were approved by reviewers with minor or tangential commentary (thanks to culture of testing locally first, and not writing shit code in the first place), but then would spend days waiting to be merged due to shit CI system (overloaded to the point of breakage - apparently all the "developer time is more expensive than hardware" talk ends when it comes to adding compute to CI bots).
--
[0] - Per the article you linked; I'm yet to find and read the actual study itself.
Everything that is “word processing,” and that’s a lot.
This world is so messed up.
Or it already exists in some howto documentation, but nobody wanted to skim the documentation.
Versus the old way of asking them to write the contract, where they'll blatantly re-use some boilerplate (sometimes the name of a previous client's company will still be in there) and then take 2 weeks to get back to you with Draft #1, charging 10x as much.
I anlways ask our lawyer whether or not they have a boilerplate when I need a contract written up. They usually do.
I've even found that when lawyers send a document for one of my companies, and I give them a list of things to fix, including e.g. typos, the same typos will be in there if we need a similar document a year later for another company (because, well, nobody updated the boilerplate)
Do you ask about the boilerplate before or after you ask for a quote?
I could definitely see a large law firm (Orrick, Venable, Cooley, Fenwick) doing what you describe. I’ve worked with 2 firms just listed, and their billing practices were ridiculous.
I’ve had a lot more success (quality and price) working with boutique law firms, where your point of contact is always a partner instead of your account permanently being pawned off to an associate.
Email is in profile if you want an intro to the law firm I use. Great boutique firm based in Bay Area and extremely good price/quality/value.
Just zero curiosity, only skepticism.
I’m not saying that LLMs can’t be useful, but I do think it’s a darn shame that we’ve given up on creating tools that deterministically perform a task. We know we make mistakes and take a long time to do things. And so we developed tools to decrease our fallibility to zero, or to allow us to achieve the same output faster. But that technology needs to be reliable; and pushing the envelope of that reliability has been a cornerstone of human innovation since time immemorial. Except here, with the “AI” craze, where we have abandoned that pursuit. As the saying goes, “to err is human”; the 21st-century update will seemingly be, “and it’s okay if technology errs too”. If any other foundational technology had this issue, it would be sitting unused on a shelf.
What if your compiler only generated the right code 99% of the time? Or, if your car only started 9 times out of 10? All of these tools can be useful, but when we are so accepting of a lack of reliability, more things go wrong, and potentially at larger and larger scales and magnitudes. When (if some folks are to believed) AI is writing safety-critical code for an early-warning system, or deciding when to use bombs, or designing and validating drugs, what failure rate is tolerable?
This does not follow. By your own assumptions, getting you 80% of the way there in 10% of the time would save you 18% of the overall time, if the first 80% typically takes 20% of the time. 18% time reduction in a given task is still an incredibly massive optimization that's easily worth $200/month for a professional.
160 hours a month * $100/hr programmer * 9% = $1400 savings, easily enough to justify $200/month.
Even if 1/10th of the time it fails, that is still ~8% or $1200 savings.
For tasks where bullshitting or regurgitating common idioms is key, it works rather well and indeed takes you 80% or even close to 100% of the way there. For tasks that require technical precision and genuine originality, it’s hopeless.
So far, given my range of projects, I have seen it struggle with lower level mobile stuff and hardware (ESP32 + BLE + HID).
For things like web (front/back), DB, video games (web and Unity), it does work pretty well (at least 80% there on average).
And I'm talking of the free version, not this $200/mo one.
People around here feel seriously threatened by ML models. It makes no sense, but then, neither does defending the Luddites, and people around here do that, too.
I'm talking of the generally available one, haven't had the chance to try this new version.
Sometimes it does what you want it to do, but still creates a bug.
Asked the AI to write some code to get a list of all objects in an S3 bucket. It wrote some code that worked, but it did not address the fact that S3 delivers objects in pages of max 1000 items, so if the bucket contained less than 1000 objects (typical when first starting a project), things worked, but if the bucket contained more than 1000 objects (easy to do on S3 in a short amount of time), then that would be a subtle but important bug.
Someone not already intimately familiar with the inner workings of S3 APIs would not have caught this. It's anyone's guess if it would be caught in a code review, if a code review is even done.
I don't ask the AI to do anything complicated at all, the most I trust it with is writing console.log statements, which it is pretty good at predicting, but still not perfect.
I use LLMs maybe a few times a month but I don’t really follow this argument against them.
It would be pretty easy for most code reviewers to miss this type of bug in a code review, because they aren't always looking for that kind of bug, they aren't always looking at the AWS documentation while reviewing the code.
Yes, people could also make the same error, but at least they have a chance at understanding the documentation and limits where the LLM has no such ability to reason and understand consequences.
There seems to be two camps: People who want nothing to do with such flawed interns - and people who are trying to figure out how to amplify and utilize the positive aspects of such flawed, yet powerful interns. I'm choosing to be in the latter camp.
The only point I wanted to make was that an LLM's ability and propensity to generate plausible falsehoods should, in my opinion, elicit a much deeper sense of distrust than one feels for an intern, enough so that comparing the two feels a little dangerous. I don't trust an intern to be right about everything, but I trust them to be self aware, and I don't feel like I have to take a magnifying glass to every tidbit of information they provide.
If an intern set their Slack status to "There's no guarantee that what I say will be accurate, engage with me at your own risk." That wouldn't excuse their attempts to answer every question as if they wrote the book on the subject.
New echnology allows those signs to be counterfeited quickly and cheaply, and it tricks our subconscious despite our best efforts to be hyper-vigilant. (Our brains don't want to do that, it's expensive.)
Perhaps a stopgap might be to make the LLM say everything in a hostile villainous way...
If it's easier mentally, just put that second sentence in from of every chatgpt answer.
Yeah the Junior dev gets better, but then you hire another one that makes the same mistakes, so in reality, on an absolute basis, the junior dev never gets any better.
Which is a very valuable lesson, worth more than $200
It just seems to me that you really need to know the answer before you ask it to be over 90% confident in the answer. And the more convincing sounding these things get the more difficult it is to know whether you have a plausible but wrong answer (aka "hallucination") vs a correct one.
If you have a need for a lot of difficult to come up with but easy to verify answers it could be worth it. But the difficult to come up with answers (eg novel research) are also where LLMs do the worst.
The problem is whether it can saves you $180/mo more than Claude does.
Now I'll forever be using a second rate model because I'm not rich enough.
If I'm stuck using a second rate model I may go find someone else's model to use.
I love this back-to-back pair of statements. It is like “You can never win three card monte. I pay a monthly subscription fee to play it.”
I think at least LLMs are more receptive to the idea that they may be wrong, and based on that, we can have N diverse LLMs and they may argue more peacefully and build a reliable consensus than N "intelligent" people.
Even if it was a consensus opinion among all HN users, which hardly seems to be the case, it would have little impact on the other billion plus potential customers…
People also pull this figure out of their ass, over or undertrust themselves, and lie. I'm not sure self-reported confidence is that interesting compared to "showing your work".
A LLM will just pretend to care about the error and happily repeats the error over and over.
I think there's a huge difference because individuals can be reasoned with, convinced they're wrong, and have the ability to verify they're wrong and change their position. If I can convince one person they're wrong about something, they convince others. It has an exponential effect and it's a good way of eliminating common errors.
I don't understand how LLMs will do that. If everyone stops learning and starts relying on LLMs to tell them how to do everything, who will discover the mistakes?
Here's a specific example. I'll pick on LinuxServer since they're big [1], but almost every 'docker-compose.yml' stack you see online will have a database service defined like this:
services:
app:
# ...
environment:
- 'DB_HOST=mysql:3306'
# ...
mariadb:
image: linuxserver/mariadb
container_name: mariadb
environment:
- PUID=1000
- PGID=1000
- MYSQL_ROOT_PASSWORD=ROOT_ACCESS_PASSWORD
- TZ=Europe/London
volumes:
- /home/user/appdata/mariadb:/config
ports:
- 3306:3306
restart: unless-stopped
Assuming the database is dedicated to that app, and it typically is, publishing port 3306 for the database isn't necessary and is a bad practice because it unnecessarily exposes it to your entire local network. You don't need to publish it because it's already accessible to other containers in the same stack.Another Docker related example would be a Dockerfile using 'apt[-get]' without the '--error-on=any' switch. Pay attention to Docker build files and you'll realize almost no one uses that switch. Failing to do so allows silent failures of the 'update' command and it's possible to build containers with stale package versions if you have a transient error that affects the 'update' command, but succeeds on a subsequent 'install' command.
There are tons of misunderstandings like that which end up being so common that no one realizes they're doing things wrong. For people, I can do something as simple as posting on HN and others can see my suggestion, verify it's correct, and repeat the solution. Eventually, the misconception is corrected and those paying attention know to ignore the mistakes in all of the old internet posts that will never be updated.
How do you convince ChatGPT the above is correct and that it's a million posts on the internet that are wrong?
1. https://docs.linuxserver.io/general/docker-compose/#multiple...
## Restrict Host Ports for Security
If app and mariadb are only communicating internally, you can remove 3306:3306 to avoid exposing the port to the host machine:
```yaml ports: - 3306:3306 # Remove this unless external access is required. ```
So, apparently, ChatGPT doesn't need any more convincing.
I don't understand how it gets there though. How does it "know" that's the right thing to suggest when the majority of the online documentation all gets it wrong?
I know how I do it. I read the Docker docs, I see that I don't think publishing that port is needed, I spin up a test, and I verify my theory. AFAIK, ChatGPT isn't testing to verify assumptions like that, so I wonder how it determines correct from incorrect.
Try asking an LLM complex questions in your area of expertise. Interview it as if you needed to be confident that it could do your job. You'll quickly find out that it can't do your job, and isn't actually capable of reasoning.
bit of a stretch.
That said, my "issue" might be that I usually work alone and I don't have anyone to consult with. I can bother people on forums, but these days forums are pretty much dead and full of trolls, so it's not very useful. ChatGPT was that thing that allows me to progress in this environment. If you work in Google and can ask Rob Pike about something, probably you don't need ChatGPT as much.
Is it possible that they have subsidized the infrastructure for free and paid users and they realized that OpenAI requires a higher revenue to maintain the current demand?
Losing $5-10b per year also is insane. People are still looking for the added value, it's been 2 whole years now
I think the ship has sailed on whether GPT is useful or a con; I've lost track of people telling me it's their first search now rather than Google.
I'd encourage skeptics who haven't read this yet to check out Nicholas' post here:
It's the cost of a new, shiny, Apple laptop every year.
"Barely good enough to replace interns" is worth a lot to businesses already.
(On that note, a founder of a SAP competitor and a major IT corporation in Poland is fond of saying that "any specialist can be replaced by a finite number of interns". We'll soon get to see how true that is.)
Since when does SAP have competitors? ;-P
A friend of mine claims most research is nowadays done by undergraduates because all senior folks are too busy.
$200 seems pretty cheap for a 24/7 [remote] intern with these abilities. That kind of money doesn't even buy a month's worth of Big Macs to feed that intern with.
It just seems like a lot (or even absurd) for a subscription to a service on teh Interweb, akin to "$200 for access to a web site? lolwut?"
Or each user doing an o1 model prompt is probably like, really expensive and they need to charge for it until they can get cost down? Anybody have estimates on what a single request into o1 costs on their end? Like GPU, memory, all the "thought" tokens?
The truth is that there are people who value the marginal performance -- if you think it's insane, clearly it's not for you.
Those people want to purchase status. Unless they ship you a fancy bow tie and a wine tasting at a wood cabin with your chatgpt subscription this isn't gonna last long.
This isn't about marginal performance, it's an increasingly desperate attempt to justify their spending in a market that's increasingly commodified and open sourced. Gotta convince Microsoft somehow to keep the lights on if you blew tens of billions to be the first guy to make a service that 20 different companies are soon gonna sell for pennies.
Interesting to compare this $200 pricing with the recent launch of Amazon Nova, which has not-equivalent-but-impressive performance for 1/10th the cost per million tokens. (Or perhaps OpenAI "shipmas" will include a competing product in the next few days, hence Amazon released early?)
A fun question I tried a couple of times is asking it to give me a list with famous talks about a topic. Or a list of famous software engineers and the topics they work on.
A couple of names typically exist but many names and basically all talks are shamelessly made up.
This hallucination is usually a very good choice to name the option / API.
I wish it could just say "There is not a good approximation of this API existing - I would suggest reviewing the following docs/sources:....".
I certainly don’t see why mere prediction can’t validate reasoning. Sure, it can’t do it perfectly all the time, but neither can people.
Have you been introduced to their CEO yet? 5 minutes of Worldcoin research should assuage your curiosity.
Forgive me for not finding your argument persuasive.
As a customer, I don’t care about the people. I’m not interested in either argument by authority (if Altman says it’s good it must be good) or ad hominem (that Altman guy is a jerk, nothing he does can have value).
The actual product. Have you tried it? With an open mind?
To be honest, it doesn't matter what the price of producing AI is, though. $200/month is, and will be a stupid price to pay because OpenAI already invented a price point with a half billion users - free. When they charged $10/month, at least they weren't taking advantage of the mentally ill. This... this is a grift, and a textbook one at that.
You don’t sound like you’re very familiar with the chatgpt product. They have about 10m customers paying $20/month. I’m one of them, and I honestly get way more than $200/month value from it.
Perhaps I’m “mentally ill”, but I’d ask you to do some introspection and see if leaping to that characterization is really the best way to explain people who get value where you see none.
Such a silly conclusion to draw based on a gut feeling, and to see all comments piggyback on it like it's a given feels like I'm going crazy. How can you all be so certain?
What I doubt though is that it can reach a mass market even in business. A good large high resolution screen is something that I absolutely consider to deliver the value it costs. Most businesses don’t think their employees deserve a 2k screen which will last for 6-10 years and thus costs just a fraction of this offering.
Apparently the majority of businesses don’t believe in marginal gains
I'll pay $200 a month, no problem; right now o1-preview does the work for me of a ... somewhat distracted graduate student who needs checking, all for under $1 / day. It's slow for an LLM, but SUPER FAST for a grad student. If I can get a more rarely distracted graduate student that's better at coding for $7/day, well, that's worth a try. I can always cancel.
My intent was to say "you seem like a smart person, but you seem to have a blind spot here, might benefit you to stay more open minded."
[1] https://www.investopedia.com/terms/p/price_discrimination.as...
It makes me wonder why they don't want to offer a usage based pricing model.
Is it because people really believe it makes a much worse product offering?
Why not offer some of the same capability as pay-per-use?
Yeah, not really fixed: https://imgur.com/a/counting-letters-with-chatgpt-7cQAbu0
So, AI market is capped by Starbucks revenue/valuation.
In Italy, an espresso is ca. 1€.
Just to put that into perspective.
I also really don't find comparisons like this to be that useful. Any subscription can be converted into an exchange rate of coffee, or meals. So what?
In other words, do you have proof that this medium of information output is doomed to forever be useless in producing information that adds value to the world?
These are of course rhetorical questions that you nor anyone else can answer today, but you seem to have a weird sort of absolute position on this matter, as if a lot depended on your sentiment being correct.
Great, we can throw even more compute and waste even more resources and energy on brute forcing problems with dumb LLMs... Anything to keep the illusion that this hasn't plateaued x)
EDIT: Correction. It now started to show the upgrade offer but when I try it comes back with "There was a problem updating your subscription". Anyone else seeing this?
You know? Nestle throws a bit of cash towards OpenAPI and all of a sudden the LLM is unable to discuss the controversies they've been involved in. Just pretends they never happened or spins the response in a way to make it positive.
"I recommend going to the Nestle chocolate house, a guided tour by LeGuide (click here for a free coupon) and the exclusive tour at the Louvre by BonGuide. (Note: this response may contain paid advertisements. Click here for more)"
"ChatGPT, my pc is acting up, I think it's a hardware problem, how can I troubleshoot and fix it?"
"Fixing electronics is to be done by professionals. Send your hardware today to ElectronicsUSA with free shipping and have your hardware fixed in up to 3 days. Click here for an exclusive discount. If the issue is urgent, otherwise Amazon offers an exclusive discount on PCs (click here for a free coupon). (Note: this response may contain paid advertisements. Click here for more)"
Please no. I'd rather self host, or we should start treating those things like utilities and regulate them if they go that way.
- I asked perplexity how to do something in terraform once. It hallucinated the entire thing and when I asked where it sourced it from it scolded me, saying that asking for a source is used as a diversionary tactic - as if it was trained on discussions on reddit's most controversial subs. So I told it...it just invented code on the spot, surely it got it from somewhere? Why so combative? Its response was "there is no source, this is just how I imagined it would work."
- Later I asked how to bypass a particular linter rule because I couldn't reasonably rewrite half of my stack to satisfy it in one PR. Perplexity assumed the role of a chronically online stack overflow contributor and refused to answer until I said "I don't care about the security, I just want to know if I can do it."
Not so much related to ads but the models are already designed to push back on requests they don't immediately like, and they already completely fabricate responses to try and satisfy the user.
God forbid you don't have the experience or intuition to tell when something is wrong when it's delivered with full-throated confidence.
try to get chatgpt web search to return you a new york times link
nyt doesnt exist to openai
Even their existing subscription is a hard sell if only because the value proposition changes so radically and rapidly, in terms of the difference between free and paid services.
THEY DONATED $200x10 TO A MEDICAL PROJECT? zomg. faint. sizzle.
Make 1000 grants. Make 10,000. 10? Seriously?
Might not be dropshipped through Temu, but you're going to end up with the same $1 hat.
Hopefully they’ll spend some resource on making it work on mobile.
The problem is OpenAIs HTML.
* Will this be the start of enshittification of the base ChatGPT offering?
* There may also be some complementary products announced this month that make the $200 worth it
* Is this the start of a bigger industry trend of prices more closely aligning to the underlying costs of running the model? I suspect a lot of the big players have been running their inference infrastructure at a loss.
there are many who wouldn't bat an eye at $1k / month that guarantees most powerful AI (even if it's just 0.01% better than competition), and no limits on anything.
y'all are greatly underestimating the value of that feeling of (best + limitlessness). high performers make decisions very differently than the average HN user.
$200/mo is enough to make decision makers feel powerful and remain a little bit lenient on widdle 'ol ChatGPT
I've switched to using a selfhosted interface and APIs.
The effective cost per token on monthly plans is frankly absurd.
On an individual level for solo devs in a developing nation USD200 a month is an enormous amount of money.
For someone in a developed nation, this is just over a coffee a day.
If some jobs do easily get automated away the only way that can be remidied is government intervention on upskilling(if you are in europe you could even get some support), if you are in the US or most developing capitalist(or monopolistic/rent etc) economies its just your bad luck, those jobs WILL be gone or reduced.
Huh? For how many seats? Does this mean an entire organization can share one Pro account and get unlimited access to those models?
New clauge and gpt doe really well with scripts already. Not worth 200 a month lmao.
lmao even
oof, I love using o1 but I’m immediately priced out (I’m probably not the target audience either)
> provides a way for researchers, engineers, and other individuals who use research-grade intelligence
I’d love to see some examples of the workflows of these users
It also seems that technology is progressing along a path:
loose collection of tools > organized system of cells > one with a nervous system
And although most people don't think ChatGPT is intelligent on its own, that's missing the point: the combination of us with ChatGPT is the nervous system, and we are becoming cells as globally, we no longer make significant decisions and only use our intelligence locally to advance technology.