Humans have worked out the amplitudes for integer n up to n = 6 by hand, obtaining very complicated expressions, which correspond to a “Feynman diagram expansion” whose complexity grows superexponentially in n. But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms. And from these base cases, no one was then able to spot a pattern and posit a formula valid for all n. GPT did that.
Basically, they used GPT to refactor a formula and then generalize it for all n. Then verified it themselves.
I think this was all already figured out in 1986 though: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.56... see also https://en.wikipedia.org/wiki/MHV_amplitudes
> I think this was all already figured out in 1986 though
They cite that paper in the third paragraph... Naively, the n-gluon scattering amplitude involves order n! terms. Famously, for the special case of MHV (maximally helicity violating) tree amplitudes, Parke and Taylor [11] gave a simple and beautiful, closed-form, single-term expression for all n.
It also seems to be a main talking point.I think this is a prime example of where it is easy to think something is solved when looking at things from a high level but making an erroneous conclusion due to lack of domain expertise. Classic "Reviewer 2" move. Though I'm not a domain expert and so if there was no novelty over Parke and Taylor I'm pretty sure this will get thrashed in review.
Sorry but I just have to point out how this field of maths read like Star Trek technobabble too me.
This result, by itself, does not generalize to open-ended problems, though, whether in business or in research in general. Discovering the specification to build is often the majority of the battle. LLMs aren't bad at this, per se, but they're nowhere near as reliably groundbreaking as they are on verifiable problems.
Can humans actually do that? Sometimes it appears as if we have made a completely new discovery. However, if you look more closely, you will find that many events and developments led up to this breakthrough, and that it is actually an improvement on something that already existed. We are always building on the shoulders of giants.
From my reading yes, but I think I am likely reading the statement differently than you are.
> from first principles
Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.
For an llm to follow a first principles strategy I would expect it to take in a body of research, come up with some first principles or guess at them, then iteratively construct and tower of reasonings/findings/experiments.
Constructing a solid tower is where things are currently improving for existing models in my mind, but when I try openai or anthropic chat interface neither do a good job for long, not independently at least.
Humans also often have a hard time with this in general it is not a skill that everyone has and I think you can be a successful scientist without ever heavily developing first principles problem solving.
You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.
> In 1902, Henri Poincaré published a collection of essays titled Science and Hypothesis, which included: detailed philosophical discussions on the relativity of space and time; the conventionality of distant simultaneity; the conjecture that a violation of the relativity principle can never be detected; the possible non-existence of the aether, together with some arguments supporting the aether; and many remarks on non-Euclidean vs. Euclidean geometry.
https://en.wikipedia.org/wiki/History_of_special_relativity
Now, if I had to pick a major idea that seemed to drop fully-formed from the mind of a genius with little precedent to have guided him, I might personally point to Galois theory (https://en.wikipedia.org/wiki/Galois_theory). (Ironically, though, I'm not as familiar with the mathematical history of that time and I may be totally wrong!)
As for general relativity, he spent several years working to learn differential geometry (which was well developed mathematics at the time, but looked like abstract nonsense to most physicists). I’m not sure how he was turned on to this theory being applicable to gravity, but my guess is that it was motivated by some symmetry ideas. (It always come down to symmetry.)
> Critique of absolute time and space of Newtonian physics was already well underway
This only means Einstein was not alone, it does not mean the results were in distribution. > Many of the phenomena that relativity would later explain under a consistent framework already had independent quasi-explanations hinting at the more universal theory.
And this comes about because people are looking at edge cases and trying to solve things. Sometimes people come up with wild and crazy solutions. Sometimes those solutions look obvious after they're known (though not prior to being known, otherwise it would have already been known...) and others don't.Your argument really makes the claim that since there are others pursuing similar directions that this means it is in distribution. I'll use a classic statistics style framing. Suppose we have a bag with n red balls and p blue balls. Someone walks over and says "look, I have a green ball" and someone else walks over and says "I have a purple one" and someone else comes over and says "I have a pink one!". None of those balls were from the bag we have. There are still n+p balls in our bag, they are still all red or blue despite there being n+p+3 balls that we know of.
> I am not a [...] physicist
I think this is probably why you don't have the resolution to see the distinctions. Without a formal study of physics it is really hard to differentiate these kinds of propositions. It can be very hard even with that education. So be careful to not overly abstract and simplify concepts. It'll only deprive you of a lot of beauty and innovation.> The quintic was almost proven to have no general solutions by radicals by Paolo Ruffini in 1799, whose key insight was to use permutation groups, not just a single permutation.
Thing is, I am usually the kind of person who defends the idea of a lone genius. But I also believe there is a continuous spectrum, no gaps, from the village idiot to Einstein and beyond.
Let me introduce, just for fun, not for the sake of any argument, another idea from math which I think it came really out of the blue, to the degree that it's still considered an open problem to write an exposition about it, since you cannot smoothly link it to anything else: forcing.
I'm not sure about GR, but I know that it is built on the foundations of differential geometry, which Einstein definitely didn't invent (I think that's the source of his "I assure you whatever your difficulties in mathematics are, that mine are much greater" quote because he was struggling to understand Hilbert's math).
And really Cauchy, Hilbert, and those kinds of mathematicians I'd put above Einstein in building entirely new worlds of mathematics...
"Since the mathematicians have invaded the theory of relativity, I do not understand it myself anymore."
:)
The process you’re describing is humans extending our collective distribution through a series of smaller steps. That’s what the “shoulders of giants” means. The result is we are able to do things further and further outside the initial distribution.
So it depends on if you’re comparing individual steps or just the starting/ending distributions.
> Can humans actually do that?
YesSeriously, think about it for a second...
If that were true then science should have accelerated a lot faster. Science would have happened differently and researchers would have optimized to trying to ingest as many papers as they can.
Dig deep into things and you'll find that there are often leaps of faith that need to be made. Guesses, hunches, and outright conjectures. Remember, there are paradigm shifts that happen. There are plenty of things in physics (including classical) that cannot be determined from observation alone. Or more accurately, cannot be differentiated from alternative hypotheses through observation alone.
I think the problem is when teaching science we generally teach it very linearly. As if things easily follow. But in reality there is generally constant iterative improvements but they more look like a plateau, then there are these leaps. They happen for a variety of reasons but no paradigm shift would be contentious if it was obvious and clearly in distribution. It would always be met with the same response that typical iterative improvements are met with "well that's obvious, is this even novel enough to be published? Everybody already knew this" (hell, look at the response to the top comment and my reply... that's classic "Reviewer #2" behavior). If it was always in distribution progress would be nearly frictionless. Again, with history in how we teach science we make an error in teaching things like Galileo, as if The Church was the only opposition. There were many scientists that objected, and on reasonable grounds. It is also a problem we continually make in how we view the world. If you're sticking with "it works" you'll end up with a geocentric model rather than a heliocentric model. It is true that the geocentric model had limits but so did the original heliocentric model and that's the reason it took time to be adopted.
By viewing things at too high of a level we often fool ourselves. While I'm criticizing how we teach I'll also admit it is a tough thing to balance. It is difficult to get nuanced and in teaching we must be time effective and cover a lot of material. But I think it is important to teach the history of science so that people better understand how it actually evolves and how discoveries were actually made. Without that it is hard to learn how to actually do those things yourself, and this is a frequent problem faced by many who enter PhD programs (and beyond).
> We are always building on the shoulders of giants.
And it still is. You can still lean on others while presenting things that are highly novel. These are not in disagreement.It's probably worth reading The Unreasonable Effectiveness of Mathematics in the Natural Sciences. It might seem obvious now but read carefully. If you truly think it is obvious that you can sit in a room armed with only pen and paper and make accurate predictions about the world, you have fooled yourself. You have not questioned why this is true. You have not questioned when this actually became true. You have not questioned how this could be true.
https://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableE...
You are greater than the sum of your partsProbably not something that the average GI Joe would be able to prompt their way to...
I am skeptical until they show the chat log leading up to the conjecture and proof.
Was the initial conjecture based on leading info from the other authors or was it simply the authors presenting all information and asking for a conjecture?
Did the authors know that there was a simpler means of expressing the conjecture and lead GPT to its conclusion, or did it spontaneously do so on its own after seeing the hand-written expressions.
These aren't my personal views, but there is some handwaving about the process in such a way that reads as if this was all spontaneous involvement on GPTs end.
But regardless, a result is a result so I'm content with it.
SpaceX can use an optimization algorithm to hoverslam a rocket booster, but the optimization algorithm didn't really figure it out on its own.
The optimization algorithm was used by human experts to solve the problem.
Is this so different?
I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.
>I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.
Sure
Do you really want to be treated like an old PC (dismembered, stripped for parts, and discarded) when your boss is done with you (i.e. not treated specially compared to a computer system)?
But I think if you want a fuller answer, you've got a lot of reading to do. It's not like you're the first person in the world to ask that question.
Not an uncommon belief.
Here you are saying you personally value a computer program more than people
It exposes a value that you personally hold and that's it
That is separate from the material reality that all this AI stuff is ultimately just computer software... It's an epistemological tautology in the same way that say, a plane, car and refrigerator are all just machines - they can break, need maintenance, take expertise, can be dangerous...
LLMs haven't broken the categorical constraints - you've just been primed to think such a thing is supposed to be different through movies and entertainment.
I hate to tell you but most movie AIs are just allegories for institutional power. They're narrative devices about how callous and indifferent power structures are to our underlying shared humanity
(In the hands of leading experts.)
What's the distinction between "first principles" and "existing things"?
I'm sympathetic to the idea that LLMs can't produce path-breaking results, but I think that's true only for a strict definition of path-breaking (that is quite rare for humnans too).
Five years ago we were at Stage 1 with LLMs with regard to knowledge work. A few years later we hit Stage 2. We are currently somewhere between Stage 2 and Stage 3 for an extremely high percentage of knowledge work. Stage 4 will come, and I would wager it's sooner rather than later.
People have been downplaying LLMs since the first buzzword garbage scientific paper made its way past peer review and into publication. And yet they keep getting better and better to the point where people are quite literally building projects with shockingly little human supervision.
By all means, keep betting against them.
IOW respect the trend line.
And the same practitioners said right after deep blue that go is NEVER gonna happen. Too large. The search space is just not computable. We'll never do it. And yeeeet...
I can claim some knowledge of physics from my degree, typically the easy part is coming up with complex dirty equations that work under special conditions, the hard part is the simplification into something elegant, 'natural' and general.
Also "LLM’s can make new things when they are some linear combination of existing things"
Doesn't really mean much, what is a linear combination of things you first have to define precisely what a thing is?
(This is deep)
[0]: https://slatestarcodex.com/2019/02/19/gpt-2-as-step-toward-g...
We're talking about significant contributions to theoretical physics. You can nitpick but honestly go back to your expectations 4 years ago and think — would I be pretty surprised and impressed if an AI could do this? The answer is obviously yes, I don't really care whether you have a selective memory of that time.
One way I gauge the significance of a theory paper are the measured quantities and physical processes it would contribute to. I see none discussed here which should tell you how deep into math it is. I personally would not have stopped to read it on my arxiv catch-up
https://arxiv.org/list/hep-th/new
Maybe to characterize it better, physicists were not holding their breath waiting for this to get done.
Whoever wrote the prompts and guided ChatGPT made significant contributions to theoretical physics. ChatGPT is just a tool they used to get there. I'm sure AI-bloviators and pelican bike-enjoyers are all quite impressed, but the humans should be getting the research credit for using their tools correctly. Let's not pretend the calculator doing its job as a calculator at the behest of the researcher is actually a researcher as well.
How much precedence is there for machines or tools getting an author credit in research? Genuine question, I don't actually know. Would we give an author credit to e.g. a chimpanzee if it happened to circle the right page of a text book while working with researchers, leading them to a eureka moment?
For a datum of one, the mathematician Doron Zeilberger give credit to his computer Shalosh B. Ekhad on select papers.
https://medium.com/@miodragpetkovic_24196/the-computer-a-mys...
https://sites.math.rutgers.edu/~zeilberg/akherim/EkhadCredit...
That usually comes up with some support usually.
Well what do you think ? Do the authors (or a single symbolic one) of pytorch or numpy or insert <very useful software> typically get credits on papers that utilize them heavily? Well Clearly these prominent institutions thought GPT's contribution significant enough to warrant an Open AI credit.
>Would we give an author credit to e.g. a chimpanzee if it happened to circle the right page of a text book while working with researchers, leading them to a eureka moment?
Cool Story. Good thing that's not what happened so maybe we can do away with all these pointless non sequiturs yeah ? If you want to have a good faith argument, you're welcome to it, but if you're going to go on these nonsensical tangents, it's best we end this here.
I don't know! That's why I asked.
> Well Clearly these prominent institutions thought GPT's contribution significant enough to warrant an Open AI credit.
Contribution is a fitting word, I think, and well chosen. I'm sure OpenAI's contribution was quite large, quite green and quite full of Benjamins.
> Cool Story. Good thing that's not what happened so maybe we can do away with all these pointless non sequiturs yeah ? If you want to have a good faith argument, you're welcome to it, but if you're going to go on these nonsensical tangents, it's best we end this here.
It was a genuine question. What's the difference between a chimpanzee and a computer? Neither are humans and neither should be credited as authors on a research paper, unless the institution receives a fat stack of cash I guess. But alas Jane Goodall wasn't exactly flush with money and sycophants in the way OpenAI currently is.
If you don't read enough papers to immediately realize it is an extremely rare occurrence then what are you even doing? Why are you making comments like you have the slightest clue of what you're talking about? including insinuating the credit was what...the result of bribery?
You clearly have no idea what you're talking about. You've decided to accuse prominent researchers of essentially academic fraud with no proof because you got butthurt about a credit. You think your opinion on what should and shouldn't get credited matters ? Okay
I've wasted enough time talking to you. Good Day.
I have no problem with the former and agree that authors/researchers must note when they use AI in their research.
for this particular paper it seems the humans were stuck, and only AI thinking unblocked them
In your eyes maybe there's no difference. In my eyes, big difference. Tools are not people, let's not further the myth of AGI or the silly marketing trend of anthropomorphizing LLMs.
― C.S. Lewis, The Last Battle
— Carl Sagan
I have no real way to demonstrate that I'm telling the truth, but I am ¯\_(ツ)_/¯
If all ideas are recombinations of old ideas, where did the first ideas come from? And wouldn't the complexity of ideas be thus limited to the combined complexity of the "seed" ideas?
I think it's more fair to say that recombining ideas is an efficient way to quickly explore a very complex, hyperdimensional space. In some cases that's enough to land on new, useful ideas, but not always. A) the new, useful idea might be _near_ the area you land on, but not exactly at. B) there are whole classes of new, useful ideas that cannot be reached by any combination of existing "idea vectors".
Therefore there is still the necessity to explore the space manually, even if you're using these idea vectors to give you starting points to explore from.
All this to say: Every new thing is a combination of existing things + sweat and tears.
The question everyone has is, are current LLMs capable of the latter component. Historically the answer is _no_, because they had no real capacity to iterate. Without iteration you cannot explore. But now that they can reliably iterate, and to some extent plan their iterations, we are starting to see their first meaningful, fledgling attempts at the "sweat and tears" part of building new ideas.
But I’ve successfully made it build me a great Poker training app, a specific form that also didn’t exist, but the ingredients are well represented on the internet.
And I’m not trying to imply AI is inherently incapable, it’s just an empirical (and anecdotal) observation for me. Maybe tomorrow it’ll figure it out. I have no dogmatic ideology on the matter.
I heard this from people who know more than me
For some extra context, pre-training is ~1/3 of the training, where it gains the basic concepts of how tokens go together. Mid & late training are where you instill the kinds of anthropic behaviors we see today. I expect pre-training to increasingly become a lower percentage of overall training, putting aside any shifts of what happens in each phase.
So to me, it is plausible they can take the 4.x pre-training and keep pushing in the later phases. There is a lot of results out there to show scaling laws (limits) have not peaked yet. I would not be surprised to learn that Gemini 3 Deep Research had 50% late-training / RL
I am generally very skeptical about work on this level of abstraction. only after choosing Klein signature instead of physical spacetime, complexifying momenta, restricting to a "half-collinear" regime that doesn't exist in our universe, and picking a specific kinematic sub-region. Then they check the result against internal consistency conditions of the same mathematical system. This pattern should worry anyone familiar with the replication crisis. The conditions this field operates under are a near-perfect match for what psychology has identified as maximising systematic overconfidence: extreme researcher degrees of freedom (choose your signature, regime, helicity, ordering until something simplifies), no external feedback loop (the specific regimes studied have no experimental counterpart), survivorship bias (ugly results don't get published, so the field builds a narrative of "hidden simplicity" from the survivors), and tiny expert communities where fewer than a dozen people worldwide can fully verify any given result.
The standard defence is that the underlying theory — Yang-Mills / QCD — is experimentally verified to extraordinary precision. True. But the leap from "this theory matches collider data" to "therefore this formula in an unphysical signature reveals deep truth about nature" has several unsupported steps that the field tends to hand-wave past.
Compare to evolution: fossils, genetics, biogeography, embryology, molecular clocks, observed speciation — independent lines of evidence from different fields, different centuries, different methods, all converging. That's what robust external validation looks like. "Our formula satisfies the soft theorem" is not that.
This isn't a claim that the math is wrong. It's a claim that the epistemic conditions are exactly the ones where humans fool themselves most reliably, and that the field's confidence in the physical significance of these results outstrips the available evidence.
I wrote up a more detailed critique in a substack: https://jonnordland.substack.com/p/the-psychologists-case-ag...
When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.
CEOs/decision makers would rather give all their labour budget to tokens if they could just to validate this belief. They are bitter that anyone from a lower class could hold any bargaining chips, and thus any influence over them. It has nothing to do with saving money, they would gladly pay the exact same engineering budget to Anthropic for tokens (just like the ruling class in times past would gladly pay for slaves) if it can patch that bitterness they have for the working class's influence over them.
The inference companies (who are also from this same class of people) know this, and are exploiting this desire. They know if they create the idea that AI progress is at an unstoppable velocity decision makers will begin handing them their engineering budgets. These things don't even have to work well, they just need to be perceived as effective, or soon to be for decision makers to start laying people off.
I suspect this is going to backfire on them in one of two ways.
1. French Revolution V2, they all get their heads cutoff in 15 years, or an early retirement on a concrete floor.
2. Many decisions makers will make fools of themselves, destroy their businesses and come begging to the working class for our labor, giving the working class more bargaining chips in the process.
Either outcome is going to be painful for everyone, lets hope people wake up before we push this dumb experiment too far.
The reality is: "GPT 5.2 found a more general and scalable form of an equation, after crunching for 12 hours supervised by 4 experts in the field".
Which is equivalent to taking some of the countless niche algorithms out there and have few experts in that algo have LLMs crunch tirelessly till they find a better formula. After same experts prompted it in the right direction and with the right feedback.
Interesting? Sure. Speaks highly of AI? Yes.
Does it suggest that AI is revolutionizing theoretical physics on its own like the title does? Nope.
Yet, if some student or child achieved the same – under equal supervision – we would call him the next Einstein.
One of my best friends in his bachelor thesis had solved a difficult mathematical problem in planet orbits or something, and it was just yet another random day in academia.
And she didn't solve it because she was a genius but because there's a bazillions such problems out there and little time to look at them and focus. Science is huge.
https://www.math.columbia.edu/~woit/wordpress/?p=15362
Let's wait a couple of days whether there has been a similar result in the literature.
This result reminded me of the C compiler case that Anthropic posted recently. Sure, agents wrote the code for hours but there was a human there giving them directions, scoping the problem, finding the test suites needed for the agentic loops to actually work etc etc. In general making sure the output actually works and that it's a story worth sharing with others.
The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding. It works great for creating impressions and building brand value but also does a disservice to the actual researchers, engineers and humans in general, who do the hard work of problem formulation, validation and at the end, solving the problem using another tool in their toolbox.
>[...]
>The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.
You're sort of acting like it's all or nothing. What about the the humans that used to be that "force multiplier" on a team with the person guiding the research?
If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.
For a more current example: do you think all the displaced Uber/Lyft drivers aren't going to think "AI took my job" just because there's a team of people in a building somewhere handling the occasional Waymo low confidence intervention, as opposed to being 100% autonomous?
It's also a legitimate concern. We happen to be in a place where humans are needed for that "last critical 10%," or the first critical 10% of problem formulation, and so humans are still crucial to the overall system, at least for most complex tasks.
But there's no logical reason that needs to be the case. Once it's not, humans will be replaced.
When the systems turn into something trivial to manage with the new tooling, humans build more complex or add more layers on the existing systems.
What I said in my original comment is that AI delivers when it's used by experts, in this case there was someone who was definitely not a C compiler expert, what would happen if there was a real expert doing this?
I worry we're not producing as many of those as we used to
https://github.com/teorth/erdosproblems/wiki/AI-contribution... may be useful
If I'm wrong, please let me know which previously unsolved problem was solved, I would be genuinely curious to see an example of that.
So I would read this (with more information available) with less emphasize on LLM discovering new result. The title is a little bit misleading but actually "derives" being the operative word here so it would be technically correct for people in the field.
Not saying they're lying, but I'm sure it's exaggerated in their own report.
Couldn't is an immensely high bar in this context, didn't seems more appropriate and renders this whole thing slightly less exciting.
Okay read it: Yep Induction. It already had the answer.
Don't get me wrong, I love Induction... but we aren't having any revolutions in understanding with Induction.
I expect lots of derivations (new discoveries whose pieces were already in place somewhere, but no one has put them together).
In this case, the human authors did the thinking and also used the LLM, but this could happen without the original human author too (some guy posts some partial on the internet, no one realizes is novel knowledge, gets reused by AI later). It would be tremendously nice if credit was kept in such possible scenarios.
Theoretical physics is throwing a lot of stuff at the wall and theory crafting to find anything that might stick a little. Generation might actually be good there, even generation that is "just" recombining existing ideas.
I trust physicists and mathematicians to mostly use tools because they provide benefit, rather than because they are in vogue. I assume they were approached by OpenAI for this, but glad they found a way to benefit from it. Physicists have a lot of experience teasing useful results out of probabilistic and half broken math machines.
If LLMs end up being solely tools for exploring some symbolic math, that's a real benefit. Wish it didn't involve destroying all progress on climate change, platforming truly evil people, destroying our economy, exploiting already disadvantaged artists, destroying OSS communities, enabling yet another order of magnitude increase in spam profitability, destroying the personal computer market, stealing all our data, sucking the oxygen out of investing into real industry, and bold faced lies to all people about how these systems work.
Also, last I checked, MATLAB wasn't a trillion dollar business.
Interestingly, the OpenAI wrangler is last in the list of Authors and acknowledgements. That somewhat implies the physicists don't think it deserves much credit. They could be biased against LLMs like me.
When Victor Ninov (fraudulently) analyzed his team's accelerator data using an existing software suite to find a novel SuperHeavy element, he got first billing on the authors list. Probably he contributed to the theory and some practical work, but he alone was literate in the GOOSY data tool. Author lists are often a political game as well as credit, but Victor got top billing above people like his bosses, who were famous names. The guy who actually came up with the idea of how to create the element, in an innovative recipe that a lot of people doubted, was credited 8th
https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83...
New Honda Civic discovered Pacific Ocean!
New F150 discovers Utah Salt Flats!
Sure it took humans engineering and operating our machines, but the car is the real contributor here!