> We applied the AI co-scientist to assist with the prediction of drug repurposing opportunities and, with our partners, validated predictions through computational biology, expert clinician feedback, and in vitro experiments.
> Notably, the AI co-scientist proposed novel repurposing candidates for acute myeloid leukemia (AML). Subsequent experiments validated these proposals, confirming that the suggested drugs inhibit tumor viability at clinically relevant concentrations in multiple AML cell lines.
and,
> For this test, expert researchers instructed the AI co-scientist to explore a topic that had already been subject to novel discovery in their group, but had not yet been revealed in the public domain, namely, to explain how capsid-forming phage-inducible chromosomal islands (cf-PICIs) exist across multiple bacterial species. The AI co-scientist system independently proposed that cf-PICIs interact with diverse phage tails to expand their host range. This in silico discovery, which had been experimentally validated in the original novel laboratory experiments performed prior to use of the AI co-scientist system, are described in co-timed manuscripts (1, 2) with our collaborators at the Fleming Initiative and Imperial College London. This illustrates the value of the AI co-scientist system as an assistive technology, as it was able to leverage decades of research comprising all prior open access literature on this topic.
The model was able to come up with new scientific hypotheses that were tested to be correct in the lab, which is quite significant.
When google publishes papers, they tend to juice the results significance (google is not the only group that does this, but they are pretty egregious). You need to be skilled in the field of the paper to be able to pare away the exceptional claims. A really good example is https://spectrum.ieee.org/chip-design-controversy while I think Google did some interesting work there and it's true they included some of the results in their chip designs, their comparison claims are definitely over-hyped and they did not react well when they got called out on it.
https://github.com/google-research/circuit_training
https://deepmind.google/discover/blog/how-alphachip-transfor...
Yes, I know it's in TPUs and I said exactly that.
You simply can't take Google press at face value.
I think that's true for virtually every company and also for most people (in the context of published work)
(academic-burnout resembles creator-burnout for similar reasons)
Wishful mnemonics in the field was called out by Drew McDermott in the mid 1970's and it is still a problem today.
https://www.inf.ed.ac.uk/teaching/courses/irm/mcdermott.pdf
And:
> As a field, I believe that we tend to suffer from what might be called serial silver bulletism, defined as follows: the tendency to believe in a silver bullet for AI, coupled with the belief that previous beliefs about silver bullets were hopelessly naive.
(H. J. Levesque. On our best behaviour. Artificial Intelligence, 212:27–35, 2014.)
But yes, it's normally: "science paper says an experiment in mice shows promising results in cancer treatment" then "University PR says a new treatment for cancer is around the corner" and "Media says cure for all cancer"
On the contrary - their advantage. They know it and they can make outlandish claims that no one will disprove
Eating, drinking, sleeping apply to absolutely everyone. Deception varies greatly by person and situation. I know people who are painfully honest and people I don't trust on anything, and many in between.
That sounds good enough for a start, considering you can massively parallelize the AI co-scientist workflow, compared to the timescale and physical scale it would take to do the same thing with human high school sophomores.
And every now and then, you get something exciting and really beneficial coming from even inexperienced people, so if you can increase the frequency of that, that sounds good too.
Regulator is not only there to protect the public, it also protects VC from responsibility
Regulations around clinical trials represent the floor of what's ethically permissible, not the ceiling. As in, these guidelines represent the absolute bare minimum required when performing drug trials to prevent gross ethical violations. Not sure what corners you think are ripe for cutting there.
Disagree. The US FDA especially is overcautious to the point of doing more harm than good - they'd rather ban hundreds of lifesaving drugs than allow one thalidomide to slip through.
But off-label use is legal, so it's ok to use a drug that's safe but not proven effective (to the FDA's high standards) for that ailment... but only if it's been proven effective for some other random ailment. That makes no sense.
> What's your level of exposure to the pharma industry?
Just an interested outsider who read e.g. the Omegaven story on https://www.astralcodexten.com/p/adumbrations-of-aducanumab .
When I demo'd my scope (which is similar to a 3d printer, using low-cost steppers and other hobbyist-grade components) the CEO gave me feedback which was very educational. They couldn't build a system that used my style of components because a failure due to a component would bring the whole system down and require an expensive service call (along with expensive downtime for the user). Instead, their mech engineer would select extremely high quality components that had a very low probability of failure to minimize service calls and other expensive outages.
Unfortunately, the cost curve for reliability not pretty, to reduce mechanical failures to close to zero costs close to infinity dollars.
One of the reasons Google's book scanning was so scalable was their choice to build fairly simple, cheap, easy to maintain machines, and then build a lot of them, and train the scanning individuals to work with those machines quirks. Just like their clusters, they tolerate a much higher failure rate and build all sorts of engineering solutions where other groups would just buy 1 expensive device with a service contract.
Could they not make the scope easily replaceable by the user and just supply a couple of spares?
Just thinking of how cars are complex machines but a huge variety of parts could be replaced by someone willing to spend a couple of hours learning how.
Maybe the next startup idea is biochemistry as a service, centralised to a large lab facility with hundreds of each device, maintained by a dedicated team of on-site professionals.
We have a couple automation systems that are semi-custom - the robot can handle operation of highly specific, non-standard instruments that 99.9% of labs aren't running. Systems have to handle very accurate pipetting of small volumes (microliters), moving plates to different stations, heating, shaking, tracking barcodes, dispensing and racking fresh pipette tips, etc. Different protocols/experiments and workflows can require vastly different setups.
See something like:
[1] https://www.hamiltoncompany.com/automated-liquid-handling/pl...
[2] https://www.revvity.com/product/fontus-lh-standard-8-96-ruo-...
I've been interested in this kind of stuff watching it from afar and now I may need to buy / build a machine that does this kind of stuff for work.
It is trained on human prose; human prose is primarily a representation of ideas; it synthesizes ideas.
There are very few uses for a machine to create ideas. We have a wealth of ideas and people enjoy coming up with ideas. It’s a solution built for a problem that does not exist.
And that their generation of impressive high school sophomore ideas is faster, more reliable, communicated better, and can continue 24/7 (given matching collaboration), relative to their bio high school sophomore counterparts.
I don’t believe any natural high school sophomore as impressive on those terms, has ever existed. Not close.
We humans (I include myself) are awful at judging things or people accurately (in even a loose sense) across more than one or two dimensions.
This is especially true when the mix of ability across several dimensions is novel.
(I also think people under estimate the degree that we, as users and “commanders” of AI, bottleneck their potential. I don’t suggest they are ready to operate without us. But that our relative lack of energy, persistence & focus all limit what we get from them in those dimensions, hiding significant value.
We famously do this with each other, so not surprising. But worth keeping in mind when judging limits: whose limits are we really seeing.)
We all have our idiosyncratically distributed areas of high intuition, expertise and fluency.
None of us need apprentice level help there, except to delegate something routine.
Lower quality ideas there would just gum things up.
And then we all have vast areas of increasingly lesser familiarity.
I find, that the more we grow our strong areas, the more those areas benefit with as efficient contact as possible with as many more other areas as possible. In both trivial and deeper ways.
The better developer I am, in terms of development skill, tool span, novel problem recognition and solution vision, the more often and valuable I find quick AI tutelage on other topics, trivial or non-trivial.
If you know a bright high school student highly familiar with a domain that you are not, but have reason to think that area might be helpful, don’t you think instant access to talk things over with that high schooler would be valuable?
Instant non-trivial answers, perspective and suggestions? With your context and motivations taken into account?
Multiplied by a million bright high school students over a million domains.
—
We can project the capability vector of these models onto one dimension, like “school level idea quality”. But lower dimension projections are literally shadows of the whole.
It if we use them in the direction of their total ability vector (and given they can iterate, it is actually a compounding eigenvector!) and their value goes way beyond “a human high schooler with ideas”.
It does take time to get the most out of a differently calibrated tool.
I don't quite understand the argument here. The future hasn't happened yet. What does it mean to demonstrate the future developments now?
Or imagine this one - computer maps the whole world, suggests a route how to get to any destination?!
You just described a basic search engine.
LLM is kind of a search engine for language
A child learns how to eat solid food and how to walk. That a square peg fits into a square hole. This has nothing to do with language.
people who deaf and mute and cannot read can still reason and solve problems.
In this case they actually tested a drug probably because Google is paying for them to test whatever the AI came up with.
On the level of suggesting suitable alternative ingredients in fruit salad.
We should really stop insulting the intelligence of people to sell AI.
It's quite a natural next step to take to consider the tails and binding partners to them, so much so that it's probably what I would have done and I have a background of about 20 minutes in this particular area. If the co-scientist had hypothesised the novel mechanism to start with, then I would be impressed at the intelligence of it. I would bet that there were enough hints towards these next steps in the discussion sections of the referenced papers anyway.
What's a bit suspicious is in the Supplementary Information, around where the hypothesis is laid out, it says "In addition, our own preliminary data indicate that cf-PICI capsids can indeed interact with tails from multiple phage types, providing further impetus for this research direction." (Page 35). A bit weird that it uses "our own preliminary data".
I think potential of LLM based analysis is sky high given the amount of concurrent research happening and high context load required to understand the papers. However there is a lot of pressure to show how amazing AI is and we should be vigilant. So, my first thought was - could it be that training data / context / RAG having access to a file it should not have contaminated the result? This is indirect evidence that maybe something was leaked.
If it ends up being more the case that AI can help us discover new stuff, that's very optimistic.
Doing this like in this test, it's very tricky to rule out the hypothesis that the AI is just combining statements from the Discussion / Future Outlook sections of some previous work in the field.
Two problems with this:
1. AI systems hallucinate stuff. If it comes up with some statement, how will you know that it did not just hallucinate it?
2. Human researchers don't work just on their own knowledge, they can use a wide range of search engines. Do we have any examples of AI systems like these that produce results that a third-year grad student couldn't do with Google Scholar and similar instructions? Tests like in TFA should always be compared to that as a baseline.
> new science should be discoverable in fields where human knowledge breadth is the limiting factor
What are these fields? Can you give one example? And what do you mean by "new science"?
The way I see it, at best the AI could come up with a hypothesis that human researchers could subsequently test. Again, you risk that the hypothesis is hallucination and you waste a lot of time and money. And again, researchers can google shit and put facts together from different fields than their own. Why would the AI be able to find stuff the researchers can't find?
https://www.independent.co.uk/news/science/super-diamond-b26...
The automobile was a useful invention. I don't know if back then there was a lot of hype around how it can do anything a horse can do, but better. People might have complained about how it can't come to you when called, can't traverse stairs, or whatever.
It could do _one_ thing a horse could do better: Pull stuff on a straight surface. Doing just one thing better is evidently valuable.
I think AI is valuable from that perspective, you provide a good example there. I might well be disappointed if I would expect it to be better than humans at anything humans can do. It doesn't have to. But with wording like "co-scientist", I see where that comes from.
I would say the doubters were right, and the results are terrible.
We redesigned the world to suit the car, instead of fixing its shortcomings.
Navigating a car centric neighbourhood on foot is anywhere between depressing and dangerous.
I hope the same does not happen with AI. But I expect it will. Maybe in your daily life AI will create legal contracts there are thousands of pages long And you will need AI of your own to summarise them and process them.
As a prototype for a "robot scientist", Adam is able to perform independent
experiments to test hypotheses and interpret findings without human guidance,
removing some of the drudgery of laboratory experimentation.[11][12] Adam is
capable of:
* hypothesizing to explain observations
* devising experiments to test these hypotheses
* physically running the experiments using laboratory robotics
* interpreting the results from the experiments
* repeating the cycle as required[10][13][14][15][16]
While researching yeast-based functional genomics, Adam became the first
machine in history to have discovered new scientific knowledge independently of
its human creators.[5][17][18]
https://en.wikipedia.org/wiki/Robot_ScientistA lot of them have to do things on computers which has nothing to do with their expertise. Like coding a small tool for working their data, small tools crunching results, formatting text data, searching and finding the right materials.
A LLM which helps a scientist to code something in an hour instead of a week, makes this research A LOT faster.
And we know from another paper, that we have now so much data, you need to use systems to find the right information for you. The study estimated how much additionanl critical information a research paper missed.
[1]https://marginalrevolution.com/marginalrevolution/2025/02/dw... [2]https://x.com/dwarkesh_sp/status/1888164523984470055
I don't know his @ but I'm sure he is on here somewhere
Oh I don’t like that. I don’t like that at all.
As a person who is literally doing his PhD on AML by implementing molecular subtyping, and ex-vivo drug predictions. I find this super random.
I would truly suggest our pipeline instead of random drug repurposing :)
https://celvox.co/solutions/seAMLess
edit: Btw we’re looking for ways to fund/commercialize our pipeline. You could contact us through the site if you’re interested!
And drug repurposing is also used for conditions with no known molecular basis like autism. You’re not suggesting its usefulness is limited in those cases right?
However, this is still not how you treat a patient. There are standard practices in the clinic. Usually the first line treatment is induction chemo with hypomethylating agents (except elderly who might not be eligible for such a treatment). Otherwise the options are still very limited, the “best” drug in the field so far is a drug called Venetoclax, but more things are coming up such as immuno-therapy etc. It’s a very complex domain, so drug repurposing on an AML cell line is not a wow moment for me.
In other fields, when models are wrong, the discussion is around 'errors'. How large the errors are, their structural nature, possible bounds, and so forth. But when it's AI it's a 'hallucination'. Almost as if the thing is feeling a bit poorly and just needs to rest and take some fever-reducer before being correct again.
It bothers me. Probably more than it should, but it does.
In the Monte Carlo Tree Search part, the outcome distribution on leaves is informed by a neural network trained on data instead of a so-called playout. Sure, part of the algorithm does invoke a random() function, but by no means the result is akin to the flip of a coin.
There is indeed randomness in the process, but making it sound like a random walk is doing a disservice to nuance.
I feel many people are too ready to dismiss the results of LLMs as "random", and I'm afraid there is some element of seeing what one wants to see (i.e. believing LLMs are toys, because if they are not, we will lose our jobs).
On top of that you are looking at the results in cell-lines which might not reflect the true nature of what would happen in-vivo (a mouse model or a human).
So, there is domain specific knowledge, which one would like to take into account for decision-making. For me, I would trust a Molecular Tumor Board with hematologists, clinicians - and possibly computational biologists :) - over a language random tree search for treating my acute myeloid leukemia, but this is a personal choice.
You'd forsake an amazing future based on copes like the precautionary principle or worse yet, a belief that work is good and people must be forced into it.
The tears of butthurt scientists, or artists who are automated out of existence because they refused to leverage or use AI systems to enhance themselves will be delicious.
The only reason that these companies aren't infinitely better than what Aaron Swartz tried to do was that they haven't open accessed everything. Deepseek is pretty close (sans the exact dataset), and so is Mistral and apparently Meta?
Y'all talked real big about loving "actual" communism until it came for your intellectual property, now you all act like copyright trolls. Fuck that!
In any case, I don’t think I’m a Luddite. I use many ai tools in my research including for idea generation. So far i have not found it to be very useful. Moreover the things it could be useful for such as automated data pipeline generation it doesn’t do. I could imagine a series of agents where one designs pipelines and one fills in the codes, etc per node in the pipeline but so far I didn’t see anything like that. If you have some kind of constructive recommendations in that direction I’m happy to hear them.
Choosing a hypothesis to test is actually a hard problem, and one that a lot of humans do poorly, with significant impact on their subsequent career. From what I have seen as an outsider to academia, many of the people who choose good hypotheses for their dissertation describe it as having been lucky.
https://arxiv.org/abs/2407.11004
In essence, LLMs are quite good at writing the code to properly parse large amounts of unstructured text, rather than what a lot of people seem to be doing which is just shoveling data into an LLM's API and asking for transformations back.
That said, I requested early access.
This feels like hubris to me. The idea here isn't to assist you with menial tasks, the idea is to give you an AI generalist that might ne able to alert you to things outside of your field that may be related to your work. It's not going to reduce your workload, in fact, it'll probably increase it but the result should be better science.
I have a lot more faith in this use of LLMs than I do for it to do actual work. This would just guide you to speak with another expert in a different field and then you take it from there.
> In many fields, this presents a breadth and depth conundrum, since it is challenging to navigate the rapid growth in the rate of scientific publications while integrating insights from unfamiliar domains.
No, any scientist has hundreds of ideas they would like to test. It's just part of the job. The hard thing is to do the rigorous testing itself.
This. Rigorous testing is hard and it requires a high degree of intuition and intellectual humility. When I'm evaluating something as part of my resaerch, I'm constantly asking: "Am I asking the right questions?" "Am I looking at the right metrics?" "Are the results noisy, to what extent, and how much does it matter?" and "Am I introducing confounding effects?" It's really hard to do this at scale and quickly. It necessarily requires slow measured thought, which computers really can't help with.
That might be a good goal. It doesn't seem to be the goal of this project.
“A groundbreaking new study of over 1,000 scientists at a major U.S. materials science firm reveals a disturbing paradox: When paired with AI systems, top researchers become extraordinarily more productive – and extraordinarily less satisfied with their work. The numbers tell a stark story: AI assistance helped scientists discover 44% more materials and increased patent filings by 39%. But here's the twist: 82% of these same scientists reported feeling less fulfilled in their jobs.”
Quote from https://futureofbeinghuman.com/p/is-ai-poised-to-suck-the-so...
Referencing this study https://aidantr.github.io/files/AI_innovation.pdf
AI chat is a massive productivity enhancer, but, when coding via prompts, I'm not able to hit the super satisfying developer flow state that I get into via normal coding.
Copilot is less of a productivity boost, but also less of a flow state blocker.
These are scientists that have cultivated a particular workflow/work habits over years, even decades. To a significant extent, I'm sure their workflow is shaped by what they find fulfilling.
That they report less fulfillment when tasked with working under a new methodology, especially one that they feel little to no mastery over, is not terribly surprising.
I only recently started using aider[1].
My experience with it can be described in 3 words.
Wow!
Oh wow!
It was amazing. I was writing a throwaway script for one time use (not for work). It wrote it for me in under 15 minutes (this includes my time getting familiar with the tool!) No bugs.
So I decided to see how far I could take it. I added command line arguments, logging, and a whole bunch of other things. After a full hour, I had a production ready script - complete with logs, etc. I had to debug code only once.
I may write high quality code for work, but for personal throwaway scripts, I'm sloppy. I would not put a command line parser, nor any logging. This did it all for me for very cheap!
There's no going back. For simple scripts like this, I will definitely use aider.
And yeah, there was definitely no satisfaction one would derive from coding. It was truly addictive. I want to use it more and more. And no matter how much I use it and like the results, it doesn't scratch my programmer's itch. It's nowhere near the fun/satisfaction of SW development.
They're very effective at making changes for the most part, but boy you need to keep them on a leash if you care about what those changes are.
I think I could accept an AI prompting me instead of the other way around. Something to ask you a checklist of problems and how you will address them.
I’d also love to have someone apply AI techniques to property based testing. The process of narrowing down from 2^32 inputs to six interesting ones works better if it’s faster.
For example in this Google essay they make the claim that CRISPR was a transdisciplinary endeavor, "which combined expertise ranging from microbiology to genetics to molecular biology" and this is the basis of their argument that an AI co-scientist will be better able to integrate multiple fields at once to generate novel and better hypothesis. For one, what they fail to understand as computer scientists (I suspect due to not being intimately familiar with biomedical research) is that microbio/genetics/mol bio are closer linked than you may expect as a lay person. There is no large leap between microbiology and genetics that would slow down someone like Doudna or even myself - I use techniques from multiple domains in my daily work. These all fall under the general broad domain of what I'll call "cellular/micro biology". As another example, Dario Amodei from Claude also wrote something similar in his essay Machines of Loving Grace that the limiting factor in biomedical is a lack of "talented, creative researchers" for which AI could fill the gap[1].
The problem with both of these ideas is that they misunderstand the rate-limiting factor in biomedical research. Which to them is a lack of good ideas. And this is very much not the case. Biologists have tons of good ideas. The rate limiting step is testing all these good ideas with sufficient rigor to either continue exploring that particular hypothesis or whether to abandon the project for something else. From my own work, the hypothesis driving my thesis I came up with over the course of a month or two. The actual amount of work prescribed by my thesis committee to fully explore whether or not it was correct? 3 years or so worth of work. Good ideas are cheap in this field.
Overall I think these views stem from field specific nuances that don't necessarily translate. I'm not a computer scientist, but I imagine that in computer science the rate limiting factor is not actually testing out hypothesis but generating good ones. It's not like the code you write will take multiple months to run before you get an answer to your question (maybe it will? I'm not educated enough about this to make a hard claim. In biology, it is very common for one experiment to take multiple months before you know the answer to your question or even if the experiment failed and you have to do it again). But happy to hear from a CS PhD or researcher about this.
All this being said I am a big fan of AI. I try and use ChatGPT all the time, I ask it research questions, ask it to search the literature and summarize findings, etc. I even used it literally yesterday to make a deep dive into a somewhat unfamiliar branch of developmental biology more easy (and I was very satisfied with the result). But for scientific design, hypothesis generation? At the moment, useless. AI and other LLMs at this point are a very powerful version of google and code writer. And it's not even correct 30% of the time to boot so you have to be extremely careful when using it. I do think that wasting less time exploring hypotheses that are incorrect or bad is a good thing. But the problem here is that we can pretty easily identify good and bad hypotheses already. We don't need AI for that, what takes time is the actual amount of testing of these hypotheses that slows down research. Oh and politics, which I doubt AI can magic away for us.
[1] https://darioamodei.com/machines-of-loving-grace#1-biology-a...
It's generally very easy to marginally move the needle in drug discovery. It's very hard to move the needle enough to justify the cost.
What is challenging is culling ideas, and having enough SNR in your readouts to really trust them.
Maybe this kind of AI-based exploration would lower the costs. The more something is automated, the cheaper it should be to test many concepts in parallel.
But no one is going to bring it to market because it costs millions and millions to synthesize, get through PK, ADMET, mouse, rat and dog tox, clinicals, etc. And the FDA won't approve marginal drugs, they need to be significantly better than the SoC (with some exceptions).
Point is, coming up with new ideas is cheap, easy, and doesn't need help. Synthesizing and testing is expensive and difficult.
If you could run that on a few thousand targets and a few million molecules in a month, you'd be able to make a compelling argument to the committee that approves molecules to go into development (probability of approval * revenue >> cost of approval)
Which seems a hard thing to disprove.
In which case, if some rival of his had done the same search a month earlier, could he have claimed the priority? And would the question of whether the idea had leaked then been a bit more salient to him. (Though it seems the decade of work might be the important bit, not the general idea).
[1] https://jdstillwater.blogspot.com/2012/05/i-put-toaster-in-d...
> Generating hypotheses is the fun, exciting part that I doubt scientists want to outsource to AI
The latter doesn’t imply the former
mechanical turk, but for biology
It's mind-blowing to think that AI can now collaborate with scientists to accelerate breakthroughs in various fields.
This collaboration isn't just about augmenting human capabilities, but also about redefining what it means to be a scientist. By leveraging AI as an extension of their own minds, researchers can tap into new areas of inquiry and push the boundaries of knowledge at an unprecedented pace.
Here are some key implications of this development
• AI-powered analysis can process vast amounts of data in seconds, freeing up human researchers to focus on high-level insights and creative problem-solving. • This synergy between humans and AI enables a more holistic understanding of complex systems and phenomena, allowing for the identification of new patterns and relationships that might have gone unnoticed otherwise. • The accelerated pace of discovery facilitated by AI co-scientists will likely lead to new breakthroughs in fields like medicine, climate science, and materials engineering.
But here's the million-dollar question as we continue to integrate AI into scientific research, what does this mean for the role of human researchers themselves? Will they become increasingly specialized and narrow-focused, or will they adapt by becoming more interdisciplinary and collaborative?
This development has me thinking about my own experiences working with interdisciplinary teams. One thing that's clear is that the most successful projects are those where individuals from different backgrounds come together to share their unique perspectives and expertise.
I'm curious to hear from others what do you think the future holds for human-AI collaboration in scientific research? Will we see a new era of unprecedented breakthroughs, or will we need to address some of the challenges that arise as we rely more heavily on AI to drive innovation?