AI Goes From Guesswork to PhD-Level Science in Just 2 Years: Researcher

In less than two years, top-tier AI models went from amateur guesses in a challenging science exam, to providing expert-level answers.
AI Goes From Guesswork to PhD-Level Science in Just 2 Years: Researcher
People stand next to a humanoid robot in Shanghai, China, on Feb. 21, 2025. Hector Retamal/AFP via Getty Images
Alfred Bui
Updated:
0:00

Several AI models have mastered PhD-level science in just a few months, according to one expert.

At a recent online event about AI, Liam Carroll, a researcher at the Gradient Institute, said frontier models were developing at a much faster rate than anticipated.

He cited data from the AI research institute Epoch AI, where models were asked PhD-level science questions according to a GPQA (Graduate-Level Google-Proof Q&A) dataset.

The 198 questions include challenging multiple-choice queries across molecular biology, astrophysics, and quantum mechanics.

Questions are written by experts (with PhDs or pursuing one), and are designed to be difficult for non-experts to answer.

A benchmark 25 percent score is given if someone were to just randomly guess their answers, while a 69.7 percent was scored during a trial with PhD experts.

In a 21-month period between July 2023 and April 2025, the systems began providing more expert-level responses, and in the three months from January to April this year, several AI models passed the 70 percent threshold.

One model, Gemini 2.5 Pro from Google, even exceeded the 80 percent benchmark.

“I also want to make clear that these models are not just parrots, or ’stochastic parrots’–some of you may have heard that term” Carroll said, in reference to AI models giving human-like text responses with no real understanding of the subject matter.

“We are seeing that they are not just memorising things in their training data. They are actually learning generalisable patterns about the world,” Carroll said.

“You can really feel this when you spend a lot of time talking to the most state-of-the-art models like Claude 3.7 or ChatGPT o1.”

ChatGPT o1, a new AI model focused on complex reasoning tasks for professionals—different from the well-known ChatGPT 4o commonly used by the public.

Next Stage of AI: Agency

At the same time, Carroll shared his views on the next AI development frontier–AI “agents.”

“At the moment, it looks like AI is just a bunch of chatbots being released every month or every week at this point,” he said.

“But the next wave of AI technologies that we’re about to see are AI agents that can autonomously operate a computer already–systems like Open AI’s Operator.”

He pointed to Manus an autonomous artificial intelligence agent created by the Chinese AI startup Monica.

“They can write and execute code in a computer terminal, and basically, many people believe that they will soon be able to do many of the kinds of tasks that a typical remote human worker could do now, but at a fraction of the cost and at a fraction of the time that a human would take to do it,” he said.

Carroll said while the rapid development of AI technology could bring massive upsides for the economy, it needed to be managed with caution.

“In order to realise that potential, we have to ensure that we can actually trust that these systems will be safe,” he said.

Microsoft CEO Satya Nadella speaks during an event highlighting Microsoft Copilot agents in Redmond, Washington, on April 4, 2025. (Stephen Brashear/Getty Images)
Microsoft CEO Satya Nadella speaks during an event highlighting Microsoft Copilot agents in Redmond, Washington, on April 4, 2025. Stephen Brashear/Getty Images

Over 1,600 AI Risks Have Been Identified: Researcher

Meanwhile, Jess Graham, a researcher at the University of Queensland who is  currently working on AI risks at MIT (Massachusetts Institute of Technology), shared her insight.

She said the AI Risk Repository project, the most comprehensive database of AI risks, which she was involved in, identified more than 1,600 types of risks as of March 2025.

“And from these risks, we were able to isolate and define 24 distinct types of AI risk,” Graham said.

“These risks that we found included more familiar risks, such as exposure to toxic content and false or misleading information, as well as more extreme risks, such as weapon development and loss of human agency and autonomy.”

Alfred Bui
Alfred Bui
Author
Alfred Bui is an Australian reporter based in Melbourne and focuses on local and business news. He is a former small business owner and has two master’s degrees in business and business law. Contact him at [email protected].