📱

Read on Your E-Reader

Thousands of readers get articles like this delivered straight to their e-reader. Works with Kindle, Boox, and any device that syncs with Google Drive or Dropbox.

Learn More

This is a preview. The full article is published at newscientist.com.

All major AI models risk encouraging dangerous science experiments

All major AI models risk encouraging dangerous science experiments

By Author Fullname; Matthew SparkesNew Scientist - Home

Scientific laboratories can be dangerous places PeopleImages/Shutterstock The use of AI models in scientific laboratories risks enabling dangerous experiments that could cause fires or explosions, researchers have warned. Such models offer a convincing illusion of understanding but are susceptible to missing basic and vital safety precautions. In tests of 19 cutting-edge AI models, every single one made potentially deadly mistakes. Serious accidents in university labs are rare but certainly not unheard of. In 1997, chemist Karen Wetterhahn was killed by dimethylmercury that seeped through her protective gloves; in 2016, an explosion cost one researcher her arm ; and in 2014, a scientist was partially blinded . Now, AI models are being pressed into service in a variety of industries and fields, including research laboratories where they can be used to design experiments and procedures. AI models designed for niche tasks have been used successfully in a number of scientific fields, such as biology , meteorology and mathematics . But large general-purpose models are prone to making things up and answering questions even when they have no access to data necessary to form a correct response . This can be a nuisance if researching holiday destinations or recipes, but potentially fatal if designing a chemistry experiment. To investigate the risks, Xiangliang Zhang at the University of Notre Dame in Indiana and her colleagues created a test called LabSafety Bench that can measure whether an AI model identifies potential hazards and harmful consequences. It includes 765 multiple-choice questions and 404 pictorial laboratory scenarios that may include safety problems. In multiple-choice tests, some AI models, such as Vicuna, scored almost as low as would be seen with random guesses, while GPT-4o reached as high as 86.55 per cent accuracy and DeepSeek-R1 as high as 84.49 per cent accuracy. When tested with images, some models, such as InstructBlip-7B, scored below 30 per cent accuracy. The team tested 19 cutting-edge large language models (LLMs) and vision language models on LabSafety Bench and found that none scored more than 70 per cent accuracy overall. Free newsletter Sign up to The Daily The latest on what’s new in science and why it matters each day. Zhang is optimistic about the future of AI in science, even in so-called self-driving laboratories where robots work alone, but says models are not yet ready to design experiments. “Now? In a lab? I don’t think so. They were very often trained for general-purpose tasks: rewriting an email, polishing some paper or summarising a paper. They do very well for these kinds of tasks. [But] they don’t have the domain knowledge about these [laboratory] hazards.” “We welcome research that helps make AI in science safe and reliable, especially in high-stakes laboratory settings,” says an OpenAI spokesperson, pointing out that the researchers did not test its leading model. “GPT-5.2 is our most capable science model to date, with significantly stronger reasoning, planning, and error-detection than the model discussed in this paper to better support researchers. It’s designed to accelerate scientific work while humans...

Preview: ~500 words

Continue reading at Newscientist

Read Full Article

More from New Scientist - Home

Subscribe to get new articles from this feed on your e-reader.

View feed

This preview is provided for discovery purposes. Read the full article at newscientist.com. LibSpace is not affiliated with Newscientist.

All major AI models risk encouraging dangerous science experiments | Read on Kindle | LibSpace