Telegram
Science and technology

Artificial intelligence is bad at explaining scientific articles

GPT-chat

Large AI-based language models, such as Chat GPT, have been found to oversimplify the results and conclusions of scientific articles. Newer versions of these models have been even worse at generalizing articles than older ones. The tendency of these models to generalize and simplify can even be harmful to users, for example, if an AI recommends the wrong treatment based on a medical article. Research published in the journal Royal Society Open Science.

Given how many people use AI to explain scientific articles and everyday things, the researchers decided to test the accuracy of 10 of the most popular AI-based models. These included the large language models ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet. These models were asked to summarize articles or their abstracts from scientific (Science, Nature) and medical journals (The New England Journal of Medicine, Lancet).

The scientists varied the requests to the language models: some asked to simply summarize the article, others to read it in detail and give a summary according to the facts, and still others to not deviate from the information specified in the article. In total, the scientists received 4300 summarized abstracts of scientific articles and 600 summarized articles. They compared them with the original text of the articles and summaries written by other scientists for journals. As it turned out, the artificial intelligence models made excessive generalizations almost five times more often than those scientists who wrote the summarized content of the research for the journal.

The Chinese AI DeepSeek, three ChatGPT models, and two LLaMA models overgeneralized and simplified information 26-73 percent of the time. The Claude language model had the lowest percentage of such simplifications. Older models, such as GPT-4 Turbo and LLaMA 2 70B, contained overgeneralized information 2,6 times more often than article annotations, while newer ChatGPT-4o and LLaMA 3.3 70B contained such information 9 and 39 times more often.

This result persisted even when the scientists asked the AI ​​not to deviate from the facts presented in the article and not to distort them. Therefore, even a correctly formed query does not protect against an AI error, so the information received from it must be verified, the scientists emphasize.

Thank you for being with us! Monobank for the support of the ElitExpert editorial office.

Comments

Recent ones

The most relevant news and analytical materials, exclusive interviews with the elite of Ukraine and the world, analysis of political, economic and social processes in the country and abroad.

We are on the map

Contact Us

01011, Kyiv, str. Rybalska, 2

Phone: +38-093-928-22-37

Copyright © 2020. ELITEXPERT GROUP

To Top