ChatGPT as Academic Co-Author

TL;DR: New research reveals ChatGPT has accumulated nearly 2,000 academic citations despite publisher bans on AI authorship, while a separate study published in BMJ Open found that more than half of AI chatbot responses to medical questions contain inaccurate or fabricated information.

The question of how much to trust artificial intelligence in academic and medical contexts has become harder to avoid. A study from web intelligence platform Oxylabs found that ChatGPT accumulated 1,952 citations across 42 co-authored scholarly papers between 2022 and 2025, spanning 12 fields and 6 languages. That figure alone would be notable for any early-career researcher, let alone a language model.

ChatGPT as Academic Co-Author

The Oxylabs research team used Google Scholar data to trace ChatGPT’s presence across disciplines from computer science, which accounts for 23 of the 42 papers, to philosophy, nursing, and education. Papers appeared in journals from publishers including Elsevier, Springer, and SAGE, all of which explicitly prohibit crediting AI as an author.

The analysis also found that ChatGPT achieved an m-index of 2, a metric typically associated with productive and highly cited scientists. Its multilingual footprint stretches across English, Spanish, German, French, Portuguese, and Indonesian publications, suggesting its use as a writing aid was not confined to any single region or research culture.

After the broader academic community reinforced authorship policies in early 2023, formal co-author credits declined. However, the Oxylabs data shows ChatGPT’s influence persisted in subtler ways, including through unedited output left inside submitted papers, such as the phrase “certainly, here is a possible introduction to your topic.”

The Accuracy Problem Behind the Citations

The citation count would be less alarming if the underlying content were reliable. A study published in BMJ Open and reported by PA Media found that half of AI chatbot responses to 50 medical questions were rated as problematic. Researchers from the University of Alberta and Loughborough University’s School of Sport, Exercise and Health Sciences tested five major chatbots across topics including vaccine safety, cancer treatment, and nutrition.

Grok returned the most problematic responses at 58 percent, followed by ChatGPT at 52 percent and Meta AI at 50 percent. The study also found that citations within those responses were “frequently incomplete or fabricated.” Separate prior work cited in the BMJ Open study found that only 32 percent of more than 500 citations from ChatGPT, ScholarGPT, and DeepSeek were accurate, with nearly half at least partially fabricated.

The researchers described chatbots as systems that “do not reason or weigh evidence” but instead generate outputs by predicting likely word sequences from training data. That process produces what the study called “authoritative-sounding but potentially flawed responses,” a characteristic that becomes particularly dangerous when the audience is a patient or a student citing academic sources.

Where the Risk Concentrates

Chatbots performed worst in stem cell therapy, athletic performance, and nutrition, areas where evidence is contested or rapidly evolving
Models showed sycophancy, meaning they tended to prioritize answers that matched user beliefs over factual accuracy
Publishers including Elsevier and Springer ban AI authorship, yet the Oxylabs data confirms those policies have not fully prevented AI from entering the scholarly record
AI-generated text left unedited inside submitted papers suggests some researchers are using chatbots without adequate review of the output

Analysis

The authorship question is genuinely thorny. Most major publishers banned AI co-authorship in early 2023, and the Oxylabs data shows that explicit ChatGPT credits did decline after that point. But the problem didn’t go away. It just became less visible. Researchers may still be using AI to draft significant portions of papers without disclosing it, which raises questions that peer review wasn’t designed to catch.

The medical misinformation findings are harder to dismiss. Chatbots performed worst on stem cells, nutrition, and athletic performance, which happen to be areas where people often make high-stakes personal decisions. The researchers were clear that these tools don’t reason or weigh evidence. They predict likely word sequences. That’s a meaningful distinction when someone is asking whether a diet is healthy or whether a stem cell therapy works.

There’s a real opportunity here too. The fact that Oxylabs could track ChatGPT’s academic footprint using public web data and natural language prompts shows that oversight tools are getting better. If publishers and institutions used similar methods to audit submissions, they could catch undisclosed AI use and citation fabrication before papers reach readers.

The risk is complacency. Both studies suggest the academic and medical communities are still catching up to how quickly these tools spread. The citation fabrication problem is particularly concerning because fabricated references can propagate through later research if nobody checks the original source. A paper that looks credible because it has citations isn’t credible if those citations don’t hold up.

Key Takeaways

ChatGPT has accumulated nearly 2,000 citations across 42 academic papers despite publisher bans on AI authorship, according to Oxylabs research
More than half of AI responses to medical questions were rated problematic in a BMJ Open study, with citation fabrication a documented pattern
The accuracy risk is highest in fast-moving or contested fields like nutrition, stem cell therapy, and athletic performance
Researchers and students relying on AI-generated citations should verify each source independently before including it in academic work
Regulators and academic institutions face mounting pressure to establish oversight frameworks before AI-generated content becomes further embedded in the published record