AI Spy: Universities struggle to catch AI generated work

By Dylan Marks

With artificial intelligence (AI) detection programs lagging behind the surge in AI usage amongst post-secondary students, experts say universities are struggling to effectively identify cases of academic misconduct involving it.

AI usage is on the rise as students rely on different generative artificial intelligence (GAI) models like ChatGPT, Quillbot, Grammarly and DALL-E to complete graded assignments. In a survey conducted by the Digital Education Council, 86 per cent of students said they use artificial intelligence in their studies and that they use it regularly.

Associate professor at the University of Calgary’s Werklund School of Education and expert in generative AI, Soroush Sabbaghan shared his thoughts on the current state of AI detection. “We simply do not have the technology to detect AI-generated content,” he explained.

“What we have now is not accurate at all. I think [we] need to move away from [current detection applications] as it is simply not possible, at the moment, to detect generative AI,” said Sabbaghan.

He also explained that, even before false positives on detection reports, a large issue with AI detection lies within the unclear parameters surrounding what constitutes AI-based academic misconduct. “At the moment, all AI tools have a risk towards academic integrity, but at the same time, we have to define what academic integrity actually is,” he said.

“We simply do not have the technology to detect AI-generated content”

According to Toronto Metropolitan University’s (TMU) academic integrity webpage, academic misconduct includes any action that prevents the university from accurately assessing a student’s academic performance. It is defined as “any behaviour that a student knew, or reasonably ought to have known, [that] could gain them or others unearned academic advantage.”

In regards to how academic integrity cases are treated at TMU specifically, it is up to the academic integrity office to come to the decision of what is and is not fair usage of AI.

According to the TMU academic integrity office webpage, “unless explicitly stated by the instructor, students should assume that using AI to complete assessments is prohibited.”

When it comes to finding AI-generated work within submitted class assignments, a recent study done by the International Journal for Educational Integrity also determined that not all AI detection tools are completely accurate. The study’s findings indicate major implications for plagiarism detection, stressing the need for advancements in detection programs to stay up-to-date with the progress of AI text generation.

In a test article containing 15 AI-generated paragraphs and only five human-written paragraphs, the study found OpenAI’s AI text classifier, specifically created to detect AI-generated work, misidentified nearly 10 per cent of the human generated work as AI.

TMU, however, doesn’t endorse any AI detection software. According to a emailed statement from the Centre for Excellence in Learning & Teaching sent to The Eyeopener, cases of academic misconduct on a case-by-case basis involve no source of quantitative data from AI-detection software.

“A fundamental part of the Policy 60 process is a discussion between the instructor and the student during which the student will have an opportunity to speak about their work and the process by which it was completed,” read the statement.

The school did not specify how accurate their methods for finding AI amongst student work.

“These large language models can be configured to generate either a safe kind of text that looks pretty much like human text”

With TMU’s broad regulations surrounding academic misconduct and a lack of quantitative AI detection, one student TMU student worries that he is in the process of being flagged for academic misconduct as he awaits further feedback from his professor.

When asked about being flagged for AI use, a first-year urban planning student* said, although they did not use AI in their assignment and properly cited their sources, their suspected low grade may be a result of their professor suspecting false AI usage.

“I should have gotten a pretty good mark because I provided all my sources and everything. I got a pretty bad mark and didn’t really get any feedback on why,” they said.

The student explained that this worry is rooted in their past experiences with the professor being extremely adamant about AI restrictions in the class and previous instances of usage.

According to associate professor at Carleton University’s School of Computer Science, Majid Komeili, AI use is detected through the process of identifying whether an image or piece of text was created by an artificial intelligence large language model (LLM) or by a human.

Komeili also stressed the detection process is usually done by analyzing the specific style of text to see if it has features commonly produced by AI. These LLM’s include any language models that can produce or determine AI-generated text, including ChatGPT, Grammarly and Turnitin.

When it comes to finding AI-generated work, Komeili said, “There is a kind of competition between the generator that generates text, and the detectors.”

“We have a generator that tries to fool the detector, and the detector that tries to detect the real images or real text versus the fake synthesized ones. So generally speaking, the task of the detector is simpler,” he said.

“These large language models can be configured to generate either a safe kind of text that looks pretty much like human text, or it can be used to generate a more diverse kind of text,” Komeili said. “So then eventually I don’t think this reliance on text detectors is going to work in academic settings.”

According to findings in a recent study done by the National Library of Medicine, inaccurate AI detection doesn’t stop at a post-secondary level and has been found in a recent scientific manuscript.

The study found that AI detectors were proven to misidentify chunks of human-written text as AI-generated content. The authors of the article reported that the AI text detector mistakenly flagged up to 8 per cent of genuine writing as AI-generated, emphasizing the current limitations of these detection tools.

“I’ve never personally been flagged, [but] I do know of friends that have been”

With heightened incidents of AI-false positive detection, TMU’s broad case-by-case regulations have some students avoiding it altogether. This includes first-year computer engineering student Adam Horani, who tries to steer away from using AI in his academic work to avoid being flagged for any amount of plagiarism.

“I don’t take the AI work and put it into my own work, that’s plagiarism. Even building off of it is a bit shaky, so I prefer not to do that [at all],” he said. “I’ve never personally been flagged, [but] I do know of friends that have been,” said Horani.

Horani did mention however, that he occasionally uses AI to summarize certain concepts to help him better understand what he’s learning, especially for courses with large readings such as physics. Within his academic work, AI-based content ends there.

As generative AI platforms like ChatGPT learn from user interactions and previously uploaded work, their ability to produce realistic, personalized responses with rapidly advancing language models makes it difficult for detection programs to distinguish between genuine and AI-generated content.

However, Sabbaghan explained how the ability for AI to replicate one’s natural writing style and fool detectors may be an indefinite problem. “It will become a cat and mouse game. So even if there are technologies that are coming in the horizon, somebody is just going to write a new model that’s going to break that system,” he said.

Both Sabbanghan and Komeili believe the possible solution could be as simple as reassessing academic testing and moving back to relying on different forms of assessment where students have to produce more authentic content.

“I think the easiest solution for [unfair AI usage] is to go back to a simple evaluation approach. In person exams for assignments and doing demonstrations rather than relying only on reports,” said Komeili.

*This source has been granted anonymity due to risks associated with Senate Policy 60. The Eyeopener has verified this source.