John Hopkins’ new AI detector exposes machine-written essays and fake news. Here’s how

The scientists believe their research could lead to better oversight of AI systems and identify which ones are at greater risk of misuse.

John Hopkins’ new AI detector exposes machine-written essays and fake news. Here’s how

Representational image.

Михаил Руденко/iStock

As artificial intelligence programs get smarter and more sophisticated, it’s getting harder to tell if you’re talking to non-human online or if AI was involved in writing that particularly brilliant essay you’ve come across — a boon for cheaters and peddlers of misinformation.

A new tool from researchers at Johns Hopkins University offers a way to tell if a piece of writing was created using AI and can also pinpoint which particular Large Language Model was involved. It’s based on the idea that a person’s writing style exhibits unique idiosyncrasies that can be identified, as can the writing done by an AI program. 

According to Nicholas Andrews, a senior research scientist at Johns Hopkins’ Human Language Technology Center of Excellence, his team was the first to demonstrate that AI-generated text contains the same identifiable features found in human writing. These features allow researchers to detect AI text and match it to a specific language model.                           

The scientists believe their work could lead to stronger controls over AI programs and can pinpoint which ones are likely to be abused. In an age of rampant online fakes and spam and increasing plagiarism in schools, identifying AI-generated texts could be invaluable. 

Detecting “fingerprints” 

Andrews became interested in the subject initially in 2016 to combat online misinformation and concerns over foreign influence over social media, especially during the US election cycles. Even before the advent of ChatGPT and similar LLMs, Andrews worked to create a fingerprint of a person online, which could be used to detect fakes.

“The big surprise was we built the system with no intention to apply it to machine writing and the model was trained before ChatGPT existed,” the researcher shared. “But the very features that helped distinguish human writers from each other were very successful at detecting machine writing’s fingerprints.”

The program he developed can determine whether ChatGPT, Gemini, or LLaMA was used to create a piece of writing, focusing on each model’s distinct linguistic fingerprints that distinguish them from human and machine authors.

The machine-text detector GPTZero misidentifies the first review as if a human wrote it. All three reviews were machine-generated using GPT-4.Credit: Nicholas Andrews/Johns Hopkins University.

The detection tool Andrews and his team developed was trained on anonymous writing samples obtained from Reddit. It can work in any language and can be downloaded online for free. 

While other AI detection tools like Turnitin or GPTZero exist, the team believes their approach is the most flexible and accurate. 

As Andrews explained, the concept they utilized originated with law enforcement, which learned to parse ransom notes and writings by suspected criminals to connect them to individuals. Andrews and his team scaled this up. They used neural networks and “lots of data” rather than humans to decide what writing features to identify.

Besides Andrews, project authors included Johns Hopkins doctoral student Aleem Khan, Kailin Koch and Barry Chen of the Lawrence Livermore National Laboratory, and Marcus Bishop of the U.S. Department of Defense. 

How AI models create “fingerprints”

Interesting Engineering reached out to Nicholas Andrews for exclusive insight on their work. In an email exchange, Andrews shared some examples of the linguistic fingerprints that allow their tool to identify AI writing.

“The neural network extracts 512-dimensional vectors that characterize each writing sample it is provided with (the “fingerprints”),” he wrote. This allows their model to capture very complex properties of the writing that human forensic linguistics experts would have a very hard or nearly impossible time describing. 

He mentioned that this “deep learning” approach has a downside: the fingerprints are not immediately interpretable. However, the researchers have done additional work to understand closely what the models are learning. One example he shared is that they found the fingerprints largely impervious to content words (like nouns). This suggested that the models focused more on writing style rather than topics. 

Can AI avoid detection? 

As Andrews elaborated, in their paper “Few-Shot Detection of Machine-Generated Text using Style Representations,” the team considered two “attacks” on their method – in one, they prompted the AI to write in the style of a real example of human text, and for the second, they paraphrased portions of the AI-generated document. That’s a well-known way to degrade the performance of machine-text detectors. 

To their surprise, the researchers found that the first approach of writing in the style of a human was not particularly successful at confusing the AI detector. This suggested that LLMs can imitate human styles only superficially. The second, the paraphrasing approach, was more fruitful in worsening the performance of their tool, which motivated them to improve its functionality.

They created a “fingerprint” for the paraphrasing model that helped identify text changes in that manner. The researchers do acknowledge, however, that this tactic, whereby a human would take an AI text and manually paraphrase it, is still a possible way to avoid detection as it requires them to make educated guesses about how potential “adversaries” who may decide to do so could paraphrase the texts. 

One way to address this issue, especially in a classroom setting where a teacher would want to ensure students aren’t turning in AI-generated material and defeating detection, is to use past writing by the students to train the tool to learn their previous writing styles. 

Potential applications

As Andrews shared, the model they released can be used by anyone with some basic Python programming knowledge to extract “fingerprints” from any writing sample. These fingerprints can accurately pinpoint writing by AI models like GPT-4.

“What’s new about our approach is that it makes it very easy for end-users to build specialized detectors for their particular settings,” he stated, adding, “For example, a university professor could preemptively prompt GPT-4, or other models they suspect students could plagiarize from, to build a “fingerprint” tailored to their particular setting.” The team’s experiments show that this can produce very “robust” detectors with “very low false alarm rates (<1%).”

Interestingly, when the team recently presented their work at the International Conference on Learning Representations, the lead author Rafael Rivera Soto, a first-year John Hopkins PhD student advised by Andrews, created a demo of the tool with some telling results. When he put all the peer reviews from the conference through their detector, it found about 10 percent of the reviews were likely machine-generated.

0COMMENT

ABOUT THE AUTHOR

Paul Ratner Paul Ratner is a writer, award-winning filmmaker, and educator. He has written for years for Interesting Engineering, Big Think, Huffington Post and other publications, focusing on stories of paradigm shifts in science, technology and history. Paul lives in sunny Sarasota, Florida.