How did we convert an AI plagiarism detector idea into a NeurIPS paper? A simple paper review HULLMI

Introduction

In recent months, the emergence of artificial intelligence (AI) and its application in various domains has raised significant concerns, especially in educational settings. One notable challenge is determining whether a piece of writing is authored by a human or generated by AI, such as large language models (LLMs). To explore this issue further, we published a paper titled “HULMI” (short for "Human versus LLM Identification with Explainability") which aims to detect AI-generated text.

The Background of the Research

Many educators have expressed concerns regarding academic integrity, particularly in light of students potentially using AI tools to complete assignments. Traditional approaches had focused on distinguishing between human-written content and that produced by parents, but the advent of AI brought about a new dimension to this problem. During school visits, we were continually asked for a solution that could assist in discerning whether a student’s assignment was genuinely written by them or produced by AI.

Recognizing that existing AI text detectors were not effective in fully addressing this issue, we decided to embark on a research project. We aimed to build a more robust model for identifying human and AI-generated text. Our research does not only pertain to education; it also carries implications for journalism, cybersecurity, and politics. As LLMs become increasingly convincing at generating human-like text, the stakes in accurate detection grow higher.

Research Methodology

Dataset Generation

We recognized that the quality and representativeness of our dataset was vital for the success of our model. Early attempts yielded misleadingly high accuracy rates due to biased datasets, so we focused on creating a balanced dataset. This process required thorough checks and cleaning to ensure that our AI and human-generated texts would reflect true writing characteristics without undue bias.

We utilized data from open-source platforms such as Hugging Face and Kaggle while also generating custom datasets. Our custom data came from five distinct domains, including English literature, cooking recipes, and IMDb user reviews. We implemented a unique "hourglass approach" when creating the AI-generated equivalents of this data. By summarizing human-written content and having AI expand on those summaries, we minimized biases introduced by original human text styles.

Model Training

Our testing included a variety of both traditional machine learning algorithms—like Logistic Regression, Naive Bayes, Random Forest, XGBoost, and LSTM—and advanced NLP models such as Roberta, T5, and Sentinel. After hyperparameter tuning, we evaluated each model’s performance on both accuracy and false positive rates.

One of the critical aspects of our research involved the implementation of Local Interpretable Model-agnostic Explanations (LIME). This technique allowed us to highlight the words contributing most to the model’s predictions, providing insights into the rationale behind its classifications.

Results

Our findings confirmed that advanced NLP models generally outperformed traditional machine learning approaches, but surprisingly, the traditional models still showed commendable accuracy. Importantly, we wanted our results to minimize false positives—situations in which human-written text is incorrectly classified as AI-generated. The accuracy and false positive rates varied significantly across models, with advanced NLP achieving better results than traditional methods in most instances.

Future Step: From Paper to Product

Going forward, we aim to translate our research into a practical product, tailoring our findings to create a robust detection tool specifically for educational use. The goal is to highlight the sections of writing most likely to be human-generated versus AI-generated, enhancing educators’ ability to accurately assess student work.

We maintained an open approach throughout our research, making our findings and code accessible for fellow researchers. Anyone interested in our methodology or with ideas for enhancements is welcome to engage with our work.

Conclusion

The challenge of differentiating between AI and human-generated text is intricate and multi-faceted, but our research provides a foundational step towards developing reliable solutions in this area. We hope to contribute to the discourse around academic integrity and AI in education, providing tools that empower teachers and institutions.

Keywords

AI detection, plagiarism detection, LLM identification, NLP models, education, academic integrity, explainability.

FAQ

Q: What is the purpose of the HULMI paper?
A: The purpose of the HULMI paper is to develop a robust model capable of detecting whether a given text is generated by AI or authored by a human, particularly in educational settings.

Q: How was the dataset for the HULMI paper created?
A: The dataset was created by balancing AI-generated texts using open-source data and also generating custom datasets from various domains, ensuring representation across different writing styles.

Q: What models were used in the research?
A: The research utilized a mix of traditional machine learning algorithms (such as Logistic Regression, Naive Bayes, etc.) and advanced NLP models (including Roberta, T5, and Sentinel).

Q: What does the LIME technique do?
A: The LIME technique highlights the specific words or phrases within a paragraph that contribute most to the model's prediction, helping to interpret and explain the model's decisions.

Q: What are the future plans for the research?
A: The future plans involve developing a product based on the research findings, specifically focused on assisting educators in detecting AI-generated texts in student submissions.