DocuMark

General

A guide to handling academic integrity violations across your institution

By Trinka On Apr 3, 2026

Reading Time 5 minutes

A guide to handling academic integrity violations across your institution

Traditional plagiarism detection tools were built for a different problem: catching students who copied from existing sources. They were not designed for a world where AI can generate plausible, original-sounding academic text in seconds. In the AI era, relying on text-matching software or probabilistic AI detectors to enforce academic integrity leaves institutions with probability scores instead of proof, and faculty with suspicion instead of evidence. A more defensible approach tracks the writing process itself, documenting how work was produced, not just what the final document contains.

When the tool and the challenge no longer match

A faculty member at a mid-sized university recently flagged a student essay for potential AI use. The detection tool returned a 73% probability score that the content was AI-generated. The student, an international student writing in her third language, contested the finding. No review process existed. The faculty member had no other evidence. The case stalled.

This scenario is not unusual. It points to a structural mismatch that has quietly undermined academic integrity policy across higher education: the tools institutions built their integrity infrastructure around were designed to catch a different kind of problem. And the institutions have not yet caught up.

Plagiarism detection in its traditional form assumes that dishonest work leaves a trace. Copy from a journal article, and the tool finds the match. Copy from a classmate, and similarity flags the overlap. That logic worked well for decades. It does not work for AI-generated text, because AI does not copy. It composes.

What text-matching tools were actually built to do

Text-matching systems, the foundation of most institutional plagiarism detection infrastructure, operate on a straightforward premise: compare a submitted document against a database of existing sources and flag high similarity. A systematic survey published in Frontiers in Computer Science (2025) notes that early detection methods such as string-matching algorithms were effective for identifying verbatim plagiarism, but struggle significantly against paraphrasing and, crucially, against AI-generated text.

AI-generated content bypasses text-matching entirely because it does not replicate source material. A student who submits a ChatGPT-generated essay may receive a near-zero similarity score, because the text is, technically, original. Research cited in the International Journal for Educational Integrity (Weber-Wulff et al., 2023) demonstrated that text-matching software applied to LLM-generated content makes little sense given the stochastic nature of how language models generate text. Originality, in the traditional sense, is not a useful metric here.

This is not a criticism of plagiarism detection vendors. They built tools that solved the problem that existed at the time. The problem has changed.

The reliability problem with AI detectors

The natural institutional response has been to layer AI detection tools on top of existing plagiarism checkers. But these tools carry their own significant reliability problems, and the evidence base for caution is now substantial.

A Stanford University study found that while AI detectors achieved near-perfect accuracy on essays written by U.S.-born eighth-graders, they misclassified more than 61% of TOEFL essays written by non-native English speakers as AI-generated. At least one detector flagged 97% of those essays. The core issue is that detectors rely on measures like text perplexity, and non-native speakers often write in ways that superficially resemble low-perplexity, AI-generated prose.

The equity implications of this are serious. A Common Sense Media report (2024) found that Black students are more likely to be falsely accused of AI-generated writing by their teachers. Neurodiverse students also face higher false-positive rates. For institutions that have invested in AI detection as a frontline enforcement mechanism, this is not a minor calibration issue. It is a structural fairness problem embedded in the tool itself.

Even OpenAI, the organization behind ChatGPT, discontinued its own AI detector after finding it correctly identified only 26% of AI-written text while falsely flagging 9% of human writing. That admission from the creator of the most widely used generative AI tool should inform how institutions think about AI detection as an evidentiary standard.

The policy gaps these tools leave open

The deeper problem is not just detection accuracy. It is that detection-first approaches answer the wrong question. Institutions using these tools are asking: “Is this document AI-generated?” The question that actually matters for academic integrity is: “Did this student do the learning that this assignment was supposed to generate?”

Those are not the same question. A student can write an essay entirely in their own words and still engage in no meaningful learning. A student can use AI as a drafting scaffold, revise it substantially, and demonstrate deep understanding of the material. A probability score captures neither of these distinctions.

EDUCAUSE’s 2024 AI Landscape Study, drawing on a survey of more than 900 higher education technology professionals, documented the gap clearly: institutional AI policies are largely permissive or neutral, but enforcement mechanisms have not kept pace. The result is that institutions with clear policy intent have no reliable way to operationalize it. Detection tools offer a sense of enforcement activity without providing the underlying evidentiary infrastructure.

What process-based integrity looks like in practice

A growing number of institutions are beginning to reframe the question. Rather than asking what a finished document looks like, they are asking how it was produced. The University of Oxford’s Academic Integrity Framework, revised in 2024, explicitly shifted from detection-first approaches toward assessment redesign and transparent disclosure policies. Stanford’s Hasso Plattner Institute of Design has piloted process documentation approaches in select courses, where students submit drafts, annotations, and reflective journals alongside final work.

Process documentation takes a different posture from detection. Instead of scanning the finished product for anomalies, it captures the writing journey: how drafts evolved, where revisions occurred, what the pacing and engagement looked like across the session. This creates a verifiable record of authorship that does not rely on probabilistic inference.

This approach also changes what misconduct investigation looks like. When a faculty member has a concern, they review a structured record of the writing session rather than a percentage score that the student can plausibly dispute. The burden of the investigation shifts from accusation to documentation. That shift matters enormously for student due process and for the institutional defensibility of any resulting disciplinary action.

The institutional cost of staying with detection-only approaches

There is an underappreciated operational cost to detection-first approaches that is worth surfacing. When a faculty member receives a high-probability AI flag on a piece of work, the investigation that follows is entirely manual. They must gather supporting evidence, confront the student, manage the appeal process, and ultimately make a judgment call on evidence that is probabilistic at best.

This is time-consuming. It is also often inconclusive, because a probability score is not proof. Cases stall, students appeal, and institutions face reputational and legal risk if a false accusation reaches formal disciplinary proceedings. The very tool meant to streamline integrity enforcement creates its own downstream burden.

Faculty are already stretched. Adding unresolvable misconduct investigations to their workload does not serve academic integrity; it erodes confidence in the system. For integrity officers, the inability to produce actionable evidence in contested cases is a recurring operational problem that detection tools were never designed to solve.

From detecting outputs to documenting the process

The question academic integrity was always trying to answer was not “Is this text original?” It was “Is this student’s work genuinely theirs?” For most of the last two decades, those questions pointed to the same tool. In the AI era, they do not.

Genuine authorship lives in the process of writing: in the decisions made mid-draft, the revisions that show understanding deepening, the engagement with the material over time. The finished essay is an artefact. The process that produced it is where integrity can actually be verified.

Institutions building out their authorship validation workflows may find that DocuMark by Trinka provides the kind of process-layer evidence that transforms integrity review from probabilistic suspicion to structured, reviewable documentation. The shift from outcome-scanning to process transparency is where institutions are finding workable, defensible paths forward.

Sources and references

Weber-Wulff, D. et al. (2023). Testing of detection tools for AI-generated text. International Journal for Educational Integrity, 19(1), 26. https://link.springer.com/article/10.1007/s40979-023-00146-z

Liang, W. et al. (2023). GPT detectors are biased against non-native English writers. Patterns / Stanford HAI. https://hai.stanford.edu/news/ai-detectors-biased-against-non-native-english-writers

Giray, L. (2024). The problem with false positives: AI detection unfairly accuses scholars of AI plagiarism. The Serials Librarian, 85(5-6). https://www.tandfonline.com/doi/abs/10.1080/0361526X.2024.2433256

EDUCAUSE. (2024). 2024 AI Landscape Study / Action Plan: AI Policies and Guidelines. https://www.educause.edu/research/2024/2024-educause-action-plan-ai-policies-and-guidelines

Frontiers in Computer Science. (2025). Plagiarism types and detection methods: a systematic survey. https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1504725/full

The Education Magazine. (2026). AI governance in higher education: the 2026 framework for policy and risk. https://www.theeducationmagazine.com/ai-governance-in-higher-education/

CITL, Northern Illinois University. (2024). AI detectors: an ethical minefield. https://citl.news.niu.edu/2024/12/12/ai-detectors-an-ethical-minefield/

Go beyond grammar & spelling
Get holistic language improvements
Write with Trinka

Confidential Data Plan

DocuMark New

Plans

Credits

Invoicing & payments

Blogs

Webinars

Industry Engagements

News & Press Releases

Grammar Checker

Paraphraser

Journal Finder

AI Content Detector

Academic Phrasebank

A guide to handling academic integrity violations across your institution

When the tool and the challenge no longer match

What text-matching tools were actually built to do

The reliability problem with AI detectors

The policy gaps these tools leave open

What process-based integrity looks like in practice

The institutional cost of staying with detection-only approaches

From detecting outputs to documenting the process

Go beyond grammar & spelling

Powered by:

Follow Us:

Writing Assistance

Reports

Popular Solutions

Confidential Data Plan

DocuMark New

Plans

Credits

Invoicing & payments

Blogs

Webinars

Industry Engagements

News & Press Releases

Grammar Checker

Paraphraser

Journal Finder

AI Content Detector

Academic Phrasebank

A guide to handling academic integrity violations across your institution

When the tool and the challenge no longer match

What text-matching tools were actually built to do

The reliability problem with AI detectors

The policy gaps these tools leave open

What process-based integrity looks like in practice

The institutional cost of staying with detection-only approaches

From detecting outputs to documenting the process

Go beyond grammar & spelling