Agent / Scholarly Writing

De-AIGC: How AI-Text Detection Works, Its Limits, and Responsible Paper Revision

AI detectors judge "does this look machine-written?" from statistical signals like perplexity, burstiness, and probability curvature — but those signals are neither stable nor robust to rewriting. This page explains how detection works and where it breaks, and reframes "humanizing" as rewriting a draft with content you genuinely understand, in your own voice, with verifiable facts, and with AI use disclosed per journal policy — not as a how-to for deception.

This page follows the agentic empirical-research workflow. The core idea: once you understand how AI detectors work (perplexity / burstiness / DetectGPT / watermarks), you see why they misfire — and why the real solution is genuine authorship, fact-checking, and transparent disclosure, not a cat-and-mouse game with detectors.

Schematic

The principle at a glance

The thread of AI-text detection: compute perplexity and burstiness from text → DetectGPT checks probability curvature → watermarks give a statistical test → produce a score. But human and machine distributions overlap heavily, so a score is a clue, not a verdict; the responsible path is genuine authorship, line-by-line verification, and disclosing AI use per venue policy.

Start Here

What you should be able to do

Understand three detection signals: perplexity and burstiness, DetectGPT probability curvature, and generation watermarks.

Know why detectors are unreliable: high false-positive rates, systematic bias against non-native writing, and easy weakening by rewriting.

Read "humanizing" as improving authenticity, accuracy, and readability — not as circumventing academic integrity.

Know the AI-use disclosure policies of major journals / conferences, and where and how to state them.

Learning Path

Learning path: perplexity → burstiness → curvature → watermark → limits

Read AI-text detection along this path: start with plain perplexity and burstiness, move to DetectGPT curvature and watermarks, and end by recognizing the fundamental limit from distributional overlap.

Step 1
Perplexity
Machine text is less surprising on average, so perplexity is lower.
PPL=exp(−mean log p)
Step 2
Burstiness
Human sentence surprise varies more; machine text is smoother.
std/mean
Step 3
Curvature
DetectGPT: machine text sits at a local log-probability maximum.
d(x)>0
Step 4
Watermark
Generation biases a green list; testable but easily weakened by rewriting.
z-score
Step 5
Limits
Distributional overlap makes false positives unavoidable; a score is only a clue.
AUC<1

01 / Intuition

Core Intuition

LLMs tend to generate high-probability, low-surprise token sequences, so machine text has lower average perplexity and smaller sentence-level surprisal variance (burstiness) — the statistical basis of most detectors.

DetectGPT uses one insight: machine text usually sits near a local maximum of the model log-probability, so small paraphrases tend to lower the log-probability; human text need not have this curvature.

Watermarking biases token choice toward a pseudo-random "green list" at generation time, testable after the fact; but translation, rewriting, or switching models weakens it — showing that all detection signals rest on fragile distributional assumptions.

02 / Math

The statistics of detection signals and their fundamental limits

01 / Perplexity

Perplexity measures how "surprised" a model is by a text on average. Machine-generated text is low-perplexity to the model that wrote it — but carefully polished, conventional human text can also be low-perplexity, a key source of false positives.

PPL=exp(−(1/N) Σ_i log p(w_i | w_<i))

02 / Burstiness

Human writing varies more in sentence-level surprise (long complex sentences mixed with short ones); machine text is smoother. Burstiness captures this variation via the dispersion of per-sentence surprisal.

burstiness = std(s_j) / mean(s_j),  s_j = mean surprisal of sentence j

03 / DetectGPT probability curvature

Perturb the text many times (paraphrase) and compare the mean log-probability of the original with the perturbations. Machine text usually sits at a local maximum, so the gap is clearly positive; human text need not be.

d(x)=log p(x) − E_{tilde x}[log p(tilde x)];  d>0 → machine-leaning

04 / Generation watermarking

At generation, hash the previous token to split the vocabulary into green / red lists and bias toward green. Detect by counting green tokens with a z-test. The upside is provable statistics; the downside is that rewriting / translation quickly weakens it.

z = (|green| − γT) / sqrt(T γ(1−γ))

05 / Why rewriting lowers every signal

Synonym substitution and reordering raise perplexity, scatter the watermark, and flatten the curvature. This is not an evasion guide but a statement of fragility: signals depend on the specific model and generation process, so a different writing style shifts the distribution.

06 / The fundamental limit: distributional overlap

Human and machine text distributions overlap heavily, and "human-written then AI-polished" is a continuum, not a binary. So any threshold trades false positives against misses and the ROC cannot be perfect — which is why a detection score can only be a clue, never proof.

TPR and FPR cannot both be ideal (overlap → AUC<1)

03 / Code

Code case: perplexity and burstiness from log-probs (the detection side)

This shows only the detection-side statistics: given per-token log-probabilities for a text, compute perplexity and sentence-level burstiness, and see why polished human text can be misflagged.

Case 1: how perplexity is computed

Perplexity is the exponential of the mean negative log-likelihood; lower means less "surprised".

import numpy as np
log_probs = np.array([-1.2, -0.9, -1.5, -0.7, -1.1])  # nats per token
ppl = np.exp(-log_probs.mean())
print("perplexity:", round(float(ppl), 3))

Expected output

perplexity: 2.93

How to read this code

Machine-generated text is low-perplexity to the model itself.
But conventional, clear human text can also be low-perplexity.
So low perplexity does not mean "machine-written".

Case 2: the intuition behind burstiness

Human sentences vary more in surprise; machine text is smoother.

import numpy as np
human = np.array([3.1, 0.8, 2.9, 1.0, 3.4])   # varied sentence surprisal
machine = np.array([1.8, 1.9, 1.7, 2.0, 1.8])  # smooth
b = lambda s: round(float(s.std() / s.mean()), 3)
print("human burstiness :", b(human))
print("machine burstiness:", b(machine))

Expected output

human burstiness : 0.594
machine burstiness: 0.057

How to read this code

Human text is bursty: sentence surprise varies a lot.
Machine text is smoother, with low burstiness.
But rewriting or mixed authorship makes the two converge.

Case 3: a false-positive demonstration

A non-native writer using short, regular sentences can be misflagged as AI.

import numpy as np
# a careful non-native writer: short, regular sentences -> low PPL, low burstiness
lp = np.array([-0.8, -0.7, -0.9, -0.6, -0.8, -0.7])
ppl = np.exp(-lp.mean())
sent = np.array([0.75, 0.80, 0.70])   # smooth per-sentence surprisal
burst = sent.std() / sent.mean()
print("PPL:", round(float(ppl), 2), "| burstiness:", round(float(burst), 3),
      "-> may be flagged, wrongly")

Expected output

PPL: 2.16 | burstiness: 0.058 -> may be flagged, wrongly

How to read this code

Concise, regular human writing is also low-perplexity and low-burstiness.
This is exactly the mechanism behind bias against non-native authors.
Takeaway: a score is a clue, never a verdict.

04 / Case

Case: a non-native researcher misflagged after LLM language polishing

Scenario: a non-native English researcher uses an LLM to polish the language of a paper and a detector flags it as "highly likely AI-generated".
Problem: detection signals cannot separate "ghost-written by a machine" from "human-written then machine-polished", and they systematically bias toward false positives for clean, concise non-native writing.
Responsible path: keep versions and drafts of the writing process; restate the contribution and arguments in your own words so every sentence maps to content you truly understand; verify every citation, datum, and number.
Transparent disclosure: per the target venue policy, state the scope of AI-tool use (e.g., "language polishing only") in methods, acknowledgments, or the cover letter, putting integrity and verifiability first.

05 / Risks

Common Pitfalls

Treating a detection score as proof: mainstream detectors have high false-positive rates and systematic bias against non-native writing.

Using "de-AIGC" tools for deception — this violates academic integrity, and mechanical rewriting often introduces factual and citation errors.

Failing to re-verify facts, data, and citations after rewriting, amplifying hallucination and error propagation.

Ignoring the target venue AI-disclosure policy (many require a statement in methods or the cover letter).

Assuming watermarks are reliable: open models may add none, and translation / rewriting quickly weakens the signal.

De-AIGC: How AI-Text Detection Works, Its Limits, and Responsible Paper Revision

The principle at a glance

What you should be able to do

Learning path: perplexity → burstiness → curvature → watermark → limits

Perplexity

Burstiness

Curvature

Watermark

Limits

Core Intuition

The statistics of detection signals and their fundamental limits

Code case: perplexity and burstiness from log-probs (the detection side)

Case 1: how perplexity is computed

Case 2: the intuition behind burstiness

Case 3: a false-positive demonstration

Case: a non-native researcher misflagged after LLM language polishing

Common Pitfalls

References