Breeze in Busan

Independent journalism on the politics, economy, and society shaping Busan.

Contact channels

News Tips

[email protected]

Partnerships

[email protected]

Contribute

[email protected]

Information

[email protected]

Explore

  • Home
  • Latest News
  • Busan News
  • National News
  • Authors
  • About
  • Editor
  • Contact

Contribute

  • Send News
  • Contact
  • Join Team
  • Collaborate

Legal

  • Privacy Policy
  • Cookie Policy
  • Terms of Use
  • Editorial Policy
  • Correction & Rebuttal

Newsroom Details

30, Hasinbeonyeong-ro 151beon-gil, Saha-gu, Busan, Korea

+82 507-1311-4503

Busan 아00471

Registered: 2022.11.16

Publisher·Editor: Maru Kim

Juvenile Protection: Maru Kim

© 2026 Breeze in Busan. All Rights Reserved.

Independent reporting from Busan across politics, economy, society, and national affairs.

technology
Breeze in Busan

Numbers Don’t Understand Meaning

Confidence scores look clinical. Precise. Coldly rational. But in practice? They often say more than they know—and less than they claim.

Jun 30, 2025
5 min read
Save
Share
Features Team

Features Team

Features Team

The Features Team produces in-depth, long-form stories, offering thorough investigations and narratives on issues that impact societies worldwide, beyond the headlines.

Numbers Don’t Understand Meaning
Breeze in Busan | “95.3% AI-Generated”: When Detection Becomes Judgment

It didn’t read like a robot wrote it. If anything, the essay felt careful—borderline hesitant in its conclusions, with oddly specific references to UN Security Council transcripts from the 1960s. Yet, within minutes of uploading, the university’s AI detection system flagged the file: “95.3% likelihood of AI-generated content.” There was no accompanying explanation. Just the percentage. The professor hesitated, then escalated.

Scenarios like this, once theoretical, now unfold daily. Across academic departments and editorial rooms alike, institutions have embraced text classification tools that promise to tell machine from mind. Their names vary—GPTZero, ZeroGPT, Originality.ai—but their mission is singular: certainty.

The problem is, certainty rarely lives in language.

Most of these tools rely on algorithmic proxies: perplexity, which rewards syntactic irregularity, and burstiness, which looks for natural oscillations in sentence rhythm. High perplexity implies unpredictability. High burstiness suggests a human pulse. But what if a student simply writes with rhythm? Or if English isn’t their native language, and their phrasing falls between patterns?

Studies have begun to question these detectors’ fairness and precision. Some show that rewritten AI output can easily bypass the tests. Others highlight the inverse: human-authored work labeled as synthetic.

The border between human and machine is no longer bold. It's fraying—quietly, and fast.

They weren’t designed to be referees.
Most AI detectors began as side projects—patches built by graduate researchers, freelance coders, and startup teams scrambling to respond to an explosion of generative text. When ChatGPT went public in late 2022, it wasn’t just students or bloggers typing prompts. Institutions panicked. Schools, publishers, HR departments—they all wanted a way to know what was “real.”

So detection systems arrived—quickly, unevenly, and in numbers.

GPTZero, perhaps the most visible among them, launched in early 2023. It promised transparency, not punishment: “We detect AI writing to support human creativity,” read its homepage. Others followed. Copyleaks marketed itself to teachers. Turnitin integrated detection into its plagiarism platform. ZeroGPT claimed a 98% accuracy rate—though the company never released peer-reviewed backing for that figure.

Within a year, the software became institutional. School districts signed contracts. Universities updated honor codes. Some newsrooms quietly embedded detectors into their editorial workflows. Submissions were scored—privately. Few writers knew.

The tools, however, were never standardized. One might flag a sentence as suspicious while another gives it a clean bill. Two identical essays could score differently based on something as small as a paraphrased phrase or a shuffled clause. And none could fully explain how or why they reached their verdict. They only pointed.

In trying to trace the mechanical footprint of language, the detectors created something else: a kind of statistical suspicion, calibrated in decimals but felt in reputations.

How These Tools Work

They don’t read. Not the way people do—not with intent or intuition or any feel for irony. What these tools do is measure. They convert text into a string of probabilities, chew it through a statistical lens, and spit out a verdict. Human. Or not.

Their logic hinges on two concepts with names that sound like academic riddles: perplexity and burstiness. You could read papers on them. Or, picture this: imagine asking someone to guess the next word in your sentence. If they always guess right, your writing has low perplexity. If they hesitate, miss, blink at the odd turn of phrase—perplexity goes up.

AI-generated text, especially from earlier models, was smooth. Sometimes eerily so. Each word flowed into the next like train cars on a track—neat, logical, and forgettable. Detection tools pounced on this predictability. They flagged texts that lacked “surprise.” Ironically, so do many student essays.

Burstiness adds another layer. It's the rhythm of variance—the scatter of short, choppy fragments mixed with dense, meandering explanations that twist, double back, or spiral. Humans do this all the time. Machines, until recently, didn’t.

That’s changed. Models like GPT-4 no longer write in a single tone. They mimic. They improvise. They pause—mid-thought—and start again. So when a detector claims it “knows” what AI sounds like, it’s really making a guess based on what AI used to sound like six months ago.

One sentence might trick the algorithm. Another might not. And somewhere in that statistical back-and-forth, meaning gets lost. Or worse—mistrusted.

Confidence scores look clinical. Precise. Coldly rational. But in practice? They often say more than they know—and less than they claim.

A box pops up. “91.7% AI-generated.” No explanation. No traceable path. Just the number. That’s all a professor sees. Sometimes, that’s all they need.

But what are they seeing? Most detectors don’t explain. They don’t cite. They don’t contextualize. They simply flag. A line. A paragraph. An entire essay. One moment, it's scholarship material; the next, it's a red flag. And beneath that flag: an algorithm guessing at authorship through linguistic residue.

Recent studies, increasingly, tell the same uneasy story. At Stanford, one team fed a corpus of student essays—manually verified, unquestionably human—into three major detection tools. GPTZero misclassified 28% of them. ZeroGPT fared worse: 35% false positives. The tool marked simplicity as suspicion. Restraint as risk. Clear prose, it turns out, can be too clear.

For non-native writers, the numbers jump. A 2024 PeerJ study found ESL submissions misidentified as AI at more than double the rate of native equivalents. The reason? Predictability. Learners often follow patterns—grammatical templates, safer syntax. The very traits AI imitates. Ironically.

And yet, flip the experiment—feed in AI text, slightly tweaked—and most systems miss it completely. A shuffled sentence here. A synonym swap there. One burst of artificial burstiness, and detection crumbles. Humanizers—tools like Undetectable.ai—make it easier still. Just paste and click.

So the system misfires in both directions. It suspects too much when there’s little to fear. It misses the obvious when dressed in new skin. It punishes caution. Rewards manipulation. And, in the middle of all this, a real person—writer, student, editor—is left holding a number they don’t know how to argue with.

No appeal. No footnote. Just math, mistaken for meaning.

Bias and Fairness

Bias in algorithms isn’t always loud. Sometimes it’s built in—quiet, statistical, unintentional. But still sharp. In the case of AI detection, the lines of inequity are increasingly visible. False positives are not evenly distributed. They fall harder—more often, more arbitrarily—on those writing outside dominant norms. And the system doesn’t flinch when it happens.

Start with non-native speakers. Writers whose first language isn’t English tend to follow structure more closely, lean on templates, avoid idioms, play it safe. Their writing, ironically, reads more “AI-like” to the detectors, because it is cautious, patterned, and grammatical. The very traits academic writing often demands.

A study by researchers in Singapore tested 400 university essays written by fluent ESL students. More than 40% were marked as “likely AI.” Every essay was human-written. Every one.

Then there’s neurodivergent writing—people who think in loops, speak in rhythm, write with flare-ups and halts. ADHD. Autism. Nonlinear minds. Their prose often breaks form. It doesn’t match training data. It spikes and sags and spins. So the detectors flag it, again and again.

Fairness isn’t baked into the math.The algorithms don’t know what it means to be multilingual. Or dyslexic. Or poetic. They don’t know that a sentence with five clauses might still be real. Or that a short one—just four words—might matter more than any.

Instead, they judge by probability. Probability shaped by training sets. Training sets shaped by... who, exactly? Models don’t create bias. People do. And yet, when the verdict comes—"AI-generated"—there is no disclaimer, no margin of error, no footnote that says: This tool might not understand how you write. Only a number. Cold. Certain. Wrong.

The Weekly Breeze

Keep pace with Busan's deep narratives.
Delivered every Monday morning.

Independent journalism, directly to your inbox.

Strategic Partner
Breeze Editorial
Elevate Your
Brand's Narrative

Connect your core values with a community of
thoughtful and discerning readers.

Inquire Now
Related Topics
Technology

Share This Story

Knowledge is most valuable when shared with the community.

Previous Article
Smart Village or Smart City? The Reality Behind Korea’s Flagship Projects
Next Article
How Stablecoins Could Transform South Korea’s Digital Economy

💬 Comments

Please sign in to leave a comment.

    Related Coverage

    Continue with related reporting

    Follow adjacent reporting from the same newsroom file, with linked coverage that extends the current story's desk and context.

    AI, White-Collar Work, and the Uncertain Future of Income
    Mar 8, 2026

    AI, White-Collar Work, and the Uncertain Future of Income

    White-collar work is not disappearing overnight. Instead, entire professions are being reorganized into automated production, human verification, and algorithmic supervision.

    Memory Placement and the Hidden Economics of AI Devices
    Feb 6, 2026

    Memory Placement and the Hidden Economics of AI Devices

    AI’s next phase is shaped less by smarter models than by where memory lives and how much it costs to keep close

    South Korea Confronts a Digital Infrastructure It No Longer Fully Controls
    Dec 8, 2025

    South Korea Confronts a Digital Infrastructure It No Longer Fully Controls

    Foreign-operated satellite networks, major data breaches and a government data-centre failure reveal how essential Korean services now depend on systems outside national authority, pushing operational sovereignty to the centre of Seoul’s policy agenda.

    More from the author

    Continue with the author

    Stay with the same line of reporting through more work from this byline.

    Growth No Longer Guarantees Street-Level Recovery in Busan
    Mar 3, 2026

    Growth No Longer Guarantees Street-Level Recovery in Busan

    KOSPI at 6,000: Can Korea’s AI Boom Deliver a Structural Rerating?
    Feb 24, 2026

    KOSPI at 6,000: Can Korea’s AI Boom Deliver a Structural Rerating?