Methodology

How Is It AI? actually works, what it can and cannot do, and the rules we apply to keep results fair.

The honest position first

AI writing detection is probabilistic. It is not a definitive test. No detector available today, including ours, can tell you with certainty whether a piece of writing was produced by a human or by a model. Universities and schools that have treated AI detection scores as proof have already had to walk that back, and the academic literature on detector accuracy is consistent on this point.

We built Is It AI? because the alternative, ignoring AI risk entirely, leaves teachers without any signal at all. Our tool gives you a signal, an explanation of what triggered it, and an honest framing of how to use it. It does not give you a verdict.

How the detection works

1. Multi-model ensemble

Each piece of submitted text is run through more than one detector. The default ensemble combines a Claude-based contextual analysis with a statistical pattern detector. The two methods look for different kinds of evidence, and combining them reduces the false-positive rate that any single method would produce on its own.

2. Contextual analysis (Claude)

The contextual layer asks an LLM to assess characteristics common in AI-generated text: sentence-length uniformity, vocabulary distribution, structural predictability, and patterns of hedging and qualification. The LLM does not “decide” whether the text is AI. It produces a structured probability assessment per passage, with reasons.

3. Statistical pattern detection

The statistical layer measures features that AI-generated text tends to express more uniformly than human writing: token-length variance, sentence-rhythm distribution, and word-frequency entropy. These features are computed on the raw text and require no model inference. They serve as an independent check on the contextual layer.

4. Per-passage flagging

Where most detectors return only a single document-level score, Is It AI? highlights the specific passages that triggered detection and shows why each was flagged in plain English. This matters because most “AI-flagged” essays are partial: a student may have used AI assistance for some sections and written others themselves. Treating the whole document as one score loses that information.

5. Confidence calibration

We expose confidence scores at the passage level, not just the document level. A high overall score driven by one short flagged passage is treated and reported differently from one driven by uniformly high scores across every passage. The distinction matters when teachers act on the result.

What we can and cannot detect

Reasonably reliable

  • Long passages of unedited output from major models (ChatGPT, Claude, Gemini, LLaMA) on standard essay-length prompts
  • AI-generated text with minimal human revision
  • Passages with unusually low burstiness and variance
  • Identifying which sections of a mixed document are most suspicious

Unreliable or impossible

  • Heavily edited or rewritten AI output
  • AI text run through paraphrasing tools
  • Very short passages (under ~80 words)
  • Writing by non-native English speakers, who often produce features detectors mistake for AI
  • Specific creative styles (e.g. clinical academic prose) that share characteristics with AI output
  • Distinguishing “human writing assisted by AI” from “human writing alone” with confidence

False positives

False positives are real. Published research has shown several detectors flag legitimate writing by non-native English speakers at materially higher rates than native-speaker writing. We treat this as a known limitation, not a bug.

Mitigations we apply:

  • Confidence is exposed at the passage level so reviewers can see whether the score is driven by one short section or by uniformly suspicious writing
  • Flag explanations describe the specific feature that triggered detection, allowing the reviewer to assess whether the feature is plausibly explained by writing style
  • Multi-model ensemble cross-checks each layer’s output, reducing the rate at which any single feature triggers a false positive
  • Result framing across the product reinforces that detection is one signal, not a verdict

If you believe Is It AI? has flagged a piece of writing incorrectly, contact us at hello@isitai.co.uk with the text and the result. We log challenged results, review them, and use them to refine the system.

Privacy in the detection pipeline

Submitted text is processed in real time. It is sent to the LLM provider (Claude, hosted by Anthropic) for the contextual layer, processed by the statistical layer locally, and the result is returned. The submitted text is then discarded. We do not store student work, do not retain the text after the result returns, and do not use submitted text to train any model.

Full details on data handling, retention, and your rights are on our privacy page and terms of service.

How we update the system

New language models ship regularly and detection accuracy on each new model needs to be re-established. We track the release of major models (OpenAI, Anthropic, Google, Meta) and adjust the contextual and statistical layers as needed. This is iterative work, not a fixed system.

We do not publish a single “accuracy percentage” because the figure depends entirely on the test set, the model that produced the AI text, the writer who produced the human text, and how heavily either has been edited. Publishing a single number would be misleading. Instead, we publish the methodology, the limitations, and the confidence framework, and we encourage reviewers to act on the structure of the result rather than on a number.

What this is, and is not

Is It AI? is a screening tool. It is for the moment when a teacher needs a structured second opinion before opening a conversation with a student. It is for the student who wants to check how their work might look before submitting. It works in either direction.

It is not a replacement for editorial judgement. It is not evidence sufficient to fail or expel a student. It is not a polygraph. It is one signal, with explanations, designed to help a person make a fair decision faster.

Corrections and feedback

If you have a concern about a result, a question about how the tool works, or evidence of a systematic false positive in a particular kind of writing, contact us. We respond to all substantive correction queries.

Email hello@isitai.co.uk or use our contact page.

Try it yourself

The honest way to evaluate any detector is to test it. Run a piece you wrote, run a piece an AI wrote, see what the tool actually says.

Try it now