I Tested AI “humanizers” to See How Well They Disguise AI Text

Artificial intelligence (AI) can’t do everything (or at least it can’t do everything well ), but generative AI tools that use large language models are very good at generating text. If you bombed the verbal portion of the SAT and find writing anything longer than text terrifying, the whole experience can feel like magic; being able to write an email, essay , or cover letter without having to stare at a blank page for hours and agonize over every word choice is powerful. That’s why it’s estimated that nearly 20% of adults in the U.S. have used AI to write emails or essays.
But once an email or essay is polished (and the facts are checked, right?), there’s a major hurdle: AI detectors, ranging from humans recognizing “clues” in AI-generated text to online tools that claim to scan text and determine whether it’s written by a human or an AI. The accuracy of these detectors is questionable , but people use them, so it’s worth considering if you’re considering passing off an AI-generated cover letter or other piece of writing as not written by an AI.
Meet the AI “humanizer,” a tool designed to turn your AI-written writing into something more human by removing and rephrasing common AI tropes and phrasing. The idea is appealing: you task an AI with creating an essay, run it through the humanizer, and the end result looks like it was written from scratch by a human (presumably you). But do they work?
Test
To find out, I ran a little experiment. While it’s not a definitive study, it certainly gave me a solid idea of whether any of these tools are worth using if you want an AI to secretly write all your correspondence, school assignments, or heartwarming letters to old friends.
First, I asked ChatGPT to generate an essay on… how to make AI-written text more human. It generated the essay in a few seconds, and the result was perfectly coherent. I didn’t fact-check it or edit it in any way; its only purpose was to test humanization tools.
I then ran the essay through several AI detectors to make sure it was a decent example of mediocre AI writing. The results were as expected: QuillBot rated it 94% AI, ZeroGPT rated it 97%, and Copyleaks rated it as a reliable 100% AI. The AI detector world agreed: this essay from ChatGPT reads like it was written by ChatGPT.
Results
So, can AI humanization tools fix this? There are plenty of them out there—the explosion of AI chatbots has sparked a war between detectors and tools designed to fool them. So I picked out a few popular ones to test.
But first, I needed a little more calibration, so I did the obvious thing: I fed the text back to ChatGPT and asked it to “humanize” it. After all, these tools are all AI-powered, so maybe the easiest thing to do is just ask ChatGPT to be less like itself.
I then took the original text generated by ChatGPT and ran it through four other humanization tools: Paraphraser.io ,StealthWriter ,Grammarly , and GPTHuman .
I now had five “humanized” versions of the essays that three AI detectors had rated as clearly AI. Would their results improve? The answer was probably no, although one tool showed what could generously be called “promise”:
-
Paraphraser.io : Killed. Quillbot rated its version at 83% in AI results, Copyleaks at a pretty solid 100%, and ZeroGPT at a suspiciously specific 99.94%.
-
ChatGPT: A failure, although to be fair it’s not exactly a humanizer and perhaps more detailed hints would have yielded better results. QuillBot and Copyleaks both rated it 100% AI-gen, while ZeroGPT rated it 87.77%.
-
Grammarly: has also been tested quite heavily, with QuillBot, Copyleaks, and ZeroGPT rating their versions at 99%, 97.1%, and 99.97%, respectively.
-
GPTHuman: The results were mixed. QuillBot was completely duped into giving it a 0% AI-gen score, and ZeroGPT was unsure of itself, giving it only 60.96%. But Copyleaks had no doubts and awarded it 100%.
-
StealthWriter: The most effective of those tested here. ZeroGPT raised suspicions by giving it a (again, oddly specific) score of 64.89% AI gene, Copyleaks gave it a score of just 3%, and QuillBot completely fooled it by giving it a score of 0%.
One aspect of Stealthwriter that may have contributed to its effectiveness was the ability to apply the humanize tool to text multiple times. The first time I ran it, StealthWriter said its score was 65% human, so I ran it a second time and the score jumped to 80%. I ran it again and it hit 95%. After that, the score didn’t change when I applied the humanize tool to the text.
All of these tools are pretty straightforward about the need to check the results and make your own adjustments. I didn’t check the quality or accuracy of the text adapted for human language. I just wanted to see if they could fool AI detectors, and the answer is: probably not, but StealthWriter can help.
Finally, keep in mind that there are many AI-based text detection tools out there, which means that the variability of results (even when using StealthWriter) is a concern: you can’t always tell which tool is being used. For example, if you use a detector that I haven’t used here and it does a better job of identifying what StealthWriter is doing, you’ll still get caught. If you’re worried about your AI-generated text being recognized as such, your best option is to write it yourself, or at least rework it very carefully.