fbpx

How Will Google Potentially Detect AI-Generated Text?

SEO Content Strategy To Dominate Any Niche With Data And Collaborative Execution - Outranking
SEO Content Strategy To Dominate Any Niche
June 18, 2022
How to Build Topical Authority and Dominate Your Niche
Complete Guide to Topical Authority: What It Is & How to Build It [Strategist Documentation]
July 13, 2022
Show all
How will Google potentially detect AI-generated text?

How will Google potentially detect AI-generated text?

The recent June 2022 Google core update made many people lose sleep, confidence, and hair when it finished rolling out. 

Some of the many reasons you must have heard are AI-generated text, topical authority gaps, and crappy backlinks.

Some could be true, and maybe all could be true. 

To understand what Google is going to do next, let’s try to understand the long list of agile problem-solving tasks they have to tackle to deal with backlink scammers and now AI-generated content. This is in addition to the many other things that they need to do to improve their search algorithm using constant data analysis.

We can try to reverse-engineer how Google will potentially target websites that have AI-generated content.

The first question to answer is, can AI-generated text be identified? If humans can’t do this, how can an AI?

Yes, you can identify AI-written text, with your eyes(and brain) and with already-existing tools. Before we unleash the tool, let’s understand how we can actually do this on our own.

Complete the following sentence: “Text generation is …”

What would you write next? 

You can probably think of “a, an, known, for, not” and a few other thin or stop words, right? 

See how we can predict the next word in that unfinished sentence. Similarly, you could also try to predict the next word that would be written after that. 

When you generate text using AI, every word written is picked from the most common words. Let’s say for example that each word is picked from a list of the 100–1000 most probable words that come to mind for that position in the sentence (or an AI’s mind). 

Every word that is written is based on probability. Educated marketers have started to refer to this as fluff content

While this method is a simple breakdown, and the structure of sentences quickly gets complicated, there will always be a high probability that content completely generated using AI will follow this predictive writing pattern (maybe slightly a tougher one in the future). 

Let’s look at the few examples of other sentences completed using AI for the same example.

  1. Text generation is a process.
  2. Text generation is a complex task, and the above approach is just one way to do it.
  3. Text generation is not a new concept.

What does this mean in a larger context?

What if there was a way to check a text to see what percentage seems to be generated based on probability?

How many words generated after the preceding word seem like they fall into this predictive pattern?

Here is an example of an open-source tool that predicts if the text was generated by AI. While this is still in the early stages, what’s more, important here is that it is possible.

Let’s look at the sentence that I started above in more depth.

The image below was taken from the open-source tool, determining if each word falls in the top 10, 100, or 1000 words predicted for that position.

  1. Green = top 10 predicted words
  2. Yellow = top 100 predicted words
  3. Orange = top 1000 words predicted
  4. Purple = is a quite surprising word
text was generated by AI

Not a single word falls above the top 100 words predicted (we added the first two, so they do not count). When humans write content, the element of surprise is included in the text, which breaks this continuous pattern of green and yellow.

Here’s an example of text written by an actual human. In the below image, that sentence was analyzed by the same utility.

text written by an actual human

How many times did we break the pattern, with words appearing that were not in the top 100 guesses (and even beyond the top 1000)? Quite a few!

So, what if you could score the content and come up with a number that says, for example, that 90 percent of the text falls into this predictive pattern?

What is the lesson here?

AI-generated text lacks the “element of surprise” that a human would add when writing in a way that often breaks the predictive pattern. 

Some signs are very obvious and some are a little more difficult to spot.

So the next time you read content that is subpar, it’s possible that it was generated by poor use of AI, which has enabled the generation of tons of unreliable content. 

If you’re not careful, Google is likely to notice and react—if not all at once, little by little.

Is GPT-3 any smarter?

It makes me laugh every time someone says that GPT-3 text can’t be identified. 

The text in the above example that looks like it could be flagged, where almost everything is green and yellow, was generated by GPT-3. Yes, that’s correct.

Here’s the same image again for the GPT-3 sentence. Make sure that you aren’t publishing sentences like this one.

GPT-3 sentence

GPT-3 is not smarter than the humans sitting at Google trying to track it down. The AI was still created by humans. If it has not happened yet, it will happen later. 

When that happens what will this look like?

How will Google potentially approach this?

For big software development projects (mentioning this for those who write but don’t know the nuances of code and development), it takes months to plan, coordinate, build, test, and reiterate to make sure everything goes according to your blueprint. 

Major updates are often rolled out in bits and pieces to learn how they affect a huge amount of people, either in positive or negative ways.

If you see a pattern in Google updates, it’s that usually a very small percentage of websites are heavily affected (good or bad), some see a noticeable effect, and most will see little or no effect.

The point is that updates are rolled out in batches and improved upon over time.

So what could Google do to find low-value websites, and how could it adapt to pinpoint content that adds no value (missing the element of surprise)?

  1. Use a much smarter version of the tool I mentioned earlier and decide to:
    • Not index the page.
    • Index it and rank it until it can find better content to replace it.
  2. Process the text in a queue, so some pages see immediate keyword ranking followed by a drop or disappearance. 
  3. Constantly update its algorithm to find and score pages based on the element of surprise in your text. 

Even if you’re not convinced that this is currently possible, we may see more proof of this with the next Google update.

What can you do, and how should you use AI to pass the check every time?

We know now that for your text to be legitimate, it needs many surprise elements. How can you insert these surprise elements into the generated text? 

  1. Use summarization techniques like those offered by Outranking to infuse facts. Facts like “it’s 70 degrees outside,” “SERP analysis usually consists of analyzing 20 pages,” and “Outranking advises writers to use ‘write for me’ at their own risk.” – The text in bold can be clearly picked out as meaningful and fairly unique. It’s not something an AI would predict unless it was a fluke or it used a source with those words. 
  1. Heavily edit text for these surprises. You can’t have filler content in your post and hope that you will make it to a million impressions. You need to read every sentence and make sure your content is something that your audience would want to read badly enough for them to drop everything else.
  1. For niche websites maintained by a single person, you can get away with publishing tons of content and then eventually coming back to improve it. This isn’t possible in an enterprise or business environment where there is a review cycle and many people are involved. 

There will be folks that read this and say “No, my AI is better than yours, damn right it’s better than yours.” Let’s reflect on this after a few more Google updates.

Pankil Shah
Pankil Shah
Co-founder @ Outranking.io