The recent June 2022 Google core update made many people lose sleep, confidence, and hair when it finished rolling out.
Some of the many reasons you must have heard are AI-generated text, topical authority gaps, and crappy backlinks.
Some could be true, and maybe all could be true.
To understand what Google is going to do next, let’s try to understand the long list of agile problem-solving tasks they have to tackle to deal with backlink scammers and now AI-generated content. This is in addition to the many other things that they need to do to improve their search algorithm using constant data analysis.
We can try to reverse-engineer how Google will potentially target websites that have AI-generated content.
Yes, you can identify AI-written text, with your eyes(and brain) and with already-existing tools. Before we unleash the tool, let’s understand how we can actually do this on our own.
Complete the following sentence: “Text generation is …”
What would you write next?
You can probably think of “a, an, known, for, not” and a few other thin or stop words, right?
See how we can predict the next word in that unfinished sentence. Similarly, you could also try to predict the next word that would be written after that.
When you generate text using AI, every word written is picked from the most common words. Let’s say for example that each word is picked from a list of the 100–1000 most probable words that come to mind for that position in the sentence (or an AI’s mind).
Every word that is written is based on probability. Educated marketers have started to refer to this as fluff content.
While this method is a simple breakdown, and the structure of sentences quickly gets complicated, there will always be a high probability that content completely generated using AI will follow this predictive writing pattern (maybe slightly a tougher one in the future).
Let’s look at the few examples of other sentences completed using AI for the same example.
What if there was a way to check a text to see what percentage seems to be generated based on probability?
How many words generated after the preceding word seem like they fall into this predictive pattern?
Here is an example of an open-source tool that predicts if the text was generated by AI. While this is still in the early stages, what’s more, important here is that it is possible.
Let’s look at the sentence that I started above in more depth.
The image below was taken from the open-source tool, determining if each word falls in the top 10, 100, or 1000 words predicted for that position.
Not a single word falls above the top 100 words predicted (we added the first two, so they do not count). When humans write content, the element of surprise is included in the text, which breaks this continuous pattern of green and yellow.
Here’s an example of text written by an actual human. In the below image, that sentence was analyzed by the same utility.
How many times did we break the pattern, with words appearing that were not in the top 100 guesses (and even beyond the top 1000)? Quite a few!
So, what if you could score the content and come up with a number that says, for example, that 90 percent of the text falls into this predictive pattern?
AI-generated text lacks the “element of surprise” that a human would add when writing in a way that often breaks the predictive pattern.
Some signs are very obvious and some are a little more difficult to spot.
So the next time you read content that is subpar, it’s possible that it was generated by poor use of AI, which has enabled the generation of tons of unreliable content.
If you’re not careful, Google is likely to notice and react—if not all at once, little by little.
It makes me laugh every time someone says that GPT-3 text can’t be identified.
The text in the above example that looks like it could be flagged, where almost everything is green and yellow, was generated by GPT-3. Yes, that’s correct.
Here’s the same image again for the GPT-3 sentence. Make sure that you aren’t publishing sentences like this one.
GPT-3 is not smarter than the humans sitting at Google trying to track it down. The AI was still created by humans. If it has not happened yet, it will happen later.
When that happens what will this look like?
For big software development projects (mentioning this for those who write but don’t know the nuances of code and development), it takes months to plan, coordinate, build, test, and reiterate to make sure everything goes according to your blueprint.
Major updates are often rolled out in bits and pieces to learn how they affect a huge amount of people, either in positive or negative ways.
If you see a pattern in Google updates, it’s that usually a very small percentage of websites are heavily affected (good or bad), some see a noticeable effect, and most will see little or no effect.
The point is that updates are rolled out in batches and improved upon over time.
So what could Google do to find low-value websites, and how could it adapt to pinpoint content that adds no value (missing the element of surprise)?
Even if you’re not convinced that this is currently possible, we may see more proof of this with the next Google update.
We know now that for your text to be legitimate, it needs many surprise elements. How can you insert these surprise elements into the generated text?
There will be folks that read this and say “No, my AI is better than yours, damn right it’s better than yours.” Let’s reflect on this after a few more Google updates.