The RankBrain Survival Guide

// // June 09th 2016 // SEO

This is a guide to surviving RankBrain. I created it, in part, because there’s an amazing amount of misinformation about RankBrain. And the truth is there is nothing you can do to optimize for RankBrain.

I’m not saying RankBrain isn’t interesting or important. I love learning about how search works whether it helps me in my work or not. What I am saying is that there are no tactics to employ based on our understanding of RankBrain.

So if you’re looking for optimization strategies you should beware of the clickbait RankBrain content being pumped out by fly-by-night operators and impression hungry publishers.

You Can’t Optimize For RankBrain

You Can't Optimize For RankBrain

I’m going to start out with this simple statement to ensure as many people as possible read, understand and retain this fact.

You can’t optimize for RankBrain.

You’ll read a lot of posts to the contrary. Sometimes they’re just flat out wrong, sometimes they’re using RankBrain as a vehicle to advocate for SEO best practices and sometimes they’re just connecting dots that aren’t there.

Read on if you want proof that RankBrain optimization is a fool’s errand and you should instead focus on other vastly more effective strategies and tactics.

What Is RankBrain?

RankBrain is a deep learning algorithm developed by Google to help improve search results. Deep learning is a form of machine learning and can be classified somewhere on the Artificial Intelligence (AI) spectrum.

I think of Deep Learning as a form of machine learning where the algorithm can adapt and learn without further human involvement. One of the more interesting demonstrations of deep learning was the identification of cats (among other things) in YouTube thumbnails (pdf).

How Does RankBrain Work?

Knowing how RankBrain works is important because it determines whether you can optimize for it or not. Despite what you might read, there are only a handful of good sources of information about RankBrain.

Greg Corrado

The first is from the October 26 Bloomberg RankBrain announcement that included statements and summaries of a chat with Google Senior Research Scientist, Greg Corrado.

RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.

This makes it pretty clear that RankBrain uses vectors to better understand complex language.

Word2Vec is most often referenced when talking about vectors. And it should be noted that Jeff Dean, Greg Corrado and many others were part of this effort. You’ll see these same names pop up time and again surrounding vectors and deep learning.

I wrote a bit about vectors in my post on Hummingbird. In particular I like the quote from a 2013 Jeff Dean interview.

I think we will have a much better handle on text understanding, as well. You see the very slightest glimmer of that in word vectors, and what we’d like to get to where we have higher level understanding than just words. If we could get to the point where we understand sentences, that will really be quite powerful. So if two sentences mean the same thing but are written very differently, and we are able to tell that, that would be really powerful. Because then you do sort of understand the text at some level because you can paraphrase it.

I was really intrigued by the idea of Google knowing that two different sentences meant the same thing. And they’ve made a fair amount of progress in this regard with research around paragraph vectors (pdf).

Paragraph Vector Paper

It’s difficult to say exactly what type of vector analysis RankBrain employs. I think it’s safe to say it’s a variable-length vector analysis and leave it at that.

So what else did we learn from the Corrado interview? Later in the piece there are statements about how much Google relies on RankBrain.

The system helps Mountain View, California-based Google deal with the 15 percent of queries a day it gets which its systems have never seen before, he said.

That’s pretty clear. RankBrain is primarily used for queries not previously seen by Google, though it seems likely that its reach may have grown based on the initial success.

Unfortunately the next statement has caused a whole bunch of consternation.

RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.

This provoked the all-too-typical reactions from the SEO community. #theskyisfalling The fact is we don’t know how Google is measuring ‘importance’ nor do we understand whether it’s for just that 15 percent or for all queries.

Andrey Lipattsev

To underscore the ‘third-most important’ signal boondoggle we have statements by Andrey Lipattsev, Search Quality Senior Strategist at Google, in a Q&A with Ammon Johns and others.

In short, RankBrain might have been ‘called upon’ in many queries but may not have materially impacted results.

Or if you’re getting technical, RankBrain might not have caused a reordering of results. So ‘importance’ might have been measured by frequency and not impact.

Later on you’ll find that RankBrain has access to a subset of signals so RankBrain could function more like a meta signal. It kind of feels like comparing apples and oranges.

But more importantly, why does it matter? What will you do differently knowing it’s the third most important signal?

Gary Illyes

Another source of RankBrain information is from statements by Gary Illyes in conversation with Eric Enge. In particular, Gary has been able to provide some examples of RankBrain in action.

I mean, if you think about, for example, a query like, “Can you get a 100 percent score on Super Mario without a walk-through?” This could be an actual query that we receive. And there is a negative term there that is very hard to catch with the regular systems that we had, and in fact our old query parsers actually ignored the “without” part.

And RankBrain did an amazing job catching that and actually instructing our retrieval systems to get the right results.

Gary’s statements lend clear support to the idea that RankBrain helps Google to better understand complex natural language queries.

Paul Haahr

Paul Haahr Speaking at SMX West 2016

Perhaps the most interesting statements about RankBrain were made by Paul Haahr, a Google Ranking Engineer, at SMX West during his How Google Works: An Google Ranking Engineer’s Story presentation and Q&A.

I was lucky enough to see this presentation live and it is perhaps the best and most revealing look at Google search. (Seriously, if you haven’t watched this you should turn in your SEO card now.)

It’s in the Q&A that Haahr discusses RankBrain.

RankBrain gets to see some subset of the signals and it’s a machine learning or deep learning system that has its own ideas about how you combine signals and understand documents.

I think we understand how it works but we don’t understand what it’s doing exactly.

It uses a lot of the stuff that we’ve published on deep learning. There’s some work that goes by Word2Vec or word embeddings that is one layer of what RankBrain is doing. It actually plugs into one of the boxes, one of the late post retrieval boxes that I showed before.

Danny then asks about how RankBrain might work to ascertain document quality or authority.

This is all a function of the training data that it gets. It sees not just web pages but it sees queries and other signals so it can judge based on stuff like that.

These statements are by far the most important because it provides a plethora of information. First and foremost Haahr states that RankBrain plugs in late post-retrieval.

This is an important distinction because it means that RankBrain doesn’t rewrite the query before Google goes looking for results but instead does so afterwards.

So Google retrieves results using the raw query but then RankBrain might rewrite the query or interpret it differently in an effort to select and reorder the results for that query.

In addition, Haahr makes it clear that RankBrain has access to a subset of signals and the query. As I mentioned this makes RankBrain feel more like a meta-signal instead of a stand-alone signal.

What we don’t know are the exact signals that make up that subset. Many will take this statement to theorize that it uses link data or click data or any sundry of signals. The fact is we have no idea which signals RankBrain has access to nor with what weight RankBrain might be using them or if they’re used evenly across all queries.

The inability to know the variables makes any type of regression analysis of RankBrain a non-starter.

Of course there’s also the statement that they don’t know what RankBrain is doing. That’s because RankBrain is a deep learning algorithm performing unsupervised learning. It’s creating its own rules.

More to the point, if a Google Ranking Engineer doesn’t know what RankBrain is doing, do you think that anyone outside of Google suddenly understands it better? The answer is no.

You Can’t Optimize For RankBrain

You can’t optimize for RankBrain based on what we know about what it is and how it works. At its core RankBrain is about better understanding of language, whether that’s within documents or queries.

So what can you do differently based on this knowledge?

Google is looking at the words, sentences and paragraphs and turning them into mathematical vectors. It’s trying to assign meaning to that chunk of text so it can better match it to complex query syntax.

The only thing you can do is to improve your writing so that Google can better understand the meaning of your content. But that’s not really optimizing for RankBrain that’s just doing proper SEO and delivering better user experience (UX).

By improving your writing and making it more clear you’ll wind up earning more links and, over time, be seen as an authority on that topic. So you’ll be covered no matter what other signals RankBrain is using.

The one thing you shouldn’t do is think that RankBrain will figure out your poor writing or that you now have the license to, like, write super conversationally you know. Strong writing matters more now than it ever has before.

TL;DR

RankBrain is a deep learning algorithm that plugs in post-retrieval and relies on variable-length text vectors and other signals to make better sense of complex natural language queries. While fascinating, there is nothing one can do to specifically optimize for RankBrain.

Postscript: Leave A Comment // Subscribe (RSS Feed)

The Next Post: The Future of Mobile Search
The Previous Post: Query Classes

4 trackbacks/pingbacks

Comments About RankBrain Survival Guide

// 34 comments so far.

Mark Traphagen // June 09th 2016

This post is SO good and SO needed, I want to have my vasectomy reversed, have another child, and marry it off to this post. It’s that good. Thanks AJ.

AJ Kohn // June 09th 2016

LOL Mark. Thank you very much for the kind words. But please, hold off on the vasectomy reversal!

Rick Bucich // June 09th 2016

Thanks again for being the source of reason in a industry in dire need of one.

AJ Kohn // June 09th 2016

Thanks Rick. I had a client send me a post the other day that just … it was the straw that broke the camel’s back. I just had to get it out of my system.

Bryson Meunier // June 09th 2016

OK so how do I optimize for RankBrain again?

J/K. Best post on RankBrain yet. Nice work.

AJ Kohn // June 09th 2016

Thanks so much Bryson. I appreciate the kind words.

Victor Pan // June 09th 2016

“While fascinating, there is nothing one can do to specifically optimize for RankBrain…” because the subset of variables and weights assigned for each variables very likely changes query by query, iteration by iteration…

So… playing devil’s advocate here… What if I’m an SEO and then I flood the index with a never before seen query like “flubbub bubbub booba baboo” then create other make believe sentences that no person in the right mind would ask with multiple domains, similar to “flubbub bubbub booba baboo” but different… can I create enough data points to figure out RankBrain for that one query?

The only other areas Google could draw from would come from my sites which use the make believe language (alternatively I think it would be interesting if you created queries based off of Ido or Esperanto)… so wouldn’t that be a nice “Keyword zero” test case for RankBrain? As you add in more sites or pages with better keywords that match closer to “flubbub bubbub booba baboo” but never quite the same, I would hope that RankBrain is what’s being used to have the new page rank higher (if it does).

While we can’t ever isolate out weights and sub-signals used in RB, wouldn’t finding clues as to the next closest keyword/semantic need help? Maybe the question isn’t about cracking RB, but where long tail queries are going in the future and which result will it be mapped to so we change our keyword volume estimates.

Yeah we need “broad match” back for long tail queries.

AJ Kohn // June 09th 2016

Victor,

While that’s an interesting idea I think it has two flaws. First is that you have no idea whether the results that are returned, even for a long-tail query like you suggest, are attributed to RankBrain.

And in relation to that is the second problem. The training data for such a query probably isn’t large enough or at all meaningful. It might not create any sort of meaningful vector and might simply be matching on similar text.

Again, even if you did think you’d triggered a RankBrain result, what does that help you do for the content that isn’t nonsensical? How do you apply that? Particularly since we have no idea if RankBrain treats queries uniformly. Frankly, that seems unlikely so … the learning isn’t transferrable nor is it reliable.

Laura Crest // June 09th 2016

Excellent read, yet again, AJ. Thanks for ending the confusion around RankBrain so definitively — this was so refreshing and so needed!

AJ Kohn // June 09th 2016

Thanks so much Laura. What I hope I’m able to do is convey complex ideas in a very simple manner. And when it comes to RankBrain that’s doubly important in my opinion.

Doc Sheldon // June 09th 2016

Great stuff, AJ! I’ve been explaining this issue to my clients this way: “You can’t optimize for RankBrain because RankBrain is only concerned with understanding the search query.. it has NOTHING to do with the content on your pages… ONLY the query. Unless YOU’RE the one doing the searching, it has nothing to do with you or your site!

AJ Kohn // June 09th 2016

Thanks Doc. But … it is concerned with the content on pages to a large degree. From my understanding it takes a variable-length of text and converts it into a vector that has certain meanings. Essentially, text is represented by the mathematical equivalent of x.

Then when a query comes in they determine if it’s one they’ve seen and understand or one that might benefit from RankBrain interpretation. At that point it likely turns the query into a vector and sees which documents might match that vector, and if that makes it a better result for that query.

All that said, it’s not like you can understand that by using a certain word in a certain context will create the right vector that will make it more probable for RankBrain to select or promote your result for a specific query. #phew

Doc Sheldon // June 09th 2016

That’s totally different from what I understand (about it being concerned with content on the pages), AJ. If that were true, then we WOULD be able to optimize for it, to a degree.

What I’ve gleaned is that RB is applied against all queries (and RB’s interpretations of those queries that weren’t already well understood are filtered for review, before being firmed up in the algo.) But RB is only applied against the query, in order to gain a better understanding (using vectors, as you said) of what the searcher is looking for.

But at this point, RB is only looking at the query. Once it has derived what it thinks the query is looking for, it then communicates that and the search algos retrieve results from the index, based upon RB’s output. At this point in time, there’s no indication or whether any vectors are being utilized on the ranking/retrieval side of the equation (although personally, I think that’s inevitable – and the sooner, the better). I wouldn’t be surprised if we see that though, within the year.

AJ Kohn // June 10th 2016

Doc,

If you listen to Paul Haahr he says that RankBrain plugs in late post-retrieval. So it does not seem to do anything to the query before it looks to source candidates. He also mentions that it has access to the query, documents and a subset of other signals.

At first I too believed that RankBrain was likely just rewriting the query to improve retrieval but the comments made about how it works indicate otherwise. Combined with the idea of variable-length vectors it seems likely that documents are represented by vectors and when the query isn’t satisfied by traditional results RankBrain turns the query into a vector and determines if any of the candidates should be promoted.

Vishal // June 10th 2016

Yeah RankBrain is powerful I admit. It IS awesome. But there would be ways around it… check this out “The Death of RankBrain?“

AJ Kohn // June 10th 2016

Vishal,

I don’t like to censor so I’m letting this comment through. I also don’t like being mean, but … posts like yours are one of the reasons I felt it necessary to write this piece.

Gianluca Fiorelli // June 10th 2016

I can only say: Amen

Doc Sheldon // June 10th 2016

“If you listen to Paul Haahr he says that RankBrain plugs in late post-retrieval.” It’s been some time since I watched that, and I must’ve missed it… an important thing to miss, for sure! Thanks, I’ll give it another watch – may have to rethink things.

Vishal // June 10th 2016

Hi AJ.. thanks for letting it go through. Googe article BTW. But, what if someone were to develop an AI algorithm that could also take huge data sets and figure out patterns inside each niche and keyword and counter rankbrain by “blending in” ? Also the data for rank brain could be skewed by real humans hired to do it. I’m aware Google is using Chrome to track behavior and engagement on sites (they have patents on this) and people have put packet sniffers to see data flowing to Google and various SEOs have conducted experiments to also deduce this… so if they’re doing it and if they already have a baseline of data, we just need to blend into that baseline. No?

AJ Kohn // June 10th 2016

Vishal,

There is no counter to RankBrain. It’s simply understanding language better and thereby able to match complex queries with documents by using variable-length vectors and a subset of other signals.

What are you blending into? Are you suddenly going to know that adding one specific word to a certain sentence is going to change the vector and ensure that for an unknown number or amount of complex queries structured in a certain way you’ll get ranked slightly higher?

So … no.

Victor Pan // June 10th 2016

Hey AJ, you’re right that figuring out RB is fruitless but let’s try to dig in deeper.

“First is that you have no idea whether the results that are returned, even for a long-tail query like you suggest, are attributed to RankBrain”

I think that’s a great question to ask a Googler – on whether all queries that have never been seen before go through RankBrain during post-retrieval. As you’ve noted, there’s no way to prove otherwise.

“The training data for such a query probably isn’t large enough or at all meaningful. It might not create any sort of meaningful vector and might simply be matching on similar text.”

I agree it probably isn’t meaningful, at a small scale, but at least we’ll have a starting point as to how RB deals with small vectors with a set of controlled text. As you build up more vectors of words, size no longer is an issue, and actually that’s how searches with 0 search interest grow, which I’d argue is natural. What I’m curious about is the tipping point where repetitive spam becomes the best match because it is the majority of data that we have. Basically, can you create enough spam to make a test set that becomes the training set for an unknown query (and therefore RB fails).

When you find that point… would the result change if certain types of signals (say links) are introduced which point to what is spam and what is not spam?

“Again, even if you did think you’d triggered a RankBrain result, what does that help you do for the content that isn’t nonsensical?”

If I were a spammer, I would “get in early” if I know “supplemental” signals used for a particular never-before-seen query could be biased.

In terms of applications… I agree there are none, since unknown queries become known queries with volume through time because scoring happens before post-retrieval so…

The journey’s the destination.

Thanks for reading and entertaining the thought exercise!

AJ Kohn // June 10th 2016

Victor,

Perhaps I’m not making it clear. If your non-sensical text is on one site or even a handful then that vector is essentially meaningless. I’m not even sure it could be expressed as a meaningful vector! So I can’t imagine RankBrain would think it was useful, particularly in combination with that subset of signals.

One doesn’t know when RankBrain is involved, what it’s weighting, nor the material impact on the SERP when invoked. I’m just not seeing how any of this type of testing would provide verifiable insight.

Doc Sheldon // June 10th 2016

What I find fascinating and intriguing is the use of W2V, P2V down the road. Using vectors has two major advantages over the more conventional methods of content to query matching:
1. it can operate at huge scales (and very rapidly, as a bonus);
2. the larger the corpus, the more accurate it becomes, whereas matching words/synonyms in the “old” methodology actually becomes somewhat less accurate as the corpus increases in size.
So when this sort of tech is applied to everything Google does to interpret content, then AI stops being an elusive dream and suddenly becomes a toddler, learning to walk.
Fun times we’re headed for!

Ken Ashe // June 14th 2016

“Strong writing matters more now than it ever has before.” Incredible post with a great conclusion. Now I have to become a better writer.

AJ Kohn // June 15th 2016

Thanks for the kind words Ken.

Tom Lambert // June 20th 2016

I have been thinking of writing something like this, but mine version would not have been half, or even a quarter, this good. Thank you for saving the world from my lame version.

My drumbeat for many, many years is that you have to assume that search engines are moving toward the Star Trek computer. That’s a stated goal and everything Google has done, but especially the Knowledge Graph and RankBrain seem to be moving in that direction. I remember years ago trying to tell people that one thing that Google gets from Google Books is a vetted repository of writing of (generally) known quality that they will eventually use to train AI to understand quality.

The way I put it to people is that in 30 years when you explain to the college students in the class of 2038 that back in the day, you used to type in keywords and try to find a document that had the same words in it as the ones in your query, they will be incredulous. It will be like explaining punch cards to them. They will marvel at how we were able to find any information at all. Only old people with grey hair and missing teeth will remember that.

So then I say that the only way to optimize for that search engine in 2038 will be to have the best answer to your query (and that will require massively greater personalization because your person search bot really needs to know not only the definitive answer to the question, but your level of expertise and what you can actually digest).

When you say you can’t optimize for RankBrain, you can only improve your writing, I would say that improving your writing (broadly conceived to include style, clarity, but also depth of understanding, quality of information and so forth) is how you optimize for the search engine of the future.

Sure “that’s just doing proper SEO and delivering better user experience (UX),” but the reality is that the goal of the big search engines is to return the same result that a top expert who is also an excellent teacher and knows you well would return. So when search engines get good enough, there’s really no difference between better UX and better SEO.

AJ Kohn // June 21st 2016

Tom,

Thanks so much for the thoughtful comment. I mostly agree with you, though the idea that Google Books is the right template might not be correct. Web writing is still very different from other forms of writing in my view. And it’s not just what you say but how you present it.

A great and insightful piece may never reach an audience if it’s all 10 point grey text on a black background with only four paragraphs to separate 3,000 words. So those who are truly able to communicate through the web will be those of prominence in 2038.

Tom Lambert // June 21st 2016

Hi AJ,

I didn’t mean that Google was using Books in a simplistic way like that so that it would say that if a web page looks like a book page, that makes it a good web page.

Rather, that Books provides Google with a linguistic corpus of vetted quality in that everything there has gone through editorial review. It includes technical, turgid text with four paragraphs for 3,000 words, but it also includes huge amounts of dialog and slang and more. If you want to turn language into vectors and teach an AI how to handle this, you need a lot of text that actually makes sense before you can teach it how to handle poorly written text. Books provides that quality text.

Once you see what text that has gone through editorial control looks like, you can then get much better at recognizing spam.

Once you are able to construct vectors and get closer to understanding the meaning of sentences, you can go beyond simple synonym matching and you can eventually distinguish homonyms. You can, as this paper shows, impute missing words.

http://arxiv.org/pdf/1511.06349.pdf

Eventually, you can feed 2635 romance novels into your AI so it can become more conversational and so it can start writing poetry… as Google has done.

http://www.androidauthority.com/google-ai-poetry-692231/
http://www.androidauthority.com/google-ai-romance-novels-691082/

The important point here being that when training the AI, they were NOT using web pages, nor where they using books with 3000 words in 4 paragraphs (though having never read a romance novel, I guess I don’t know that).

That’s sort of what I always thought was the big payoff for Google Books in terms of providing a resource for the core search business (I know Google Books has its own value for selling books, not discounting that).

The thing that I noticed with Google Books is that early on the transcriptions were terrible because it was “dumb” OCR. At a certain point, though, the transcriptions got really, really good. It was at that point that I realized that they were using Google Books to learn language and that would feed back in all sorts of ways.

AJ Kohn // June 22nd 2016

Interesting Tom. I hadn’t thought about the idea that the Books corpus could be the training data for sentence and/or paragraph vectors. That would be an interesting way to develop a strong baseline, particularly since that corpus is labeled by topic(s) and of a quality that would produce valuable vectors.

Many thanks for your contributions here.

Frank Sandtmann // June 27th 2016

Hi AJ,

Thanks for another piece of truly insightful content. I especially also enjoyed Tom’s comment, as it is pointing to the general approach Google is using. They are hardly ever investing money into a business area, which will not be of any use for their overall goals in future. And using content from books, which have proven to be of high quality – covering a myriad of topics – as training material for their AI activities seems to be 100% in line.

Google has so often reiterated the claim that “content is king” that SEOs simply no longer hear that message. But now we might be finally coming nearer to a time, when what you write is truly defining, whether you will really have any chance to be found.

There can be no way to optimize for RankBrain. The only way is to write helpful, insightful and engaging content – as you just did with this post.

AJ Kohn // June 27th 2016

Thank you for the kind words and comment Frank.

Content is absolutely essential. I think many people just don’t understand what type of content is going to win. Some think in terms of volume. Others think in terms of formulas. If it were that easy we’d all be cranking out amazing content. But it’s just not paint by numbers.

The real shame is that it’s not just what you say but how you present it. This is where I see so many people going wrong. Readability isn’t how easy it is to read something by whether you want to read it. Very few grok the importance of presentation nor engagement as a means to promotion and distribution.

And you’re 100% right. Google rarely does anything that doesn’t tie back to their core business. Thinking otherwise is, in my view, naive.

Jeannie Hill // June 27th 2016

Thanks for a great article.

What, how, and where you push out what your write contributes much.

Seems like it is important to know much about the type of content niche users want, what form of content they prefer to consume, where they look for it, and how to help search engines know your site offers relevant solutions and answers. Paid marketing and additional promotional efforts to the channels they visit should follow, in my experience.

Why do you think we see so many sites with well-written content that no-one or few seem to find and read?

AJ Kohn // August 21st 2016

Jeannie,

I think many of those sites don’t do the appropriate outreach and marketing of that content. It’s not just enough to have the content and then hope Google sends you traffic. In addition, some of the content might be well written but it’s not readable. Readability isn’t just that it’s easy to read but that people want to read it.

So I agree with everything you said regarding what’s important and can only add that I believe that the time spent creating content should always be matched by time spent marketing that content.

Jonathan // July 28th 2016

I had no idea things were so evolved when it comes to AI! I was guessing Google started using AI for a small part of their queries but they seem to be using it for a lot more than that. I wonder how many queries I’ve made recently that were served using RankBrain? Will have to learn more about this but right now I tend to agree with you: you can’t really optimize for this.

Sorry, comments for this entry are closed at this time.

You can follow any responses to this entry via its RSS comments feed.

Blind Five Year Old

RankBrain Survival Guide

4 trackbacks/pingbacks

Comments About RankBrain Survival Guide

Subscribe

Browse by Category

Search The Site

Follow

Blog Roll