The fact that Google frequently uses synonyms to boost search quality is nothing new. But Dan Petrovic brought an interesting example to my attention via Google+ which spawned a dialog that included Bill Slawski, Wissam Dandan and Steven Baker, Principal Software Engineer on the Search Ranking team.
It is conversations like these that make search so enjoyable. Hopefully you agree.
Dan’s question revolved around the query ‘the dreaming void plot’.
This query returned results for The Temporal Void as well as The Dreaming Void, both books by Peter F. Hamilton. The question was why?
First things first. Bold words in search results usually reflect the query terms. It’s one of the strongest signals of relevance that Google can provide to the user. Your eye naturally gravitates to those bolded words and they reinforce the fact that the result(s) matched your query.
However, Google has also been bolding synonyms when they’re returned in search results. The easiest way to see this is to combine a synonym operator (~) with a negative operator (-).
Here it’s easy to see that fantasy and sleep are bolded and are thus synonyms to dream according to Google. This makes complete sense.
Here’s where it gets interesting. The terms dreaming and temporal are not … regular synonyms. By that I mean that if you try the operator scenario above for dreaming you will not see temporal in bold.
A cursory look at your favorite dictionary will also tell you that these are not ‘grammatical’ synonyms.
The next thing I did was conduct a search using the root query: The Dreaming Void. The result did not yield results for The Temporal Void. I then looked at related searches, one of my favorite search features.
Lo and behold the ‘first’ related search is ‘temporal void’. This tells me that Google sees a very strong relationship between these two terms based on query patterns.
The related search for the full ‘the dreaming void plot’ query does not yield any temporal void terms. That’s not entirely unexpected for reasons I won’t go into here for the sake of brevity. Finally, I remove the related filter and then test the query using the new verbatim search.
Poof. All results for ‘The Temporal Void’ disappear. Though obvious, this confirms that the results for ‘The Temporal Void’ are either synonyms or match similar terms.
This is what I refer to as a query synonym. The science behind these is actually incredibly interesting and complex. Because synonyms are not just about simple grammar, they’re about language, syntax and context as well.
Wissam Dandan offered this excerpt from a recent Google blog post on search quality changes.
Related query results refinements: Sometimes we fetch results for queries that are similar to the actual search you type. This change makes it less likely that these results will rank highly if the original query had a rare word that was dropped in the alternate query. For example, if you are searching for [rare red widgets], you might not be as interested in a page that only mentions “red widgets.”
Could this be related to Dan’s query? It might. The idea behind related queries is similar to synonyms. (Irony, huh?) The example provided by Google is that it will return results for ‘floral delivery’ when you search for ‘flower shops’. The change above will reduce the likelihood of false positives which may allow Google to increase the use of related query results refinements.
In the case of ‘the dreaming void plot’ there don’t seem to be any rare query terms. In fact, most documents in the content corpus contain all of these words and the word ‘temporal’ as well. There’s a high degree of co-occurrence for the terms ‘dreaming’ and ‘temporal’ which makes sense since they are part of a series of books.
But that’s the thing, what seems easy and straightforward to us is actually quite difficult for a machine.
The Science of Synonyms
Then the always smart Bill Slawski joined the conversation providing more examples of why synonyms are so difficult.
For instance, while we may often consider the words “auto” and “car” to be synonyms, that’s not the case when you set an alarm on “auto.” Even within longer phrases, words that we might consider to be synonyms might not be. So, “automobile” and “car” are synonyms when we search for a [ford car], but not when we search for a [railroad car].
Bill went on to reference a number of patents that describe how Google might approach synonyms and related query refinement, five of which list Steven Baker as a co-inventor.
While Bill and I sought out other science fiction series that might display this same behavior Steven joined the conversation. While he wasn’t able to provide much detail he did reference his blog post on synonyms.
An irony of computer science is that tasks humans struggle with can be performed easily by computer programs, but tasks humans can perform effortlessly remain difficult for computers. We can write a computer program to beat the very best human chess players, but we can’t write a program to identify objects in a photo or understand a sentence with anywhere near the precision of even a child.
The last statement is a odd sort of synonym for my own SEO philosophy and name of this blog. The post also answered my question as to whether query synonyms are provided the same bold treatment. (They are.)
Google is actively using complex methods to identify synonyms and related queries to improve search results. While this type of query results refinement is usually spot on and unnoticeable it can sometimes be flawed. In those instances, you can remove these results using the verbatim search tool.