Over the last two and a half years I've become convinced that Google performs rich categorization of sites and queries. The signs are plain as day and the impact is substantial. Categorization allows Google to apply algorithmic changes by category as well as deliver more relevant and diverse search results.
In late 2008 I wrote about how taxonomic search could be used to create diversity in search engine results. Later, In May 2009, Google launched search options that included the ability to filter your results by Forums and Reviews.
Rich Snippets stole the show during this launch but it should have been crystal clear that Google was able to distinguish site or page type. How else would it deliver results in these filters?
Google made this even more transparent when they tested the ability to see more or fewer shopping sites. Obviously Google was able to tell (with relative accuracy) which sites were eCommerce enabled.
You may also come across search results for Q&A sites or forums that include a pseudo-rich snippet which includes the number of posts and authors.
This is not structured markup. This isn't RDFa or microdata. Instead, Google has identified patterns in these sites, extracting the relevant information and creating their own rich snippet. More recently, Google demonstrated that it could determine if a page contained search results.
These are interesting applications but it's what is at the heart of it all that is really remarkable - the ability to algorithmically identify and categorize a page or site.
If it were a snake it would have bit you already. Of course Google categorizes queries! You see this when local results are triggered, or when the Onebox is presented. Enter a query that is product based and the Shopping Onebox is likely to appear. Enter a query that is health related and the Health Onebox may appear.
It's not a matter of if, but how deeply Google categorizes queries.
Matching Queries to Categories
Matching queries to categories can help deliver the right information to users. You can already see this happening as Google changes the left hand search options based on the query.
You won't get a Recipes option if you search for 'baked new jersey'.
You won't get a Books option if you search for 'crying of lot 23'.
These examples could be delivered by simply accessing relevant databases of recipes or books, but I'm guessing the relation is far more dynamic.
Matching Documents to Queries
The question then becomes how Google matches web pages (aka documents) to queries. What documents (or results) does Google return based on the query categorization?
A recent patent dissected by Bill Slawski shows how Google might think about matching documents, queries, keywords and categories. It indicates that what Google thinks your site is about could influence how and when it is returned for certain queries.
If you were to take all of that information that Google provides for your site, and try to guess at a category or categories that Google might assign for your site, could you?
That is an interesting question.
Google Ad Planner
One of the unconventional places I've looked at recently to give me a sense of what Google might think a site is about is Google Ad Planner.
The content categories are usually a mixture of accurate and bizarre, particularly around geography. Now, I'm not saying that the content categories here influence Google's categorization. But I have found it illuminating to look at Ad Planner results when comparing competitors.
Beyond the content categories are affinity scores for sites also visited and audience interests. The latter is another interesting data point when thinking about how Google might categorize sites.
The good news is Google has a compelling reason for this data to be accurate (it's attached to advertising) and the data seems to be updated frequently. As an example, the Ad Planner thumbnail for this blog shows my most recent blog post.
Why Google Ad Planner thinks I'm in Canada is a bit of a mystery. I did do a blog posts about Hockey Memories and The Flyers 1987 Stanley Cup Playoffs, but I'm not in Canada (as lovely a country as it may be.)
After verifying site ownership you can change your categories, which reveals a fairly robust taxonomy. Is this the same taxonomy Google uses in their algorithm? Probably not. But it might help inform or update the one used for search. So, I've gone ahead and changed my categories and description.
It's in my can't-hurt-and-might-help category.
Google continues to work on matching categories to queries and queries to web pages to improve search quality. Understanding how your site might be perceived by Google is an important new step in search engine optimization.