You Are Browsing The SEO Category

Crawl Optimization

July 29 2013 // SEO // 80 Comments

Crawl optimization should be a priority for any large site looking to improve their SEO efforts. By tracking, monitoring and focusing Googlebot you can gain an advantage over your competition.

Crawl Budget

Ceiling Cat

It's important to cover the basics before discussing crawl optimization. Crawl budget is the time or number of pages Google allocates to crawl a site. How does Google determine your crawl budget? The best description comes from an Eric Enge interview of Matt Cutts.

The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank. So if you have a lot of incoming links on your root page, we'll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we'll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank. The pages that get linked to a lot tend to get discovered and crawled quite quickly. The lower PageRank pages are likely to be crawled not quite as often.

In other words, your crawl budget is determined by authority. This should not come as a shock. But that was pre-Caffeine. Have things changed since?

Caffeine

Percolator

What is Caffeine? In this case it's not the stimulant in your latte. But it is a stimulant of sorts. In June of 2010, Google rebuilt the way they indexed content. They called this change 'Caffeine' and it had a profound impact on the speed in which Google could crawl and index pages. The biggest change, as I see it, was incremental indexing.

Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Essentially, Caffeine removed the bottleneck for getting pages indexed. The system they built to do this is aptly named Percolator.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

The speed in which Google can crawl is now matched by the speed of indexation. So did crawl budgets increase as a result? Some did, but not as much as you might suspect. And here's where it gets interesting.

Googlebot seems willing to crawl more pages post-Caffeine but it's often crawling the same pages (the important pages) with greater frequency. This makes a bit of sense if you think about Matt's statement along with the average age of documents benchmark. Pages deemed to have more authority are given crawl priority.

Google is looking to ensure the most important pages remain the 'freshest' in the index.

Time Since Last Crawl

Googlebot's Google Calendar

What I've observed over the last few years is that pages that haven't been crawled recently are given less authority in the index. To be more blunt, if a page hasn't been crawled recently, it won't rank well.

Last year I got a call from a client about a downward trend in their traffic. Using advanced segments it was easy to see that there was something wrong with their product page traffic.

Looking around the site I found that, unbeknownst to me, they'd implemented pagination on their category results pages. Instead of all the products being on one page, they were spread out across a number of paginated pages.

Products that were on the first page of results seemed to be doing fine but those on subsequent pages were not. I started to look at the cache date on product pages and found that those that weren't crawled (I'm using cache date as a proxy for crawl date) in the last 7 days were suffering.

Undo! Undo! Undo!

Depagination

That's right, I told them to go back to unpaginated results. What happened?

Depagination

You guessed it. Traffic returned.

Since then I've had success with depagination. The trick here is to think about it in terms of progressive enhancement and 'mobile' user experiences.

The rise of smartphones and tablets has made click based pagination a bit of an anachronism. Revealing more results by scrolling (or swiping) is an established convention and might well become the dominant one in the near future.

Can you load all the results in the background and reveal them only when users scroll to them without crushing your load time? It's not always easy and sometimes there are tradeoffs but it's a discussion worth having with your team.

Because there's no better way to get those deep pages crawled by having links to all of them on that first page of results.

CrawlRank

Was I crazy to think that the time since last crawl could be a factor in ranking? It turns out I wasn't alone. Adam Audette (a smart guy) mentioned he'd seen something like this when I ran into him at SMX West. Then at SMX Advanced I wound up talking with Mitul Gandhi, who had been tracking this in more detail at seoClarity.

seoClarity graph

Mitul and his team were able to determine that content not crawled within ~14 days receives materially less traffic. Not only that, but getting those same pages crawled more frequently produced an increase in traffic. (Think about that for a minute.)

At first, Google clearly crawls using PageRank as a proxy. But over time it feels like they're assigning a self-referring CrawlRank to pages. Essentially, if a page hasn't been crawled within a certain time period then it receives less authority. Let's revisit Matt's description of crawl budget again.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank.

The pages that aren't crawled as often are pages with little to no PageRank. CrawlRank is the difference in this very large pool of pages.

You win if you get your low PageRank pages crawled more frequently than the competition.

Now what CrawlRank is really saying is that document age is a material ranking factor for pages with little to no PageRank. I'm still not entirely convinced this is what is happening, but I'm seeing success using this philosophy.

Internal Links

One might argue that what we're really talking about is internal link structure and density. And I'd agree with you!

Not only should your internal link structure support the most important pages of your site, it should make it easy for Google to get to any page on your site in a minimum of clicks.

One of the easier ways to determine which pages are deemed most important (based on your internal link structure) is by looking at the Internal Links report in Google Webmaster Tools.

Google Webmaster Tools Internal Links

Do the pages at the top reflect the most important pages on your site? If not, you might have a problem.

I have a client whose blog was receiving 35% of Google's crawl each day. (More on how I know this later on.) This is a blog with 400 posts amid a total content corpus of 2 million+ URLs. Googlebot would crawl blog content 50,000+ times a day! This wasn't where we wanted Googlebot spending its time.

The problem? They had menu links to the blog and each blog category on nearly all pages of the site. When I went to the Internal Links report in Google Webmaster Tools you know which pages were at the top? Yup. The blog and the blog categories.

So, we got rid of those links. Not only did it change the internal link density but it changed the frequency with which Googlebot crawls the blog. That's crawl optimization in action.

Flat Architecture

Flat Architecture

Remember the advice to create a flat site architecture. Many ran out and got rid of subfolders thinking that if the URL didn't have subfolders then the architecture was flat. Um ... not so much.

These folks destroyed the ability for easy analysis, potentially removed valuable data in assessing that site, and did nothing to address the underlying issue of getting Google to pages faster.

How many clicks from the home page is each piece of content. That's what was, and remains, important. It doesn't matter if the URL is domain.com/product-name if it takes Googlebot (and users) 8 clicks to get there.

Is that mega-menu on every single page really doing you any favors? Once you get someone to a leaf level page you want them to see similar leaf level pages. Related product or content links are the lifeblood of any good internal link structure and are, sadly, frequently overlooked.

Depagination is one way to flatten your architecture but a simple HTML sitemap, or specific A-Z sitemaps can often be very effective hacks.

Flat architecture shortens the distance between authoritative pages and all other pages, which increases the chances of low PageRank pages getting crawled on a frequent basis.

Tracking Googlebot

"A million dollars isn’t cool. You know what’s cool? A billion dollars."

Okay, Sean Parker probably didn't say that in real life but it's an apt analogy for the difference in knowing how many pages Googlebot crawled versus where Googlebot is crawling, how often and with what result.

The Crawl Stats graph in Google Webmaster Tools only shows you how many pages are crawled per day.

Google Webmaster Tools Crawl Stats

For nearly five years I've worked with clients to build their own Googlebot crawl reports.

Googlebot Crawl Reporting That's Cool

That's cool.

And it doesn't always have to look pretty to be cool.

Googlebot Crawl Report by Page Type and Status

Here I can tell there's a problem with this specific page type. More than 50% of the crawl on that page type if producing a 410. That's probably not a good use of crawl budget.

All of this is done by parsing or 'grepping' log files (a line by line history of visits to the site) looking for Googlebot. Here's a secret. It's not that hard, particularly if you're even half-way decent with Regular Expressions.

I won't go into details (this post is long enough as it is) but you can check out posts by Ian Lurie and Craig Bradford for more on how to grep log files.

In the end I'm interested in looking at the crawl by page type and response code.

Googlebot Crawl Report Charts

You determine page type using RegEx. That sounds mysterious but all you're doing is bucketing page types based on pattern matching.

I want to know where Googlebot is spending time on my site. As Mike King said, Googlebot is always your last persona. So tracking Googlebot is just another form of user experience monitoring. (Referencing it like this might help you get this project prioritized.)

You can also drop the crawl data into a database so you can query things like time since last crawl, total crawl versus unique crawl or crawls per page. Of course you could also give seoClarity a try since they've got a lot of this stuff right out of the box.

If you're not tracking Googlebot then you're missing out on the first part of the SEO process.

You Are What Googlebot Eats

Cookie Monster Fruit

What you begin to understand is that you're assessed based on what Googlebot crawls. So if they're crawling a whole bunch of parameter based, duplicative URLs or you've left the email-a-friend link open to be crawled on every single product, you're giving Googlebot a bunch of empty calories.

It's not that Google will penalize you, it's the opportunity cost for dirty architecture based on a finite crawl budget.

The crawl spent on junk could have been spent crawling low PageRank pages instead. So managing your URL Parameters and using robots.txt wisely can make a big difference.

Many large sites will also have robust external link graphs. I can leverage those external links, rely less on internal link density to rank well, and can focus my internal link structure to ensure low PageRank pages get crawled more frequently.

There's no patent right or wrong answer. Every site will be different. But experimenting with your internal link strategies and measuring the results is what separates the great from the good.

Crawl Optimization Checklist

Here's a quick crawl optimization checklist to get you started.

Track and Monitor Googlebot

I don't care how you do it but you need this type of visibility to make any inroads into crawl optimization. Information is power. Learn to grep, perfect your RegEx. Be a collaborative partner with your technical team to turn this into an automated daily process.

Manage URL Parameters

Yes, it's confusing. You will probably make some mistakes. But that shouldn't stop you from using this feature and changing Googlebot's diet.

Use Robots.txt Wisely

Stop feeding Googlebot empty calories. Use robots.txt to keep Googlebot focused and remember to make use of pattern matching.

Don't Forget HTML Sitemap(s)

Seriously. I know human users might not be using these, but Googlebot is a different type of user with slightly different needs.

Optimize Your Internal Link Structure

Whether you try depagination to flatten your architecture, re-evaluate navigation menus, or play around with crosslink modules, find ways to optimize your internal link structure to get those low PageRank pages crawled more frequently.

Keywords Still Matter

June 05 2013 // SEO // 59 Comments

As content marketing becomes the new black I'm starting to hear people talk about how keywords don't matter anymore. This sentiment appears in more than a few posts and the general tenor seems to be that keyword focused strategies are a thing of the past - a relic from a dark time.

The problem? You need keywords to produce successful content.

Dwight Meme Keywords

Keyword Syntax

How do people search for something? That's what keywords are all about. It's vital to ensuring your content will be found and resonate with your users.

keyword syntax

Are people searching for 'all weather fluid displacement sculptures' or 'outdoor water fountains'. That's an extreme example but it makes an important point.

You need to understand the user and the words they use to find your content.

Keyword Intent

Keywords can also tell you a lot about the intent of a search. Look (well) beyond informational, navigational and transactional intent and start thinking about how you can map keywords to the various stages of your site's conversion funnel.

For instance, what does a query like 'majestic seo vs open site explorer' tell you? This user is probably further along in purchase funnel. They're aware of their choices and may have even narrowed it down to these two options. The keyword (yes, keyword) 'vs' makes it clear that they're looking for comparison data.

Google SERP for Comparison Intent

Sure enough, most of the results returned are posts that compare these two tools. Those pieces of content squarely meet that intent, in part because they're paying attention to keywords.

Majestic SEO has a result but ... it's the home page. Is that going to satisfy the desire to compare? Probably not. And where's SEOMoz? Missing in action.

Each could rely on the blog posts presented to deliver this comparison. Or they could also develop content that met that keyword and intent, allowing them to tell their story and frame the debate.

I know some will shriek, "Are you crazy? You don't want to promote your competition by mentioning them so prominently!" But that's denying reality. Users are searching with this syntax and intent.

Now, I'm not saying you have to put content that meets this particular intent prominently on the site or in the normal conversion flow. But if you know someone is on the fence and comparing products, why wouldn't you want a chance to engage that user on your own terms?

Keywords let you create content that matches user intent.

Magic Questions

Oh-O It's Magic!

There's also a lot of meta information that comes along with a keyword. I'm fond of using a term like 'eureka 313a manual' as an example. It's a query for a vacuum cleaner manual.

On the one hand it's a pretty simple. There's explicit intent. Someone is looking for the manual to their vacuum cleaner. The content to meet that informational search would be ... the manual. But, what's really going on?

If you're searching for the manual, odds are that something is wrong with your vacuum. There's an implied intent at work. The vacuum is either not working right or is flat out broken. You have the opportunity to anticipate and answer magic questions.

How can I fix my vacuum? Where can I buy replacement parts? Are there repair shops near me? What vacuum should I get to replace this one if it can't be fixed?

Be decoding the keyword you can create a relevant and valuable page that meets explicit and implied intent.

Keyword Frequency

Keyword frequency is important. Yes, really. One of my favorite examples of this is LinkedIn. How did they secure their place in the competitive 'name' query space?

LinkedIn Keyword Frequency

LinkedIn wanted to make it clear what (or who) these pages were about. That's what keyword frequency is about, making it easy for search engines and users to understand what that page is about.

LinkedIn doesn't just do it with their headers either, but uses the name frequently elsewhere on the page. The result?

Marshall Simmonds Wordle

There's no question what this page is about.

Keywords are Steve Krug for Googlebot.

Readability

This Is Not A Pipe

The reaction I get from many when I press on this issue is that it produces a poor user experience. Really? I've never heard anyone complain about LinkedIn and most never realize that it's even going on.

Using the keywords people expect to see can only help make your content more readable, which is still a tremendously undervalued aspect of SEO. Because people scan text and rarely read word for word.

And what do you think they're scanning for? What do you think is rattling around in their brain when they're scanning your content? It's not something random like 'bellhop poodle duster', it's probably the keyword that brought them there.

You may think Google is smart enough to figure it out. You'll claim that Google's gotten far more sophisticated in the application of synonyms and topical modeling. And you'd be right to a degree. But why take the chance? Particularly since users crave the repetition and consistency.

They don't want you to use four different ways to say the same thing and the hard truth is they're probably only going to read one of those words anyway. You'll create better content for users if you write for search engines.

Make sure you're using the words users expect to see.

TL;DR

Keywords aren't going away, they're becoming more important. Query syntax and user intent are vital in producing relevant and valuable content that resonates with users and answers both explicit and implicit questions.

Google Removes Related Searches

April 19 2013 // Rant + SEO // 45 Comments

This morning I went to use one of my go to techniques for keyword research and found it was ... missing.

Related Searches Gone

Related Searches Option Gone

It was bad enough that the new Search tools interface was this awkward double-click menu but I understood that decision. Because most mainstream users don't ever refine their results.

But to remove related searches from that menu altogether? In less than a year related searches went from being a search tip to being shuffled off to Buffalo?

WTF!

Out of Insight

Clooney is Pissed

Google needs to understand that there are SEOs, or digital marketing professionals if that makes it easier, who are helping to make search results better. We're helping sites understand the syntax and intent of their users and creating relevant and valuable experiences to match and satisfy those queries.

I wasn't happy but wasn't that upset when Google introduced (not provided). But as the amount of (not provided) traffic increases I see no reason why Google shouldn't implement my (not provided) drill down suggestion. Seriously, get on that.

But then Google merged Google Trends with Google Insights for Search and in the process removed its most useful feature. That's right, knowing what percentage of the traffic that was attributed to each category let SEOs better understand the intent of that query.

Now Google's taking away the interface for related searches? Yeah, you've gone too far now. Hulk mad.

Stop Ignoring Influencers

You Wouldn't Like Me When I'm Angry

Just like the decision to terminate Google Reader, Google doesn't seem to understand that they need to address influencers. And believe it or not Google, SEOs are influencers. We're demystifying search so that sites don't fall for get-rank-quick schemes. And you need us to do that because you're dreadful at SEO. Sites aren't finding much of your educational content. They're not. Really.

In the last year Google's made it more and more difficult for SEOs to do good work. And you know who ultimately suffers? Google. Because the content coming out won't match the right syntax and intent. It'll get tougher for Google, over-time, to find the 'right' content and users will feel the slow decline in search quality. You know, garbage in, garbage out.

Any good marketer understands that they have to serve more than one customer segment. Don't like to think of SEOs as influencers? Fine. Call us power users and put us back on your radar and stop removing value from the search ecosystem.

Time To Long Click

April 17 2013 // SEO // 63 Comments

The internal metric Google uses to determine search success is time to long click. Understanding this metric is important for search marketers in assessing changes to the search landscape and developing better optimization strategies.

Short Clicks vs Long Clicks

Longcat

Back in 2009 I wrote about the difference between short clicks and long clicks. A long click occurs when a user performs a search, clicks on a result and remains on that site for a long period of time. In the optimal scenario they do not return to the search results to click on another result or reformulate their query.

A long click is a proxy for user satisfaction and success.

On the other hand, a short click occurs when a user performs a search, clicks on a result and returns to the search results quickly to click on another result or reformulate their query. Short clicks are an indication of dissatisfaction.

Google measures success by how fast a search result produces a long click.

Bounce Rate vs Pogosticking

Before I continue I want to make sure we're not conflating short clicks with bounce rate. While many bounces could be construed as short clicks, that's not always the case. The bounce rate on Stack Overflow is probably very high. Users search for something specific, click through to a Stack Overflow result, get the answer they needed and move on with their life. This is not a bad thing. That's actually a long click.

You can gain greater clarity on this by configuring an adjusted bounce rate or something even more advanced that takes into account the amount of time the user spent on the page. In the example above you'd likely see that users spent a material amount of time on that one page which would be a positive indicator.

The behavior you want to avoid is pogosticking. This occurs when users click through on a result, returns quickly to the search results and clicks on another result. This indicates, to some extent, that the user was not satisfied with the original result.

Two problems present themselves with pogosticking. The first is that it's impossible for sites to measure this metric. That sort of sucks. We can only look at short bounces as a proxy and even then can't be sure that the user pogosticked to another result.

The second is that some verticals will naturally produce pogosticking behavior. Health related queries will show pogosticking behavior since users want to get multiple points of view (or opinions if you will) on that ailment or issue.

This could be overcome by measuring the normal pogosticking behavior for a vertical or query class and then determining which results produce lower and higher than normal pogosticking rates. I'm not sure Google is doing this but it's not out of the question since they already have a robust understanding of query and vertical mapping.

But I digress.

Speed

Part of the way Google works on reducing the time to long click is by improving the speed of search results and the Interent in general. Their own research showed the impact of speed on search results.

All other things being equal, more usage, as measured by number of searches, reflects more satisfied users. Our experiments demonstrate that slowing down the search results page by 100 to 400 milliseconds has a measurable impact on the number of searches per user of -0.2% to -0.6% (averaged over four or six weeks depending on the experiment). That's 0.2% to 0.6% fewer searches for changes under half a second!

Remember that while usage was the metric used, they were trying to measure satisfaction. Making it faster to get to information made people happier and more likely to use search for future information requests. Google's simply reducing the friction of searching.

But it's not just the speed of presenting results but in how quickly Google gets someone to that long click that matters. Search results that don't produce long clicks are bad for business as are those that increase the time selecting a result. And pogosticking blows up the query timeline as users loop back and tack on additional seconds worth of selection and page load time.

Google Query Timeline

Make no mistake. Google wants to reduce every portion of this timeline they presented at Inside Search in 2011.

Answers

42

One of the ways in which we've seen Google reduce time to long click is through various 'answers' initiatives. Whether it's a OneBox or a Knowledge Graph result the idea is that answers can often reduce the time to long click. It's immediate gratification and in line with Amit Singhal's Star Trek computer ideal.

In some of cases a long click is measured by the absence of a click and reformulated query. If I search for weather, don't click but don't take any further actions, that should register as a long click.

Ads

John Henry Man vs Machine

You'll also hear Google (and Bing) talk about the fact that ads are answers. Of course ads are what fill the coffers but they also provide another way to get people to a long click. Arguing the opposite (that ads aren't contributing to satisfaction) is a lot like arguing that marketers and advertisers aren't valuable.

Not only that, but Google has features in place to help ensure that good ads answers rise to the top. The auction model coupled with quality score and keyword level bidding all produce relevant ads that lead to long clicks.

The analysis of pixel space on search results is often used to show how Google is marginalizing organic search. Yet, the other way to look at it is that advertisers are getting better at delivering results (with the help of new Google ad extensions). Isn't it, in some ways, man versus machine? The advertiser being able to deliver a better result than the algorithm?

Without doubt Google benefits financially from having more space dedicated to paid results but they still must result in long clicks for Google to optimize long-term use, which leads to long-term revenues and profits.

I would be very surprised if changes to search results (both paid and organic) weren't measured by the impact they had in time to long click.

Hubs

Bow Tie

All of this is interesting but what does the time to long click metric mean for SEO? More than you might suspect.

When I started in the SEO field I read everything I could get my hands on (which is not altogether different from now). At the time there was advice about becoming a hub.

There was a good deal of hand waving about the definition of a hub but the general idea was that you wanted to be at the center of a topic by providing value and resources. People would link to you and the traffic you received would often go on to the resources you provided. About.com is a good example.

Funny thing is, this isn't some well kept secret. Marshall Simmonds spells it out pretty clearly in this 2010 Whiteboard Friday video where he discusses bow tie theory (hubs) and link journalism. (I just watched this again while writing this and, man, this is an awesome video.)

Most people focus on the fact that hubs receive a lot of backlinks. They do because of the value they provide, which is often in the aggregation of and links to other content. In the end, the real value of hubs is that they play an important part in getting people to content and that long click.

Search is a multi-site experience.

This is what search marketers must realize. You will get credit for a long click if you're part of the long click. If you ensure that the user doesn't return to search results, even by sending them to another site, then you're going to be rewarded.

Too often sites won't link out. I regularly run into this as my clients navigate business development deals with partners. It's frustrating. They think linking out is a sign of weakness and reduces their ability to consolidate Page Rank.

While Page Rank math might support not linking out, that strategy ultimately limits success.

Link Out!

Local Maxima Graph

Limiting your outlinks creates a local maxima problem. You'll optimize only up to a certain ceiling based on constrained Page Rank math. Again, not a real secret. Cyrus Shepard talked about this in a 2011 Whiteboard Friday video (though I wouldn't stress too much about the anchor text myself.)

Linking out can help you break through that local maxima by delivering more long clicks. Suddenly, your page is a sort of mini-hub. People search, get to your page and then go on to other relevant information.

Google wants to include results that contribute to reducing the time to long click for that query. 

I'm not advocating that you vomit up pages with a ton of links. What I'm recommending is that you link to other valuable sources of information when appropriate so that you fully satisfy that user's query. In doing so you'll generate more long clicks and earn more links over time, both of which can have profound and positive impact on your rankings.

Stop thinking about optimizing your page and think about optimizing the search experience instead. 

I ran into someone as SMX West who inherited a vast number of low quality sites. These sites used the old technique of being relevant enough to get someone to the page but not delivering enough value to answer their query. The desired result was a click on an ad. Simple arbitrage when you get down to it.

In a test, placing prominent links to relevant content on a sub-set of these pages had a material and positive impact on their ranking. It's certainly not conclusive, but it showed the potential impact of being part of a multi-site long click search result.

As an aside, it's not that those ad clicks were bad. Some of those probably resulted in long clicks. Just not enough of them. The majority either pogosticked to another result or wound up back at the search result after an ad click. And we already know this as search marketers by looking at the performance of search versus display campaigns.

Impact On Domain Diversity

If you believe time to long click is the way in which Google is measuring search success then you start to see some of the changes in a new light. I've been disappointed by the lack of domain diversity on many search results.

Yelp Dominating Search Results for Haircut in Concord CA

Sadly, this type of result hasn't been that rare within the last year. Pete Myers has been doing amazing work on this topic.

For a while I just thought this was Google being stupid. But then it dawned on me. The lack of domain diversity may be reducing the time to long click. It might actually be improving the overall satisfaction metrics Google uses to optimize search!

In some ways this makes a bit of sense, if even from a straight up Paradox of Choice perspective. Selecting from 10 different domains versus 5 might reduce cognitive strain. Too many choices overwhelm people, reducing both action and satisfaction. So perhaps Google's just reflecting that in their results with both domain diversity (or lack there of) and more instances of 7 results pages.

Downsides to Time To Long Click?

MC Escher Relativity Stairs

Are these long clicks are truly a sign of satisfaction. The woman who had been cutting my hair for nearly 10 years retired. So I actually did need to find someone new. I hated search result but did wind up clicking through and using Yelp to locate someone. So from Google's perspective I was satisfied but in reality ... not so much.

I wonder how long a time frame Google uses in assessing the value of long clicks. I abandoned my haircut search a number of times over the course of a month. In many of those instances I'm sure it looked like I was satisfied with the result. It looked like a long click. Yet, if you looked over a longer period of my search history it would become clear I wasn't. I think this is a really difficult problem to solve. Is it satisfaction or abandonment?

The other danger here is that Google is training people to use another service. Now, I don't particularly like Yelp but what this result tells me is that if I wanted to find something like this again I should just skip Google and go right to Yelp instead.

The same could be said by reflecting our own bias toward brands. While users may respond better to brands and the time to click might be reduced, the long term implications could be that Google is training users to visit those brands directly. Why start my product search on Google when all they're doing is giving me links to Amazon 90% of the time?

Of course, Google could argue that it will remain the hub for information requests because it continues to deliver value. (See what I did there?)

TL;DR

Google is using time to long click to measure the effectiveness of search results. Understanding this puts many search changes and initiatives into perspective and gives sites renewed reason to link out and think of search as a multi-site experience.

Tracking Image Search In Google Analytics

March 27 2013 // Analytics + SEO // 51 Comments

(This post has been updated as of 4/5/14 to reflect refinements to the filters as well as new caveats about Chrome.)

The Internet is becoming increasingly visual but the standard Google Analytics default lumps image search traffic in with organic traffic. The problem with that is these two types of traffic have radically different behaviors.

Google Analytics Y U No Track Image Search

So here's a quick way for you to track image search in Google Analytics to gain insight into how images are performing for your business.

Image Search Referrers

After the last big image search update I was asked by Annie Cushing if I'd figured out a way to track images in Google Analytics. I'd meant to but hadn't yet. Her reminder led me to find out what was possible. I fired up Firefox and used Live HTTP Headers to look at the referrers for image search traffic.

I found that there were two distinct referrers for Google, one from Google images and one from images that showed up via universal search results.

Here's what the referrer looks like from Google image search.

Google Image Search Referrer

The parts to note here are the /url? and the source=images parameter. Now lets look at what the referrer looks like from an image via universal search.

Google Image Referrer via Universal Search

The part to note here is that the URL doesn't use /url? but imgres? instead. This means you can track traffic from each source!

But there's another wrinkle I discovered over time. Many of the international versions of Google use the old image search UX which also produces the /imgres? referrer.

google.fr image search for ruby red slippers

In addition, most of these wind up being passed in the Google cookie as a 'referring' medium and not 'organic'. So you might be seeing Google domains cropping up in your referring reports (annoying!). Adding Full Referrer as a secondary dimension shows where the majority of these are coming from: imgres.

Google Referring Traffic in Google Analytics Reports

This means two things. First, we're going to have to create a special case for universal search on google.com so that it isn't mixed up with image search from international properties. Second, we're going to have to change the medium on the international image search traffic so that it is properly attributed to organic.

Finally lets take a look at Bing.

Bing Image Search Referrer

This is pretty straight forward and doesn't change based on whether it's from image search proper or via a universal result.

Google Analytics Image Search Filters

If you know the referrer patterns you can set up some Google Analytics filters to capture and reclassify this traffic into the appropriate buckets. Here's the step-by-step way to do that.

From Google Analytics click Admin.

Google Analytics Admin

That takes you to a list of profiles.

Google Analytics Select or Create a Profile

Here you can either create a new profile or select a current one. I'd suggest creating a new profile to test this out before you decide to integrate it into your primary profile. Because you might screw it up or just may not like the detail or may not want to have the change in continuity. That said, I've created these filters so they'll have the least amount of impact on your reporting while still delivering added insight.

Next you'll reach the profile navigation pane where you'll want to click on Filters.

Google Analytics Filters 2014

At that point you'll want to go ahead and click the red New Filter button.

Google Analytics Red New Filter Button

That's when the real fun begins and you construct a new advanced filter.

Creating a Google Analytics Google Image Search Filter

The first step is to name this filter. This won't show up in your reports and is simply a way for you to know what that filter is doing. So make it descriptive and obvious.

Next you'll want to select the Custom filter button (2) which then reveals a list of options. From that list you'll want to select Advanced (3). This is where it gets a bit tricky.

In step 4 you'll select Referral from the menu of options and then apply some RegEx to match the pattern we've identified. In this instance the RegEx I'm using is:

.*google\.(.*)/url.*source=images.*

I love RegEx, which stands for Regular Expression, but I don't always get it right the first time and regularly rely on this RegEx cheat sheet to remind and guide me. In this instance I'm looking for all Google domains (and  including any international domain using the new image search here) with /url and source=images within the referrer.

In step five you're selecting what you're going to do when a referrer matches your RegEx. I've chosen Campaign Source from the menu and then created a new source called 'google images'. You can name these whatever you like but I keep them lowercase to match the other sources.

You'll note that the 'Override Output Field' is set to Yes which means that I'm going to change the Campaign Source for those that match this referrer pattern from what it is currently to 'google images'. The great part about this is that you retain the fact that the medium is 'organic'. So all those reports remain completely valid.

Finally, you click Save and then you wait for the filter to be applied to traffic coming into the site. Depending on the amount of traffic you get from these sources, it may take a few hours to a few days to see the filter working in your reports.

Next we have to put into place a filter for Google universal images, Google images from international properties not using the current image search UX as well as Bing images.

The RegEx for Google universal search images is:

.*google.com/imgres.*

Note that I'm only looking to match referrers coming from google.com so that I'm not mixing international image search with US universal image search.

The RegEx for Google international search is crazy long and didn't really work pasted here. So instead you can click here to copy and paste the Google 'International' image search filter RegEx.

Now, many of the domains won't match because they're using the new version of image search, which will match the first filter we created. But I figured I'd just be as inclusive as possible instead of validating the current image search UX on each domain. (I mean, it's wicked time consuming too.)

Finally, the RegEx for Bing images is:

.*bing\.(.*)/images/search.*

But we're not done! Close, but not quite.

Changing Google Analytics Medium Filters

So after having these filters in place for a while I noticed that some of the new sources I created were showing up as a medium of 'referring' instead of 'organic. That means you're still short-changing your organic efforts because Google is passing the wrong medium in their cookie.

So you have to create two new filters that change the medium of Google universal images and Google international images.

Google Analytics Filter to Change Medium

This is another Advanced filter but this one is much simpler but must be very precise. In Field A  you're looking for the Campaign Source that exactly matches the source you created in the filter. For me, that means 'google international images' and 'google universal images'. For you, it's whatever you named the new sources.

Then you're simply outputting and overriding the Campaign Medium to organic. Remember, you'll create two of these. One for the 'international' images and one for 'universal images'. My guess is that you might only need the one but I want to cover my bases.

To simplify, all your doing here is looking for the sources you created and then making sure that the medium associated with those sources is changed to organic.

Image Search Filter Order

The final step is to make sure that your filters are in the right order. The last two filters that change the medium based on a specific campaign source (that you created) must come at the end.

Google Analytics Google Image Search Filter Order

This makes sense right? You couldn't match a source that you hadn't already created, right? Stick to this order and you'll ensure image search traffic is tracked appropriately.

Image Search Reports

So what do you get to see in the reports?

Image Filters Create Better Google Analytics Reports

This is data from a client site where I've had all the filters in place for a few days. The medium for all of these is still organic but I've now got new sources for google images, universal images and bing images. (Update on 4/5/14) I've been using these filters successfully for a year now.

What you should see right away is the very large difference in how this traffic performs. Image search traffic in this instance has a 1.5 Pages/Visit and 3:00 Avg. Visit Duration while the web based organic traffic has a 6 Pages/Visit and 6.00 Avg. Visit Duration.

Most importantly, the conversion rate on these two types of traffic is different as well. Segmenting your image search traffic can bring more clarity to your analysis and help you make the right decisions on what's working, how to allocate resources and what to optimize.

Image Search Filter Validation

So how do I know this is really working? I drill down into one of these new sources and then select keyword as the secondary dimension. Did I forget to mention that the keyword data remains in tact?

Google Analytics Universal Images Keyword Report

Yup, sure does! So the next step here is to see if there really is a universal result for these keywords.

Google Search Result for Badass Over Here Real Pic

Sure enough, I'm the second result in this universal search result. Now lets see if the filter for normal image search is working.

Google Analytics Google Images Keyword Report

I'll use 'wifi logo' as my target term and first go to make sure that I'm not showing up in universal search results.

Google Search Result for Wifi Logo

Nope, not showing up there. But am I showing up in Google image search?

Google Images Search Results for Wifi Logo

Sure enough I'm there just inside the top 100 results from what I can tell. So I'm pretty confident that the filter is catching things and bucketing them appropriately. I've also validated this with very robust client data but can't share that level of detail publicly.

What Is images.google?

You might have noticed the images.google source above. What's that you ask? I don't know. But I don't think it's traditional image search traffic since the user behavior of that source doesn't conform to the other three image based sources. It's also a small source of traffic so while my OCD senses are tingling I'm currently ignoring the urge to figure out exactly what images.google represents.

Tell me if you figure it out.

Caveats

You Raise a Valid Point Ice Cream

The big question is why I wouldn't just use the Google Webmaster Tools queries report and filter by image right? Well first off, the integration into Google Analytics still isn't where I'd like it to be making any type of robust reporting near impossible.

In addition, I don't like mixing image search traffic with web search traffic in my normal reports because they're so different. It makes any analysis you do using that mixed data less precise and prone to unintentional error.

More problematic is the fact that the data between Google Webmaster Tools and Google Analytics doesn't match up.

I started looking at specific keywords via my filters versus what was reported in Google Webmaster Tools. There were just too many times when Google Webmaster Tools reported material amounts of traffic that wasn't showing up in my Google Analytics reports.

Google Webmaster Tools Clicks

Here you can see that the top term received 170 clicks in this time frame. Yet during the same time frame here's what the Google Analytics filter based method reports.

Google Analytics Image Based Clicks

170 versus 24! Even if I factor in the (not provided) percentage (which runs about 35% for this client) and add that back in I only get close to 40 visits.

But that's when the lightbulb went off. Maybe Google Analytics is reporting Visits while Google Webmaster Tools is reporting Clicks?

While I can't confirm this I'm guessing that Google Webmaster Tools is counting all clicks on a result. Many of those clicks are going directly to the image and not the page the image resides on. That's important since direct clicks to the image (i.e. - .jpg files and the like) aren't going to be tracked in Google Analytics as a visit. There is no Google Analytics code on these files. The delta between the two could be the number of users who clicked directly to the image.

In addition, this method doesn't catch any of the mobile clicks and visits since no image search visits (and very few universal images) show up using this filter when looking at mobile traffic. I'm pretty sure that the referrers are just getting stripped and these wind up going into direct instead which is part of the iOS and Android 4+ search attribution issue. (If someone else has an explanation here or finds a different referrer for mobile image search please let me know.)

Finally, there's something funky with Chrome. When I look at the distribution of traffic to each bucket Chrome is an outlier for Google images.

Image Filters Browser Distribution

That 3.7% is just way out of proportion. And it's not related to the amount of (not provided) traffic since Firefox actually has a higher percentage of (not provided) 72% than Chrome (64%) in this instance. So I can only conclude that there's some amount of data loss going on with Chrome. Maybe that also contributes to the discrepancy I see between Google Analytics and Google Webmaster Tools.

This got even worse as of January when Chrome stopped passing rich referrer information.

Image Search by Browser

I can only guess that this is part of Google's security and privacy efforts. Sadly, it means you're capturing a lot less detail about image search and your data will be less accurate because of it.

Despite all of these caveats I love having the additional detail on image traffic which has wildly different intent and user behavior. Some insight is better than none.

TL;DR

Apply a few simple Google Analytics filters to gain insight into how much traffic you're getting through image search. This is increasingly important as the Internet becomes more visual and the user behavior of these visits differs in material ways from traditional search traffic.

Bing People Snippets

March 19 2013 // SEO // 10 Comments

This morning (thanks to a tip from Search Engine Roundtable) I began researching what looked like authorship snippets on Bing. While it's only been an hour or so here's what I've seen and what I think I've figured out.

People Snippets

The new faces (for the most part) showing up in Bing search results are not authorship snippets per se but are people snippets derived from entities. It's about who the content is about rather than who created the content.

If you haven't seen them already here's what one looks like when you search for Lauren Cohan.

Bing People Snippets for Lauren Cohan

They look remarkably like the authorship snippets that Google has implemented but they're most certainly different in their application.

Structured Data?

The first assumption here is that Bing might be using structured data to present these new snippets. Perhaps they're using the person attribute in schema.org markup?

Structured Data Results for Lauren Cohan page

Not so much. There's no structured data on this page and I've found plenty of others getting the people snippet that are devoid of mark-up. So if Bing isn't using structured data, what are they using to match and identify people?

People Pages

Clearly they rely heavily on sources such as Wikipedia, LinkedIn and Freebase. But they seem to be expanding their data sources on people to other sites and specific pages.

Simon Le Bon People Snippets on Bing

Searching for Simon Le Bon you'll find that a people snippet appears for Wikipedia, IMDb and Biography. Wikipedia is a no-brainer and IMDb makes a good deal of sense too. Biography was the surprising one.

I noted that both IMDb and Biography had namespaces or folders (underlined in red) that seemed to be easy identifiers for entities. So I decided to look for more sources and I found them. Lots of them.

CrunchBase

CrunchBase People Snippet for Jason Calacanis

MySpace

MySpace People Snippet for Pete Myers

NBA.com

NBA.com People Snippet for Pete Myers

Quora

Quora People Snippet for Jessica Guynn

TED

TED People Snippet for Seth Godin

ESPN

ESPN People Snippet for Claude Giroux

The Canadian Encyclopedia

Canadian Encyclopedia People Snippet for Douglas Coupland

Amazon

Amazon People Snippet for Tony Basil

MTV

MTV People Snippet for Paula Abdul

Last.fm

Last.fm People Snippet for Kim Carnes

Forbes

Forbes People Snippet for Mark Cuban

NNDB

NNDB People Snippet for Alan Greenspan

Facebook

Facebook People Snippet for Matthew Inman

Twitter

Twitter People Snippet for Neil deGrasse Tyson

Yahoo! Movies

Yahoo Movies People Snippet for Will Ferrell

Hollywood.com

Hollywood.com People Snippet for Will Ferrell

AskMen

AskMen People Snippet for Will Smith

FriendFeed

FriendFeed People Snippet for Louis Gray

TV Guide

TV Guide People Snippet for Andrew Lincoln

Comedy Central

Comedy Central People Snippet for Daniel Tosh

Most of these either have a namespace that makes it easy to identify as a person or are clear profiles in the case of MySpace and FriendFeed. Whether it's 'player', 'artist', 'celebrities', 'person', 'profiles' or 'speakers' it seems like Bing has determined pages that match these specific entities.

About Pages

People snippets show up far more often on about pages which supports the idea that Bing is looking for high confidence entity pages and not assigning real authorship.

Blind Five Year Old People Snippet for AJ Kohn

As you can see I get a people snippet on my about page but not on my site as a whole. Nor do I get it returned on any of my content. Here's another example.

0at People Snippet for Matthew Inman

Again, the about page on Matthew Inman's now defunct site is given a people snippet while the site as a whole isn't. The people snippet is showing pages about that entity, not authored by that entity. It just so happens that there's some overlap in those areas.

Sorta Structured Data

Many of these pages have a rich amount of data on them. While they aren't marked-up with any structured data per se, search engines can clearly parse and use that information. Here's a people snippet via Green Day Authority.

Green Day Authority People Snippet for Bille Joe Armstrong

That page has no structured data mark-up but it has structure.

Green Day Authority Page

Characters?

Further pushing on the choice of pages to use to apply the people snippet I began to search for characters. First Harry Potter and then Derek Zoolander.

Derek Zoolander Bing Results

No people snippets are applied even though it's still pulling from IMDb. The difference here is that it's plucking out a title page and a character page instead. Maybe that's not how it works but that's how my pattern matching mind sees it right now.

[Update 3/21/13]

ChaosSEO noted that he could get character names to render people snippets. Sure enough, you can.

People Snippet for Olivia Dunham on Bing

And ...

People Snippets for Jean-Luc Picard on Bing

I tend to think that there's some special casing going on with IMDb so that it only applies the snippet to the name pages, but if you get a snippet to render for an IMBb character page please let me know.

Going through characters was actually really instructive. First I began to see that there were associations between the entities of person and character.

People Snippets for Hermoine Granger on Bing

A search for Hermione Granger produces people snippets and a result for Emma Watson. Clearly there's some understanding that the two are related. You can get that same dynamic for a number of character searches such as Gandalf or Chewbacca.

Gandalf People Snippet on Bing

Chewbacca Result on Bing

Finally, I found a result that makes me very confident that this is not authorship at all but entity detection.

Han Solo People Snippets on Bing

Clearly Harrison Ford (or Han Solo) is not the author of these pieces but the subject of them.

Decentralized Images

There are few instances where a site will get a people snippet. This seems to be rare and only occurs when Bing has high confidence that they have the right person.

Bing Seth Godin Results - Different Faces

Here we can see that a people snippet is applied to Seth's site but that the image is pulled from that site and not from some central database. This provides some variety in what is displayed but also leads to some errors from time to time.

Bing Result for Jason Calacanis

So Bing seems confident that they have the right person associated with that site but the image they pulled is not Jason. It's Wesley Chan.

Authorship?

Is this a form of Authorship? Sorta, kinda, not really. Sometimes you'll see what looks like a people snippet pop up on a content page.

Tim Gunn People Snippet on HuffPo Article

Tim is the author of that piece so there's a chance that they've identified and are trying to present Authorship based on that fact. But it's more likely they just identified him as an entity. Because the images are decentralized it pulls what it can from that article.

Mathew Ingram People Snippet

Same thing happens with this piece by Mathew Ingram. In both cases there is structured data on that page that would indicate that each is the author of that piece (though they both don't have Google Authorship working.)

So it's not true authorship and very few pieces of content have a people snippet right now but if Bing decides to follow this path, make the connections with all of their datasets (including their social sidebar results) then you could see Bing being a legitimate authorship platform.

Right now it seems like the snippets on content results are more like a side effect of people identification.

TL;DR

Bing has introduced people snippets that look like Google's Authorship snippets but are more focused on identifying people as entities through a variety of sources rather than assigning authorship of content. For now.

Build Your Authority Not Your Author Rank

March 18 2013 // SEO // 49 Comments

It's been a frustrating few weeks of discussion about Authorship and Author Rank.

Are We There Yet?

Here I will present a few things that may give you some more context overall and, in particular, my point of view on things.

Social Computing Research

Just the other day Google revealed that it gave $1.2 million dollars in awards to those undertaking social computing research.

We know that interactions on the Web are diverse and people-centered. Google now enables social interactions to occur across many of our products, from Google+ to Search to YouTube. To understand the future of this socially connected web, we need to investigate fundamental patterns, design principles, and laws that shape and govern these social interactions.

We envision research at the intersection of disciplines including Computer Science, Human-Computer Interaction (HCI), Social Science, Social Psychology, Machine Learning, Big Data Analytics, Statistics and Economics. These fields are central to the study of how social interactions work, particularly driven by new sources of data, for example, open data sets from Web2.0 and social media sites, government databases, crowdsourcing, new survey techniques, and crisis management data collections. New techniques from network science and computational modeling, social network and sentiment analysis, application of statistical and machine learning, as well as theories from evolutionary theory, physics, and information theory, are actively being used in social interaction research.

We’re pleased to announce that Google has awarded over $1.2 million dollars to support the Social Interactions Research Awards, which are given to university research groups doing work in social computing and interactions. Research topics range from crowdsourcing, social annotations, a social media behavioral study, social learning, conversation curation, and scientific studies of how to start online communities.

What this says to me is that Google is intensely interested in understanding how to use social interaction data. But they're not there yet. And why should they be? They've been working on link based signals and refinement for over 10 years but haven't delved into social data until the last few.

This is a discipline that they are far from fully understanding. I can't help but pick out words like 'investigate', 'envision' and 'new'. This is a post about the exploration of the effects of social interaction on a host of fields. These are not papers as to their conclusions.

But we do have a few of those papers, areas where Google has begun to learn about how social interactions or signals might impact search. Lets take social annotations as an example.

Social Annotations in Web Search

Social Annotations and Snippet Length Chart

This research was presented at the 2012 Conference on Human Factors in Computing Systems.

Remember when our SERPs had a whole bunch of smaller faces in them and other various social gestures? Well, Google found that those didn't work. We hardly noticed them and when we did we didn't always believe they added value.

In fact, the only thing that really did was the Authorship snippet. It's a very interesting read if you're interested in design and authority. The way we see results today is clearly influenced by this research and you can see Google learning more about how social connections and expertise work within search.

This study revealed a counter-intuitive result. Despite having the names and faces of familiar people, and despite being intended to be noticeable to searchers, subjects for the most part did not pay attention to the social annotations.

Our questions about contact closeness, expertise, and topic were answered by the reactions captured during the retrospective interviews. These interviews revealed the importance of contact expertise and closeness, and the importance of the search topics in determining whether social signals are useful, thus echoing past findings on the role of expertise in social search.

I walk away thinking that all of this is much tougher than we believe peering in from the outside. That and Google is at the start of this research, not the end.

Knowing that they aspire to understand these dynamics also makes the closure of Google Reader odd since there is a substantial amount of data that could be mined there, all tied back to identity and, by extension, topical expertise.

Whisper Down The Lane

I had a chance to speak on a panel at SMX West with Mike Arnesen and Lisa Weinberger about Authorship, Author Rank and Authority.

Overall, authorship and the potential for Author Rank was a hot topic that spilled out into multiple other sessions. Both Matt Cutts and Duane Forrester were asked about link based signals versus social signals. You could tell they are both tired of this question. Paraphrasing, they essentially said that while social signals are intriguing they're not nearly as far along as we in the industry might believe (or want).

When prodded about the collapse of the link graph they noted that the link graph was just fine thank you very much. Link manipulation, the intent behind linking that we feel is so perverted, is not nearly as rampant as we assume. The mainstream blogger or site owner is linking for the right reasons. In short, the link graph is still valuable and with lower friction to producing digital content it may actually improve as more laypeople become content producers.

That's not to say that social signals aren't important but it will be a complement to or a refinement of the link graph, not a replacement. This is something I discussed in my original Author Rank post.

If we believe that search engines still view the link graph as viable there may be ways to simply use Authorship to make the link graph more accurate. Think of Authorship as meta information passed on every link. When looking for information on cancer the link given to an article from an established oncologist at a world renowned hospital would likely confer more value than a link to an article from 'screwcancer888' at a Q&A site.

In some ways this reminds me of delegating authority which Bill Slawski (always insightful) wrote about back in late 2010. What we're really talking about is identifying expertise and allowing those experts to help curate our view of those topics where it matters - in search results.

Parsing Statements

So You're Telling Me There's A Chance?

It's enticing to pick apart responses and statement by Googlers when they are asked to comment on Author Rank. The fact is that they're not going to divulge much or commit one way or the other (at least publicly). They've been burned before by saying something that is true but interpreted in different ways.

So when asked, of course they're going to reply that it's something they're experimenting with (because they do aspire to use the data) but that it is currently not a direct ranking signal and nothing to worry about now.

Of course that leads everyone to look for the experiments, to look for indirect ranking signals and to take the 'now' as a declaration of sorts for future implementation.

Authorship could be an indirect signal if you believe (like I do) that the click through rate (CTR) on a result can provide a positive feedback signal. And we know the CTR on authored results disrupts the normal click distribution of a SERP. Of course Google could take into account the Authorship snippet and normalize the CTR impact. So perhaps it isn't having that indirect impact. See how confusing it can get?

Just for fun, let us think what would transpire if a Googler simply said there is no such thing as Author Rank without any hedging or caveats. People would start to conflate that with Authorship, potentially reducing the adoption rate. Many would interpret it to mean that Google had abandoned author based weighting completely. Thus, when Google did figure it out and apply it the industry would point to the statement and shout 'liar' at the top of their lungs.

We've trained Google to provide us with these elliptical statements. I choose to view them through this lens.

What To Look For?

That's not to say that we shouldn't be interested in the topic. I like the testing Terry Simmonds is doing on the mechanics of Authorship because it documents how Google is trying to extend the mark-up to more of the content on the web. And that's a constraint as far as I can tell right now. Conversations about the inability to roll out updates because of low adoption are not uncommon.

You can't begin to rank results based on topical expertise if many of the experts aren't included in the selection criteria. The participation rate in Authorship has to be such that using it would provide a materially better ranking of content. Reports have Authorship coverage as low as 9% and as high as 17%. That's not a lot really and both studies are limited based on the relatively small data sets analyzed.

The problem? If you were to want information on astrophysics you'd probably want to include Neil deGrasse Tyson in those results. Yet, he's not on Google+ (as far as I can tell) and isn't part of the Authorship program.

Looking at how Google is trying to assign Authorship is important.

The mechanics and the indirect Authorship Google often grants is particularly intriguing. I noted that Jonathon Colman was receiving a bounce back Authorship link on a SlideShare URL for which no direct Authorship mark-up was present.

Indirect Authorship

I recall seeing this in the past on URLs from Quora, FriendFeed and Flickr. I swear some of these used to show up in Author Stats but I haven't seen them lately (except for FriendFeed which I see at the tail end of my list.)

In fact, the bug that took Author Stats down might have been the exposure of indirect Authorship based on high confidence in matching public social graph data to Google+ profiles. Rapleaf got the brunt of the ire for crawling the public social graph but Google clearly has and continues to use this information even though the social circles feature has been retired.

Looking today I see another interesting URL showing up in Author Stats - Twitter.

Twitter Discussion Gets Authorship

There's quite a lot of evidence that Twitter is a fairly well trusted source of indirect Authorship, but that's a post for another day. However, we can also look at the verbiage in the Structured Data Testing Tool, which has changed within the last few weeks.

Authorship rel=author Structured Data Testing Tool Results

The points of interest here are the '(direct or indirect)' verbiage as well as the fact that the tool only checks the first rel=author link listed on a webpage.  The former certainly makes me believe that assigning Authorship based on indirect links is important to Google.

The latter tells me two things. First that the tool should not be trusted as the final arbiter of whether the correct Authorship is or will be applied. Second that Google obviously sees multiple authors or entities (or agents) on the page.

Lets go a step further. Google's new Social Sign-In can be construed as a portable digital signature which might allow Google to rely on comments and other content produced outside of Google+. So tracking how this is rolled out and whether the reviews that now flow under your profile are also granted Authorship are interesting developments.

I've been eager to see Author Rank implemented since I first saw Matt Cutts interview Steven Levy.

This actually predates Authorship and the follow-up question by Matt (along with a bit of body language) makes it clear that Google was thinking about this seriously. While I absolutely do look for connections and patterns that might paint a picture of the future I'm not looking for it behind every corner and trying to fit Author Rank into each and every odd result or anecdote.

Authority

You Will Respect My Authority!

I prefer to talk about how people might build authority rather than how they would build Author Rank. Just as links are the result and not the goal, Author Rank will be the result and not the goal of your efforts.

Discussions about what makes someone an authority and how Google might want to translate that into math are fascinating. What makes someone authoritative versus popular? Is there a difference? If so, how would you go about separating the two?

How do you map the decline of authority? Of someone who is no longer really an expert and just mailing it in? Can you identify this even if they remain popular? How can you tell if someone is endorsing content based on merit or friendship? Is it what you know or who you know?

Furthermore, you could find that one was popular for the wrong reasons. Would you want to rank someone highly who simply fanned the flames of dissent and created controversy? The tone and type of interaction will be important so sentiment analysis and other processes will need to determine how to use social interaction as a reliable signal.

Influence

We're Dealing With A Badass Over Here

And how does influence fit into this equation? One can be influential without being popular, but clearly being popular gives you a better chance of being influential just by sheer reach. Can you be influential without being an authority? I think so. Just look at Jenny McCarthy and her influence within the anti-vaccine movement.

The latter clearly strays into the subjective nature of quality, relevance and authority that I touched on after the Panda update. Personalization helps to ensure that your subjective view of authority is reflected back to you. That's why search results are changed based on who you follow on Google+. And personalization of search results is the most important thing about Google+ in my view.

But in discussing how Google might identify authority and expertise, we're dealing with the aggregate. So the question isn't really about your personal view (which is reflected back in Search+ results) but how the aggregate views different figures and authorities.

Of course, being likable is part of the way you can obtain authority. And it is often not what you say, but how you say it (or present it) that gets you noticed. So part of building authority is in ensuring that you can communicate in a way that conveys that expertise but also makes it accessible and ... memorable.

Yes, I see all of this as being related because the same content presented in comic sans without any images or paragraph breaks wouldn't have nearly the same impact and would not, ultimately, convey authority. Even though the actual words are the same!

I had a similar conversation with Dan Shure where he wondered about the impact of publishing content from Rand Fiskin under somebody else's name. Would it get as much 'play' and be received as well? I doubt it. So what does that say about the connection of authority, popularity and quality assessment?

These are just a few of the things that make this topic so incredible.

TL;DR

I believe Google wants to use Author Rank but I also believe that it's far more difficult than we think. Focusing solely on Author Rank may blind us to tracking Google's progress and building what is truly important. Authority.

New Ways To Track Keyword Rank

January 13 2013 // Analytics + SEO // 83 Comments

Tracking keyword rank is as old as the SEO industry itself. But how you do (and use) it is changing. Are you keeping up?

This post covers how I create and use rank indexes and introduces a new and improved way to track rank in Google Analytics.

Rankaggedon

In December of 2012 both Raven and Ahrefs made the decision to shut down their rank tracking features because they violated Google's Terms of Service. The reaction from the SEO industry was predictable.

WTF LOLcat

The debate about why Google began to enforce the TOS (I think it has to do with the FTC investigation) and the moaning about how unfair it is doesn't interest me. Both SEOmoz and Authority Labs still offer this service and the way many use rank needs to change anyway.

Every obstacle is an opportunity. Trite but true.

Is Rank Important?

To be honest, I don't use rank that much in my work. This has to do with a combination of the clients I choose to work with and my philosophy that increasing productive traffic is the true goal.

Yet, you'd have to be soft in the head not to understand that securing a higher rank does produce more traffic. Being on the first page matters. Getting in the top three results can produce significant traffic. Securing the first position is often a huge boon to a business. Duh!

But rank is the extrinsic measurement of your activities. It's a Google grade. Rank isn't the goal but the result.

Unfortunately, too many get obsessed with rank for a specific keyword and spend way too much time trying to move it just one position up by any means necessary. They want to figure out what the teacher is going to ask instead of just knowing the material cold.

Rank Indexes

So how do I use rank? I create rank indexes.

A rank index is the aggregate rank of a basket of keywords that represent a type of query class that have an impact on your bottom line. For an eCommerce client you might have a rank index for products and for categories. I often create a rank index for each modifier class I identify for a client.

Usually a rank index will contain between 100 and 200 keywords that represent that query class. The goal is to ensure that those keywords reflect the general movement of that class and that changes in rank overall will translate into productive traffic. There's no sense in measuring something that doesn't move your business.

If that rank index moves down (lower is better) then you know your efforts are making a difference.

Executives Love Indexes

Business Cat

A rank index is also a great way to report to C Level executives. These folks understand index funds from an investment perspective. They get this approach and you can steer them away from peppering you with 'I did this search today and we're number 4 and I want to be number 1' emails.

It becomes not about any one term but the aggregate rank of that index. That's a better conversation to have in my opinion. A rank index keeps the conversation on how to move the business forward instead of moving a specific keyword up. 

Getting Rank Index Data

If you're using SEOmoz you export the entire keyword ranking history to CSV.

SEOmoz Export Full Keyword History to CSV

After a bit of easy clean up you should have something that looks like this in Excel.

SEOmoz Keyword History Raw Data

At this point I simply copy and paste this data into my prior framework. I've already configured the data ranges in that framework to be inclusive (i.e. - 50,000 rows) so I know that I can just refresh my pivot table and everything else will automagically update.

If you're using Authority Labs you'll want to export a specific date and simply perform the export each week.

Authority Labs Keyword Ranking Export

There's a bit more clean up for Authority Labs data but in no time you get a clean four column list.

Authority Labs Keyword Data

Unlike the SEOmoz data where you replace the entire data in your framework, you simply append this to the bottom of your data. Once again, you know the pivot table will update because the data range has been configured to be quite large.

Creating The Rank Index Pivot Table

You can review my blow by blow of how to create a pivot table (though I'm not using a new version of Excel so it all looks different anyway.) It's actually a lot easier now than it was previously which is something of a miracle for Microsoft in my view.

Keyword Rank Index Pivot Table

You'll use the keyword as your row label, date as the column label and the Average of rank as the values. It's important to use a label so you can create different indexes for different query classes. Even if you only have one index, use a label so you can use it as a filter and get rid of the pesky blank column created by the empty cells in your data range.

You may notice that there are a lot of 100s and that is by design.

Keyword Rank Index Pivot Table Options

All those non-ranked terms need to be counted somehow right? I chose to use 100 because it was easy and because Authority Labs reports up to (and sometimes beyond) that number.

Turning Rank Data Into A Rank Index

Now that you have all the rank data it's time to create the rank index and associated metrics.

Keyword Rank Index Calculated Data

Below the pivot table it's easy to use a simple AVERAGE function as well as various COUNTIF functions to create these data points. Then you can create pretty dashboard reports.

Keyword Rank Index Reports

Average Rank is the one I usually focus on but the others are sometimes useful as well and certainly help clients better understand the situation. A small caveat about the Average Rank. Because you're tracking non-ranking terms and assigning them a high rank (100) the average rank looks a bit goofy and the movement within that graph can sometimes be quite small. Because of this you may wind up using the Average of Ranking Terms as your presentation graph.

Average of Ranking Terms Graph

I don't care much about any individual term as long as the index itself is going in the right direction.

Projecting Traffic

I can always look at the details if I want and I've also created a separate tab which includes the expected traffic based on the query volume and rank for each term.

Rank Index Traffic Projections

This simply requires you to capture the keyword volume (via Google Adwords), use a click distribution table of your choosing and then do a VLOOKUP.

IFERROR(([Google Adwords Keyword Volume])*(VLOOKUP([Weekly Rank],[SERP Click Distribution Table]),2,0)),0)

You'll need to divide by 4 to get the weekly volume but at that point you can match that up to real traffic in Google Analytics by creating a regex based advanced segment using the keywords in that index.

Of course, you have to adjust for (not provided) and the iOS attribution issue so this is very far from perfect. And that's what got me really thinking about whether rank and rank indexes could be relied on as a stable indicator.

What is Rank?

What Is Love Night at the Roxbury

The rise in (not provided) and the discrepancies often seen between reported rank volume and the traffic that shows up point to the increase in personalization. SERPs are no longer as uniform as they once were and personalization is only going to increase over time.

So you might have a 'neutral' rank of 2 but your 'real' rank (including context and personalization) might be more like a 4 or 5.

That's why Google Analytics rank tracking seems so attractive, because you can get real world ranking data based on user visits. But that method is limited and makes reporting a huge pain in the ass. The data is there but you can't easily turn it into information ... until now.

Improved Google Analytics Rank Tracking

I got to talking to Justin Cutroni (a really nice and smart guy) about the difficulties around tracking rank in Google Analytics. I showed him how I use rank indexes to better manage SEO efforts and over the course of a conversation (and a number of QA iterations) he figured out a way to deliver keyword rank the way I wanted in Google Analytics.

Keyword Rank Tracking In Google Analytics with Events

Using Events and the value attached to it, we've been able to create real keyword rank tracking in Google Analytics.

The Avg. Value is calculated by dividing the Event Value by Total Events. You could change this calculation once you do the export to be Event Value by Unique Events if you're concerned about those users who might refresh the landing page and trigger another Event. I haven't deployed this on a large site yet to know whether this is a real concern or not. Even if it is, you can always change it in the export.

Keyword Rank Tracking Data via Analytics Events

So you can just make Avg. Value a calculated field and then continue to tweak the exported data so that it's in a pivot table friendly format. That means adding a date column, retaining the Event Action column but renaming it keyword, adding a Tag column, and retaining the Avg. Value column.

You essentially want it to mimic the four column exports from other providers. I suppose you could keep a bunch of this stuff in there and not use it in the pivot table too. I just like it to be clean.

Event Based Rank Tracking Code

Start tracking rank this way on any Google Analytics enabled site by dropping the following code into your header.

Google Analytics Rank Tracking Code

To make it easier, the code can be found and copied at jsFiddle. Get it now!

Just like the old method of tracking rank in Google Analytics, this method relies on finding the cd parameter (which is the actual rank of that clicked result) in the referring URL. This time we're using Event Tracking to record rank and putting it in a field which treats it as a value.

The code has also been written in a way to ensure it does not impact your bounce rate. So there's no downside to implementation. You will find the data under the Content > Events section of Google Analytics.

Where To Find Average Rank in Google Analytics

Just click on Content, Top Events and then RankTracker and you'll find keyword ranking data ready for your review.

Google Analytics Rank Indexes

I've been working at applying my index approach using this new Event based Google Analytics rank tracking data. The first thing you'll need to do is create an advanced segment for each index. You do this by creating a regex of the keywords in that index.

Rank Index Regex Advanced Segement

Sometimes you might not get a click on a term that is ranked 20th and certainly not those that are ranked 50th. That's a constraint of this method but you can still populate an entire list of keywords in that index by doing a simple VLOOKUP.

IFERROR(VLOOKUP(A1,'Export Event Data'!$A$1:$E$5000,5,FALSE),100)

The idea is to find the keyword in your export data and report the rank for that keyword. If the keyword isn't found, return a value of 100 (or any value you choose). From there it's just about configuring the data so you can create the pivot table and downstream reports.

Caveats

You Raise a Valid Point Ice Cream

This new way of tracking is different and has some limitations. So lets deal with those head on instead of creating a grumble-fest.

The coverage isn't as high as I'd like because of (not provided) and the fact that the cd parameter is still only delivered in about half of the referrers from Google. I'm trying to find out why this is the case and hope that Google decides to deliver the cd parameter in all referrers.

Full coverage would certainly increase the adoption of rank tracking in Google Analytics and reduce those seeking third party scraped solutions, something Google really doesn't like. It's in their self-interest to increase the cd parameter coverage.

As an aside, you can get some insight into the rank of (not provided) terms and match those to landing pages, which could be pretty useful.

Rank of Not Provided Terms by Landing Page

The other limitation is that you only get the rank for those queries that received clicks. So if you're building a rank index of terms you want to rank for but aren't and track it over time it becomes slightly less useful. Though as I've shown above you can track the average of ranking terms and of the index as a whole at the same time.

One of the better techniques is to find terms that rank at 11 to 13 and push them up to the front page, usually with some simple on-page optimization. (Yes, seriously, it's way more effective than you read about.) So this type of tracking might miss a few of these since few people get to page 2 of results. Then again, if you see a rank of 11 for a term with this tracking that's an even higher signal that getting that content to the front page could be valuable.

Finally, the data configuration is, admittedly, a bit more difficult so you're working a tad harder to get this data. But on the other hand you're seeing ranking data from real users. This could get really interesting as you apply geographic based advanced segments. Larger organizations with multiple locations might be able to determine which geographies they rank well in versus those where they're struggling.

And not Or

At this point I can't say that I'd scrap traditional rank tracking techniques altogether, though I'm sure Google would like me to say as much. Instead, I think you should use the new Google Analytics Event Based Rank Tracking in conjunction with other ranking tools.

First off, it's free. So there's no reason not to start using it. Second, you get to see real world rank, which while limited in scope can be used to compare against neutral rank offerings. Lastly, if you're trying to future proof your efforts you need to be prepared for the potential end to traditional ranking tools or such high variation in personalization to make them unreliable.

Did I mention this new rank tracking method is free?

I'm looking forward to putting this into practice and comparing one tracking method to the other. Then we'll see the potential variance between personalized ranking versus anonymized ranking.

TL;DR

The closure of recent third-party rank tracking services is an opportunity to think about rank in a different way. Using a rank index can help keep you focused on moving the business forward instead of a specific keyword. To future proof your efforts you should implement improved Google Analytics rank tracking for free.

2013 Internet, SEO and Technology Predictions

December 31 2012 // Advertising + Marketing + SEO + Social Media + Technology // 15 Comments

I've made predictions for the past four years (2009, 2010, 2011, 2012) and think I've done pretty well as a prognosticator.

I'm sometimes off by a year or two and many of my predictions are wrong where my predictions were more like personal wishes. But it's interesting to put a stake in the ground so you can look back later.

2013 Predictions

2013 Predictions Crystal Ball

Mobile Payment Adoption Soars

If you follow my Marketing Biz column you know I'm following the mobile payments space closely. Research seems to indicate that adoption of mobile payments will take some time in the US based on current attitudes.

I believe smartphone penetration and the acceptance of other similar payments such as app store purchases and Amazon Video on Demand will smooth the way for accelerated mobile payment adoption. Who wins in this space? I'm still betting on Google Wallet.

Infographics Jump The Shark

Frankly, I think this has already happened but perhaps it's just me. So I'm going to say I'm the canary in the coal mine and in 2013 everyone else will get sick and tired of the glut of bad Infographics.

Foursquare Goes Big

The quirky gamification location startup that was all about badges and mayorships is growing up into a mature local search portal. I expect to see Foursquare connect more dots in 2013, making Yelp very nervous and pissing off Facebook who will break their partnership when they figure out that Foursquare is eating their local lunch.

Predictive Search Arrives

Google Now is a monster. The ability to access your location and search history, combined with personal preferences allows Google to predict your information needs. Anyone thinking about local optimization should be watching this very closely.

Meme Comments

A new form of comments and micro-blogging will emerge where the entire conversation is meme based. Similar to BuzzFeed's reactions, users will be able to access a database of meme images, perhaps powered by Know Your Meme, to respond and converse.

Search Personalization Skyrockets

Despite the clamor from filter bubble and privacy hawks, Google will continue to increase search personalization in 2013. They'll do this through context, search history, connected accounts (Gmail field trial) and Google+.

The end result will be an ever decreasing uniformity in search results and potential false positives in many rank tracking products.

Curation Marketing

Not content with the seemingly endless debate of SEO versus Inbound Marketing versus Content Marketing versus Growth Hacking we'll soon have another buzzword entering the fray.

Curation marketing will become increasingly popular as a way to establish expertise and authority. Like all things, only a few will do it the right way and the rest will be akin to scraped content.

Twitter Rakes It In 

I've been hard on Twitter in the past and for good reason. But in 2013 Twitter will finally become a massive money maker as it becomes the connection in our new multi-screen world. As I wrote recently, Twitter will win the fight for social brand advertising dollars.

De-pagination

After spending years and literally hundreds of blog posts about the proper way to paginate we'll see a trend toward de-paginating in the SEO community. The change will be brought on by the advent of new interfaces and capabilities. (Blog post forthcoming.)

Analytics 3.0 Emerges

Pulling information out of big data will be a trend in 2013. But I'm even more intrigued by Google's Universal Analytics and location analytics services like Placed. Marketers are soon going to have a far more complete picture of user behavior, Minority Report be damned!

Ingress Becomes Important

I'm a bit addicted to Ingress. At first you think this is just a clever way for Google to further increase their advantage on local mapping. And it is.

But XM is essentially a map Android usage. You see a some in houses, large clusters at transit stops, movie theaters and doctor's offices, essentially anywhere there are lines. You also see it congregate at intersections and a smattering of it on highways.

Ingress shows our current usage patterns and gives Google more evidence that self-driving cars could increase Internet usage, which is Google's primary goal these days.

Digital Content Monetization

For years we've been producing more and more digital content. Yet, we still only have a few scant ways to monetize all of it and they're rather inefficient when you think about it. Someone (perhaps even me) will launch a new way to monetize digital content.

I Will Interview Matt Cutts

No, I don't have this lined up. No, I'm not sure I'll be able to swing it. No, I'm not sure the Google PR folks would even allow it. But ... I have an idea. So stay tuned.

Reclaiming Lost iOS Search Traffic

December 19 2012 // Analytics + SEO // 29 Comments

Have you noticed that direct traffic year over year is through the roof? Maybe you scratched your head, wrinkled your brow and chalked it up to better brand recognition. In reality, no such thing happened. What is happening is search traffic from iOS is being attributed to direct traffic instead.

Your organic search numbers are being mugged.

[Update] Frank Zimper notes that this problem also exists for those running Android 4.0 and higher. I've confirmed this via the same process you'll read below. The only saving grace is that Android is usually a smaller traffic driver and the version migration is far more gradual. Yet, it'll clearly continue to syphon search traffic off over time unless Google addresses this problem.

iOS 6 Search Theft

Stolen Search Traffic LOLcat

The reason these visits are being mis-attributed is a decision by Apple to move Safari search to secure (SSL) in iOS 6. The result of this decision is that the referrer isn't passed. In the absence of a referrer Google Analytics defaults those visits to (none) which shows up in direct traffic.

The web browser on iOS 6 switched to use SSL by default and our web servers don’t yet take that fact into account. Searching still works fine, but in some situations the HTTP referer header isn’t passed on to the destination page. We’re investigating different options to address this issue.

As Google investigates different options to address this we're left dealing with a serious data problem. Personally, I think Google Analytics should have a message within the interface that warns people of this issue until it's fixed.

RKG did a nice job of tracking this and showing how to estimate the hidden search traffic. But for some reason this issue doesn't seem to be getting as much traction as it should so I wanted to demonstrate the problem and show exactly how you can fight back. Because it's tough enough being an SEO.

Organic Search Traffic Graph 2012

At a glance it looks like this has been a decent year for this client. But it's actually better than it looks in October and November. Follow along to see just how much better.

Create iOS Advanced Segments

The first step is to create two Advanced Segments, one for iOS and one for iOS 6.

iOS Advanced Segment in Google Analytics

In May the labeling of Apple Operating Systems changed from specific devices to iOS. So include all four so you can see your iOS traffic for the entire year.

iOS 6 Advanced Segment in Google Analytics

The iOS 6 segment is straightforward and will only be used to demonstrate and prove the problem. Also, if you want to perform this analysis on multiple analytics properties be sure to save these segments to any profile.

The Scene Of The Crime

Once you have your advanced segments you want to apply them as you look at direct traffic by month.

Search Theft Underway

This plainly shows that direct traffic suddenly jumped from traditional levels upon the release of iOS 6 in late September.

Reclaiming Stolen Search Traffic

Every SEO should be reclaiming this stolen traffic to ensure they (and their clients) are seeing the real picture. Here's my simple method of figuring out how much you should take back.

Three Month iOS Direct Search Ratio

I've taken a three month slice of iOS traffic composed of April, May and June. From there I'm looking to see direct traffic as a percentage of the sum of direct and organic. The reason I'm not doing direct as a percentage of the total is to reduce any noise from referral spikes, paid search campaigns or other channel specific fluctuations.

In this instance direct comprises 10.5%. If you want to go the extra mile and quell the OCD demons in your head (or is that just me) you can do this for every month to ensure you've got the right percentage. I did and am confident that the percentage for this site is 10.5%.

Be aware, it will be different for each site.

Next I look at November and perform the same calculation just to confirm that it's out of whack. At 46.6% it's clearly departed from the established baseline.

November Direct and Search Traffic for iOS

I simply apply the proper direct traffic percentage (10.5% in this case) to the sum of direct and organic traffic. That's the real amount of direct traffic. I then subtract that from the reported direct traffic to find the lost search traffic number.

The equation is none-((organic+none)*percentage). In this case I just reclaimed 79,080 search visits!

Better SEO Results

Get the credit you deserve and apply those stolen search visits to organic traffic.

November Search Lift from iOS Search

A very quick calculation shows that reclaiming iOS search traffic produced a 4.6% bump in organic traffic for this client. That's the best 32 minutes I've spent in a long time. Now it's your turn.

TL;DR

Changes in how Safari searches are passed to Google Analytics is causing organic searches to be listed under direct traffic. Give clients the real picture and get the credit you deserve by properly attributing iOS traffic.