Authorship Is Dead, Long Live Authorship

October 24 2013 // SEO // 58 Comments

Google's Authorship program is still a hot topic. A constant string of blog posts, conference sessions and 'research' projects about Authorship and the idea that it can be used as a ranking signal fill our community.

I Do Not Think It Means What You Think It Does

Yet, the focus on the actual markup and clock-watching when AuthorRank might show up may not be the best use of time.

Would it surprise you to learn that the Authorship Project at Google has been shuttered? Or that this signals not the death of Authorship but a different method of assigning Authorship.

Here's my take on where Authorship stands today.

RIP Authorship Project

The Authorship Project at Google was headed up by Othar Hansson. He's an incredibly smart and amiable guy, who from time to time was kind enough to provide answers and insight into Authorship. I was going to reach out to him again the other day and discovered something.

Othar Hansson Google+ About

Othar no longer works on the Authorship Project. He's now a principal engineer on the Android search team, which is a pretty sweet gig. Congratulations!

Remember that it was Othar who announced the new markup back in June of 2011 and then appeared with Matt Cutts in the Authorship Markup video. His departure is meaningful. More so because I can't locate a replacement. (That doesn't mean there isn't one but ... usually I'm pretty good at connecting with folks.)

Not only that but there was no replacement for Sagar Kamdar, who left as product manager of Authorship (among other things) in July of 2012 to work at Google X and, ultimately, Project Loon.

At the time I thought the writing was on the wall. The Authorship Project wasn't getting internal resources and wasn't a priority for Google.

Authorship Adoption

Walter White with his Pontiac Aztec

The biggest problem with Authorship markup is adoption. Not everyone is participating. Study after study after study show that there are material gaps in who is and isn't using the markup. Even the most rosy study of Authorship adoption by technology writers isn't anything to write home about.

Google is unable to use Authorship as a ranking signal if important authors aren't participating.

That means people like Neil Gaiman and Kevin Kelly wouldn't rank as well since they don't employ Authorship markup. It doesn't take a lot of work to find important people who aren't participating and that makes any type of AuthorRank that relies on markup a non-starter.

Authorship SERP Benefits

Search Result Heatmap For Authorship Snippet

Don't get me wrong. Google still supports Authorship markup and there are clear click-through rate benefits to having an Authorship snippet on a search result. Even if you don't believe me or Cyrus Shepard, you should believe Google and the research they've done on social annotations in 2012 (PDF) and 2013 (PDF).

So if you haven't implemented Google Authorship yet it's still a good idea to do so. You'll receive a higher click-through rate and will build authority (different from AuthorRank), both of which may help you rank better over time.

Google knows users respond to Authorship.

Inferred Authorship

I Know What You Did Last Summer

It's clear that Google still wants to do something about identifying authority and expertise. Any monkey with a keyboard can add content to the Internet. So increasingly it's about who is creating that content and why you should trust and value their opinion.

One of the first ways Google was able to infer identity (aka authorship) was by crawling the public social graph. Rapleaf took the brunt of the backlash for this but Google was quietly mapping all of your social profiles as well.

So even if you don't have Authorship markup on a Quora or Slideshare profile Google probably knows about it and could assign Authorship. All this data used to be available via social circles but Google removed this feature a few years ago. But that doesn't mean Google isn't mining the social graph.

Heck, Google could even employ usernames as a way to identify accounts from the same person. What we're really talking about here is how Google can identify people and their areas of expertise.

Authors are People are Entities

But what if Google took another approach to identifying authors? Instead of looking for specific markup what if they looked for entities that happen to be people.

Authors are people are entities.

This would solve the adoption issue. And that's what the Freebase Annotations of the ClueWeb Corpora (FACC) seems to indicate.

Identifying Authors in Text

The picture makes it pretty clear in my mind. Here we're seeing that Google has been able to identify an entity (a person in this instance) within the text of a document and match it to a Freebase identifier.

Based on review of a sample of documents, we believe the precision is about 80-85%, and recall, which is inherently difficult to measure in situations like this, is in the range of 70-85%. Not every ClueWeb document is included in this corpus; documents in which we found no entities were excluded from the set. A document might be excluded because there were no entities to be found, because the entities in question weren’t in Freebase, or because none of the entities were resolved at a confidence level above the threshold.

At a glance you might think this means that Google still has a 'coverage' problem if they were to use entities as their approach to Authorship. But think about who is and isn't in Freebase (or Wikipedia). In some ways, these repositories are biased towards those who have achieved some level of notoriety.

Would Google prefer to rely on self referring markup or a crowd based approach to identifying experts?

Google+ Is An Entity Platform

AJ Kohn Cheltenham High School ID

While Google might prefer to use a smaller set of crowd sourced entities to assign Authorship initially I think they'd ultimately like to have a larger corpus of Authors. That's where Google+ fits into the puzzle.

I think most people understand that Google+ is an identity platform. But if people are entities (and so are companies) then Google+ is a huge entity platform, a massive database of people.

Google+ is the knowledge graph of everyday people.

And if we then harken back to social circles, to mapping the social graph and to measuring engagement and activity, we can begin to see how a comprehensive Authorship program might take shape.

Extract, Match and Measure

Concentration Board Game

Authorship then becomes about Google's ability to extract entities from documents, matching those entities to a corpus that contains descriptors of that entity (i.e. - social profiles, official page(s), subjects) and then measuring the activity around that entity.

Perhaps Google could even go so far as to understand triples on a very detailed (document) level, noting which documents I might have authored as well as the documents in which I've been mentioned.

The presence of Authorship markup might increase the confidence level of the match but it will likely play a supporting and refining role instead of the defining role in the process.

Trust and Authority

Trust Me Sign

I'm reminded that Google talks frequently about trust and authority. For years that was about how it assessed sites but that same terminology can (and should) be applied to people as well.

Authorship markup is but one part of the equation but that alone won't translate into some magical silver bullet of algorithmic success. Building authority is what will ultimately matter and be reflected in any related ranking signal.

Are the documents you author well regarded by your peers? Are they shared? By who? How often? With what velocity? And are you mentioned (or cited) by other documents? Do they sit on respected sites? Who are they authored by? What text surrounded your mention?

So part of this is doing the hard work of producing memorable content, marketing yourself and engaging with your community. The other part will be ensuring that your entity information is both comprehensive and up-to-date. That means filling out your entire Google+ profile and potentially finding ways to add yourself to traditional entity resources such as Wikipedia and Freebase.

Just as links are the result and not the goal of your efforts, any sort of AuthorRank will be the result of building your own trust and authority through content and engagement.

TL;DR

The Authorship Project at Google has been abandoned. But that doesn't mean Authorship is dead. Instead it signals a change in tactics from Authorship markup to entity extraction as a way to identify experts and a pathway to using Authorship as a ranking signal.

Crawl Optimization

July 29 2013 // SEO // 74 Comments

Crawl optimization should be a priority for any large site looking to improve their SEO efforts. By tracking, monitoring and focusing Googlebot you can gain an advantage over your competition.

Crawl Budget

Ceiling Cat

It's important to cover the basics before discussing crawl optimization. Crawl budget is the time or number of pages Google allocates to crawl a site. How does Google determine your crawl budget? The best description comes from an Eric Enge interview of Matt Cutts.

The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank. So if you have a lot of incoming links on your root page, we'll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we'll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank. The pages that get linked to a lot tend to get discovered and crawled quite quickly. The lower PageRank pages are likely to be crawled not quite as often.

In other words, your crawl budget is determined by authority. This should not come as a shock. But that was pre-Caffeine. Have things changed since?

Caffeine

Percolator

What is Caffeine? In this case it's not the stimulant in your latte. But it is a stimulant of sorts. In June of 2010, Google rebuilt the way they indexed content. They called this change 'Caffeine' and it had a profound impact on the speed in which Google could crawl and index pages. The biggest change, as I see it, was incremental indexing.

Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Essentially, Caffeine removed the bottleneck for getting pages indexed. The system they built to do this is aptly named Percolator.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

The speed in which Google can crawl is now matched by the speed of indexation. So did crawl budgets increase as a result? Some did, but not as much as you might suspect. And here's where it gets interesting.

Googlebot seems willing to crawl more pages post-Caffeine but it's often crawling the same pages (the important pages) with greater frequency. This makes a bit of sense if you think about Matt's statement along with the average age of documents benchmark. Pages deemed to have more authority are given crawl priority.

Google is looking to ensure the most important pages remain the 'freshest' in the index.

Time Since Last Crawl

Googlebot's Google Calendar

What I've observed over the last few years is that pages that haven't been crawled recently are given less authority in the index. To be more blunt, if a page hasn't been crawled recently, it won't rank well.

Last year I got a call from a client about a downward trend in their traffic. Using advanced segments it was easy to see that there was something wrong with their product page traffic.

Looking around the site I found that, unbeknownst to me, they'd implemented pagination on their category results pages. Instead of all the products being on one page, they were spread out across a number of paginated pages.

Products that were on the first page of results seemed to be doing fine but those on subsequent pages were not. I started to look at the cache date on product pages and found that those that weren't crawled (I'm using cache date as a proxy for crawl date) in the last 7 days were suffering.

Undo! Undo! Undo!

Depagination

That's right, I told them to go back to unpaginated results. What happened?

Depagination

You guessed it. Traffic returned.

Since then I've had success with depagination. The trick here is to think about it in terms of progressive enhancement and 'mobile' user experiences.

The rise of smartphones and tablets has made click based pagination a bit of an anachronism. Revealing more results by scrolling (or swiping) is an established convention and might well become the dominant one in the near future.

Can you load all the results in the background and reveal them only when users scroll to them without crushing your load time? It's not always easy and sometimes there are tradeoffs but it's a discussion worth having with your team.

Because there's no better way to get those deep pages crawled by having links to all of them on that first page of results.

CrawlRank

Was I crazy to think that the time since last crawl could be a factor in ranking? It turns out I wasn't alone. Adam Audette (a smart guy) mentioned he'd seen something like this when I ran into him at SMX West. Then at SMX Advanced I wound up talking with Mitul Gandhi, who had been tracking this in more detail at seoClarity.

seoClarity graph

Mitul and his team were able to determine that content not crawled within ~14 days receives materially less traffic. Not only that, but getting those same pages crawled more frequently produced an increase in traffic. (Think about that for a minute.)

At first, Google clearly crawls using PageRank as a proxy. But over time it feels like they're assigning a self-referring CrawlRank to pages. Essentially, if a page hasn't been crawled within a certain time period then it receives less authority. Let's revisit Matt's description of crawl budget again.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank.

The pages that aren't crawled as often are pages with little to no PageRank. CrawlRank is the difference in this very large pool of pages.

You win if you get your low PageRank pages crawled more frequently than the competition.

Now what CrawlRank is really saying is that document age is a material ranking factor for pages with little to no PageRank. I'm still not entirely convinced this is what is happening, but I'm seeing success using this philosophy.

Internal Links

One might argue that what we're really talking about is internal link structure and density. And I'd agree with you!

Not only should your internal link structure support the most important pages of your site, it should make it easy for Google to get to any page on your site in a minimum of clicks.

One of the easier ways to determine which pages are deemed most important (based on your internal link structure) is by looking at the Internal Links report in Google Webmaster Tools.

Google Webmaster Tools Internal Links

Do the pages at the top reflect the most important pages on your site? If not, you might have a problem.

I have a client whose blog was receiving 35% of Google's crawl each day. (More on how I know this later on.) This is a blog with 400 posts amid a total content corpus of 2 million+ URLs. Googlebot would crawl blog content 50,000+ times a day! This wasn't where we wanted Googlebot spending its time.

The problem? They had menu links to the blog and each blog category on nearly all pages of the site. When I went to the Internal Links report in Google Webmaster Tools you know which pages were at the top? Yup. The blog and the blog categories.

So, we got rid of those links. Not only did it change the internal link density but it changed the frequency with which Googlebot crawls the blog. That's crawl optimization in action.

Flat Architecture

Flat Architecture

Remember the advice to create a flat site architecture. Many ran out and got rid of subfolders thinking that if the URL didn't have subfolders then the architecture was flat. Um ... not so much.

These folks destroyed the ability for easy analysis, potentially removed valuable data in assessing that site, and did nothing to address the underlying issue of getting Google to pages faster.

How many clicks from the home page is each piece of content. That's what was, and remains, important. It doesn't matter if the URL is domain.com/product-name if it takes Googlebot (and users) 8 clicks to get there.

Is that mega-menu on every single page really doing you any favors? Once you get someone to a leaf level page you want them to see similar leaf level pages. Related product or content links are the lifeblood of any good internal link structure and are, sadly, frequently overlooked.

Depagination is one way to flatten your architecture but a simple HTML sitemap, or specific A-Z sitemaps can often be very effective hacks.

Flat architecture shortens the distance between authoritative pages and all other pages, which increases the chances of low PageRank pages getting crawled on a frequent basis.

Tracking Googlebot

"A million dollars isn’t cool. You know what’s cool? A billion dollars."

Okay, Sean Parker probably didn't say that in real life but it's an apt analogy for the difference in knowing how many pages Googlebot crawled versus where Googlebot is crawling, how often and with what result.

The Crawl Stats graph in Google Webmaster Tools only shows you how many pages are crawled per day.

Google Webmaster Tools Crawl Stats

For nearly five years I've worked with clients to build their own Googlebot crawl reports.

Googlebot Crawl Reporting That's Cool

That's cool.

And it doesn't always have to look pretty to be cool.

Googlebot Crawl Report by Page Type and Status

Here I can tell there's a problem with this specific page type. More than 50% of the crawl on that page type if producing a 410. That's probably not a good use of crawl budget.

All of this is done by parsing or 'grepping' log files (a line by line history of visits to the site) looking for Googlebot. Here's a secret. It's not that hard, particularly if you're even half-way decent with Regular Expressions.

I won't go into details (this post is long enough as it is) but you can check out posts by Ian Lurie and Craig Bradford for more on how to grep log files.

In the end I'm interested in looking at the crawl by page type and response code.

Googlebot Crawl Report Charts

You determine page type using RegEx. That sounds mysterious but all you're doing is bucketing page types based on pattern matching.

I want to know where Googlebot is spending time on my site. As Mike King said, Googlebot is always your last persona. So tracking Googlebot is just another form of user experience monitoring. (Referencing it like this might help you get this project prioritized.)

You can also drop the crawl data into a database so you can query things like time since last crawl, total crawl versus unique crawl or crawls per page. Of course you could also give seoClarity a try since they've got a lot of this stuff right out of the box.

If you're not tracking Googlebot then you're missing out on the first part of the SEO process.

You Are What Googlebot Eats

Cookie Monster Fruit

What you begin to understand is that you're assessed based on what Googlebot crawls. So if they're crawling a whole bunch of parameter based, duplicative URLs or you've left the email-a-friend link open to be crawled on every single product, you're giving Googlebot a bunch of empty calories.

It's not that Google will penalize you, it's the opportunity cost for dirty architecture based on a finite crawl budget.

The crawl spent on junk could have been spent crawling low PageRank pages instead. So managing your URL Parameters and using robots.txt wisely can make a big difference.

Many large sites will also have robust external link graphs. I can leverage those external links, rely less on internal link density to rank well, and can focus my internal link structure to ensure low PageRank pages get crawled more frequently.

There's no patent right or wrong answer. Every site will be different. But experimenting with your internal link strategies and measuring the results is what separates the great from the good.

Crawl Optimization Checklist

Here's a quick crawl optimization checklist to get you started.

Track and Monitor Googlebot

I don't care how you do it but you need this type of visibility to make any inroads into crawl optimization. Information is power. Learn to grep, perfect your RegEx. Be a collaborative partner with your technical team to turn this into an automated daily process.

Manage URL Parameters

Yes, it's confusing. You will probably make some mistakes. But that shouldn't stop you from using this feature and changing Googlebot's diet.

Use Robots.txt Wisely

Stop feeding Googlebot empty calories. Use robots.txt to keep Googlebot focused and remember to make use of pattern matching.

Don't Forget HTML Sitemap(s)

Seriously. I know human users might not be using these, but Googlebot is a different type of user with slightly different needs.

Optimize Your Internal Link Structure

Whether you try depagination to flatten your architecture, re-evaluate navigation menus, or play around with crosslink modules, find ways to optimize your internal link structure to get those low PageRank pages crawled more frequently.

Keywords Still Matter

June 05 2013 // SEO // 58 Comments

As content marketing becomes the new black I'm starting to hear people talk about how keywords don't matter anymore. This sentiment appears in more than a few posts and the general tenor seems to be that keyword focused strategies are a thing of the past - a relic from a dark time.

The problem? You need keywords to produce successful content.

Dwight Meme Keywords

Keyword Syntax

How do people search for something? That's what keywords are all about. It's vital to ensuring your content will be found and resonate with your users.

keyword syntax

Are people searching for 'all weather fluid displacement sculptures' or 'outdoor water fountains'. That's an extreme example but it makes an important point.

You need to understand the user and the words they use to find your content.

Keyword Intent

Keywords can also tell you a lot about the intent of a search. Look (well) beyond informational, navigational and transactional intent and start thinking about how you can map keywords to the various stages of your site's conversion funnel.

For instance, what does a query like 'majestic seo vs open site explorer' tell you? This user is probably further along in purchase funnel. They're aware of their choices and may have even narrowed it down to these two options. The keyword (yes, keyword) 'vs' makes it clear that they're looking for comparison data.

Google SERP for Comparison Intent

Sure enough, most of the results returned are posts that compare these two tools. Those pieces of content squarely meet that intent, in part because they're paying attention to keywords.

Majestic SEO has a result but ... it's the home page. Is that going to satisfy the desire to compare? Probably not. And where's SEOMoz? Missing in action.

Each could rely on the blog posts presented to deliver this comparison. Or they could also develop content that met that keyword and intent, allowing them to tell their story and frame the debate.

I know some will shriek, "Are you crazy? You don't want to promote your competition by mentioning them so prominently!" But that's denying reality. Users are searching with this syntax and intent.

Now, I'm not saying you have to put content that meets this particular intent prominently on the site or in the normal conversion flow. But if you know someone is on the fence and comparing products, why wouldn't you want a chance to engage that user on your own terms?

Keywords let you create content that matches user intent.

Magic Questions

Oh-O It's Magic!

There's also a lot of meta information that comes along with a keyword. I'm fond of using a term like 'eureka 313a manual' as an example. It's a query for a vacuum cleaner manual.

On the one hand it's a pretty simple. There's explicit intent. Someone is looking for the manual to their vacuum cleaner. The content to meet that informational search would be ... the manual. But, what's really going on?

If you're searching for the manual, odds are that something is wrong with your vacuum. There's an implied intent at work. The vacuum is either not working right or is flat out broken. You have the opportunity to anticipate and answer magic questions.

How can I fix my vacuum? Where can I buy replacement parts? Are there repair shops near me? What vacuum should I get to replace this one if it can't be fixed?

Be decoding the keyword you can create a relevant and valuable page that meets explicit and implied intent.

Keyword Frequency

Keyword frequency is important. Yes, really. One of my favorite examples of this is LinkedIn. How did they secure their place in the competitive 'name' query space?

LinkedIn Keyword Frequency

LinkedIn wanted to make it clear what (or who) these pages were about. That's what keyword frequency is about, making it easy for search engines and users to understand what that page is about.

LinkedIn doesn't just do it with their headers either, but uses the name frequently elsewhere on the page. The result?

Marshall Simmonds Wordle

There's no question what this page is about.

Keywords are Steve Krug for Googlebot.

Readability

This Is Not A Pipe

The reaction I get from many when I press on this issue is that it produces a poor user experience. Really? I've never heard anyone complain about LinkedIn and most never realize that it's even going on.

Using the keywords people expect to see can only help make your content more readable, which is still a tremendously undervalued aspect of SEO. Because people scan text and rarely read word for word.

And what do you think they're scanning for? What do you think is rattling around in their brain when they're scanning your content? It's not something random like 'bellhop poodle duster', it's probably the keyword that brought them there.

You may think Google is smart enough to figure it out. You'll claim that Google's gotten far more sophisticated in the application of synonyms and topical modeling. And you'd be right to a degree. But why take the chance? Particularly since users crave the repetition and consistency.

They don't want you to use four different ways to say the same thing and the hard truth is they're probably only going to read one of those words anyway. You'll create better content for users if you write for search engines.

Make sure you're using the words users expect to see.

TL;DR

Keywords aren't going away, they're becoming more important. Query syntax and user intent are vital in producing relevant and valuable content that resonates with users and answers both explicit and implicit questions.

Google Removes Related Searches

April 19 2013 // Rant + SEO // 45 Comments

This morning I went to use one of my go to techniques for keyword research and found it was ... missing.

Related Searches Gone

Related Searches Option Gone

It was bad enough that the new Search tools interface was this awkward double-click menu but I understood that decision. Because most mainstream users don't ever refine their results.

But to remove related searches from that menu altogether? In less than a year related searches went from being a search tip to being shuffled off to Buffalo?

WTF!

Out of Insight

Clooney is Pissed

Google needs to understand that there are SEOs, or digital marketing professionals if that makes it easier, who are helping to make search results better. We're helping sites understand the syntax and intent of their users and creating relevant and valuable experiences to match and satisfy those queries.

I wasn't happy but wasn't that upset when Google introduced (not provided). But as the amount of (not provided) traffic increases I see no reason why Google shouldn't implement my (not provided) drill down suggestion. Seriously, get on that.

But then Google merged Google Trends with Google Insights for Search and in the process removed its most useful feature. That's right, knowing what percentage of the traffic that was attributed to each category let SEOs better understand the intent of that query.

Now Google's taking away the interface for related searches? Yeah, you've gone too far now. Hulk mad.

Stop Ignoring Influencers

You Wouldn't Like Me When I'm Angry

Just like the decision to terminate Google Reader, Google doesn't seem to understand that they need to address influencers. And believe it or not Google, SEOs are influencers. We're demystifying search so that sites don't fall for get-rank-quick schemes. And you need us to do that because you're dreadful at SEO. Sites aren't finding much of your educational content. They're not. Really.

In the last year Google's made it more and more difficult for SEOs to do good work. And you know who ultimately suffers? Google. Because the content coming out won't match the right syntax and intent. It'll get tougher for Google, over-time, to find the 'right' content and users will feel the slow decline in search quality. You know, garbage in, garbage out.

Any good marketer understands that they have to serve more than one customer segment. Don't like to think of SEOs as influencers? Fine. Call us power users and put us back on your radar and stop removing value from the search ecosystem.

Time To Long Click

April 17 2013 // SEO // 60 Comments

The internal metric Google uses to determine search success is time to long click. Understanding this metric is important for search marketers in assessing changes to the search landscape and developing better optimization strategies.

Short Clicks vs Long Clicks

Longcat

Back in 2009 I wrote about the difference between short clicks and long clicks. A long click occurs when a user performs a search, clicks on a result and remains on that site for a long period of time. In the optimal scenario they do not return to the search results to click on another result or reformulate their query.

A long click is a proxy for user satisfaction and success.

On the other hand, a short click occurs when a user performs a search, clicks on a result and returns to the search results quickly to click on another result or reformulate their query. Short clicks are an indication of dissatisfaction.

Google measures success by how fast a search result produces a long click.

Bounce Rate vs Pogosticking

Before I continue I want to make sure we're not conflating short clicks with bounce rate. While many bounces could be construed as short clicks, that's not always the case. The bounce rate on Stack Overflow is probably very high. Users search for something specific, click through to a Stack Overflow result, get the answer they needed and move on with their life. This is not a bad thing. That's actually a long click.

You can gain greater clarity on this by configuring an adjusted bounce rate or something even more advanced that takes into account the amount of time the user spent on the page. In the example above you'd likely see that users spent a material amount of time on that one page which would be a positive indicator.

The behavior you want to avoid is pogosticking. This occurs when users click through on a result, returns quickly to the search results and clicks on another result. This indicates, to some extent, that the user was not satisfied with the original result.

Two problems present themselves with pogosticking. The first is that it's impossible for sites to measure this metric. That sort of sucks. We can only look at short bounces as a proxy and even then can't be sure that the user pogosticked to another result.

The second is that some verticals will naturally produce pogosticking behavior. Health related queries will show pogosticking behavior since users want to get multiple points of view (or opinions if you will) on that ailment or issue.

This could be overcome by measuring the normal pogosticking behavior for a vertical or query class and then determining which results produce lower and higher than normal pogosticking rates. I'm not sure Google is doing this but it's not out of the question since they already have a robust understanding of query and vertical mapping.

But I digress.

Speed

Part of the way Google works on reducing the time to long click is by improving the speed of search results and the Interent in general. Their own research showed the impact of speed on search results.

All other things being equal, more usage, as measured by number of searches, reflects more satisfied users. Our experiments demonstrate that slowing down the search results page by 100 to 400 milliseconds has a measurable impact on the number of searches per user of -0.2% to -0.6% (averaged over four or six weeks depending on the experiment). That's 0.2% to 0.6% fewer searches for changes under half a second!

Remember that while usage was the metric used, they were trying to measure satisfaction. Making it faster to get to information made people happier and more likely to use search for future information requests. Google's simply reducing the friction of searching.

But it's not just the speed of presenting results but in how quickly Google gets someone to that long click that matters. Search results that don't produce long clicks are bad for business as are those that increase the time selecting a result. And pogosticking blows up the query timeline as users loop back and tack on additional seconds worth of selection and page load time.

Google Query Timeline

Make no mistake. Google wants to reduce every portion of this timeline they presented at Inside Search in 2011.

Answers

42

One of the ways in which we've seen Google reduce time to long click is through various 'answers' initiatives. Whether it's a OneBox or a Knowledge Graph result the idea is that answers can often reduce the time to long click. It's immediate gratification and in line with Amit Singhal's Star Trek computer ideal.

In some of cases a long click is measured by the absence of a click and reformulated query. If I search for weather, don't click but don't take any further actions, that should register as a long click.

Ads

John Henry Man vs Machine

You'll also hear Google (and Bing) talk about the fact that ads are answers. Of course ads are what fill the coffers but they also provide another way to get people to a long click. Arguing the opposite (that ads aren't contributing to satisfaction) is a lot like arguing that marketers and advertisers aren't valuable.

Not only that, but Google has features in place to help ensure that good ads answers rise to the top. The auction model coupled with quality score and keyword level bidding all produce relevant ads that lead to long clicks.

The analysis of pixel space on search results is often used to show how Google is marginalizing organic search. Yet, the other way to look at it is that advertisers are getting better at delivering results (with the help of new Google ad extensions). Isn't it, in some ways, man versus machine? The advertiser being able to deliver a better result than the algorithm?

Without doubt Google benefits financially from having more space dedicated to paid results but they still must result in long clicks for Google to optimize long-term use, which leads to long-term revenues and profits.

I would be very surprised if changes to search results (both paid and organic) weren't measured by the impact they had in time to long click.

Hubs

Bow Tie

All of this is interesting but what does the time to long click metric mean for SEO? More than you might suspect.

When I started in the SEO field I read everything I could get my hands on (which is not altogether different from now). At the time there was advice about becoming a hub.

There was a good deal of hand waving about the definition of a hub but the general idea was that you wanted to be at the center of a topic by providing value and resources. People would link to you and the traffic you received would often go on to the resources you provided. About.com is a good example.

Funny thing is, this isn't some well kept secret. Marshall Simmonds spells it out pretty clearly in this 2010 Whiteboard Friday video where he discusses bow tie theory (hubs) and link journalism. (I just watched this again while writing this and, man, this is an awesome video.)

Most people focus on the fact that hubs receive a lot of backlinks. They do because of the value they provide, which is often in the aggregation of and links to other content. In the end, the real value of hubs is that they play an important part in getting people to content and that long click.

Search is a multi-site experience.

This is what search marketers must realize. You will get credit for a long click if you're part of the long click. If you ensure that the user doesn't return to search results, even by sending them to another site, then you're going to be rewarded.

Too often sites won't link out. I regularly run into this as my clients navigate business development deals with partners. It's frustrating. They think linking out is a sign of weakness and reduces their ability to consolidate Page Rank.

While Page Rank math might support not linking out, that strategy ultimately limits success.

Link Out!

Local Maxima Graph

Limiting your outlinks creates a local maxima problem. You'll optimize only up to a certain ceiling based on constrained Page Rank math. Again, not a real secret. Cyrus Shepard talked about this in a 2011 Whiteboard Friday video (though I wouldn't stress too much about the anchor text myself.)

Linking out can help you break through that local maxima by delivering more long clicks. Suddenly, your page is a sort of mini-hub. People search, get to your page and then go on to other relevant information.

Google wants to include results that contribute to reducing the time to long click for that query. 

I'm not advocating that you vomit up pages with a ton of links. What I'm recommending is that you link to other valuable sources of information when appropriate so that you fully satisfy that user's query. In doing so you'll generate more long clicks and earn more links over time, both of which can have profound and positive impact on your rankings.

Stop thinking about optimizing your page and think about optimizing the search experience instead. 

I ran into someone as SMX West who inherited a vast number of low quality sites. These sites used the old technique of being relevant enough to get someone to the page but not delivering enough value to answer their query. The desired result was a click on an ad. Simple arbitrage when you get down to it.

In a test, placing prominent links to relevant content on a sub-set of these pages had a material and positive impact on their ranking. It's certainly not conclusive, but it showed the potential impact of being part of a multi-site long click search result.

As an aside, it's not that those ad clicks were bad. Some of those probably resulted in long clicks. Just not enough of them. The majority either pogosticked to another result or wound up back at the search result after an ad click. And we already know this as search marketers by looking at the performance of search versus display campaigns.

Impact On Domain Diversity

If you believe time to long click is the way in which Google is measuring search success then you start to see some of the changes in a new light. I've been disappointed by the lack of domain diversity on many search results.

Yelp Dominating Search Results for Haircut in Concord CA

Sadly, this type of result hasn't been that rare within the last year. Pete Myers has been doing amazing work on this topic.

For a while I just thought this was Google being stupid. But then it dawned on me. The lack of domain diversity may be reducing the time to long click. It might actually be improving the overall satisfaction metrics Google uses to optimize search!

In some ways this makes a bit of sense, if even from a straight up Paradox of Choice perspective. Selecting from 10 different domains versus 5 might reduce cognitive strain. Too many choices overwhelm people, reducing both action and satisfaction. So perhaps Google's just reflecting that in their results with both domain diversity (or lack there of) and more instances of 7 results pages.

Downsides to Time To Long Click?

MC Escher Relativity Stairs

Are these long clicks are truly a sign of satisfaction. The woman who had been cutting my hair for nearly 10 years retired. So I actually did need to find someone new. I hated search result but did wind up clicking through and using Yelp to locate someone. So from Google's perspective I was satisfied but in reality ... not so much.

I wonder how long a time frame Google uses in assessing the value of long clicks. I abandoned my haircut search a number of times over the course of a month. In many of those instances I'm sure it looked like I was satisfied with the result. It looked like a long click. Yet, if you looked over a longer period of my search history it would become clear I wasn't. I think this is a really difficult problem to solve. Is it satisfaction or abandonment?

The other danger here is that Google is training people to use another service. Now, I don't particularly like Yelp but what this result tells me is that if I wanted to find something like this again I should just skip Google and go right to Yelp instead.

The same could be said by reflecting our own bias toward brands. While users may respond better to brands and the time to click might be reduced, the long term implications could be that Google is training users to visit those brands directly. Why start my product search on Google when all they're doing is giving me links to Amazon 90% of the time?

Of course, Google could argue that it will remain the hub for information requests because it continues to deliver value. (See what I did there?)

TL;DR

Google is using time to long click to measure the effectiveness of search results. Understanding this puts many search changes and initiatives into perspective and gives sites renewed reason to link out and think of search as a multi-site experience.

Tracking Image Search In Google Analytics

March 27 2013 // Analytics + SEO // 49 Comments

(This post has been updated as of 4/5/14 to reflect refinements to the filters as well as new caveats about Chrome.)

The Internet is becoming increasingly visual but the standard Google Analytics default lumps image search traffic in with organic traffic. The problem with that is these two types of traffic have radically different behaviors.

Google Analytics Y U No Track Image Search

So here's a quick way for you to track image search in Google Analytics to gain insight into how images are performing for your business.

Image Search Referrers

After the last big image search update I was asked by Annie Cushing if I'd figured out a way to track images in Google Analytics. I'd meant to but hadn't yet. Her reminder led me to find out what was possible. I fired up Firefox and used Live HTTP Headers to look at the referrers for image search traffic.

I found that there were two distinct referrers for Google, one from Google images and one from images that showed up via universal search results.

Here's what the referrer looks like from Google image search.

Google Image Search Referrer

The parts to note here are the /url? and the source=images parameter. Now lets look at what the referrer looks like from an image via universal search.

Google Image Referrer via Universal Search

The part to note here is that the URL doesn't use /url? but imgres? instead. This means you can track traffic from each source!

But there's another wrinkle I discovered over time. Many of the international versions of Google use the old image search UX which also produces the /imgres? referrer.

google.fr image search for ruby red slippers

In addition, most of these wind up being passed in the Google cookie as a 'referring' medium and not 'organic'. So you might be seeing Google domains cropping up in your referring reports (annoying!). Adding Full Referrer as a secondary dimension shows where the majority of these are coming from: imgres.

Google Referring Traffic in Google Analytics Reports

This means two things. First, we're going to have to create a special case for universal search on google.com so that it isn't mixed up with image search from international properties. Second, we're going to have to change the medium on the international image search traffic so that it is properly attributed to organic.

Finally lets take a look at Bing.

Bing Image Search Referrer

This is pretty straight forward and doesn't change based on whether it's from image search proper or via a universal result.

Google Analytics Image Search Filters

If you know the referrer patterns you can set up some Google Analytics filters to capture and reclassify this traffic into the appropriate buckets. Here's the step-by-step way to do that.

From Google Analytics click Admin.

Google Analytics Admin

That takes you to a list of profiles.

Google Analytics Select or Create a Profile

Here you can either create a new profile or select a current one. I'd suggest creating a new profile to test this out before you decide to integrate it into your primary profile. Because you might screw it up or just may not like the detail or may not want to have the change in continuity. That said, I've created these filters so they'll have the least amount of impact on your reporting while still delivering added insight.

Next you'll reach the profile navigation pane where you'll want to click on Filters.

Google Analytics Filters 2014

At that point you'll want to go ahead and click the red New Filter button.

Google Analytics Red New Filter Button

That's when the real fun begins and you construct a new advanced filter.

Creating a Google Analytics Google Image Search Filter

The first step is to name this filter. This won't show up in your reports and is simply a way for you to know what that filter is doing. So make it descriptive and obvious.

Next you'll want to select the Custom filter button (2) which then reveals a list of options. From that list you'll want to select Advanced (3). This is where it gets a bit tricky.

In step 4 you'll select Referral from the menu of options and then apply some RegEx to match the pattern we've identified. In this instance the RegEx I'm using is:

.*google\.(.*)/url.*source=images.*

I love RegEx, which stands for Regular Expression, but I don't always get it right the first time and regularly rely on this RegEx cheat sheet to remind and guide me. In this instance I'm looking for all Google domains (and  including any international domain using the new image search here) with /url and source=images within the referrer.

In step five you're selecting what you're going to do when a referrer matches your RegEx. I've chosen Campaign Source from the menu and then created a new source called 'google images'. You can name these whatever you like but I keep them lowercase to match the other sources.

You'll note that the 'Override Output Field' is set to Yes which means that I'm going to change the Campaign Source for those that match this referrer pattern from what it is currently to 'google images'. The great part about this is that you retain the fact that the medium is 'organic'. So all those reports remain completely valid.

Finally, you click Save and then you wait for the filter to be applied to traffic coming into the site. Depending on the amount of traffic you get from these sources, it may take a few hours to a few days to see the filter working in your reports.

Next we have to put into place a filter for Google universal images, Google images from international properties not using the current image search UX as well as Bing images.

The RegEx for Google universal search images is:

.*google.com/imgres.*

Note that I'm only looking to match referrers coming from google.com so that I'm not mixing international image search with US universal image search.

The RegEx for Google international search is crazy long and didn't really work pasted here. So instead you can click here to copy and paste the Google 'International' image search filter RegEx.

Now, many of the domains won't match because they're using the new version of image search, which will match the first filter we created. But I figured I'd just be as inclusive as possible instead of validating the current image search UX on each domain. (I mean, it's wicked time consuming too.)

Finally, the RegEx for Bing images is:

.*bing\.(.*)/images/search.*

But we're not done! Close, but not quite.

Changing Google Analytics Medium Filters

So after having these filters in place for a while I noticed that some of the new sources I created were showing up as a medium of 'referring' instead of 'organic. That means you're still short-changing your organic efforts because Google is passing the wrong medium in their cookie.

So you have to create two new filters that change the medium of Google universal images and Google international images.

Google Analytics Filter to Change Medium

This is another Advanced filter but this one is much simpler but must be very precise. In Field A  you're looking for the Campaign Source that exactly matches the source you created in the filter. For me, that means 'google international images' and 'google universal images'. For you, it's whatever you named the new sources.

Then you're simply outputting and overriding the Campaign Medium to organic. Remember, you'll create two of these. One for the 'international' images and one for 'universal images'. My guess is that you might only need the one but I want to cover my bases.

To simplify, all your doing here is looking for the sources you created and then making sure that the medium associated with those sources is changed to organic.

Image Search Filter Order

The final step is to make sure that your filters are in the right order. The last two filters that change the medium based on a specific campaign source (that you created) must come at the end.

Google Analytics Google Image Search Filter Order

This makes sense right? You couldn't match a source that you hadn't already created, right? Stick to this order and you'll ensure image search traffic is tracked appropriately.

Image Search Reports

So what do you get to see in the reports?

Image Filters Create Better Google Analytics Reports

This is data from a client site where I've had all the filters in place for a few days. The medium for all of these is still organic but I've now got new sources for google images, universal images and bing images. (Update on 4/5/14) I've been using these filters successfully for a year now.

What you should see right away is the very large difference in how this traffic performs. Image search traffic in this instance has a 1.5 Pages/Visit and 3:00 Avg. Visit Duration while the web based organic traffic has a 6 Pages/Visit and 6.00 Avg. Visit Duration.

Most importantly, the conversion rate on these two types of traffic is different as well. Segmenting your image search traffic can bring more clarity to your analysis and help you make the right decisions on what's working, how to allocate resources and what to optimize.

Image Search Filter Validation

So how do I know this is really working? I drill down into one of these new sources and then select keyword as the secondary dimension. Did I forget to mention that the keyword data remains in tact?

Google Analytics Universal Images Keyword Report

Yup, sure does! So the next step here is to see if there really is a universal result for these keywords.

Google Search Result for Badass Over Here Real Pic

Sure enough, I'm the second result in this universal search result. Now lets see if the filter for normal image search is working.

Google Analytics Google Images Keyword Report

I'll use 'wifi logo' as my target term and first go to make sure that I'm not showing up in universal search results.

Google Search Result for Wifi Logo

Nope, not showing up there. But am I showing up in Google image search?

Google Images Search Results for Wifi Logo

Sure enough I'm there just inside the top 100 results from what I can tell. So I'm pretty confident that the filter is catching things and bucketing them appropriately. I've also validated this with very robust client data but can't share that level of detail publicly.

What Is images.google?

You might have noticed the images.google source above. What's that you ask? I don't know. But I don't think it's traditional image search traffic since the user behavior of that source doesn't conform to the other three image based sources. It's also a small source of traffic so while my OCD senses are tingling I'm currently ignoring the urge to figure out exactly what images.google represents.

Tell me if you figure it out.

Caveats

You Raise a Valid Point Ice Cream

The big question is why I wouldn't just use the Google Webmaster Tools queries report and filter by image right? Well first off, the integration into Google Analytics still isn't where I'd like it to be making any type of robust reporting near impossible.

In addition, I don't like mixing image search traffic with web search traffic in my normal reports because they're so different. It makes any analysis you do using that mixed data less precise and prone to unintentional error.

More problematic is the fact that the data between Google Webmaster Tools and Google Analytics doesn't match up.

I started looking at specific keywords via my filters versus what was reported in Google Webmaster Tools. There were just too many times when Google Webmaster Tools reported material amounts of traffic that wasn't showing up in my Google Analytics reports.

Google Webmaster Tools Clicks

Here you can see that the top term received 170 clicks in this time frame. Yet during the same time frame here's what the Google Analytics filter based method reports.

Google Analytics Image Based Clicks

170 versus 24! Even if I factor in the (not provided) percentage (which runs about 35% for this client) and add that back in I only get close to 40 visits.

But that's when the lightbulb went off. Maybe Google Analytics is reporting Visits while Google Webmaster Tools is reporting Clicks?

While I can't confirm this I'm guessing that Google Webmaster Tools is counting all clicks on a result. Many of those clicks are going directly to the image and not the page the image resides on. That's important since direct clicks to the image (i.e. - .jpg files and the like) aren't going to be tracked in Google Analytics as a visit. There is no Google Analytics code on these files. The delta between the two could be the number of users who clicked directly to the image.

In addition, this method doesn't catch any of the mobile clicks and visits since no image search visits (and very few universal images) show up using this filter when looking at mobile traffic. I'm pretty sure that the referrers are just getting stripped and these wind up going into direct instead which is part of the iOS and Android 4+ search attribution issue. (If someone else has an explanation here or finds a different referrer for mobile image search please let me know.)

Finally, there's something funky with Chrome. When I look at the distribution of traffic to each bucket Chrome is an outlier for Google images.

Image Filters Browser Distribution

That 3.7% is just way out of proportion. And it's not related to the amount of (not provided) traffic since Firefox actually has a higher percentage of (not provided) 72% than Chrome (64%) in this instance. So I can only conclude that there's some amount of data loss going on with Chrome. Maybe that also contributes to the discrepancy I see between Google Analytics and Google Webmaster Tools.

This got even worse as of January when Chrome stopped passing rich referrer information.

Image Search by Browser

I can only guess that this is part of Google's security and privacy efforts. Sadly, it means you're capturing a lot less detail about image search and your data will be less accurate because of it.

Despite all of these caveats I love having the additional detail on image traffic which has wildly different intent and user behavior. Some insight is better than none.

TL;DR

Apply a few simple Google Analytics filters to gain insight into how much traffic you're getting through image search. This is increasingly important as the Internet becomes more visual and the user behavior of these visits differs in material ways from traditional search traffic.

Bing People Snippets

March 19 2013 // SEO // 10 Comments

This morning (thanks to a tip from Search Engine Roundtable) I began researching what looked like authorship snippets on Bing. While it's only been an hour or so here's what I've seen and what I think I've figured out.

People Snippets

The new faces (for the most part) showing up in Bing search results are not authorship snippets per se but are people snippets derived from entities. It's about who the content is about rather than who created the content.

If you haven't seen them already here's what one looks like when you search for Lauren Cohan.

Bing People Snippets for Lauren Cohan

They look remarkably like the authorship snippets that Google has implemented but they're most certainly different in their application.

Structured Data?

The first assumption here is that Bing might be using structured data to present these new snippets. Perhaps they're using the person attribute in schema.org markup?

Structured Data Results for Lauren Cohan page

Not so much. There's no structured data on this page and I've found plenty of others getting the people snippet that are devoid of mark-up. So if Bing isn't using structured data, what are they using to match and identify people?

People Pages

Clearly they rely heavily on sources such as Wikipedia, LinkedIn and Freebase. But they seem to be expanding their data sources on people to other sites and specific pages.

Simon Le Bon People Snippets on Bing

Searching for Simon Le Bon you'll find that a people snippet appears for Wikipedia, IMDb and Biography. Wikipedia is a no-brainer and IMDb makes a good deal of sense too. Biography was the surprising one.

I noted that both IMDb and Biography had namespaces or folders (underlined in red) that seemed to be easy identifiers for entities. So I decided to look for more sources and I found them. Lots of them.

CrunchBase

CrunchBase People Snippet for Jason Calacanis

MySpace

MySpace People Snippet for Pete Myers

NBA.com

NBA.com People Snippet for Pete Myers

Quora

Quora People Snippet for Jessica Guynn

TED

TED People Snippet for Seth Godin

ESPN

ESPN People Snippet for Claude Giroux

The Canadian Encyclopedia

Canadian Encyclopedia People Snippet for Douglas Coupland

Amazon

Amazon People Snippet for Tony Basil

MTV

MTV People Snippet for Paula Abdul

Last.fm

Last.fm People Snippet for Kim Carnes

Forbes

Forbes People Snippet for Mark Cuban

NNDB

NNDB People Snippet for Alan Greenspan

Facebook

Facebook People Snippet for Matthew Inman

Twitter

Twitter People Snippet for Neil deGrasse Tyson

Yahoo! Movies

Yahoo Movies People Snippet for Will Ferrell

Hollywood.com

Hollywood.com People Snippet for Will Ferrell

AskMen

AskMen People Snippet for Will Smith

FriendFeed

FriendFeed People Snippet for Louis Gray

TV Guide

TV Guide People Snippet for Andrew Lincoln

Comedy Central

Comedy Central People Snippet for Daniel Tosh

Most of these either have a namespace that makes it easy to identify as a person or are clear profiles in the case of MySpace and FriendFeed. Whether it's 'player', 'artist', 'celebrities', 'person', 'profiles' or 'speakers' it seems like Bing has determined pages that match these specific entities.

About Pages

People snippets show up far more often on about pages which supports the idea that Bing is looking for high confidence entity pages and not assigning real authorship.

Blind Five Year Old People Snippet for AJ Kohn

As you can see I get a people snippet on my about page but not on my site as a whole. Nor do I get it returned on any of my content. Here's another example.

0at People Snippet for Matthew Inman

Again, the about page on Matthew Inman's now defunct site is given a people snippet while the site as a whole isn't. The people snippet is showing pages about that entity, not authored by that entity. It just so happens that there's some overlap in those areas.

Sorta Structured Data

Many of these pages have a rich amount of data on them. While they aren't marked-up with any structured data per se, search engines can clearly parse and use that information. Here's a people snippet via Green Day Authority.

Green Day Authority People Snippet for Bille Joe Armstrong

That page has no structured data mark-up but it has structure.

Green Day Authority Page

Characters?

Further pushing on the choice of pages to use to apply the people snippet I began to search for characters. First Harry Potter and then Derek Zoolander.

Derek Zoolander Bing Results

No people snippets are applied even though it's still pulling from IMDb. The difference here is that it's plucking out a title page and a character page instead. Maybe that's not how it works but that's how my pattern matching mind sees it right now.

[Update 3/21/13]

ChaosSEO noted that he could get character names to render people snippets. Sure enough, you can.

People Snippet for Olivia Dunham on Bing

And ...

People Snippets for Jean-Luc Picard on Bing

I tend to think that there's some special casing going on with IMDb so that it only applies the snippet to the name pages, but if you get a snippet to render for an IMBb character page please let me know.

Going through characters was actually really instructive. First I began to see that there were associations between the entities of person and character.

People Snippets for Hermoine Granger on Bing

A search for Hermione Granger produces people snippets and a result for Emma Watson. Clearly there's some understanding that the two are related. You can get that same dynamic for a number of character searches such as Gandalf or Chewbacca.

Gandalf People Snippet on Bing

Chewbacca Result on Bing

Finally, I found a result that makes me very confident that this is not authorship at all but entity detection.

Han Solo People Snippets on Bing

Clearly Harrison Ford (or Han Solo) is not the author of these pieces but the subject of them.

Decentralized Images

There are few instances where a site will get a people snippet. This seems to be rare and only occurs when Bing has high confidence that they have the right person.

Bing Seth Godin Results - Different Faces

Here we can see that a people snippet is applied to Seth's site but that the image is pulled from that site and not from some central database. This provides some variety in what is displayed but also leads to some errors from time to time.

Bing Result for Jason Calacanis

So Bing seems confident that they have the right person associated with that site but the image they pulled is not Jason. It's Wesley Chan.

Authorship?

Is this a form of Authorship? Sorta, kinda, not really. Sometimes you'll see what looks like a people snippet pop up on a content page.

Tim Gunn People Snippet on HuffPo Article

Tim is the author of that piece so there's a chance that they've identified and are trying to present Authorship based on that fact. But it's more likely they just identified him as an entity. Because the images are decentralized it pulls what it can from that article.

Mathew Ingram People Snippet

Same thing happens with this piece by Mathew Ingram. In both cases there is structured data on that page that would indicate that each is the author of that piece (though they both don't have Google Authorship working.)

So it's not true authorship and very few pieces of content have a people snippet right now but if Bing decides to follow this path, make the connections with all of their datasets (including their social sidebar results) then you could see Bing being a legitimate authorship platform.

Right now it seems like the snippets on content results are more like a side effect of people identification.

TL;DR

Bing has introduced people snippets that look like Google's Authorship snippets but are more focused on identifying people as entities through a variety of sources rather than assigning authorship of content. For now.

Build Your Authority Not Your Author Rank

March 18 2013 // SEO // 49 Comments

It's been a frustrating few weeks of discussion about Authorship and Author Rank.

Are We There Yet?

Here I will present a few things that may give you some more context overall and, in particular, my point of view on things.

Social Computing Research

Just the other day Google revealed that it gave $1.2 million dollars in awards to those undertaking social computing research.

We know that interactions on the Web are diverse and people-centered. Google now enables social interactions to occur across many of our products, from Google+ to Search to YouTube. To understand the future of this socially connected web, we need to investigate fundamental patterns, design principles, and laws that shape and govern these social interactions.

We envision research at the intersection of disciplines including Computer Science, Human-Computer Interaction (HCI), Social Science, Social Psychology, Machine Learning, Big Data Analytics, Statistics and Economics. These fields are central to the study of how social interactions work, particularly driven by new sources of data, for example, open data sets from Web2.0 and social media sites, government databases, crowdsourcing, new survey techniques, and crisis management data collections. New techniques from network science and computational modeling, social network and sentiment analysis, application of statistical and machine learning, as well as theories from evolutionary theory, physics, and information theory, are actively being used in social interaction research.

We’re pleased to announce that Google has awarded over $1.2 million dollars to support the Social Interactions Research Awards, which are given to university research groups doing work in social computing and interactions. Research topics range from crowdsourcing, social annotations, a social media behavioral study, social learning, conversation curation, and scientific studies of how to start online communities.

What this says to me is that Google is intensely interested in understanding how to use social interaction data. But they're not there yet. And why should they be? They've been working on link based signals and refinement for over 10 years but haven't delved into social data until the last few.

This is a discipline that they are far from fully understanding. I can't help but pick out words like 'investigate', 'envision' and 'new'. This is a post about the exploration of the effects of social interaction on a host of fields. These are not papers as to their conclusions.

But we do have a few of those papers, areas where Google has begun to learn about how social interactions or signals might impact search. Lets take social annotations as an example.

Social Annotations in Web Search

Social Annotations and Snippet Length Chart

This research was presented at the 2012 Conference on Human Factors in Computing Systems.

Remember when our SERPs had a whole bunch of smaller faces in them and other various social gestures? Well, Google found that those didn't work. We hardly noticed them and when we did we didn't always believe they added value.

In fact, the only thing that really did was the Authorship snippet. It's a very interesting read if you're interested in design and authority. The way we see results today is clearly influenced by this research and you can see Google learning more about how social connections and expertise work within search.

This study revealed a counter-intuitive result. Despite having the names and faces of familiar people, and despite being intended to be noticeable to searchers, subjects for the most part did not pay attention to the social annotations.

Our questions about contact closeness, expertise, and topic were answered by the reactions captured during the retrospective interviews. These interviews revealed the importance of contact expertise and closeness, and the importance of the search topics in determining whether social signals are useful, thus echoing past findings on the role of expertise in social search.

I walk away thinking that all of this is much tougher than we believe peering in from the outside. That and Google is at the start of this research, not the end.

Knowing that they aspire to understand these dynamics also makes the closure of Google Reader odd since there is a substantial amount of data that could be mined there, all tied back to identity and, by extension, topical expertise.

Whisper Down The Lane

I had a chance to speak on a panel at SMX West with Mike Arnesen and Lisa Weinberger about Authorship, Author Rank and Authority.

Overall, authorship and the potential for Author Rank was a hot topic that spilled out into multiple other sessions. Both Matt Cutts and Duane Forrester were asked about link based signals versus social signals. You could tell they are both tired of this question. Paraphrasing, they essentially said that while social signals are intriguing they're not nearly as far along as we in the industry might believe (or want).

When prodded about the collapse of the link graph they noted that the link graph was just fine thank you very much. Link manipulation, the intent behind linking that we feel is so perverted, is not nearly as rampant as we assume. The mainstream blogger or site owner is linking for the right reasons. In short, the link graph is still valuable and with lower friction to producing digital content it may actually improve as more laypeople become content producers.

That's not to say that social signals aren't important but it will be a complement to or a refinement of the link graph, not a replacement. This is something I discussed in my original Author Rank post.

If we believe that search engines still view the link graph as viable there may be ways to simply use Authorship to make the link graph more accurate. Think of Authorship as meta information passed on every link. When looking for information on cancer the link given to an article from an established oncologist at a world renowned hospital would likely confer more value than a link to an article from 'screwcancer888' at a Q&A site.

In some ways this reminds me of delegating authority which Bill Slawski (always insightful) wrote about back in late 2010. What we're really talking about is identifying expertise and allowing those experts to help curate our view of those topics where it matters - in search results.

Parsing Statements

So You're Telling Me There's A Chance?

It's enticing to pick apart responses and statement by Googlers when they are asked to comment on Author Rank. The fact is that they're not going to divulge much or commit one way or the other (at least publicly). They've been burned before by saying something that is true but interpreted in different ways.

So when asked, of course they're going to reply that it's something they're experimenting with (because they do aspire to use the data) but that it is currently not a direct ranking signal and nothing to worry about now.

Of course that leads everyone to look for the experiments, to look for indirect ranking signals and to take the 'now' as a declaration of sorts for future implementation.

Authorship could be an indirect signal if you believe (like I do) that the click through rate (CTR) on a result can provide a positive feedback signal. And we know the CTR on authored results disrupts the normal click distribution of a SERP. Of course Google could take into account the Authorship snippet and normalize the CTR impact. So perhaps it isn't having that indirect impact. See how confusing it can get?

Just for fun, let us think what would transpire if a Googler simply said there is no such thing as Author Rank without any hedging or caveats. People would start to conflate that with Authorship, potentially reducing the adoption rate. Many would interpret it to mean that Google had abandoned author based weighting completely. Thus, when Google did figure it out and apply it the industry would point to the statement and shout 'liar' at the top of their lungs.

We've trained Google to provide us with these elliptical statements. I choose to view them through this lens.

What To Look For?

That's not to say that we shouldn't be interested in the topic. I like the testing Terry Simmonds is doing on the mechanics of Authorship because it documents how Google is trying to extend the mark-up to more of the content on the web. And that's a constraint as far as I can tell right now. Conversations about the inability to roll out updates because of low adoption are not uncommon.

You can't begin to rank results based on topical expertise if many of the experts aren't included in the selection criteria. The participation rate in Authorship has to be such that using it would provide a materially better ranking of content. Reports have Authorship coverage as low as 9% and as high as 17%. That's not a lot really and both studies are limited based on the relatively small data sets analyzed.

The problem? If you were to want information on astrophysics you'd probably want to include Neil deGrasse Tyson in those results. Yet, he's not on Google+ (as far as I can tell) and isn't part of the Authorship program.

Looking at how Google is trying to assign Authorship is important.

The mechanics and the indirect Authorship Google often grants is particularly intriguing. I noted that Jonathon Colman was receiving a bounce back Authorship link on a SlideShare URL for which no direct Authorship mark-up was present.

Indirect Authorship

I recall seeing this in the past on URLs from Quora, FriendFeed and Flickr. I swear some of these used to show up in Author Stats but I haven't seen them lately (except for FriendFeed which I see at the tail end of my list.)

In fact, the bug that took Author Stats down might have been the exposure of indirect Authorship based on high confidence in matching public social graph data to Google+ profiles. Rapleaf got the brunt of the ire for crawling the public social graph but Google clearly has and continues to use this information even though the social circles feature has been retired.

Looking today I see another interesting URL showing up in Author Stats - Twitter.

Twitter Discussion Gets Authorship

There's quite a lot of evidence that Twitter is a fairly well trusted source of indirect Authorship, but that's a post for another day. However, we can also look at the verbiage in the Structured Data Testing Tool, which has changed within the last few weeks.

Authorship rel=author Structured Data Testing Tool Results

The points of interest here are the '(direct or indirect)' verbiage as well as the fact that the tool only checks the first rel=author link listed on a webpage.  The former certainly makes me believe that assigning Authorship based on indirect links is important to Google.

The latter tells me two things. First that the tool should not be trusted as the final arbiter of whether the correct Authorship is or will be applied. Second that Google obviously sees multiple authors or entities (or agents) on the page.

Lets go a step further. Google's new Social Sign-In can be construed as a portable digital signature which might allow Google to rely on comments and other content produced outside of Google+. So tracking how this is rolled out and whether the reviews that now flow under your profile are also granted Authorship are interesting developments.

I've been eager to see Author Rank implemented since I first saw Matt Cutts interview Steven Levy.

This actually predates Authorship and the follow-up question by Matt (along with a bit of body language) makes it clear that Google was thinking about this seriously. While I absolutely do look for connections and patterns that might paint a picture of the future I'm not looking for it behind every corner and trying to fit Author Rank into each and every odd result or anecdote.

Authority

You Will Respect My Authority!

I prefer to talk about how people might build authority rather than how they would build Author Rank. Just as links are the result and not the goal, Author Rank will be the result and not the goal of your efforts.

Discussions about what makes someone an authority and how Google might want to translate that into math are fascinating. What makes someone authoritative versus popular? Is there a difference? If so, how would you go about separating the two?

How do you map the decline of authority? Of someone who is no longer really an expert and just mailing it in? Can you identify this even if they remain popular? How can you tell if someone is endorsing content based on merit or friendship? Is it what you know or who you know?

Furthermore, you could find that one was popular for the wrong reasons. Would you want to rank someone highly who simply fanned the flames of dissent and created controversy? The tone and type of interaction will be important so sentiment analysis and other processes will need to determine how to use social interaction as a reliable signal.

Influence

We're Dealing With A Badass Over Here

And how does influence fit into this equation? One can be influential without being popular, but clearly being popular gives you a better chance of being influential just by sheer reach. Can you be influential without being an authority? I think so. Just look at Jenny McCarthy and her influence within the anti-vaccine movement.

The latter clearly strays into the subjective nature of quality, relevance and authority that I touched on after the Panda update. Personalization helps to ensure that your subjective view of authority is reflected back to you. That's why search results are changed based on who you follow on Google+. And personalization of search results is the most important thing about Google+ in my view.

But in discussing how Google might identify authority and expertise, we're dealing with the aggregate. So the question isn't really about your personal view (which is reflected back in Search+ results) but how the aggregate views different figures and authorities.

Of course, being likable is part of the way you can obtain authority. And it is often not what you say, but how you say it (or present it) that gets you noticed. So part of building authority is in ensuring that you can communicate in a way that conveys that expertise but also makes it accessible and ... memorable.

Yes, I see all of this as being related because the same content presented in comic sans without any images or paragraph breaks wouldn't have nearly the same impact and would not, ultimately, convey authority. Even though the actual words are the same!

I had a similar conversation with Dan Shure where he wondered about the impact of publishing content from Rand Fiskin under somebody else's name. Would it get as much 'play' and be received as well? I doubt it. So what does that say about the connection of authority, popularity and quality assessment?

These are just a few of the things that make this topic so incredible.

TL;DR

I believe Google wants to use Author Rank but I also believe that it's far more difficult than we think. Focusing solely on Author Rank may blind us to tracking Google's progress and building what is truly important. Authority.

Closing Google Reader Is Dangerous

March 14 2013 // Social Media + Technology // 39 Comments

I'm a dedicated Google Reader user, spending hours each day using it to keep up on any number of topics. So my knee-jerk reaction to the news that Google will close the service as of July 1, 2013 was one of shock and anger.

I immediately Tweeted #savegooglereader and posted on Google+ in hopes of getting it to trend or go hot. These things are silly in the scheme of things. But what else is there to do?

I've written previously that the problem with RSS readers is marketing. I still believe that (it's TiVo for web content people!) but in the end that's not why closing Google Reader is so dangerous. And it is dangerous.

Google Reader Fuels Social

Google Reader Is The Snowpack of Social

Photo via double-h

The announcement indicates that, while having a loyal following, usage has declined. That's a rather nebulous statement, though I don't truly expect Google to provide the exact statistics. But it's who is still using Google Reader that is important, is it not?

Participation inequality, often called the 90-9-1 principle, should be an important factor in analyzing Google Reader usage. Even if you believe that the inequality isn't as pronounced today, those that are contributing are still a small bunch.

Studies on participation on Twitter have shown this to be true, both from what content is shared and who is sharing it. That means that the majority of the content shared is still from major publications and that we get that information through influencers. But where do they get it?

Google Reader.

RSS readers are the snowpack of social networks.

Organizing Information

Jigsaw Puzzle Pieces

Google's mission is to organize the world's information and make it universally accessible and useful. By extension that is what Google Reader lets power-users do. Make no mistake, Google Reader is not a mainstream product. Google (and many others) have screwed up how to market time-shifted online reading.

The result is that those using Google Reader are different. They're the information consumers. They're the ones sifting through the content (organizing) and sharing it with their community (accessible) on platforms like Twitter, Facebook and Google+ (useful).

Google Reader allows a specific set of people to help Google fulfill their mission.

Losing Identity

AJ Kohn Cheltenham High School ID

There are replacements to Google Reader such as Feedly. So you can expect that the people who fuel social networks will find other ways to obtain and digest information so they can filter it for their followers. Problem solved, right? Wrong.

Why exactly does Google want to hand over this important part of the ecosystem to someone else? With Google Reader they know who I am, what feeds I subscribe to, which ones I read and then which ones I wind up sharing on Google+.

Wouldn't knowing that dynamic, of understanding how people evaluate content and determine what is worthy of sharing, be of interest to Google? It should be. It's sort of what they want to excel at.

Not only that but because Google Reader has product market fit (see how I got that buzzword in there) with influencers or experts, you're losing an important piece of the puzzle if you're thinking about using social sharing and Authorship as search signals.

Data Blind

Data Blind

In the end, I'm surprised because it makes Google data blind. As I look at Unicorn, Facebook's new inverted-index system, I can't help but think that Facebook would love to have this information. Mining the connections and activity between these nodes seems messy but important.

What feeds do I subscribe to? That social gesture could be called a Like in some ways. What feeds do I read? That's a different level of engagement and could even be measured by dwell time. What feeds and specific content do I share? These are the things that I am endorsing and promoting.

By having Google Reader integrated into the Google+ ecosystem, they can tell when I consumed that information and when I then shared it, not just on Google+ but on other platforms if Google is following the public social graph (which we all know they are.)

Without Google Reader, Google loses all of that data and only sees what is ultimately shared publicly. Never mind the idea that Google Reader might be powering dark social which could connect and inform influencers. Gone is that bit of insight too.

Multi-Channel Social

Daft Punk Discovery

As a marketer I'm consumed with attribution and Google Analytics clearly understands the importance of multi-channel modeling. We even see the view-through metric in Google Adwords display campaigns.

The original source and exposure of content is of huge importance. Google might have Ripples but that only tells them how the content finally entered Google+ not how that content was discovered.

I'm certain that users will find alternatives because there is a need for this service. Google just won't know what new sites influencers might be reading more of or which sites might be waning with subject matter experts. Google will only see the trailing indicators, not the leading ones.

TL;DR

Google Reader allows information consumers - influencers and subject matter experts - to fuel social networks and help fulfill Google's core mission. Closing Google Reader will put that assistance in the hands of another company or companies and blinds Google to human evaluation data for an important set of users.

What I Learned In 2012

February 14 2013 // Career + Life // 53 Comments

2012 was a fantastic year for Blind Five Year Old. I met most of my goals, came to a few epiphanies but often found it difficult to juggle everything at once. In all, this is what I learned.

Stop Comparing

Comparison Is The Thief Of Joy

There are a number of 'names' in the SEO community and there's a growing trend to share your journey - to open the kimono so to speak. (Sort of like what I'm doing here which is going to be strange given my next statement.) The odd thing about this transparency is that it puts a bit of pressure on others. Or maybe that's just me.

I had a chance to sit and chat with Wil Reynolds. I talked with Rhea Drysdale. They were generous with their time and gave a lot of excellent advice. Yet for a brief while those conversations also made me feel pretty lousy.

I started wondering. Was I doing enough to build my company? Was I falling behind? After establishing myself and building my brand was I frittering it away? Would I just be a 'lifestyle business'? Shouldn't I get bigger and build an agency? Are they so much better at this stuff than me?

I came to realize that I wasn't enjoying my success. And that sucked, particularly because I was doing really well. So I decided to stop comparing my journey to those of others.

I am not Wil or Rhea or Rand. They all provide amazing advice based on their journey and personal situations. Mine will be different because I'm different. Hopefully I'll learn from their insight and experiences but I'm sure I'll make some of the same mistakes they made as well as others as I find my way. And that's okay.

Take Risks

Take Risks

One of my goals was to speak at two conferences in 2012. Mind you, I'd never spoken at a conference before and while I've done plenty of trainings in front of large groups this would definitely be out of my comfort zone. I'm still an introvert at heart.

I snagged a spot at SMX West 2012 to speak about Authorship. I worked on that deck for ages. I obsessed over it. Then I found out that the presenter notes wouldn't be available. Yikes! I was incredibly nervous but there were people like Aaron Friedman and Nick Roshon who were eager to see me present and gave me encouragement.

I was up there on stage with Dennis Goedegebuure. He's done a lot of speaking and seemed ... unflappable. "Do you still get nervous?" I asked him. "Oh yeah, every time" he replied. That made me feel better and helped me more than he probably realized. The presentation was ... okay. I think I read a bit too much, had slide problems and went long, which meant Vanessa was about ready to shove me off the stage.

It was done. It had gone well enough. People didn't heckle me and there weren't any Tweets about how much I sucked. The world kept spinning. I needed that experience because the next time I presented was at MozCon 2012 in front of about 800 or so people. Crazy! I'm not sure Rand knew this would only be my second presentation or not but I'm very thankful for the opportunity he provided.

With the help of some amazing advice I was able to build a much better deck this time. I was a total and complete wreck before I presented. So if you met me there before my slot I might have seemed a bit preoccupied. (A thank you to Mackenzie Fogelson, Pete Meyers and Cyrus Shepard for distracting me with interesting conversations.)

I think I did well. It felt ... good, which was an odd sensation for me. And the feedback and score I received validated my effort.

I've always taken risks throughout my career and that has to continue if I'm going to grow.

Retain Confidence

Have Confidence

I've had a crisis of confidence a few times in the past, mostly brought on by my own harsh criticism. That didn't happen this year but between comparing myself to others and working myself up into a lather about presenting, I may have had a few doubts here and there.

But you have to kick those gremlins out of your head. Confidence is so important. Don't confuse that with being a cocky douchebag. Confidence simply means that you know you've done everything you can do and that you're comfortable with what you're putting out there. It's also acknowledging that you're not always going to be right. That's life so get used to it and move on.

This piece from Todd Mintz was brave and worth reading. Todd's a smart and talented guy but he gets smarter and more skilled as time goes by. The post shows that we can only be confident about where we are at any given point in time. Will we make errors? Sure. But we learn from them and get better. Don't look back and let mistakes sap your confidence, let it fuel it instead.

Keep Learning

 

Keep Going

In this industry you simply must keep learning. My definition of SEO is quite broad, which means that I need to know a little bit about everything.

Everything is a lot! Some of it you're not going to understand at first but you have to keep pushing. Ask questions, even dumb ones. Just keep picking up new skills and experimenting. I can not stress enough how beneficial experiential learning is in this business. Don't just take my word or some expert's word on how something works, try it yourself.

Because we're in a post modern SEO era.

Postmodern SEO develop strategies and tactics based on individual context, not on preconceived “Best Practices,” or some blogger’s interpretation of “standards.” Instead we consider things like business goals, risk, longevity, audience and others to build individual strategies.

Do. The. Work.

Watch The Clock

Time Slipping Away

There are simply not enough hours in the day. Success has been great but it also means I'm juggling a lot more. I've got more clients. I've got a part-time writing gig at Marketing Land. I'm speaking at conferences. I'm keeping up on industry news. And the email just never stops.

I don't expect anyone to feel sorry for me. That would be ludicrous! These are good problems. But I haven't quite mastered how to balance everything yet. I've contemplated stopping my #ididnotwakeupin series. I've missed out on requests to contribute to articles. Sometimes things just fall through the cracks. And I hate that.

Through it all I have guarded my personal time. I'm still working more than I ever have, but I don't pull that many crazy hours. I take the time to build Legos with my daughter, play family games of Ticket To Ride, watch episodes of Nikita or just have an afternoon off with my wife.

Love Your Calendar

Mayan Calendar

The primary way I began to take back control of my time was to rely on my calendar. I started to put everything in my Google Calendar, including all those 'tentative' meetings. Because the worst thing that can happen is you tell three people you're available on a certain day and within the span of a few hours they all try to book the same time.

Not only were there fewer missed connections but I was able to see the time I had available for other work. It became more and more clear that I had to book hours to do the work too.

Keep Fit

Lets Get Physical

I also made time to workout. I lost 30 pounds and kept it off by counting calories and working our regularly. I admit, part of this was driven by pure vanity. I didn't want to stand up in front of a lot of people and look bad.

Besides the obvious health benefits, the other reason was also selfish. Staying fit made me a better thinker. Working out let me clear my head and afterwards I was definitely sharper. I think of working out a little like being organized. It takes a bit of time each week but it makes me a lot more productive.

Ditch False Modesty

Grumpy Cat

I ran into Marty Weintraub at both SMX West and MozCon. It was at the latter that he basically called me out. He complimented me on my presentation and I did the 'aw shucks, thank you, just trying my best' routine and he told me to stop with the false modesty and instead simply say thank you and accept the praise.

That doesn't come naturally to me but it was a turning point. I needed to embrace those who appreciated me. I mean, there are going to be plenty of folks who try to tear you down in life so when you're recognized as being good at something just run with it.

Overcome Guilt

The More You Care

That image will give you a headache right? And that's the point I'm making here. Guilt is awful but I've got a lot of it.

I don't have much guilt about 'making it'. I worked hard and put in a lot of time and effort. But I recognize that I didn't do it alone. I was helped by many many people along the way. So I try to do the same. But that's not always easy. I despair when I don't get back to someone's email or Google+ post.

I even have some guilt about mentioning some people in this post but not others. How can I leave out people like Matt McGeeAnthony Pensabene, Jon Henshaw, Bill Sebald, Zeph Snapp, Max Minzer and Tadeusz Szewczyk.

And I'm leaving a ton of other people out here! I don't want to slight anyone. I want to acknowledge their contribution and worth. I value my Google+ community. I care. A lot. Yet it's nearly impossible for me to communicate that. So I'm letting go of that guilt little by little.

Yet, I doubt I'll get rid of all my guilt because I think it makes me a better person.

Battle Perfectionism

Done Is Better Than Perfect

Am I a perfectionist? If you have to ask yourself that question I think you're likely closer to one than you might think. I have very high standards and I like to present things when I have pulled on every little thread and packaged it up into something that is appealing as well as informative.

This wreaks havoc with my time management and I try to live by the 'done is better than perfect' mantra. I nod my head when Jonathon Colman talks about it and often give this exact advice to others. Yet, I find it tough to follow in practice.

The reason why is that my quest for superior quality at all costs has netted me a really nice referral business. I know I should give myself a break but I fear the slippery slope of sloppy work.

Yet I'm beginning to see a light at the end of the tunnel as I work on some other projects and collaborate in different ways. That said, don't expect this to become a high volume blog ... ever.

Embrace The Unknown

Embrace The Unknown

I remember when I would interview for a job and I'd get that 'where do you see yourself in 5 years' question. Based on my life experience I was usually honest in telling people I had no idea. Shit happened and you just never could know how things would turn out. You can only open the door right in front of you and see where it goes.

So I don't know how Blind Five Year Old will grow, though I think it will. I don't know what new things I'll be doing this year. Maybe I'll build a product. Maybe I'll do more writing. Maybe I'll write a book. I just don't know yet and I'm okay with that.

It's not that I'm not ambitious or that I don't have goals. I am and I do. It's just figuring out which direction to go and opening that door.