Do 404 Errors Hurt SEO?

You Are Browsing The SEO Category

Do 404 Errors Hurt SEO?

February 01 2016 // SEO // 25 Comments

Do 404 errors hurt SEO? It’s a simple question. However, the answer is far from simple. Most 404 errors don’t have a direct impact on SEO, but they can eat away at your link equity and user experience over time.

There’s one variety of 404 that might be quietly killing your search rankings and traffic.

404 Response Code

Abandoned Building

What is a 404 exactly? A 404 response code is returned by the server when there is no matching URI. In other words, the server is telling the browser that the content is not found.

404s are a natural part of the web. In fact, link rot studies show that links regularly break. So what’s the big deal? It’s … complicated.

404s and Authority

Evaporation Example

One of the major issues with 404s is that they stop the flow of authority. It just … evaporates. At first, this sort of bothered me. If someone linked to your site but that page or content is no longer there the citation is still valid. At that point in time the site earned that link.

But when you start to think it through, the dangers begin to present themselves. If authority passed through a 404 page I could redirect that authority to pages not expressly ‘endorsed’ by that link. Even worse, I could purchase a domain and simply use those 404 pages to redirect authority elsewhere.

And if you’re a fan of conspiracies then sites could be open to negative SEO, where someone could link toxic domains to malformed URLs on your site.

404s don’t pass authority and that’s probably a good thing. It still makes sense to optimize your 404 page so users can easily search and find content on your site.

Types of 404s

Google is quick to say that 404s are natural and not to obsess about them. On the other hand, they’ve never quite said that 404s don’t matter. The 2011 Google post on 404s is strangely convoluted on the subject.

The last line of the first answer seems to be definitive but why not answer the question simply? I believe it’s because there’s a bit of nuance involved. And most people suck at nuance.

While the status code remains the same there are different varieties of 404s: external, outgoing and internal. These are my own naming conventions so I’ll make it clear in this post what I mean by each.

Because some 404s are harmless and others are downright dangerous.

External 404s

External 404s occur when someone else is linking to a broken page on your site. Even here, there is a small difference since there can be times when the content has legitimately been removed and other times when someone is linking improperly.

External 404 Diagram

Back in the day many SEOs recommended that you 301 all of your 404s so you could reclaim all the link authority. This is a terrible idea. I have to think Google looks for sites that employ 301s but have no 404s. In short, a site with no 404s is a red flag.

A request for domain.com/foobar should return a 404. Of course, if you know someone is linking to a page incorrectly, you can apply a 301 redirect to get them to the right page, which benefits both the user and the site’s authority.

External 404s don’t bother me a great deal. But it’s smart to periodically look to ensure that you’re capturing link equity by turning the appropriate 404s into 301s.

Outgoing 404s

Outgoing 404s occur when a link from your site to another site breaks and returns a 404. Because we know how often links evaporate this isn’t uncommon.

Outgoing 404 Diagram

Google would be crazy to penalize sites that link to 404 pages. Mind you, it’s about scale to a certain degree. If 100% of the external links on a site were going to 404 pages then perhaps Google (and users) would think differently about that site.

They could also be looking at the age of the link and making a determination on that as well. Or perhaps it’s fine as long as Google saw that the link was at one time a 200 and is now a 404.

Overall these are the least concerning of 404 errors. It’s still a good idea, from a user experience perspective, to find those outgoing 404s in your content and remove or fix the link.

Internal 404s

The last type of 404 is an internal 404. This occurs when the site itself is linking to another ‘not found’ page on their own site. In my experience, internal 404s are very bad news.

Internal 404 Diagram

Over the past two years I’ve worked on squashing internal 404s for a number of large clients. In each instance I believe that removing these internal 404s had a positive impact on rankings.

Of course, that’s hard to prove given all the other things going on with the site, with competitors and with Google’s algorithm. But all things being equal eliminating internal 404s seems to be a powerful piece of the puzzle.

Why Internal 404s Matter

If I’m Google I might look at the number of internal 404s as a way to determine whether the site is well cared for and has an attention to detail.

Does a high-quality site have a lot of internal 404s? Unlikely.

Taken a step further, could Google determine that the odds of a user encountering a 404 on a site and then use that to demote sites from search? I think it’s plausible. Google doesn’t want their users having a poor experience so they might steer folks away from a site they know has a high probability of ending in a dead end.

That leads me to think about the user experience when encountering one of these internal 404s. When a user hits one of these they blame the site and are far more likely to leave the site and return to the search results to find a better result for their query. This type of pogosticking is clearly a negative signal.

Internal 404s piss off users.

The psychology is different with an outgoing 404. I believe most users don’t blame the site for these but the target of the link instead. There’s likely some shared blame, but the rate of pogosticking shouldn’t be as high.

In my experience internal 404s are generally caused by bugs and absolutely degrade the user experience.

Finding Internal 404s

You can find 404s using Screaming Frog or Google Search Console. I’ll focus on Google Search Console here because I often wind up finding patterns of internal 404s this way.

In Search Console you’ll navigate to Crawl and select Crawl Errors.

404s in Google Search Console

At that point you’ll select the ‘Not found’ tab to find the list of 404s Google has identified. Click on one of these URLs and you get a pop-up where you can select the ‘Linked from’ tab.

Linked from Details on 404 Error

I was actually trying to get Google to recognize another internal 404 but they haven’t found it yet. Thankfully I muffed a link in one of my posts and the result looks like an internal 404.

Malformed Link Causes Internal 404

What you’re looking for are instances where your own site appears in the ‘Linked from’ section. On larger sites it can be easy to spot a bug that produces these types of errors by just checking a handful of these URLs.

In this case I’ll just edit the malformed link and everything will work again. It’s usually not that easy. Most often I’m filing tickets in a client’s project tracking system and making engineers groan.

Correlation vs Causation

Not This Again

Some of you are probably shrieking that internal 404s aren’t the problem and that Google has been clear on this issue and that it’s something else that’s making the difference. #somebodyiswrongontheinternet

You’re right and … I don’t care.

You know why I don’t care? Every time I clean up internal 404s, it produces results. I’m not particularly concerned about exactly why it works. Mind you, from an academic perspective I’m intrigued but from a consulting perspective I’m not.

In addition, if you’re in the new ‘user experience optimization’ camp, then eliminating internal 404s fits very nicely, doesn’t it? So is it the actual internal 404s that matter or the behavior of users once they are eliminated that matters or something else entirely? I don’t know.

Not knowing why eliminating internal 404s works isn’t going to stop me from doing it.

This is particularly true since 404 maintenance is entirely in our control. That doesn’t happen much in this industry. It’s shocking how many people ignore 404s that are staring them right in the face. Whether it’s not looking at Google Search Console or not tracking down the 404s that crop up in weblog reports or deep crawls.

Make it a habit to check and resolve your Not found errors via Search Console or Screaming Frog.

TL;DR

404 errors themselves may not directly hurt SEO, but they can indirectly. In particular, internal 404s can quietly tank your efforts, creating a poor user experience that leads to a low-quality perception and pogosticking behavior.

Acquisition SEO and Business Crowding

January 20 2016 // SEO // 25 Comments

There’s an old saying that if you can’t beat ’em, join ’em. But in search, that saying is often turning into something different.

If you can’t beat ’em, buy ’em.

Acquisition search engine optimization is happening more often as companies acquire or merge, effectively taking over shelf space on search results. Why settle for having the top result on an important term when I can have the first and second result?

They Live Movie Scene

Should this trend continue you could find search results where only a handful of companies are represented on the first page. That undermines search diversity, one of the fundamentals of Google’s algorithm.

This type of ‘business crowding’ creates false choice and is vastly more dangerous than the purported dread brought on by a filter bubble.

Acquisition SEO

SEO stands for search engine optimization. Generally, that’s meant to convey the idea that you’re working on getting a site to be visible and rank well in search engines.

However, you might see diminishing returns when you’re near the top of the results in important query classes. Maybe the battle with a competitor for those top slots is so close that the effort to move the needle is essentially ROI negative.

In these instances, more and more often, the way to increase search engine traffic, and continue on a growth trajectory, is through an acquisition.

Example of Acquisition SEO

That’s not to say that Zillow or Trulia is doing anything wrong. But it brings up a lot of thorny questions.

Search Shelf Space

Gary Larson False Choice Cartoon

About seven years ago I had an opportunity to see acquisition SEO up close and personal. Caring.com acquired Gilbert Guide and suddenly we had two results on the first page for an important query class in the senior housing space.

It’s hard not to get Montgomery Burns at that point and look at how you can dominate search results by having two sites. All roads lead to Rome as they say.

I could even rationalize that the inventory provided on each platform was different. A venn diagram would show a substantial overlap but there was plenty of non-shaded areas.

But who wants to maintain two sets of inventory? That’s a lot of operational and technical overhead. Soon you figure out that it’s probably better to have one set of inventory and syndicate it across both sites. Cost reduction and efficiency are powerful business tenets.

At that point the sites are, essentially, the same. They offer the same content (the inventory of senior housing options) but with different wrappers. It idea was awesome but also made my stomach hurt.

(Please note that this is not how these two sites are configured today.)

Host Crowding

Manspreading Example

The funny thing is that if I’d tried to do this with a subdomain on Caring.com I’d have run afoul of something Google calls host crowding.

Matt Cutts wrote about this back in 2007 in a post about subdomains and subdirectories.

For several years Google has used something called “host crowding,” which means that Google will show up to two results from each hostname/subdomain of a domain name. That approach works very well to show 1-2 results from a subdomain, but we did hear complaints that for some types of searches (e.g. esoteric or long-tail searches), Google could return a search page with lots of results all from one domain. In the last few weeks we changed our algorithms to make that less likely to happen.

In essence, you shouldn’t be able to crowd out competitors on a search result through the use of multiple subdomains. Now, host crowding or clustering as it’s sometimes called has seen an ebb and flow over time.

In 2010 Google loosened host crowding constraints when a domain was included in the query.

For queries that indicate a strong user interest in a particular domain, like [exhibitions at amnh], we’ll now show more results from the relevant site.

At the time Google showed 7 from amnh.org. Today they show 9.

In 2012 they tweaked things again to improve diversity but that didn’t make much of a dent and Matt was again talking about changes to host clustering in 2013. I think a good deal of the feedback was around the domination of Yelp.

I know I was complaining. My test Yelp query is [haircut concord ca], which currently returns 6 results from Yelp. (It’s 8 if you add the ‘&filter=0’ parameter on the end of the URL.)

I still maintain that this is not useful and that it would be far better to show fewer results from Yelp and/or place many of those Yelp results as sitelinks under one canonical Yelp result.

But I digress.

Business Crowding

Freedom of Choice by Devo

The problem here is that acquisition SEO doesn’t violate host crowding in the strict sense. The sites are on completely different domains. So a traditional host crowding algorithm wouldn’t group or cluster those sites together.

But make no mistake, the result is essentially the same. Except this time it’s not the same site. It’s the same business.

Business crowding is the advanced form of host crowding.

It can actually be worse since you could be getting the same content delivered from the same company under different domains.

The diversity of that result goes down and users probably don’t realize it.

Doorway Pages

When you think about it, business crowding essentially meets the definition of a doorway page.

Doorways are sites or pages created to rank highly for specific search queries. They are bad for users because they can lead to multiple similar pages in user search results, where each result ends up taking the user to essentially the same destination.

When participating in business crowding you do have similar pages in search results where the user is taken to the same content. It’s not the same destination but the net result is essentially the same. One of the examples cited lends more credence to this idea.

Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page

In business crowding you certainly have multiple domain names but there’s no funnel necessary. The content is effectively the same on those multiple domains.

Business crowding doesn’t meet the letter of the doorway page guidelines but it seems to meet the spirt of them.

Where To Draw The Line?

Fry Not Sure If ...

This isn’t a cut and dry issue. There’s quite a bit of nuance involved if you were to address business crowding. Lets take my example above from Caring.

If the inventory of Caring and Gilbert Guide were never syndicated, would that exempt them from business crowding? If the inventories became very similar over time, would it still be okay?

In essence, if the other company is run independently, then perhaps you can continue to take up search shelf space.

But what prevents a company from doing this multiple times and owning 3, 4 or even 5 sites ranking on the first page for a search result? Even if they’re independently run, over time it will make it more difficult for others to disrupt that space since the incumbents have no real motivation to improve.

With so many properties they’re very happy with the status quo and are likely not too concerned with any one site’s position in search as long as the group of sites continues to squeeze out the competition.

Perhaps you could determine if the functionality and features of the sites was materially different. But that would be pretty darn difficult to do algorithmically.

Or is it simply time based? You get to have multiple domains and participate in business crowding for up to, say, one year after the acquistion. That would be relatively straight-forward but would have a tremendous impact on the mergers and acquisitions space.

If Zillow knew that they could only count on the traffic from Trulia for one year after the acquisition they probably wouldn’t have paid $3.5 billion (yes that’s a ‘b’) for Trulia. In fact, the deal might not have gotten done at all.

So when we start talking about addressing this problem it spills out of search and into finance pretty quickly.

What’s Good For The User?

At the end of the day Google wants to do what is best for the user. Some of this is altruistic. Trust me, if you talk to some of the folks on Google’s search quality team, they’re serious about this. But obviously if the user is happy then they return to Google and perform more searches that wind up padding Google’s profits.

Doing good by the user is doing good for the business.

My guess is that most users don’t realize that business crowding is taking place. They may pogostick from one site to the other and wind up satisfied, even if those sites are owned by the same company. In other words, search results with business crowding may wind up producing good long click and time to long click metrics.

It sounds like an environment ripe for a local maxima.

If business crowding were eliminated then users would see more options. While some of the metrics might deteriorate in the short-term would they improve long-term as new entrants in those verticals provided value and innovation?

There’s only one way to find out.

Vacation Rentals

One area where this is currently happening is within the vacation rentals space.

Business Crowding Example

In this instance two companies (TripAdvisor and HomeAway) own the first six results across five domains. This happens relatively consistently in this vertical. (Please note that I do have a dog in this fight. Airbnb is a client.)

Are these sites materially different? Not really. HomeAway makes the syndication of your listing a selling point.

HomeAway Business Crowding

Not only that but if you look at individual listings on these sites you find that there are rel=canonicals in place.

Rel Canonical to VRBO

In this instance the property listings on VacationRentals and HomeAway point to the one on VRBO.

The way the inventory is sorted on each of these sites is different but it doesn’t seem like the inventory itself is all that different at the end of the day.

TripAdvisor doesn’t do anything with canonicals but they do promote syndication as a feature.

Syndication Selling Point on Vacation Home Rentals

A venn diagram of inventory between TripAdvisor properties would likely show a material overlap but with good portions unshaded. They seem to have a core set of inventory that is on all properties but aren’t as aggressive with full on syndication.

Let me be clear here. I don’t blame these companies for doing what they’re doing. It’s smart SEO and it’s winning within the confines of Google’s current webmaster guidelines.

My question is whether business crowding is something that should be addressed? What happens if this practice flourishes?

Is the false choice being offered to users ultimately detrimental to users and, by proxy, to Google?

The Mid-Life Crisis of Search Results

Thoughtful Cat is Thoughtful

Search hasn’t been around for that long in the scheme of things. As the Internet evolved we saw empires rise and fall as new sites, companies and business models found success.

Maybe you remember Geocities or Gator or Lycos or AltaVista or Friendster. Now, none of these fall into the inventory based sites I’ve referenced above but I use them as proxies. When it comes to social, whether you’re on Facebook or Instragram or WhatsApp, one company is still in control there.

Successful companies today are able to simply buy competitors and upstarts to solidify their position. Look no further than online travel agencies where Expedia now owns both Travelocity and Orbitz.

The days in which successful sites could rise and fall – and I mean truly fall – seem to be behind us.

The question is whether search results should reflect and reinforce this fact or if it should instead continue to reflect diversity. It seems like search is at a crossroads of sorts as the businesses that populate results have matured.

Can It Be Addressed Algorithmically?

The next question that comes to mind is whether Google could actually do anything about business crowding. We know Google isn’t going to do anything manual in nature. They’d want to implement something that dealt with this from an algorithmic perspective.

I think there’s a fairly straight forward way Google could do this via the Knowledge Graph. Each business is an entity and it would be relatively easy to map the relationship between each site as a parent child relationship.

Some of this can be seen in the remnants of Freebase and their scrape of CrocTail, though the data probably needs more massaging. But it’s certainly possible to create and maintain these relationships within the Knowledge Graph.

Once done, you can attach a parent company to each site and apply the same sort of host crowding algorithm to business crowding. This doesn’t seem that farfetched.

But the reality of implementing this could have serious implications and draw the ire of a number of major corporations. And if users really don’t know that it’s all essentially the same content I’m not sure Google has the impetus to do anything about it.

Too Big To Fail (at Search)

Having made these acquisitions under the current guidelines, could Google effectively reduce business crowding without creating a financial meltdown for large corporate players.

Organic Traffic via Similar Web for Trulia

SimilarWeb shows that Trulia gets a little over half of its traffic from organic search. Any drastic change to that channel would be a material event for the parent company.

Others I’ve mentioned in this post are less dependent on organic search to certain degrees but a business crowding algorithm would certainly be a bitter pill to swallow for most.

Selfishly, I’d like to see business crowding addressed because it would help one of my clients, Airbnb, to some degree. They’d move up a spot or two and gain additional exposure and traffic.

But there’s a bigger picture here. False diversity is creeping into search. If you extrapolate this trend search results become little more than a corporate shell game.

On the other hand, addressing business crowding could dramatically change the way sites deal with competitors and how they approach mergers and acquisitions. I can’t predict how that would play out in the short or long-term.

What do you think? I’m genuinely interested in hearing your thoughts on this topic so please jump in with your comments.

Is Click Through Rate A Ranking Signal?

June 24 2015 // SEO // 60 Comments

Signs Point To Yes!

Are click through rates on search results a ranking signal? The idea is that if the third result on a page is clicked more often than the first that it will, over time, rise to the second or first result.

I remember this question being asked numerous times when I was just starting out in the industry. Google representatives employed a potent combination of tap dancing and hand waving when asked directly. They were so good at doing this that we stopped hounding them and over the last few years I rarely hear people talking about, let alone asking, this question.

Perhaps it’s because more and more people aren’t focused on the algorithm itself and are instead focused on developing sites, content and experiences that will be rewarded by the algorithm. That’s actually the right strategy. Yet I still believe it’s important to understand the algorithm and how it might impact your search efforts.

Following is an exploration of why I believe click-through rate is a ranking signal.

Occam’s Razor

Pile of Pennies

Though the original principle wasn’t as clear cut, today’s interpretation of Occam’s Razor is that the simplest answer is usually the correct one. So what’s more plausible? That Google uses click-through rate as a signal or that the most data driven company in the world would ignore direct measurement from their own product?

It just seems like common sense, doesn’t it? Of course, we humans are often wired to make poor assumptions. And don’t get me started on jumping to conclusions based on correlations.

The argument against is that even Google would have a devil of a time using click-through rate as a signal across the millions of results for a wide variety of queries. Their resources are finite and perhaps it’s just too hard to harness this valuable but noisy data.

The Horse’s Mouth

It gets more difficult to make the case against Google using click-through rate as a signal when you get confirmation right from the horse’s mouth.

Google confirms watching clicks to evaluate results quality. FYI Google still won’t say if clicks used as rank signal pic.twitter.com/jzNGc5reQk

— Danny Sullivan (@dannysullivan) March 25, 2015

That seems pretty close to a smoking gun doesn’t it?

Now, perhaps Google wants to play a game of semantics. Click-through rate isn’t a ranking signal. It’s a feedback signal. It just happens to be a feedback signal that influences rank!

Call it what you want, at the end of the day it sure sounds like click-through rate can impact rank.

[Updated 7/22/15]

Want more? I couldn’t find this quote the first time around but here’s Marissa Mayer in the FTC staff report (pdf) on antitrust allegations.

According to Marissa Mayer, Google did not use click-through rates to determine the position of the Universal Search properties because it would take too long to move up on the SERP on the basis of user click-through rate.

In other words, they ignored click data to ensure Google properties were slotted in the first position.

Then there’s former Google engineer Edmond Lau in an answer on Quora.

It’s pretty clear that any reasonable search engine would use click data on their own results to feed back into ranking to improve the quality of search results. Infrequently clicked results should drop toward the bottom because they’re less relevant, and frequently clicked results bubble toward the top. Building a feedback loop is a fairly obvious step forward in quality for both search and recommendations systems, and a smart search engine would incorporate the data.

So is Google a reasonable and smart search engine?

The Old Days

There are other indications that Google has the ability to monitor click activity on a query by query basis, and that they’ve had that capability for dog years.

Here’s an excerpt from a 2007 interview with Marissa Mayer, then VP of Search Products, on the implementation of the OneBox.

We hold them to a very high click through rate expectation and if they don’t meet that click through rate, the OneBox gets turned off on that particular query. We have an automated system that looks at click through rates per OneBox presentation per query. So it might be that news is performing really well on Bush today but it’s not performing very well on another term, it ultimately gets turned off due to lack of click through rates. We are authorizing it in a way that’s scalable and does a pretty good job enforcing relevance.

So way back in 2007 (eight years ago folks!) Google was able to create a scalable solution to using click-through rate per query to determine the display of a OneBox.

That seems to poke holes in the idea that Google doesn’t have the horsepower to use click-through rate as a signal.

The Bing Argument

Anything You Can Do I Can Do Better

Others might argue that if Bing is using click-through rate as a signal that Google surely must be as well. Here’s what Duane Forrester, Senior Product Manager for Bing Webmaster Outreach (or something like that) said to Eric Enge in 2011.

We are looking to see if we show your result in a #1, does it get a click and does the user come back to us within a reasonable timeframe or do they come back almost instantly?

Do they come back and click on #2, and what’s their action with #2? Did they seem to be more pleased with #2 based on a number of factors or was it the same scenario as #1? Then, did they click on anything else?

We are watching the user’s behavior to understand which result we showed them seemed to be the most relevant in their opinion, and their opinion is voiced by their actions.

This and other conversations I’ve had make me confident that click-through rate is used as a ranking signal by Bing. The argument against is that Google is so far ahead of Bing that they may have tested and discarded click-through rate as a signal.

Yet as other evidence piles up, perhaps Google didn’t discard click-through rate but simply uses it more effectively.

Pogosticking and Long Clicks

Duane’s remarks also tease out a little bit more about how click-through rate would be used and applied. It’s not a metric used in isolation but measured in terms of time spent on that clicked result, whether they returned to the SERP and if they then refined their search or clicked on another result.

When you really think about it, if pogosticking and long clicks are real measures then click-through rate must be part of the equation. You can’t calculate the former metrics without having the click-through rate data.

And when you dig deeper Google does talk about ‘click data’ and ‘click signals’ quite a bit. So once again perhaps it’s all a game of semantics and the equivalent to Bill Clinton clarifying the meaning of ‘is’.

Seeing Is Believing

A handful of prominent SEOs have tested whether click-through rate influences rank. Rand Fishkin has been leading that charge for a number of years.

Back in May of 2014 he performed a test with some interesting results. But it was a long-tail term and other factors might have explained the behavior.

But just the other day he ran another version of the same test.

If you’re curious – results came in rather quickly. Will be interesting to see how long it lasts. pic.twitter.com/I9mJLj2GMq — Rand Fishkin (@randfish) June 21, 2015

However, critics will point out that the result in question is once again at #4, indicating that click-through rate isn’t a ranking signal.

But clearly the burst of searches and clicks had some sort of effect, even if it was temporary, right? So might Google have developed mechanisms to combat this type of ‘bombing’ of click-through rate? Or perhaps the system identifies bursts in query and clicks and reacts to meet a real time or ‘fresh’ need?

Either way it shows that the click-through behavior is monitored. Combined with the admission from Udi Manber it seems like the click-through rate distribution has to be consistently off of the baseline for a material amount of time to impact rank.

In other words, all the testing in the world by a small band of SEOs is a drop in the ocean of the total click stream. So even if we can move the needle for a small time, the data self-corrects.

But Rand isn’t the only one testing this stuff. Darren Shaw has also experimented with this within the local SEO landscape.

User Behavior and Local Search – State of Search 2014

Darren’s results aren’t fool proof either. You could argue that Google representatives within local might not be the most knowledgable about these things. But it certainly adds to a drumbeat of evidence that clicks matter.

But wait, there’s more. Much more.

Show Me The Patents

She Blinded Me With Science

For quite a while I was conflicted about this topic because of one major stumbling block. You wouldn’t be able to develop a click-through rate model based on all the various types of displays on a result.

The result that had a review rich snippet gets a higher click-through rate because the eye gravitates to it. Google wouldn’t want to reward that result from a click-through rate perspective just because of the display.

Or what happens when the result has an image result or a answer box or video result or any number of different elements? There seemed to be too many variations to create a workable model.

But then I got hold of two Google patents titled Modifying search result ranking based on implicit user feedback and Modifying search result ranking based on implicit user feedback and a model of presentation bias.

The second patent seems to build from the first with the inventor in common being Hyung-Jin Kim.

Both of these are rather dense patents and it reminds me that we should all thank Bill Slawski for his tireless work in reading and rendering patents more accessible to the community.

I’ll be quoting from both patents (there’s a tremendous amount of overlap) but here’s the initial bit that encouraged me to put the headphones on and focus on decoding the patent syntax.

The basic rationale embodied by this approach is that, if a result is expected to have a higher click rate due to presentation bias, this result’s click evidence should be discounted; and if the result is expected to have a lower click rate due to presentation bias, this result’s click evidence should be over-counted.

Very soon after this the patent goes on to detail the number of different types of presentation bias. So this essentially means that Google saw the same problem but figured out how to deal with presentation bias so that it could rely on ‘click evidence’.

Then there’s this rather nicely summarized 10,000 foot view of the issue.

In general, a wide range of information can be collected and used to modify or tune the click signal from the user to make the signal, and the future search results provided, a better fit for the user’s needs. Thus, user interactions with the rankings presented to the users of the information retrieval system can be used to improve future rankings.

Again, no one is saying that click-through rate can be used in isolation. But it clearly seems to be one way that Google has thought about re-ranking results.

But it gets better as you go further into these patents.

The information gathered for each click can include: (1) the query (Q) the user entered, (2) the document result (D) the user clicked on, (3) the time (T) on the document, (4) the interface language (L) (which can be given by the user), (5) the country (C) of the user (which can be identified by the host that they use, such as www-google-co-uk to indicate the United Kingdom), and (6) additional aspects of the user and session. The time (T) can be measured as the time between the initial click through to the document result until the time the user comes back to the main page and clicks on another document result. Moreover, an assessment can be made about the time (T) regarding whether this time indicates a longer view of the document result or a shorter view of the document result, since longer views are generally indicative of quality for the clicked through result. This assessment about the time (T) can further be made in conjunction with various weighting techniques.

Here we see clear references to how to measure long clicks and later on they even begin to use the ‘long clicks’ terminology. (In fact, there’s mention of long, medium and short clicks.)

But does it take into account different classes of queries? Sure does.

Traditional clustering techniques can also be used to identify the query categories. This can involve using generalized clustering algorithms to analyze historic queries based on features such as the broad nature of the query (e.g., informational or navigational), length of the query, and mean document staytime for the query. These types of features can be measured for historical queries, and the threshold(s) can be adjusted accordingly. For example, K means clustering can be performed on the average duration times for the observed queries, and the threshold(s) can be adjusted based on the resulting clusters.

This shows that Google may adjust what they view as a good click based on the type of query.

But what about types of users. That’s when it all goes to hell in a hand basket right? Nope. Google figured that out.

Moreover, the weighting can be adjusted based on the determined type of the user both in terms of how click duration is translated into good clicks versus not-so-good clicks, and in terms of how much weight to give to the good clicks from a particular user group versus another user group. Some user’s implicit feedback may be more valuable than other users due to the details of a user’s review process. For example, a user that almost always clicks on the highest ranked result can have his good clicks assigned lower weights than a user who more often clicks results lower in the ranking first (since the second user is likely more discriminating in his assessment of what constitutes a good result).

Users are not created equal and Google may weight the click data it receives accordingly.

But they’re missing the boat on topical expertise, right? Not so fast!

In addition, a user can be classified based on his or her query stream. Users that issue many queries on (or related to) a given topic (e.g., queries related to law) can be presumed to have a high degree of expertise with respect to the given topic, and their click data can be weighted accordingly for other queries by them on (or related to) the given topic.

Google may identify topical experts based on queries and weight their click data more heavily.

Frankly, it’s pretty amazing to read this stuff and see just how far Google has teased this out. In fact, they built in safeguards for the type of tests the industry conducts.

Note that safeguards against spammers (users who generate fraudulent clicks in an attempt to boost certain search results) can be taken to help ensure that the user selection data is meaningful, even when very little data is available for a given (rare) query. These safeguards can include employing a user model that describes how a user should behave over time, and if a user doesn’t conform to this model, their click data can be disregarded. The safeguards can be designed to accomplish two main objectives: (1) ensure democracy in the votes (e.g., one single vote per cookie and/or IP for a given query-URL pair), and (2) entirely remove the information coming from cookies or IP addresses that do not look natural in their browsing behavior (e.g., abnormal distribution of click positions, click durations, clicks_per_minute/hour/day, etc.). Suspicious clicks can be removed, and the click signals for queries that appear to be spammed need not be used (e.g., queries for which the clicks feature a distribution of user agents, cookie ages, etc. that do not look normal).

As I mentioned, I’m guessing the short-lived results of our tests are indicative of Google identifying and then ‘disregarding’ that click data. Not only that, they might decide that the cohort of users who engage in this behavior won’t be used (or their impact will be weighted less) in the future.

What this all leads up to is a rank modifier engine that uses implicit feedback (click data) to change search results.

How Google Uses Click Data To Modify Rank

Here’s a fairly clear description from the patent.

A ranking sub-system can include a rank modifier engine that uses implicit user feedback to cause re-ranking of search results in order to improve the final ranking presented to a user of an information retrieval system.

It tracks and logs … everything and uses that to build a rank modifier engine that is then fed back into the ranking engine proper.

But, But, But

Castle Is Speechless

Of course this type of system would get tougher as more of the results were personalized. Yet, the way the data is collected seems to indicate that they could overcome this problem.

Google seems to know the inherent quality and relevance of a document, in fact of all documents returned on a SERP. As such they can apply and mitigate the individual user and presentation bias inherent in personalization.

And it’s personalization where Google admits click data is used. But they still deny that it’s used as a ranking signal.

Perhaps it’s a semantics game and if we asked if some combination of ‘click data’ was used to modify results they’d say yes. Or maybe the patent work never made it into production. That’s a possibility.

But looking at it all together and applying Occam’s Razor I tend to think the click-through rate is used as a ranking signal. I don’t think it’s a strong signal but it’s a signal none the less.

Why Does It Matter?

You might be asking, so freaking what? Even if you believe click-through rate is a ranking signal, I’ve demonstrated that manipulating it may be a fool’s errand.

The reason click-through rate matters is that you can influence it with changes to your title tag and meta description. Maybe it’s not enough to tip the scales but trying is better than not isn’t it?

Those ‘old school’ SEO fundamentals are still important.

Or you could go the opposite direction and build your brand equity through other channels to the point where users would seek out your brand in search results irrespective of position.

Over time, that type of behavior could lead to better search rankings.

TL;DR

The evidence suggests that Google does use click-through rate as a ranking signal. Or, more specifically, Google uses click data as an implicit form of feedback to re-rank and improve search results.

Despite their denials, common sense, Google testimony and interviews, industry testing and patents all lend credence to this conclusion.

Do You Even Algorithm, Google?

June 19 2015 // SEO // 18 Comments

It has been 267 days since the last Panda update. That’s 8 months and 25 days.

Adventure Time Lemongrab Unacceptable

Where’s My Panda Update?

Obviously I’m a bit annoyed that there hasn’t been a Panda update in so long because I have a handful of clients who might (fingers crossed) benefit from having it deployed. They were hit and they’ve done a great deal of work cleaning up their sites so that they might get back into Google’s good graces.

I’m not whining about it (much). That’s the way the cookie crumbles and that’s what you get when you rely on Google for a material amount of your traffic.

Google shouldn’t be concerned about specific sites caught in limbo based on their updates. The truth, hard as it is to admit, is that very few sites are irreplaceable.

You could argue that Panda is punitive and that not providing an avenue to recovery is cruel and unusual punishment. But if you can’t do the time, don’t do the crime.

Do You Even Algorithm, Google?

Why haven’t we seen a Panda update in so long? It seemed to be one of Google’s critical components in ensuring quality search results, launched in reaction to a rising tide of complaints from high-profile (though often biased) individuals.

Nine months is a long time. I’m certain there are sites in Panda jail right now that shouldn’t be and other sites that may be completely new or have risen dramatically in that time that deserve to be Pandalized.

In an age of agile development and the two week sprint cycle, nine months is an eternity. Heck, we’ve minted brand new spanking humans in that span of time!

Fewer Panda updates equal lower quality search results.

Google should want to roll out Panda updates because without them search results get worse. Bad actors creep into the results and reformed sites that could improve results continue to be demoted.

The Panda Problem

Does the lack of Panda updates point to a problem with Panda itself? Yes and no.

My impression is that Panda continues to be a very resource intensive update. I have always maintained that Panda aggregates individual document scores on a site.

panda document scores

The aggregate score determines whether you are below or above the Panda cut line.

As Panda evolved I believe the cut line has become dynamic based on the vertical and authority of a site. This would ensure that sites that might look thin to Google but are actually liked by users avoid Panda jail. This is akin to ensuring the content equivalent of McDonald’s is still represented in search results.

But think about what that implies. Google would need to crawl, score and compute every site across the entire web index. That’s no small task. In May John Mueller related that Google was working to make these updates faster. But he said something very similar about Penguin back in September of 2014.

I get that it’s a big task. But this is Google we’re talking about.

Search Quality Priorities

I don’t doubt that Google is working on making Panda and Penguin faster. But it’s clearly not a priority. If it was, well … we’d have seen an update by now.

Because we’ve seen other updates. There’s been Mobilegeddon (the Y2K of updates) a Doorway Page Update, The Quality Update and the Colossus Update just the other day. And there’s a drum beat of advancements and work to leverage entities for both algorithmic ranking and search display.

The funny thing is, the one person who might have helped boost Panda as a priority is no longer there. That’s right, Matt Cutts no longer attends the weekly search quality meeting.

As the industry’s punching bag, Matt was able to bring our collective ire and pain to the Googleplex.

Now, I’m certain John Mueller and Gary Illyes both get an earful and are excellent ambassadors. But do they have the pull that Matt had internally? No way.

Eating Cake

Cat Eating Cake

We keep hearing that these updates are coming soon. That they’ll be here in a month or a few weeks. There are only so many times you can hear this before you start to roll your eyes and silently say ‘I’ll believe it when I see it.’

What’s more, if Panda still improves search quality then the lack of an update means search quality is declining. Other updates may have helped stem the tide but search quality isn’t optimized.

You can quickly find a SERP that has a thin content site ranking well. (In fact, I encourage you to find and post links to those results in the comments.)

Perhaps Google wants to move away from Panda and instead develop other search quality signals that better handle this type of content. That would be fine, yet it’s obvious that Panda is still in effect. So logically that means other signals aren’t strong enough yet.

At the end of the day it’s not about my own personal angst or yours. It’s not about personal stories of Panda woe as heartbreaking as some of them may be. This is about search quality and putting your money (resources) where your mouth is.

You can’t have your cake and eat it too.

TL;DR

It’s been nearly nine months since the last Panda update. If Panda improves search quality then the prolonged delay means search quality is declining.

My Favorite SEO Tool

March 24 2015 // SEO // 35 Comments

My favorite SEO tool isn’t an SEO tool at all. Don’t get me wrong, I use and like plenty of great SEO tools. But I realized that I was using this one tool all the time.

Chrome Developer Tools how I love thee, let me count the ways.

Chrome Developer Tools

The one tool I use countless times each day is Chrome Developer Tools. You can find this handy tool under the View -> Developer menu in Chrome.

chrome-developer-tools

Or you can simply right click and select Inspect Element. (I suppose the latter is actually easier.) Here’s what it looks like (on this site) when you open Chrome Developer Tools.

Chrome Developer Tools In Action

There is just an incredible amount of functionality packed into Chrome Developer Tools. Some of it is super technical and I certainly don’t use all of the features. I’m only going to scratch the surface with this post.

But hopefully you’re not overwhelmed by it all because there are some simple features that are really helpful on a day-to-day basis.

Check Status Codes

One of the simplest things to do is to use the Network tab to check on the status code of a page. For instance, how does a site handle domain level canonicalization.

Chrome Developer Tools Network Tab

With the Network tab open I go directly to the non-www version of this site and I can see how it redirects to the www version. In this case it’s doing exactly what it’s supposed to do.

If I want more information I can click on any of these line items and see the headers information.

Chrome Developer Tools Network Detail

You can catch some pretty interesting things by looking at what comes through the Network tab. For instance, soon after a client transitioned from http to https I noted the following response code chain.

An https request for a non-www URL returned a 301 to the www http version (domain level canonicalization) and then did another 301 to the www https version of that URL.

The double 301 and routing from https to http and back again can (and should) be avoided by doing the domain level canonicalization and https redirect at the same time. So that’s what we did … in the span of an hour!

I won’t get into the specifics of what you can tease out of the headers here because it would get way too dense. But suffice to say it can be a treasure of information.

Of course there are times I fire up something more detailed like Charles or Live HTTP Headers, but I’m doing so less frequently given the advancements in Chrome Developer Tools.

Check Mobile

There was a time when checking to see how a site would look on mobile was a real pain in the ass. But not with Chrome Developer Tools!

Chrome Developer Tools Viewport Rendering

The little icon that looks like mobile phone is … awesome. Click it!

Chrome Developer Tools Select Mobile Device

Now you can select a Device and reload the page to see how it looks on that device. Here’s what this site looks like on mobile.

Chrome Developer Tools Mobile Test

The cool thing is you can even click around and navigate on mobile in this interface to get a sense of what the experience is really like for mobile users without firing up your own phone.

A little bonus tip here is that you can clear the device by clicking the icon to the left and then use the UA field to do specific User Agent (UA) testing.

For instance, without a Device selected what happens when Googlebot Smartphone hits my site. All I have to do is use the UA override and put in the Googlebot Smartphone User Agent.

Sure enough it looks like Googlebot Smartphone will see the page correctly. This is increasingly important as we get closer to the 4/21/15 mopocalypse.

You can copy and paste from the Google Crawlers list or use one of a number of User Agent extensions (like this one) to do this. However, if you use one of the User Agent extensions you won’t see the UA show up in the UA field. But you can confirm it’s working via the headers in the Network tab.

Show Don’t Tell

The last thing I’ll share is how I use Chrome Developer Tools to show instead of tell clients about design and readability issues.

If you go back to some of my older posts you’ll find that they’re not as readable. I had to figure this stuff out as I went along.

This is a rather good post about Five Foot Web Design, which pretty much violates a number of the principles described in the piece. I often see design and readability issues and it can be difficult for a client to get that feedback, particularly if I’m just pointing out the flaws and bitching about it.

So instead I give them a type of side-by-side comparison by editing the HTML in Chrome Developer Tools and then taking a screen capture of the optimized version I’ve created.

You do this by using the Elements tab (1) and then using the Inspect tool (2) to find the area of the code you want to edit.

The inspect tool is the magnifying glass if you’re confused and it just lets you sort of zero in on the area of that page. It will highlight the section on the page and then show where that section resides in the code below.

Now, the next step can be a bit scary because you’re just wading into the HTML to tweak what the page looks like.

A few things to remember here. You’re not actually changing the code on that site or page. You can’t hurt that site by playing with the code here. Trust me, I screw this up all the time because I know just enough HTML and CSS to be dangerous.

In addition, if you reload this page after you’ve edited it using Chrome Developer Tools all of your changes will vanish. It’s sort of like an Etch-A-Sketch. You doodle on it and then you shake it and it disappears.

So the more HTML you know the more you can do in this interface. I generally just play with stuff until I get it to look how I want it to look.

Here I’ve added a header of sorts and changed the font size and line height. I do this sort of thing for a number of clients so I can show them what I’m talking about. A concrete example helps them understand and also gives them something to pass on to designers and developers.

TL;DR

Chrome Developer Tools is a powerful suite of tools that any SEO should be using to make their lives easier and more productive.

Non-Linking URLs Seen As Links

March 20 2015 // SEO // 27 Comments

(This post has been updated so make sure you read all the way to the bottom.)

Are non-linking URLs (pasted URLs) seen as links by Google? There’s long been chatter and rumor that they do among various members of the SEO community. I found something the other day that seems to confirm this.

Google Webmaster Tools Crawl Errors

I keep a close eye on the Crawl Errors report in Google Webmaster Tools with a particular focus on ‘Not found’ errors. I look to see if they’re legitimate and whether they’re linked internally (which is very bad) or externally.

The place to look for this information is in the ‘Linked from’ tab of a specific error.

Linked From Tab on 404 Error

Now, all too often the internal links presented here are woefully out-of-date (and that’s being generous.) You click through, search for the link in the code and don’t find it. Again and again and again. Such was the case here. This is extremely annoying but is a topic for another blog post.

Instead let’s focus on that one external link. Because I figured this was the reason Google continued to return the page as an error even though 1stdibs had stopped linking to it ages ago.

Pasted URL Seen As Link?

That’s not a link! It’s a pasted URL but it’s not a link. (Ignore the retargeted ad.) Looking at the code there’s no <a> tag. Maybe it was there and then removed but that … doesn’t seem likely. In addition, I’ve seen a few more examples of this behavior but didn’t capture them at the time and have since marked those errors as fixed. #kickingmyself

Google (or a tool Google provides) is telling me that the page in question links to this 404 page.

Non-Linking URLs Treated As Links?

This Is Not A Link

It’s not a stretch to think that Google would be able to recognize the pattern of a URL in text and, thus, treat it as a link. And there are good reasons why they might want to since many unsophisticated users botch the HTML.

By treating pasted URLs as links Google can recover those citations, acknowledge the real intent and pass authority appropriately. (Though it doesn’t look like they’re doing that but instead using it solely for discovery.)

All of this is interesting from an academic perspective but doesn’t really change a whole lot in the scheme of things. Hopefully you’re not suddenly thinking that you should go out and try to secure non-linking URLs. (Seriously, don’t!)

What’s your take? Is this the smoking gun proof that Google treats non-linking URLs as links?

[Update]

Apparently John Mueller confirmed this in a Google+ Hangout back in September of 2013. So while seeing it in Google Webmaster Tools might be new(ish), Google clearly acknowledges and crawls non-linked URLs. Thanks to Glenn Gabe for pointing me to this information.

In addition, Dan Petrovic did a study to determine if non-linking URLs influenced rankings and found it likely that they did not. This makes a bit of sense since you wouldn’t be able to nofollow these pasted URLs, opening the door to abuse via blog comments.

Aggregating Intent

March 13 2015 // SEO // 16 Comments

Successful search engine optimization strategies must aggregate intent. This is something I touched on in my What Is SEO post and also demonstrated in my Rich Snippets Algorithm piece. But I want to talk about it in depth because it’s that important.

Aggregating Intent

Many of Google’s Knowledge Cards aggregate intent. Here’s the Knowledge Card displayed when I search for ‘va de vi’.

Knowledge Card Aggregates Intent

Google knows that Va de Vi is a restaurant. But they don’t quite know what my intent is behind such a broad query. Before Knowledge Cards Google would rely on providing a mixture of results to satisfy different intents. This was effective but inefficient and incomplete. Knowledge Cards makes aggregating intent a breeze.

What type of restaurant is it? Is it expensive? Where is it? How do I get there? What’s their phone number? Can I make a reservation? What’s on the menu? Is the food good? Is it open now? What alternatives are nearby?

Just look at that! In one snapshot this Knowledge Card satisfies a multitude of intents and does so quickly.

It’s not just restaurants either. Here’s a Knowledge Card result for ‘astronautalis’.

Aggregating Intent in Google Knowledge Cards

Once again you can see a variety of intents addressed by this Knowledge Card. Who is Astronautalis? Can I listen to some of his music? Where is he playing next? What are some of his popular songs? How can I connect with him? What albums has he released?

Google uses Knowledge Cards to quickly aggregate multiple intents and essentially cover all their bases when it comes to entity based results. If it’s good enough for Google shouldn’t it be good enough for you?

Active and Passive Intent

Aggregating Intent

So how does this translate into the search strategies you and I can implement? The easiest way to think about this is to understand that each query comes with active and passive intent.

Active intent is the intent that is explicitly described by the query syntax. A search for ‘bike trails in walnut creek’ is explicitly looking for a list of bike trails in walnut creek. (Thank you captain obvious.)

You must satisfy active intent immediately.

If a user doesn’t immediately see that their active intent has been satisfied they’re going to head back to search results. Trust me, you don’t want that. Google doesn’t like pogosticking. This means that at a glance users must see the answer to their active intent.

One of the mistakes I see many making is addressing active and passive intent equally. Or simply not paying attention to query syntax and decoding intent properly. More than ever, your job as an SEO is to extract intents from query syntax.

Passive intent is the intent that is implicitly described by the query syntax. A search for ‘bike trails in walnut creek’ is implicitly looking for trail maps, trail photos, trail reviews and attributes about those trails such as difficulty and length to name a few.

You create value by satisfying passive intent.

When you satisfy passive intent you’ll see page views per session and time on site increase. You’re ensuring that your site generates long clicks, which is incredibly important from a search engine perspective. It also happens to be the way you build your brand, convert users and ween yourself from being overly dependent on search engine traffic.

I think one of the best ways to think about passive intent is to ask yourself what the user would search for next … over and over again.

Intent Hierarchy

First You Looked Here, Then Here

It’s essential to understand the hierarchy of intent so you can deliver the right experience. This is where content and design collide with “traditional” search. (I use the quotes here because I’ve never really subscribed to search being treated as a niche tactic.)

SEO is a user centric activity in this context. The content must satisfy active and passive intent appropriately. Usually this means that there is ample content to address active intent and units or snippets to satisfy passive intent.

The design must prominently feature active intent content while providing visual cues or a trail of sorts to show that passive intent can also be satisfied. These things are important to SEO.

We can look at Google’s Knowledge Cards to see how they prioritize intent. Sometimes it’s the order in which the content is presented. For instance, usually the ‘people also search for’ is at the bottom of the card. These alternatives always represent passive intent.

For location based entities the map and directions are clearly given more priority by being at the top (and having a strong call to action). While the reviews section is often presented later on, it takes up a material amount of real estate (look into https://webuyhousesinatlanta.com/ site to get daily updates about the real estate trend), signaling higher (and potentially active) intent. Those with more passive intent (address, phone, hours etc.) are still available but are not given as high a weight visually.

For an artist (such as Astonautalis) you’ll see that listening options are presented first. Yes, it’s an ad based unit but it also makes sense that this would be an active intent around these queries.

It’s up to us to work with content and design teams to ensure the hierarchy of intent is optimized. Simply putting everything on the page at once or with equal weight will distract or overwhelm the user and chase them back to search results or a competitor.

Decoding Intent

Decoding Intent

While the days of having one page for every variant of query syntax are behind us, we’re still not at the point where one page can address every query syntax and the intents behind them.

If I search for ‘head like a hole lyrics’ the page I reach should satisfy my active intent and deliver the lyrics to this epic NIN song. To serve passive intent I’d want to see a crosslink unit to other songs from Pretty Hate Machine as well as other NIN albums. Maybe there’s another section with links to songs with similar themes.

But if I search for ‘pretty hate machine lyrics’ the page I reach should have a list of songs from that album with each song linking to a page with its lyrics. The crosslink unit on this page would be to other NIN albums and potentially other similar artists albums.

By understanding the query syntax (and in this case query classes) you can construct different page types that address the right hierarchy of intent.

Target the keyword, optimize the intent.

TL;DR

Aggregating intent and understanding how to decode, identify and present active and passive intent from query syntax is vital to success in search and beyond.

We Want To Believe

January 20 2015 // Marketing + SEO + Social Media // 5 Comments

Fake news and images are flourishing and even Snopes can’t hold back the tide of belief. Marketers should be taking notes, not to create their own fake campaigns but to understand the psychology that makes it possible and connect that to digital marketing trends.

We Want To Believe

I Want To Believe Poster

Agent Mulder, of the X-Files, was famous for his desire to believe in aliens and all sorts of other phenomena. The truth is, we all want to believe. Maybe not in aliens but a host of other things. It’s not that we’re gullible, per se, but that we are inherently biased and seek what feels like the truth.

One of the pieces that fooled some of my colleagues was the story about a busy restaurant who’d commissioned research on why service ratings had declined over time.

Restaurant Service Cellphones Fake Research

This was a post on Craigslist in the rants & raves section. Think about that for a moment. This is not a bastion of authenticity. But the post detailed patrons’ obsession with their phones and the inordinate amount of time they took texting and taking pictures of their food.

This self-absorbed, technology-obsessed customer was the real problem. Many reported this ‘research’ as fact because the findings were ones that people wanted to believe. Too many of us have witnessed something similar. We have experience that creates a bias to believe.

We wanted the story to be true because it felt right and matched our preconceptions and beliefs.

The Subversive Jimmy Kimmel

While Jimmy Fallon may be the more affable of late night hosts, Jimmy Kimmel has been doing what I think is some ground-breaking work. His most popular pranks have exposed our desire to believe.

Twerking was jumping the shark and this video tapped into our collective eye-roll of the practice. But less than a week later Kimmel revealed that it was all a hoax.

He didn’t stop there though. The next time he enlisted Olympian Kate Hansen to post a video that purportedly showed a wolf wandering the halls at the Sochi Olympics.

Once again, Kimmel revealed that while it was a wolf, it wasn’t anywhere near Russia. I’m not sure people give Kimmel enough credit. He made people believe there were wolves roaming the halls at the Olympics!

Now, why did we believe? We believed because the narrative was already set. Journalists were complaining about the conditions at Sochi. So when the wolf video appeared and it was endorsed by an Olympic athlete no less, well, we fell for it. It matched our expectations.

It’s not about the truth, it’s about it making sense.

Experience, Belief and Marketing

Adventure Time Demon Cat

So how does our desire to believe connect to marketing? Marketers should be figuring out how to create a narrative and set expectations.

Content marketing is popular right now because it provides us the opportunity to shape expectations.

I’ve written quite a bit about how to maximize attention. If you only remember one thing from that piece it’s that we’re constantly rewriting our memory.

Every interaction we have with a site or brand will cause us to edit that entry in our head, if even just a little. Each time this happens marketers have an opportunity to change the narrative and reset expectations.

For a restaurant this means that a bad meal, even after having numerous good ones in the past, can have a serious impact on that patron’s perception and propensity to return. I used to love eating at Havana, a nearby Cuban restaurant. My wife and I had many great meals (and Mojitos) there. But about a year ago we had a sub-par dinner.

Because we’d had so many good meals before we wrote it off as an aberration. This is an important thing to understand. Because what it really means is that we felt like our experience didn’t match our expectation. But instead of changing our expectation we threw away that experience. You should get a raise if you’re able to pull this off as a marketer.

We returned a few months later and it was another clunker. This time we came to the conclusion that the food quality had simply taken a nose dive. We haven’t been back since. Our perception and expectation changed in the span of two bad experiences.

Content, in any form, follows the same rules. Consistently delivering content that reinforces or rewrites a positive brand expectation is vital to success.

Know The Landscape

Beached Whale Revealed In Painting

Our experiences create context and a marketer needs to understand that context, the landscape, before constructing a content strategy. Because it’s not about the truth. It’s about what people are willing to believe.

All too often I find companies who struggle with this concept. They have the best product or service out there but people are beating a path to their competitor(s) instead. It’s incomprehensible. They’re indignant. Their response is usually to double-down on the ‘but we’re the best’ meme.

Nearly ten years ago I was working at Alibris, a used, rare and out-of-print book site. Within the bookselling community the Alibris name was mud. The reason could be traced back to when Alibris entered the market. The Alibris CEO was blunt, telling booksellers that they would be out of business if they didn’t jump on the band wagon.

He was right. But the way the message was delivered, among other things, led to a general negative perception of the brand among booksellers, a notoriously independent bunch. #middlefingersraised

How could I change this negative brand equity? Did I just tell sellers that we were awesome? No. Instead I figured out the landscape and used content and influencer marketing to slowly change the perception of the brand.

Our largest competitor was Abebooks. So I signed up as a seller there, which also gave me access to their community forum. It was here that I carefully read seller complaints about the industry and about Abebooks itself. What I came to realize was that many of their complaints were actually areas where Alibris excelled. Sellers just weren’t willing to see it because of their perception (or expectation) of the brand.

So every month in our seller newsletter I would talk about an Alibris feature that I knew would hit a nerve. I knew that it was a pain point for the industry or an Abebooks pet peeve. Inevitably, these newsletter items were talked about in the forums. At first the response went a little like this. “Alibris is still evil, but at least they’re doing something about this one thing.”

At the same time I identified vocal detractors of our brand and called them on the phone. I wanted them to vent and asked them what it would take for them to give Alibris a try. My secret goal was to change their perception of the brand, to humanize it, and neutralize their contribution to the negative narrative in the community.

It didn’t happen overnight but over the course of a year the narrative did change. Booksellers saw us as a brand trying to do right by them, perhaps ‘seeing the error of our ways’ and forging a new path. They gave us the benefit of the doubt. They grudgingly told stories about how sales on Alibris were similar to those on Abebooks.

I’d changed the narrative about the brand.

I didn’t do this through cheerleading. Instead, I led the community to content that slowly rewrote their expectations of Alibris. I never told them Alibris was better, I simply presented content that made them re-evaluate their perception of ‘Abebooks vs. Alibris’.

Influencer Marketing

Why do some of these fake stories take hold so quickly? The Sochi wolf had a respected Olympic athlete in on the gag. She was a trusted person, an influencer, with no real reason to lie.

Fake NASA Weightless Tweet

People wouldn’t have believed this false weightless claim if it hadn’t been delivered as a (spoofed) Tweet from NASA’s official Twitter account. Our eyes told us that someone in authority, the ultimate authority in this case, said it was true. That and we wanted to believe. Maybe this time in something amazing. Not aliens exactly but close.

So when we talk about influencer marketing we’re talking about enlisting others who can reinforce the narrative of your brand. These people can act as a cementing agent. It’s not so much about their reach (though that’s always nice) but the fact that it suddenly makes sense for us to believe because someone else, someone we trust or respect, agrees.

At that point we’re more willing to become evangelizers of the brand. That’s the true value of influencer marketing. People will actively start passing along that positive narrative to their friends, family and colleagues. If you’re familiar with the Net Promoter concept you can think of influencer marketing as a way to get people from passives (7-8) to promoters (9-10).

Influencer marketing converts customers into evangelizers who actively spread your brand narrative.

Justin Timberlake Is A Nice Guy?

Dick In a Box Sceenshot

Take my opinion (and probably yours) of Justin Timberlake. He seems like a really nice guy, right? But I don’t know Justin. I’ve never met him and odds are neither have you. For all we know, he could be a raging asshole. But I think he isn’t because of a constant drip of content that has shaped my opinion of him.

He’s the guy who is willing to do crazy stuff and poke fun at himself on SNL. He goes on prom dates. He’s the sensitive guy who encourages a musician in a MasterCard commercial. He celebrates at Taco Bell. I don’t even like his music but I like him.

The next thing I want to say is that it probably helps that he really is a nice guy. But I honestly don’t know that! I want to believe that but I’m also sure he has a very savvy PR team.

Uber Is Evil?

Skepticism Intensifies

Uber is a great example of when you lose control of the narrative. A darling of the ‘sharing economy’ Uber might torpedo that movement because they’re suddenly seen as an uber-villain. (Sorry, I couldn’t help it.)

Once again, it’s about consistency. It’s about rewriting that perception. So taking a brand down doesn’t happen, generally, with just one gaff. You have to step in it over and over again.

Uber’s done that. From placing fake orders or other dirty tricks against competitors, to threatening journalists, to violating user privacy, to surge pricing, to sexual assault to verbal abuse of a cancer patient.

Suddenly, every Uber story fits a new narrative and expectation. Uber is evil. Is that the truth? Not really. Is it what we want to believe? Yup.

Uber screwed up numerous times but their negative brand equity is partly due to the landscape. There are enough people (me included) who aren’t keen on the sharing economy that took Uber’s missteps as an opportunity to float an alternate narrative, attacking the sharing economy by proxy.

Either way, it became increasingly easy to get stories published that met this new expectation and increasingly difficult for positive experiences to see the light of day. This is explained incredibly well in a case study on Internet celebrity brought to my attention by Rand Fishkin.

The video is 19 minutes long, which is usually an eternity in my book. But this video is worth it. Every marketer should watch it all the way through.

A Content Marketing Framework

I realize that I use a number of terms almost interchangeably throughout this piece. In truth, there are wrinkles and nuance to these ideas. If they weren’t confusing then everyone would be a marketing god. But I want to provide a strawman framework for you to remember and try out.

Why Content Marketing Works

Our experience with content creates context or bias that changes our belief or perception of that brand resulting in a new expectation when we encounter the brand again.

At any point in this journey a person can be exposed to a competitor’s content which can change context and bias. In addition, influencer marketing and social proof can help reinforce context and cement belief.

I’d love to hear your feedback on this framework and whether it helps you to better focus your marketing efforts.

TL;DR

The lesson marketers should be taking from the proliferation of fake news and images isn’t to create our own fake stories or products. Instead we should be deciphering why people believe and use that knowledge to construct more effective digital marketing campaigns.

Google Autocomplete Query Personalization

January 14 2015 // SEO // 22 Comments

The other day a friend emailed me asking if I’d ever seen behavior where Google’s autocomplete suggestions would change based on a prior query.

Lucifer from Battlestar Galatica

I’ve seen search results change based on prior queries but I couldn’t recall the autocomplete suggestions changing in the way he detailed. So I decided to poke around and see what was going on. Here’s what I found.

Query Dependent Autocomplete Example

Here’s the example I was sent. The individual was cleaning up an old computer and didn’t quite know the purpose of a specific program named ‘WineBottler’.

Search Result for WineBottler

Quickly understanding that he didn’t need this program anymore he began to search for ‘uninstall winebottler’ but found that Google’s autocomplete had beat him to it.

There it was already listed as an autocomplete suggestion. This is very different from doing the uninstall query on a fresh session.

Normal Autocomplete Suggestions

I was intrigued. So I started to try other programs in hopes of replicating the query dependent functionality displayed. I tried ‘SnagIt’ and ‘Photoshop’ but each time I did I got the same generic autocomplete suggestions.

Query Class Volume

Coincidentally I was also chatting with Barbara Starr about an old research paper (pdf) that Bill Slawski had brought to my attention. The subject of the paper was in identifying what I call query classes, or a template of sorts, which is expressed as a root term plus a modifier. Easy examples might be ‘[song] lyrics’ or ‘[restaurant] menu’.

So what does this have to do with autocomplete suggestions? Well, my instinct told me that there might be a query class of ‘uninstall [program]’. I clicked over to Ubersuggest to see if I just hadn’t hit on the popular ones but the service was down. Instead I landed on SERPs Suggest which was handy since it also brought in query volume for those autocomplete suggestions.

I searched for ‘uninstall’ and scrolled to where the results were making the most sense to me.

SERPs Suggests Keyword Tool

Quite obviously there is a query class around ‘uninstall [program]’. Now it was time to see if those with high volume (aka intent) would trigger the query class based autocomplete suggestions.

Query Class Based Autocomplete Suggestions

The scourge of the pop-under world, MacKeeper, jumped out at me so I gave that one a try.

MacKeeper Search Result

Google Autocomplete for Uninstall after MacKeeper query

Sure enough the first autocomplete suggestion is uninstall mackeeper. It’s also interesting to note the prior query is kept in reference in the URL. This isn’t new. It’s been like that for quite some time but it makes this type of scenario far easier to explain.

At random I tried another one from my target list.

Parallels Search Results

Uninstall Autocomplete after Parallels Query

Yup. Same thing.

Classes or Attributes?

It got me thinking though whether it was about query classes or just attributes of an entity. So I poked around a bit more and was able to find examples in the health field. (Sorry to be a debbie downer.) Here’s a search for lymphoma.

Lymphoma Search Results

Followed by a search for treatment.

Autocomplete for Treatment after Lymphoma Query

This differs from a clean search for ‘treat’.

Treat Autocomplete Suggestions

Treatment is an attribute of the entity Lymphoma. Then again ‘treatment of [ailment]’ is also a fairly well-defined query class. So perhaps I’m splitting hairs in trying to pry apart classes from attributes.

It Doesn’t Always Work

I figured I could find more of these quickly and selected a field that I thought had many query classes: music. Search for a band, then search for something like ‘tour dates’ or ‘tickets’ and see if I could get the query dependent autocomplete suggestions to fire.

I tried Kasabian.

Kasabian Search Results

And then tour dates.

Tour Dates Autocomplete Suggestions

Nothing about Kasabian at all. Just generic tour dates autocomplete suggestions. I tried this for many other artists including the ubiquitous Taylor Swift and got the same results, or lack thereof.

I had a few theories of why music might be exempted but it would all just be conjecture. But it did put a bit of a dent into my next leap in logic, which would have been to conversational search.

Not Conversational Search

One of the bigger components of Hummingbird was the ability to perform conversational search that, often, wouldn’t require the user to reference the specific noun again. The classic example being ‘How tall is the Eiffel Tower?’ ‘Who built it?’

Now in the scheme of things conversational search is, in part, built upon identifying query classes and how people string them together in a query session. So it wouldn’t be a shock if this started showing up in Google’s autocomplete suggestions. Yet that’s not what appears to be happening.

Because you can do a voice search using Google Now for ‘Kasabian’ and then follow up with ‘tickets for them’ and get a very different and relevant set of results. They figure out the pronoun reference and substitute appropriately to generate the right query: ‘Kasabian Tickets’.

What Does Google Say?

Of course it pays to see what Google says about their Autocomplete ~~suggestions~~ predictions.

About Google Autocomplete Predictions

I find it interesting that they call them predictions and not suggestions. It’s far more scientific. More Googly. But I’m not changing my references throughout this piece!

But here we can see a probable mash-up of “search activity of users” (aka query classes) and “relevant searches you’ve done in the past” (aka query history). Previously, the query history portion was more about ensuring that my autocomplete for ‘smx’ might start with ‘smx east’.

Personalized Autocomplete

While the autocomplete for someone unaffiliated with search wouldn’t get that suggestion.

Nonpersonalized Autocomplete

So I’m left to think that this new session based autocomplete personalization is relatively new but may have been going on for quite some time without many people noticing.

There’s a lot more research that could be done here so please let me know if and when you’ve noticed this feature as well as any other examples you might have of this behavior.

For Google the reason for doing this is easy. It’s just one more way that they can reduce the time to long click.

TL;DR

Google is personalizing autocomplete suggestions based on a prior query when it matches a defined query class or entity attribute.

Image Blind

December 16 2014 // Analytics + SEO // 16 Comments

Images are an increasingly important part of the Internet landscape. Yet marketers are provided very little in the way of reliable metrics to allow us to understand their power and optimize accordingly. This is doubly strange given the huge amount of research going on regarding images within search engine giants such as Google.

Image Tracking In Google Analytics

There is none. Or at least there is no image search tracking in Google Analytics unless you create filters based on referrers. I wrote about how to track image search in Google Analytics in March of 2013 and updated that post in April of 2014.

The problem with this method is that it is decreasing in usefulness. I still use it and recommend it because some visibility is better than none. But when Chrome removed the referrer completely from these clicks earlier this year it really hurt the accuracy of the filter.

Who cares you might be asking. I care because image search intent and the resulting user behavior is often wildly different than web search.

Google Image Search Traffic Behavior

The users coming to the site above via web search have vastly different behavior metrics than those coming from image search. I’ve highlighted the dramatic pages per visit and time on site metrics. Shouldn’t we be building user stories and personas round this type of user?

For a while I explained away the reasons for not providing image search tracking in Google Analytics under the umbrella of privacy. I understand that Google was pretty much forced to move to ‘not provided’ because of lawsuits, Gaos v. Google Inc. in particular. I get it.

But I’m with Chris Messina. Privacy shouldn’t be a four letter word. And the one company who has the best chance of changing the conversation about it is Google. But let’s not go down the privacy rabbit hole. Because we don’t have to.

Right now Google Analytics provides other data on how people search. They break things down by mobile or tablet. We can even get down to the device level.

Google Analytics by Device

Are we really saying that knowing the user came in via image search is more identifiable than what device they were using? They simply explain different meta data on how a user searched.

Furthermore, on both web and image search I can still drill down and see what page they landed on. In both instances I can make some inferences on what term was used to get them to that page.

There is no inherent additional data being revealed by providing image search as a source.

Image Clicks in Google Webmaster Tools

I wouldn’t be as frothed up about this if it was just Google Analytics. Because I actually like Google Analytics a lot and like the people behind it even more.

But then we’ve got to deal with Google Webmaster Tools data on top and that’s an even bigger mess. First let’s talk about the dark pattern where when you look at your search queries data it automatically applies the Web filter. #notcool

Default Web Filter for Search Queries in GWT

I’m sure there’s an argument that it’s prominent enough and might even draw the user’s attention. I could be persuaded. But defaults are dangerous. I’d hazard there are plenty of folks who don’t even know that you can see this data with other filters.

And a funny thing happens with sites that have a lot of images (think eCommerce) when you look at this data. It doesn’t make an ounce of sense.

What happens if I take a month’s worth of image filtered data and a month’s worth of web filtered data and then compare that to the actual data reported in Google Analytics?

Here’s the web filtered data which is actually from November 16 to December 14. It shows 369,661 Clicks.

GWT Web Filter Example

Now here the image filtered data from the same time frame. It shows 965,455 Clicks.

GWT Image Filter Traffic Graph

Now here’s what Google Analytics reports for the same timeframe.

Google Analytics Traffic Comparison

For those of you slow on the uptake, the image click data from Google Webmaster Tools is more than the entire organic search reported! Not just Google but organic search in total. Put web and image together and we’re looking at 1.3 million according to Google Webmaster Tools.

I’m not even going to get into the ratio of image clicks versus web clicks and how they don’t have any connection to reality when looking at the ratio in Google Analytics. Even taking the inaccuracy of the Google Analytics filters into account it points to one very clear truth.

The image click data in Google Webmaster Tools is wonky.

So that begs the question. What exactly is an image click? It doesn’t seem to be limited to clicks from image search to that domain. So what does it include?

This blog is currently number three for the term ‘cosmic cat’ in image search (#proud) so I’ll use that as an example.

What Is an Image Click?

Do image clicks include clicks directly to the image, which are generally not on that domain and not counted in most traffic packages including Google Analytics? Maybe. But that would mean a lot of people were clicking on a fairly small button. Not impossible but I’d put it in the improbable bucket.

Or do image clicks include any time a user clicks to expand that image result? This makes more sense given what I’m seeing.

But that’s lunacy. That’s comparing apples to oranges. How does that help a marketer? How can we trust the data in Google Webmaster Tools when we encounter such inconsistencies.

Every webmaster should be inquiring about the definition of an image click.

The definition (of sorts) provided by Google in their support documentation doesn’t help.

GWT Search Queries FAQ

The first line is incorrect and reflects that this document hasn’t been updated for some time. (You know, I hear care and attention to detail might be a quality signal these days.) There’s a line under devices that might explain the image click bloat but it’s not contained in that section and instead is attributed to devices.

Long story short, the documentation Google Webmaster Tools provides on this point isn’t helpful. (As an aside, I’d be very interested in hearing from others who have made the comparison of image filter and web filter clicks to Google Analytics traffic.)

Images During HTTPS Conversion

These problems came to a head during a recent HTTP to HTTPS conversion. Soon after the conversion the client involved saw a decent decline in search traffic. Alarm bells went off and we all scrambled to figure out what was going on.

This particular client has a material amount of images so I took the chart data from both HTTP and HTTPS for web and image clicks and graphed them together.

Exasperated Picard

In doing so the culprit in the decline post conversion was clearly image traffic! Now, some of you might be thinking that this shows how the Google Webmaster Tools data is just fine. You’re be wrong! The data there is still incorrect. It’s just wrong consistently enough for me to track fluctuations. I’m glad I can do it but relying on consistently bad data isn’t something I’m cheering about.

The conclusion here seems to be that it takes a long time to identify HTTPS images and match them to their new HTTPS pages. We’re seeing traffic starting to return but it’s slower than anyone would like. If Google wants sites to convert to HTTPS (which they do) then fixing this image search bottleneck should be a priority.

Image Blind?

I'm Mad as Hell And ...

The real problem here is that I was blindsided due to my lack of visibility into image search. Figuring out what was going on took a fair amount of man hours because the metrics that would have told us what was going on weren’t readily available.

Yet in another part of the Googleplex they’re spending crazy amounts of time on image research.

Google Image Advancements

I mean, holy smokes Batman, that’s some seriously cool work going on. But then I can’t tell image search traffic from web search traffic in Google Analytics and the Google Webmaster Tools data often shows more ‘image clicks’ to a site than total organic traffic to the site in the same time period. #wtf

Even as Google is appropriately moving towards the viewable impressions metric for advertisers (pdf), we marketers can’t make heads or tails of images, one of the most important elements on the web. This needs to change.

Marketers need data that they can both rely on and trust in to make fact based decisions.

TL;DR

Great research is being done by Google on images but they are failing marketers when it comes to image search metrics. The complete lack of visibility in Google Analytics coupled with ill defined image click data in Google Webmaster Tools leaves marketers in the dark for an increasingly important type of Internet content.

Blind Five Year Old

You Are Browsing The SEO Category