You Are Browsing The SEO Category

Tracking Hidden Long-Tail Search Traffic

January 25 2018 // Analytics + SEO // 9 Comments

A lot of my work is on large consumer facing sites. As such, they get a tremendous amount of long-tail traffic. That’s right, long-tail search isn’t dead. But you might think so when you look at Google Search Console.

Hidden Search Traffic

I’ve found there’s more data in Google Search Console than you might believe. Here’s what I’m doing to track hidden long-tail search traffic.

Traffic Hazards

The first step in understanding how to track long-tail search is to make sure you’re not making mistakes in interpreting Google Search Console data.

Last year I wrote about the dangers of using the position metric. You can only use it reliably when looking at it on the query level and not the page level.

Today, I’m going the other direction. I’m looking at traffic by page but will be doing so to uncover a new type of metric – hidden traffic.

Page Level Traffic

The traffic for a single page in Google Search Console is comprehensive. That’s all the traffic to a specific page in that time frame.

Page Level Metrics from Google Search Console

But a funny thing happens when you look at the query level data below this page level data.

Query Level Data for a Page in Google Search Console

The numbers by query do not add up to the page level total. I know the first reaction many have is to curse Google and write off the data as being bad. But that would actually be a bad idea.

The difference between these two numbers are the queries that Google is suppressing because they are either too small and/or personally identifiable. The difference between the page total and visible total is your hidden long-tail traffic.

Calculating Hidden Traffic

Finding the amount of hidden long-tail traffic turns out to be relatively easy. First, download the query level data for that page. You’ll need to make sure that you don’t have more than 1,000 rows or else you won’t be able to properly count the visible portion of your traffic.

Once downloaded you calculate the visible total for those queries.

Visible Total for Page Level Queries

So you’ll have a sum of clicks, sum of impressions, a calculated clickthrough rate and then calculate a weighted average for position. The latter is what seems to trip a lot of folks up so here’s that calculation in detail.


What this means is you’re getting the sum product of impressions and rank and then dividing that by the sum of impressions.

Next you manually put in the page total data we’ve been provided. Remember, we know this represents all of the data.

The clicks are easy. The impressions are rounded in the new Search Console. I don’t like that and I hope it changes. For now you could revert to the old version of search console if you’re only looking at data in the last 90 days.

(Important! The current last 7 days option in Search Console Beta is actually representative of only 6 days of data. WTF!)

From there I calculate and validate the CTR. Last is the average position.

To find the hidden long-tail traffic all you have to do is subtract the visible total from the page total. You only do that for clicks and impressions. Do not do that for CTR folks. You do the CTR calculation on the click and impression numbers.

Finally, you calculate the weighted position for the hidden traffic. The latter is just a bit of algebra at the end of the day. Here’s the equation.


What this is doing is taking the page total impressions * page total rank – visible page total impressions * visible page total rank and dividing that by the hidden page total impressions to arrive at the hidden page total rank.

The last thing I’ve done here is determine the percentage of clicks and impressions that are hidden for this page.

Hidden Traffic Total for Page Level Traffic

In this instance you can see that 26% of the traffic is hidden and … it doesn’t perform particularly well.

Using The Hidden Traffic Metric

This data alone is interesting and may lead you to investigate whether you can increase your long-tail traffic in raw numbers and as a percentage of total traffic. It can be good to know what pages are reliant on the more narrow visible queries and what pages draw from a larger number of hidden queries.

In fact, when we had full keyword visibility there was a very predictable metric around number of keywords per page that mapped to increases in authority. It still happens today, we just can’t easily see when it happens.

But one of the more interesting applications is in monitoring these percentages over time.

Comparing Visible and Hidden Traffic Over Time

What happens to these metrics when a page loses traffic. I took two time periods (of equal length) and then determined the percentage loss for visible, total and hidden.

In this instance the loss was almost exclusively in visible traffic. The aggregate position number (dangerous to rely on for specificity but good for finding the scent of a problem) leads me to believe it’s a ranking problem for visible keywords. So my job is to look at specific keywords to find which ones dropped in rank.

What really got me curious was when the opposite happens.

Hidden Traffic Loss

Here the page suffered a 29% traffic loss but nearly all of it was in hidden traffic. My job at that point is to figure out what type of long-tail queries suddenly evaporated. This isn’t particularly easy but there are clues in the visible traffic.

When I figured it out things got very interesting. I spent the better part of the last three months doing additional analysis along with a lot of technical reading.

I’ll cover the implications of changes to hidden traffic in my next post.

Caveats and Traps

Slow Your Roll

This type of analysis is not particularly easy and it does come with a fair number of caveats and traps. The first is the assumption that the page level data we get from Google Search Console is accurate and comprehensive. I’ve been told it is and it seems to line up to Google Analytics data. #ymmv

The second is that the data provided at the query level is consistent. In fact, we know it isn’t since Google made an update to the data collection and presentation in July of 2017.

Google Search Analytics Data Changes

Mind you, there were some other things that happened during that time and if you were doing this type of analysis then (which is when I started in earnest) you learned quite a bit.

You also must select a time period for that page that doesn’t have more than 1,000 visible queries. Without knowing the total visible query total you can’t calculate your hidden total. Finding the right timeframe can sometimes be difficult when looking at high volume pages.

One of the traps you might fall into is assuming that the queries in each bucket remain stable. That’s not always the case. Sometimes the composition of visible queries changes. And it’s hard to know whether hidden queries were promoted to visible or vice versa.

There are ways to control for some of this in terms of the total number of visible terms along with looking at not just the raw change in these cohorts but the percentage changes. But it can get messy sometimes.

In those situations it’s down to interpretation. Use that brain of yours to figure out what’s going on.

Next Steps and Requests

Shia Labeouf Just Do It

I’ve been playing with this metric for a while now but I have yet to automate the process. Adjacent to automation is the 1,000 visible query limit, which can be eliminated by using the API or tools like Supermetrics and/or Data Studio.

While performing this analysis on a larger set of pages would be interesting, I’ve found enough through this manual approach to keep me busy. I’m hopeful that someone will be excited to do the work to automate these calculations now that we have access to a larger data set in Google Search Console.

Of course, none of that would be necessary if Google simply provided this data. I’m not talking about the specific hidden queries. We know we’re never getting that.

Just give us a simple row at the end of the visible query rows that provides the hidden traffic aggregate metrics. An extra bonus would be to tell us the number of keywords that compose that hidden traffic.

After publishing this, John Mueller reminded me that this type of presentation is already integrated into Google Analytics if you have the Search Console integration.

The presentation does most of what is on my wishlist.

Other term in Google Analytics Search Console Integration

Pretty cool right? But it would be nice if (other) instead said (167 other search queries). The real problem with this is the data. It’s not comprehensive. Here’s the downloaded data for the page above including the (other) row.

GA Search Console Data Incomplete

It’s an interesting sub-set of the hidden queries but it’s incomplete. So fix the data discrepancy or port the presentation over into search console and we’re good. :-)


You can track hidden long-tail search traffic using Google Search Console data with some straight-forward math. Understanding and monitoring hidden traffic can help diagnose ranking issues and other algorithmic shifts.

What I Learned in 2017

January 18 2018 // Career + Life + SEO // 36 Comments

(This is a personal post so if that isn’t your thing then you should move on.)

2017 was a lot like 2016, but on steroids. That meant a 40% increase in the business, which unfortunately came with a lot more stress and angst. I did figure some things out and managed to make some decisions that I plan to put into practice in 2018.

Nothing Succeeds Like Success 

How Did I Get Here?

Last year I was finally comfortable calling Blind Five Year Old a success. I’d made it. But that came with a lot of strange baggage that I wasn’t entirely sure how to handle.

It was uncomfortable to write about how success can be difficult when you know that others are struggling. But I can only write about my own experience and acknowledge that some would take my words the wrong way.

Trust me, I understand that these are good problems. But they are problems nonetheless. In 2017 those problems grew. The very healthy income I had maintained for the past four years rose by 40%.

I stared at the run rate throughout the year kind of dumbfounded. For real? That much! It’s not that I lacked confidence and didn’t think I’d make it. The number was just beyond what I expected.

Money and Happiness

ABC 12 inch Art

Money is a strange beast. One of my favorite pieces last year was When life changing money, isn’t by Wil Reynolds. He captured a great deal of what I’ve struggled with over the past few years.

I’m at a place where bills aren’t a problem and I can essentially do what I want to do. My daughter needs a new tennis racquet, I buy one. Should we go out for dinner tonight? Why not. Want to vacation on the beach in Maui? Book it!

The ability to do these things makes me very different from a majority of people and that scares me.

The thing is, I don’t need a whole lot more. I’m not looking to get a better house or a better car. I don’t have a need to buy crazy expensive clothing. Hell, I spend most of my days in sweats behind this computer.

More money isn’t inherently bad. I mean, I do live in one of the most expensive areas in the country and I am all about putting more towards retirement and college. But both of those are now on track so the extra money doesn’t actually do that much more.

More money hasn’t made me happier.

Time and Stress

The additional work created a lot more pressure. There’s less time and more expectations. That combination doesn’t translate into more happiness. Not at all.
Not Enough Time In The Day

It might if I just wanted to coast on reputation and churn out whatever the minimum amount that was required to keep the money rolling in. But I’m not wired like that.

I’m not looking to mollify and appease, I’m looking to transform and build. Each client is different and requires research and due diligence to determine how to best tackle their search and business issues.

I feel the obligation of being a good partner and in delivering results. I don’t like cashing checks when a client’s business isn’t moving in the right direction.


Cool Hand Luke Failure To Communicate

I find it hard to respond quickly to something I believe requires greater thought. That means I’m slow and frequently don’t communicate well. I’ve come to the conclusion that this is a feature and not a bug.

Can I get better at telling people when I’m taking more time than they want? Yes. But I know it won’t go away completely. I’ll often slip into a cycle of not responding and then putting off responding until I have something more material and when I don’t the guilt increases and the response then must be that much better so I delay … again.

I do this less now than I used to. But I know it’ll still happen from time to time and I’m tired of feeling bad about that. Some clients just aren’t a match for my work style. And that’s okay.

Referrals and Relief

Bruce Sutter and Rollie Fingers Baseball Card

Much of what I describe above is why I continue to receive referrals. Good work gets noticed and in an industry rife with pretenders people happily promote those who truly get the work done.

I love referrals. But they also come loaded with additional stress. Because you don’t want to let the person referring you down. It’s not lost on me that they have enough confidence in me to trust them with one of their own connections.

What I’ve found in the last year is that more of these people understand the bind I’m in. I have only so much time and I’m not always the right person for a business. I specialize in large scale B2C sites like Pinterest and Genius. It’s not that I can’t do B2B. I just don’t enjoy it as much.

So they tell me up front that it might not be a match or they might even ask if further referrals are helping me or not. I tell you, it’s an incredible relief when referrals are put in this context.

I usually still take those calls though. I learned that just having a conversation with a referred lead is valuable. I don’t have to be the solution. I can help determine what they really need and can sometimes connect them with colleagues who I trust will do a good job on their behalf.

I become a link on a chain of expertise and trust. This is a highly valuable and scarce commodity.

Expert or Prima Donna

Separated M&Ms

The crux for me was in understanding my value. Not only understanding it but believing in it. Do I deserve that lawyer-like hourly rate? I don’t do a lot of hourly work now but I find it a good way to help more folks without the overhead of stress.

Lawyers have a defined set of expertise that many others don’t. Hopefully they also have a track record of success. So how does that compare to my business? The law is relatively stable and transparent. But search is the opposite. It changes and it is not transparent in the slightest.

Of course two lawyers can interpret the law differently, just as two SEOs can interpret search differently. But more so today than ever, the lack of information in our industry – or pure disinformation – puts a premium on connecting with true experts.

It’s not just finding someone who can help you figure out your search issues. It’s preventing them from following bad advice and throwing good money after bad.

My default is to say that I’m lucky to be in a position where I have more business than I can handle. But it’s not really luck. I put in the time and I get the results. I work hard and am constantly looking to keep my edge. What is it that I’m not seeing?

I use this as context to explain why I’m not willing to relinquish my work style. And I’m trying to recognize that it doesn’t make me a prima donna. It simply acknowledges that I’m an expert in my field and that I want to be happy.

It’s uncomfortable to charge a high rate and dictate specific terms of engagement. It’s like the Van Halen rider where they demanded M&Ms but no brown ones. I guess you can do worse than being the search equivalent of David Lee Roth. Particularly if you know the history around that famous rider.

Letting Angst Go

Let It Go

2017 was about embracing my value and believing in my expertise. It was about letting my own misgivings and angst go so that I can do the work I enjoy and be happy doing it.

Perhaps this sounds easy to some. But it hasn’t come easily for me. While I don’t gain validation from others, I don’t want to be one of those people who are out of touch and difficult to work with.

I absolutely dropped the ball on some leads and some clients in 2017. Never to the point where it hurt their business. But people were annoyed. I am truly sorry for that but … I no longer feel (overly) guilty about it.

I wanted to do the best work. I took on too much. I tried my best. I’ll wake up and try my best tomorrow.

I’ve learned to say no more often and not feel guilty about it or feel like it’s a missed opportunity. I’m not looking to build an agency and scale up. I’m a high-touch consultant with limited time constraints.

Raising Rates and Changing Retainers

Based on this I raised my rates. It’s the second time I’ve done that in the last three years. And I did it because one of my clients told me I should. It’s nice when clients are looking out for you as much as you are for them.

I also decided to remove the hourly maximum in my retainer agreements. In the past, I had a clause that essentially ensured that a client wouldn’t monopolize my time under a retainer agreement. I built in a hourly maximum just in case.

The problem was that by having that hourly maximum they were always thinking of the retainer in terms of the number of hours worked. That wasn’t what I was about. It isn’t about time. It’s about expertise and results.

This video on How To Price Design Services spoke to me so clearly.

I didn’t watch the entire video. I mean, who has 36 minutes! But that one segment was enough for me to know that it wasn’t the hours people should be paying for but the expertise.

This made a huge difference because I no longer have dreary conversations about whether I dedicated enough hours to support the retainer. I hate those conversations. They make me angry. So now I don’t have them.

Advisor Gigs


I also sought out more advisor positions in 2017. I didn’t quite nail down how to best structure these engagements. And I did a lousy job of juggling those relationships versus my traditional relationships.

But that’s how you figure this stuff out. You stub your toe and move on trying not to make those same mistakes again. 2018 already looks good on this front with a number of interesting relationships where I can leverage my expertise in search and marketing.

I built most of my long term client relationships on trust and adding business value beyond traditional search. And while I may take advising positions based on my primary expertise I’m looking for those that value my larger knowledge set and insight from scores of clients over the past ten years.

I’ve learned quite a bit about what makes one start-up succeed where others fail.

Continuous Education

Change is always a constant in search. And I’d say that the rate of change is increasing. I’m lucky to work with some incredible technical teams. So when they say something I don’t quite understand I don’t just nod along.

I ask them to explain it. I tell people when I don’t know something. I’ll tell people I know enough to know something is off but not enough to tell them exactly what’s wrong. This is how you build expertise and gain trust.

And in 2018 I’ve asked a few developers I trust to take an afternoon to talk to me like a five year old about JavaScript frameworks and how they deliver content to the page. Now, I understand the topic. But I want to learn more.

One of my assets has been to have enough technical knowledge to know when someone is blowing smoke up my nether regions. A lot of what I ask people to do (instrumentation) is boring. As such, many developers inflate the complexity of those tasks. Asking a few pointed questions quickly reduces that inflation and gets the work done.

I don’t feel like I have that level of confidence on JavaScript frameworks. I can tell half of the developers I work with have a similar level of knowledge to my own. And when a developer admits as much we can easily collaborate, debate difficult questions and figure things out. But many developers aren’t going to admit to ‘good enough’ knowledge.

Learning more is always a priority.


Ain't Nobody Got Time For That

On the other hand, I can’t do everything. I sometimes want to but there’s simply not enough time in the day. This blog needs a makeover and I’ll have to get someone else to do it. I have to let my tinkering ways go so I can grow and focus on other projects.

And there are other projects in the works. In the past I’ve had ideas, purchased domains and thought about building one thing or another. Great ideas! But they never went anywhere. A constant flow of renewing domain email notices remind me of the missed opportunities.

The biggest obstacle in those projects was … me. I wanted to do it all. I wanted to build the actual site, which might require learning a new programming and database language. And then I’d need to actually write all the content and then do all the marketing and promotion.

Ain’t nobody got time for that.

Well, maybe some people do but I’m not one of them. Even though I could, and part of me thinks it would be fun if I did, I shouldn’t spend my time that way. So I’m working with folks to spin up two sites and one potential tool.

Risk and Danger

Old School Risk Board and Pieces

I expect that it will be difficult for me to let go of some details. I’m guessing the projects will be messy, confusing, aggravating and hopefully rewarding in one way or the other. But honestly, there are specific lyrics from Contrails by Astronautalis that remain my guiding star.

The real risk is not a slipped grip at the edge of the peak
The real danger is just to linger at the base of the thing

Every time I take a risk I am happy I did so. I can’t tell you that it always worked out. But in some ways … it did, with enough time and perspective.

In each failure, I can pick out how that helped get me to where I am today. I’m not saying things couldn’t have been easier. They could have. I just decide to find the positive out of those situations.

That’s not some saccharine ‘everything happens for a reason’ tripe. Screw that. I can just tell a story where the ending is … happy. I have cancer but it’s one that’s easily treatable. That’s a win in my book.

Telling myself those stories and deciding that I’d rather dwell on what turned out right instead of wrong helps me take the next risk. It’s my job to listen to that restless itch and move my story forward knowing I may need to do some editing in post production.


Observation Deck Viewer

There were a lot of industry changes last year that had a meaningful impact on my business. I made a resolution to criticize less so I wavered in adding these observations because they’re not particularly rosy.

But the following things shaped my year from how I approach search analysis, to how I gain additional knowledge to how I educate clients.

The Google we knew is not the Google we’re dealing with today

I’ve been lucky to meet and talk with a number of Googlers throughout the years. They are overwhelmingly good people trying to do the right thing by users. The energy and passion they have around search is … inspiring.

But Matt Cutts left and Amit Singhal was replaced by John Giannandrea as the head of search. That doesn’t seem like a lot. But if you put your ear to the tracks and read the tea leaves you recognize that this was a massive change in direction for Google.

Machine learning is front and center and it’s an essential part of Google’s algorithm.

It’s not that good, passionate people aren’t still at Google. They are. But the environment is certainly different. We’re talking about people, experts in their field, given new direction from a new boss. How do you think you’d feel?

I believe understanding the people who work on search is an asset to understanding search. That’s more true today than ever.

Industry Content Is Lacking

I struggle to find good content to read these days. We lost our best investigative journalist last year along with another passionate and smart editor. Danny Sullivan and Matt McGee are sorely missed.

I used to take great pride in curating the industry and Tweeting out the best I could find each day. It was a steady stream of 2 or 3 Tweets a day. Now … it’s maybe twice a week. Maybe I’m just over-the-hill and not finding the new voices? Maybe I’m not dedicating enough time to combing Feedly?

But I’m discouraged when I open up a top trends of 2018 post (which I know is a mistake) and see ‘water is wet’ statements like ‘featured snippets will be important’ and ‘voice search is on the rise’.

Instead of bemoaning the bad, I would like to point out folks like Paul Shapiro for great technical content and Cyrus Shepard who seems to have taken up the mantle of curating our industry. There are other great specialists like Bill Slawski and Bartosz Goralweicz out there contributing but … there are too few of them for my taste.

And there are others who clearly have knowledge but aren’t sharing right now. I’m not going to call them out. Hell, I’d be calling myself out too. I think they’re all busy with work and life. Being industry famous doesn’t make their lives better. In fact, it causes more problems. I get it, but I wish we all had more time to move the conversation forward.

More data isn’t the problem, it’s the lack of interpretation and analysis. 

The conversations I see happening in the industry are often masturbatory and ego driven. Someone has to be right and someone has to be wrong. Real debate and true exploration seem like an endangered species.

For instance, knowing that Google is relying heavily on machine learning, shouldn’t the industry be looking at analyzing algorithmic changes in a different way.

Today, changes in rank are often tied to an update in the mapping of vectors to intent that renders a different mix of content on results. One can watch over many months as they test, learn and adapt on query classes in pursuit of optimal time to long click metrics.

I find the calcification of search truth to be dangerous given the velocity of changes inherent in our vertical. At the same time, the newest things don’t replace the tried and true. It’s these contradictions that make our industry interesting!

Beyond that, many are working off of a very limited data set. The fact that something worked for you on the one site that you tried it on might not mean much. Of course, we’ve also seen people with much larger data sets make mistakes in interpretation.

And that’s where things seem to have gone off the tracks. I don’t mind correlation studies. They provide another point of data for me to consider among a large number of other data points. I assign the findings from each correlation study a weight based on all of my other knowledge.

That means that some will receive very little weight and others more based on my understanding of how they were conducted and what I see in practice across my client base. We don’t need less data, less content or fewer tactics. We need to better understand the value of each and how they combine to help achieve search success.

As a result I see far more appetite for hiring growth engineers over SEOs largely because they’re willing to test and adapt instead of proselytize.

The Things That Matter

I’m cancer free! It’s been nearly three years now. And in 2017 I couldn’t use recovery as an excuse for my eating habits. So I lost 25 pounds.

For those interested, there’s no real magic to losing weight. Journal your food and take in fewer calories than you burn. It’s not always fun or easy but it works.

I gained 10 of that back in the last few months of the year. This was partly because I lost my tennis partners, which meant no calorie burning exercise cushion that allowed me a few days of indulgence each week.

Thankfully, my daughter is now finally getting back to tennis after physical therapy for a patellar subluxation, which is a dislocation of the kneecap. Her second in two years.

It turns out her thigh bone doesn’t have as deep a divot for her kneecap. It’s nearly flat, which means she’s prone to dislocations. The orthopedist mentioned that this also meant that when it did slip out it wouldn’t hurt nearly as much. Seems I’m not the only one who can tell a story that relies on the positive versus the negative. #callback

My wife, on the other hand, has tennis elbow, which is far more painful than she or I realized. She’ll be undergoing a procedure soon in hopes that it helps her tendon to bounce back and heal fully.

Things are actually quite good despite all this and the fact that my daughter is a teenager (yikes) and my wife just had sinus surgery. I’m around and I’m happier, which I hope is as infectious as this year’s flu.

Google Index Coverage Report

October 23 2017 // Analytics + SEO // 16 Comments

Google’s new Index Coverage report lets you “Fix problems that prevent your URLs from being optimally indexed by Google Search.”

As it stands the report delivers a huge increase in visibility, creates a host of new metrics to track and requires new sitemap configurations. But the real treasures are what you learn when you dig into the data.

Index Coverage Report

The Index Coverage report is a Google Search Console public beta that provides details on the indexation status for pages on a site or in a specific sitemap or sitemap index. It’s essentially a mashup of Index status and Sitemaps on steroids.

You’ll know if you have access if you have a ‘Try the new Search Console’ link at the top of the left hand navigation in Search Console.

A handful of my clients are part of this public beta. I wish more were. I asked for additional client access but was turned down. So if you don’t have this link, I can’t help you gain access to the beta.

Instead, I hope to provide a decent overview of the functionality that may or may not wind up being launched. And later on I’ll show that the data this report contains points to important optimization strategies.

Clicking on that ‘Try’ link sends you to the new look Search Console.

Index Coverage Report Entry Page

Clicking on the Index Coverage line gives you the full report. The top of the page provides a general trend in a stacked bar graph form for each status as defined by Google.

Index Coverage Full Report

The bottom of the page gives you the details within each status.

Index Coverage Full Report Bottom

Clicking on any of those rows provides you with a sample list of 1000 pages.

Index Coverage Sample Pages

You can download this data, which I did as you’ll see later. You can also filter these pages by ‘Page’ or ‘Last crawled’ date.

Index Coverage Download and Filter Options

This is particularly handy if you have named folders or even patterned syntax (e.g. – condos-for-rent vs houses-for-rent) that you filter on and determine the ratio of content within the sample provided.

You can choose to see this data for all known pages, all submitted pages or for an individual sitemap or sitemap index that is at the top level in your Google Search Console account.

Index Coverage Report by Sitemap

One thing to note here is that you must click the Excluded tab to add that to the report. And you’ll want to since there’s some interesting information in that status.

Indexation Status

The first thing to know here is that you get a lot of new terminology regarding the status of your URLs. Frankly, I think this is overkill for the vast majority of site owners. But I’m thrilled that the search community might get this level of detail.

Google classifies the status of a page into four major categories.

Index Coverage Status Definition Key

The Error and Warning areas are fairly straight forward so I’m not going to go into much detail there. Instead I want to cover the two major sub-status definitions for Valid pages.

Index Coverage Valid Definitions

Indexed, Low interest. Well hello there! What is this? It felt very much like a low or thin content signal. Visions of Pandas danced in my head.

I spent a lot of time looking at the sample pages in the Indexed, Low interest status. Sometimes the sample pages for this status made sense and other times they didn’t. I couldn’t quite figure out what made something low interest.

One client looked at the traffic to these two cohorts using the sample data across a number of sitemaps. The results for a seven day period were stunning.

The pages in Submitted and Indexed delivered 4.64 visits per page.

The pages in Indexed, Low interest delivered 0.04 visits per page.

Kirk Jazz Hands

It’s pretty clear that you want to avoid the Indexed, Low interest status. I imagine Google holding their nose while indexing it and keeping it around just in case they need to resort to it for some ultra long-tail query.

In contrast, the Submitted and Indexed status is the VIP of index status and content. If your content falls into this status it will translate into search success.

The other status that drew my attention was Excluded.

Index Coverage Report Excluded Sub Status Definitions

There are actually a lot more than pictured but the two most often returned are Crawled and Discovered – currently not indexed.

Reading the definitions of each it’s essentially Google giving the single bird and double bird to your content respectively. Crawled means they crawled it but didn’t index it with a small notation to ‘don’t call us, we’ll call you’.

Discovered – currently not indexed seems to indicate that they see it in your sitemap but based on how other content looks they’re not even going to bother crawling it. Essentially, “Ya ugly!” Or, maybe it’s just a representation of poor crawl efficiency.

Frankly, I’m not entirely sure that the definition of Discovered is accurate since many of the sample URLs under this status have a Last crawled date. That seems to contradict the definition provided.

And all of this is complicated by the latency in the data populating these reports. As of the writing of this post the data is 20 days behind. No matter the specific meaning, content with this status is bad news.

Indexation Metrics

New data leads to new calculated metrics. Sure you can track the trend of one status or another. But to me the real value is in using the data to paint a picture of health for each type of content.

Index Coverage Metrics

Here I have each page type as a separate sitemap index allowing me to compare them using these new metrics.

The ‘Valid Rate’ here is the percentage of pages that met that status. You can see the first has a massive Valid Rate while the others don’t. Not by a long shot.

But the metric I really like is the percentage Indexed and Submitted in relation to total Valid pages. In other words, of those pages that get the Valid status, how many of them are the ‘good’ kind.

Here again, the first page type not only gets indexed at a high rate but the pages that do get indexed are seen as valuable. But it’s the next two pages types that show why this type of analysis valuable.

Because both of the next two page types have the same Valid Rate. But one page type has a better chance of being seen as valuable than the next based on the percentage Indexed and Submitted.

I can then look at the percentage Discovered and see that there’s a large amount of pages that might be valid if they were crawled. With this in mind I’d work on getting the page type with a higher percentage I&S crawled more frequently since I have a 1 in 4 chance of those being ‘good’ pages.

Here’s an alternate way one client used to look at each sitemap and determine the overall value Google sees in each.

Index Coverage Metrics Matrix

It’s the same general principle but they’re using a ratio of Submitted and Indexed to Low interest to determine general health for that content.

It remains to be seen exactly what metrics will make the most sense. But the general guidance here is to measure the rate at which content is indexed at all and once indexed what percentage is seen as valuable.

Sitemap Configuration

I’ve long been a proponent of optimizing your sitemaps to gain more insight into indexation by page type. That usually meant having a sitemap index with a number of sitemaps underneath all grouped by page type.

The current Index Coverage report will force changes to this configuration if you want to gain the same level of insight. Instead of one sitemap index with groups of sitemaps representing different page types you’ll need a separate sitemap index for each page type. For smaller sites you can have a separate sitemap at the top level for each page type.

This is necessary since there is no drill down capability from a sitemap index to individual sitemap within the tool. And even if there were, it would be difficult to aggregate all of this data across multiple sitemaps.

Instead, you’ll use the sitemap index to do all of the aggregation for you. So you’d have a sitemap index for each page type and might even make them more granular if you thought there was a material difference on the same page type (e.g. – rap lyrics versus rock lyrics).

Don’t worry, you can have multiple sitemap index files in your account (at least up to 500 I believe) so you’ll have plenty of room for whatever scheme you can cook up.

Defining Low Interest

I got very interested in determining why a page would wind up in the low interest bucket. At first glance I figured it might just be about content. Essentially a Panda signal for thin or low value content.

But the more I dug the more I realized it couldn’t just be a content signal. I kept seeing pages that were very similar showing up in both Indexed, Low Interest and Submitted and Indexed. But I needed a more controlled set of content to do my analysis.

And then I found it.

Index Coverage Report Example

This sitemap contains state level pages for nursing homes. There are 54 in total because of Washington D.C., Guam, Puerto Rico and The Virgin Islands.

These pages are essentially navitorial pages meant to get users to the appropriate city of choice. What that means is that they are nearly identical.

Index Coverage Submitted and Indexed Example

Index Coverage Low Interest Example

Which one do you think is the low interest page? Because one of them is and … one of them is not. Do you think you could figure that out simply from the text on the page?

This defined set of content allowed me to easily compare each cohort to see if there were any material differences. I downloaded the pages for each cohort and used a combination of Google Keyword Planner, ahrefs and SEMrush to compile metrics around query volume, backlinks and keyword difficulty.

The query class I used to calculate these metrics is ‘nursing homes in [state].

Query Metrics

Query Metrics for Index Coverage Comparison

The volume is slightly higher for the Submitted and Indexed group but that’s skewed by Google grouping ‘va nursing homes’ into the Virginia query. This means folks potentially looking for veteran’s affairs nursing homes would fall into this query.

Low volume and high volume queries fall into both cohorts so I tend to think query volume isn’t a material difference. I added number of results to the mix after seeing the discrepancy between the two cohorts.

I found it a bit odd that there were fewer results for higher volume queries. I’m not sure what to make of this. Could there be a higher bar for content where there is a larger number of results? Further investigation is necessary but it didn’t jump to the top of my list.

Link Metrics

Index Coverage Comparison Link Metrics

The link metrics from ahrefs show no material difference. Not only that but when I look at the links they’re all rather similar in nature. So I find it hard to believe that one set had better topical links or more trusted links than another from a Google perspective.

Keyword Difficulty Metrics

Index Coverage Comparison Difficulty Metrics

Here again there wasn’t a material difference. Even more so if I account for the fact that Texas spiked higher at the time because of the flooding of nursing homes due to hurricane Harvey.

Now, I wouldn’t be taking you down this road if I didn’t find something that was materially different. Because I did.

Crawl Metrics

I’ve long been a proponent of crawl efficiency and crawl optimization. So it was interesting to see a material difference in the reported last crawl for each cohort.

Index Coverage Comparison Crawl Date Metrics

That’s a pretty stark difference. Could crawl date be a signal? Might the ranking team think so highly of the crawl team that pages that aren’t crawled as often are deemed less interesting? I’ve often thought something like this existed and have had offline talks with a number of folks who see similar patterns.

But that’s still just scuttlebutt really. So what did I do? I took one of the pages that was in the Low interest cohort and used Fetch as Google to request indexing of that page.

Sure enough when the data in the Index Coverage report was updated again that page moved from Low interest to Submitted and Indexed.

So, without any other changes Google was now reporting that a page that had previously been Low interest was now Submitted and Indexed (i.e. – super good page) based solely on getting it crawled again.

I'm Intrigued

Now, the data for the Index Coverage report has been so woefully behind that I don’t yet know if I can repeat this movement. Nor do I know how long that page will remain in Submitted and Indexed. I surmise that after a certain amount of time it will return back to the Low interest cohort.

Time will tell.

[Updated on 10/24/17]

The Index Coverage report data updated through October 6th. The update revealed that my test to get another page moved from Indexed, Low interest to Submitted and Indexed through a Fetch as Google request was successful. The prior page I moved also remains in Submitted and Indexed.

Strangely, a third page moved from Indexed, Low interest to Submitted and Indexed without any intervention. It’s interesting to see that this particular state was an outlier in that Low interest cohort in terms of engagement.

Good Engagement Moves Content

[Updated on 11/9/17]

On October 20, the first page I fetched moved back from Submitted and Indexed to Indexed, Low Interest. That means it took approximately 22 days for the ‘crawl boost’ (for lack of a better term) to wear off.

On October 31, the second page I fetched moved back from Submitted and Indexed to Indexed, Low Interest. That means it took approximately 26 days for the ‘crawl boost’ to wear off.

It’s hard to get an exact timeframe because of how infrequently the data is updated. And each time they update it’s a span of days that all take on the same data point. If that span is 7 days I have no clear idea of when that page truly moved down.

From the data, along with some history with crawl analysis, it seems like the ‘crawl boost’ lasts approximately three weeks.

It should be noted that both URLs did not seem to achieve higher rankings nor drive more traffic during that ‘crawl boost’ period. My assumption is that other factors prevented these pages from fully benefitting from the ‘crawl boost’.

Further tests would need to be done with content that didn’t have such a long and potentially negative history. In addition, testing with a page where you’ve made material changes to the content would provide further insight into whether the ‘crawl boost’ can be used to rehabilitate pages.

[Updated on 11/20/17]

The data is now current through November 11th and a new wrinkle has emerged. There are now 8 URLs in the Excluded status.

Index Coverage Trend November 11, 2017

One might think that they were all demoted from the Indexed, Low Interest section. That would make sense. But that’s not what happened.

Of the 6 URLs that are now in the Crawled status, three are from Indexed, Low Interest but three are from Submitted and Indexed. I’m not quite sure how you go from being super awesome to being kicked out of the index.

And that’s pretty much what Excluded means when you look at the information hover for that status.

Index Coverage Report Excluded Hover Description

The two other URLs that dropped now have the status Submitted URL not selected as canonical. Sure enough, it’s represented by one from Indexed, Low Interest and one from Submitted and Indexed.

There’s what I believe to be new functionality as I try to figure out what URL Google has selected as the canonical.

Index Coverage Page Details

None of it actually helps me determine which URL Google thinks is better than the one submitted. It’s interesting that they’ve chosen to use the info: command given that the functionality of this operator was recently reduced.

And that’s when I realize that they’ve changed the URLs for these pages from /local/nursing-homes-in-[state] to /local/states/nursing-homes-in-[state]. They did this with a 301 (yay!) but didn’t update the XML sitemap (boo!).

This vignette is a prime example of what it means to be an SEO.

It also means using these pages as a stable set of data has pretty much come to an end. However, I’ll poke the client to update the XML sitemaps and see what happens just to see if I can replicate the original breakdown between Submitted and Indexed and Indexed, Low Interest.

Internal Links

How did Google decide not to crawl the low interest cohort group as frequently? Because while the crawl might be some sort of recursive signal there are only a few ways it could arrive at that decision in the first place.

We know the content is the same, the links are the same and the general query volume and keyword difficulty are the same. Internal links could come into play but there are breadcrumbs back to the state page on every city and property page.

So logically I’d hazard that a state like California would have far more cities and properties, which would mean that the number of internal links would be higher for that state than for others. The problem? California is in the Low interest cohort. So unless having more links is worse I don’t think this is material.

But, when in doubt you keep digging.

The internal links report doesn’t show all of the state pages but what it does show is certainly interesting. Of the 22 state pages that do show up on this report only 2 of them fall into the Low interest cohort.

So that means 20 of the original 30 Submitted and Indexed (66%) had reported internal link density while only 2 of the original 24 Low interest (8%) had reported internal link density. That’s certainly a material difference!

By comparison a Screaming Frog crawl shows that the real internal link difference between these pages is different in the way I expected with larger states having more links than smaller ones.

Index Coverage Screaming Frog Internal Links

Those highlighted fall into the Low interest cohort. So there doesn’t seem to be a connection based on internal link density.

But let’s return to that Internal links report. It’s always been a frustrating, though valuable, report because you’re never quite sure what it’s counting and how often the data is updated. To date I only knew that making that report look right correlated highly with search success.

This new information gives rise to a couple of theories. Is the report based on the most recent crawl of links on a site? If so, the lower crawl rate for those in the Low interest cohort would produce the results seen.

Or could the links to those Low interest pages be deemed less valuable based on the evaluation of that page? We already know that Google can calculate the probability that a person will click on a link and potentially assign value based on that probability. So might the report be reflection of Google’s own value of the links they find?

Unfortunately there are few definitive answers though I tend to think the Internal links report oddity is likely driven by the crawl date discrepancy between the two cohorts.

Engagement Metrics

So I’m again left with the idea that Google has come to some conclusion about that cohort of pages that is then informing crawl and potentially internal link value.

Some quick regex and I have Google Analytics data for each cohort back to 2009. Yeah, I’ve got 8 years of data on these suckers.

Index Coverage Comparison Engagement Metrics

The engagement metrics on the Low interest cohort are materially worse than those on the Submitted and Indexed cohort.

Engagement, measured as some composite of adjusted click rate combined with a long click measurement, may be a factor in determining whether a page is of Low interest. It’s not the only factor but we’ve just ruled out a whole bunch of other factors.

“When you have eliminated the impossible, whatever remains, however improbable, must be the truth.”

Now, you might make the case that ranking lower might produce lower metrics. That’s possible but … I’m always wary when pretzel logic is introduced. Sure, sometimes our brain gets lazy and we make the easy (and wrong) connection but we also often work too hard to explain away the obvious.

Here’s what I do know. Pages in the Low interest cohort are clearly being demoted.

Query Based Demotion

The first page returned for a search for ‘nursing homes in indiana’ is on page three and it isn’t the state page.

Query Example for Demoted Content

Google knows that this query is targeted toward the state of Indiana. There’s a local unit with Indiana listings and every other result on page one references the state of Indiana.

Now lets do the same search but with the site: operator.

Index Coverage Site Query Example

Suddenly Google has the state page as the first result. Of course the site: query isn’t a perfect tool to identify the most relevant content for a given query. But I tend to believe it provides a ballpark estimate.

If the site: operator removes other signals and simply returns the most relevant content on that site for a given term the difference between what is returned with and without is telling.

Any way you look at it, Google has gone out of their way to demote this page and others in the Low interest cohort for this query class. Yet for pages in the Submitted and Indexed cohort these state pages rank decently on page one (4th or 5th generally.)

Click Signals

Electric Third Rail Sign

The third rail of SEO these days is talking about click signals and their influence on rankings. I’ve written before about how the evidence seems to indicate Google does integrate this data into the algorithm.

There’s more I could add to that post and subsequent tests clients have done that I, unfortunately, can’t share. The analysis of these state pages provides further evidence that click data is employed. Even then, I acknowledge that it’s a small set of data and there could be other factors I’m missing.

But even if you don’t believe, behaving like you do will still help you succeed.

Other Index Coverage Findings

There are a number of other conclusions I’ve reached based on observing the data from multiple client reports.

Google will regularly choose a different canonical. Remember that rel=canonical is a suggestion and Google can and will decide to ignore it when they see fit. Stop canonical abuse and use 301 redirects (a directive) whenever possible.

Google sucks at dealing with parameters. I’ve said it over and over. Parameter’s are the devil. Googlebot will gorge themselves on parameter based URLs to the detriment of the rest of your corpus.

Google will ignore href lang targeted for that country or language. The markup itself is brittle and many have struggled with the issue of international mismatches. You can actively see them doing this by analyzing the Index Coverage report data.

One of the more frustrating situations is when the local version of your home page isn’t selected for that localized search. For instance, you might find that your .com home page is displayed instead of your .br home page in Brazil.

If you believe that engagement is a signal this actually might make sense. Because many home pages either give users and easy way to switch to a local domain or may automatically redirect users based on geo-IP or browser language. If this is the case, clicks on a mismatch domain would still provide positive engagement signals.

Those clicks would still be long clicks!

The feedback loop to Google would be telling them that the .com home page was doing just swell in Brazil. So there’s no reason for Google to trust your href lang markup and make the switch.

I’m not 100% convinced this is what is happening but … it’s a compelling argument.

Get Ready

There are a few things you can do to get ready for the full rollout of the Index Coverage report. The first is to reorganize your sitemap strategy so you have your sitemaps or sitemap index files all at the top level broken down by page type or whatever other strategy that delivers value.

The second is to begin or refine tracking of engagement metrics such as modified bounce rate and specific event actions that may indicate satisfaction. I’m still working to determine what baseline metrics make sense. Either way, SEO and UX should be working together and not against each other.


The new Index Coverage report provides a new level of insight into indexation issues. Changes to your sitemap strategy will be required to take full advantage of the new data and new metrics will be needed to better understand how your content is viewed by Google.

Data from the Index Coverage report confirms the high value of crawl efficiency and crawl optimization. Additional analysis also provides further evidence that click signals and engagement are important in the evaluation and ranking of content.

Analyzing Position in Google Search Console

July 18 2017 // Analytics + SEO // 16 Comments

Clients and even conference presenters are using Google Search Console’s position wrong. It’s an easy mistake to make. Here’s why you should only trust position when looking at query data and not page or site data.


Google has a lot of information on how they calculate position and what it means. The content here is pretty dense and none of it really tells you how to read and when to rely on the position data. And that’s where most are making mistakes.

Right now many look at the position as a simple binary metric. The graph shows it going down, that’s bad. The graph shows it going up, that’s good. The brain is wired to find these shortcuts and accept them.

Search Analytics Site Level Trend Lies

As I write this there is a thread about there being a bug in the position metric. There could be. Maybe new voice search data was accidentally exposed? Or it might be that people aren’t drilling down to the query level to get the full story.

Too often, the data isn’t wrong. The error is in how people read and interpret the data.

The Position Problem

The best way to explain this is to actually show it in action.

Search Analytics Position Example

A week ago a client got very concerned about how a particular page was performing. The email I received asked me to theorize why the rank for the page dropped so much without them doing anything. “Is it an algorithm change?” No.

Search Analytics Position Comparison Day over Day

If you compare the metrics day over day it does look pretty dismal. But looks can be deceiving.

At the page level you see data for all of the queries that generated an impression for the page in question. A funny thing happens when you select Queries and look at the actual data.

Search Analytics Position Term Expansion

Suddenly you see that on July 7th the page received impressions for queries that were not well ranked.

It doesn’t take a lot of these impressions to skew your average position.

A look at the top terms for that page shows some movement but nothing so dramatic that you’d panic.

Top Terms for a Page in Search Analytics

Which brings us to the next flaw in looking at this data. One day is not like the other.

July 6th is a Thursday and July 7th is a Friday. Now, usually the difference between weekdays isn’t as wide as it is between a weekday and a weekend but it’s always smart to look at the data from the same day in the prior week.

Search Analytics Position Week over Week

Sure enough it looks like this page received a similar expansion of low ranked queries the prior Friday.

There’s a final factor that influences this analysis. Seasonality. The time in question is right around July 4th. So query volume and behavior are going to be different.

Unfortunately, we don’t have last year’s data in Search Analytics. These days I spend most of my time doing year over year analysis. It makes analyzing seasonality so much easier. Getting this into Search Analytics would be extremely useful.

Analyzing Algorithm Changes

User Error

The biggest danger comes when there is an algorithm change and you’re analyzing position with a bastardized version of regex. Looking at the average position for a set of pages (i.e. – a folder) before and after an algorithm change can be tricky.

The average position could go down because those pages are now being served to more queries. And in those additional queries those pages don’t rank as high. This is actually quite normal. So if you don’t go down to the query level data you might make some poor decisions.

One easy way to avoid making this mistake is to think hard when you see impressions going up but position going down.

When this type of query expansion happens the total traffic to those pages is usually going up so the poor decision won’t be catastrophic. It’s not like you’d decide to sunset that page type.

Instead, two things happen. First, people lose confidence in the data. “The position went down but traffic is up! The data they give just sucks. You can’t trust it. Screw you Google!”

Second, you miss opportunities for additional traffic. You might have suddenly broken through at the bottom of page one for a head term. If you miss that you lose the opportunity to tweak the page for that term.

Or you might have appeared for a new query class. And once you do, you can often claim the featured snippet with a few formatting changes. Been there, done that.

Using the average position metric for a page or group of pages will lead to sub-optimal decisions. Don’t do it.

Number of Queries Per Page

Princess Unikitty Boardroom

This is all related to an old metric I used to love and track religiously.

Back in the stone ages of the Internet before not provided one of my favorite metrics was the number of keywords driving traffic to a page. I could see when a page gained enough authority that it started to appear and draw traffic from other queries. Along with this metric I looked at traffic received per keyword.

These numbers were all related but would ebb and flow togther as you gained more exposure.

Right now Google doesn’t return all the queries. Long-tail queries are suppressed because they’re personally identifiable. I would love to see them add something that gave us a roll-up of the queries they aren’t showing.

124 queries, 3,456 impressions, 7.3% CTR, 3.4 position

I’d actually like a roll-up of all the queries that are reported along with the combined total too. That way I could track the trend of visible queries, “invisible” queries and the total for that page or site.

The reason the number of queries matters is that as that page hits on new queries you rarely start at the top of those SERPs. So when Google starts testing that page on an expanded number of SERPs you’ll find that position will go down.

This doesn’t mean that the position of the terms you were ranking for goes down. It just means that the new terms you rank for were lower. So when you add them in, the average position declines.

Adding the roll-up data might give people a visual signpost that would prevent them from making the position mistake.


Google Search Console position data is only stable when looking at a single query. The position data for a site or page will be accurate but is aggregated by all queries.

In general, be on the look out for query expansion where a site or page receives additional impressions on new terms where they don’t rank well. When the red line goes up and the green goes down that could be a good thing.

Ignoring Link Spam Isn’t Working

July 06 2017 // SEO // 37 Comments

Link spam is on the rise again. Why? Because it’s working. The reason it’s working is that demand is up based on Google’s change from penalization to neutralization.

Google might be pretty good at ignoring links. But pretty good isn’t good enough.

Neutralize vs Penalize

For a very long time Google didn’t penalize paid or manipulative links but instead neutralized them, which is a fancy way of saying they ignored those links. But then there was a crisis in search quality and Google switched to penalizing sites for thin content (Panda) and over optimized links (Penguin).

The SEO industry underwent a huge transformation as a result.

Google Trends for Content Marketing

I saw this as a positive change despite having a few clients get hit and seeing the industry throw the baby (technical SEO) out with the bathwater. The playing field evened and those who weren’t allergic to work had a much better chance of success.

Virtually Spotless

Cascade Print Ad

This Cascade campaign and claim is one of my favorites as a marketer. Because ‘virtually spotless’ means those glasses … have spots. They might have less spots than the competition but make no mistake, they still have spots.

This was Gary’s response to a Tweet about folks peddling links from sites like Forbes and Entrepreneur. I like Gary. He’s also correct. Unfortunately, none of that matters.

Pretty good is the same as virtually spotless.

Unless neutralization is wildly effective in the first month those links are found then it will ultimately lead to more successful link spam. And that’s what I’m seeing. Over the last year link spam is working far more often, in more verticals and for more valuable keywords.

So when Google says they’re pretty good at ignoring link spam that means some of the link spam is working. They’re not catching 100%. Not by a long shot.


Lighting a Cigar with a 100 Dollar Bill

One of the issues is that, from a Google perspective, the difference might seem small. But to sites and to search marketing professionals, the differences are material.

I had a similar debate after Matt Cutts said there wasn’t much of a difference between having your blog in a subdomain versus having it in a subfolder. The key to that statement was ‘much of’, which meant there was a difference.

It seemed small to Matt and Google but if you’re fighting for search traffic, it might turn out to be material. Even if it is small, do you want to leave that gain on the table? SEO success comes through a thousand optimizations.

Cost vs Benefit

Perhaps Google neutralizes 80% of the link spam. That means that 20% of the link spam works. Sure, the overall cost for doing it goes up but here’s the problem. It doesn’t cost that much.

Link spam can be done at scale and be done without a huge investment. It’s certainly less costly than the alternative. So the idea that neutralizing a majority of it will help end the practice is specious. Enough of it works and when it works it provides a huge return.

It’s sort of like a demented version of index investing. The low fee structure and broad diversification mean you can win even if many of the stocks in that index aren’t performing.

Risk vs Reward

Get Out Jail Free Monopoly Card

Panda and Penguin suddenly made thin content and link spam risky. Sure it didn’t cost a lot to produce. But if you got caught, it could essentially put your site six feet under.

Suddenly, the reward for these practices had to be a lot higher to offset that risk.

The SEO industry moaned and bellyached. It’s their default reaction. But penalization worked. Content got better and link spam was severely marginalized. Those who sold the links were now offering link removal services. Because the folks who might buy links … weren’t in the market anymore.

The risk of penalty took demand out of the market.

Link Spam

I’m sure many of you are seeing more and more emails peddling links showing up in your inbox.

Paid Link Outreach Email

Some of them are laughable. Yet, that’s what makes it all the more sad. It shows just how low the bar is right now for making link spam work.

There are also more sophisticated link spam efforts, including syndication spam. Here, you produce content once with rich anchor text (often on your own site) and then syndicate that content to other platforms that will provide clean followed links. I’ve seen both public and private syndication networks deliver results.

I won’t offer a blow-by-blow of this or other link manipulation techniques. There are better places for that and others who are far more versed in the details.

However, a recent thread in the Google Webmaster Help forum around a PBN is instructive.

Black Hat Private Blog Networks Thread

The response by John Mueller (another guy I like and respect) is par for the course.

The tricky part about issues like these is that our algorithms (and the manual webspam team) often take very specific action on links like these; just because the sites are still indexed doesn’t mean that they’re profiting from those links.

In short, John’s saying that they catch a lot of this and ignore those links. In extreme cases they will penalize but the current trend seems to rely on neutralization.

The problem? Many of us are seeing these tactics achieve results. Maybe Google does catch the majority of this spam. But enough sneaks through that it’s working.

Now, I’m sure many will argue that there are other reasons a site might have ranked for a specific term. Know what? They might be right. But think about it for a moment. If you were able to rank well for a term, why would you employ this type of link spam tactic?

Even if you rationalize that a site is simply using everything at their disposal to rank, you’d then have to accept that fear of penalty was no longer driving sites out of the link manipulation market.

Furthermore, by letting link manipulation survive ‘visually’ it becomes very easy for other site owners to come to the conclusion (erroneous or not) that these tactics do work. The old ‘perception is reality’ adage takes over and demand rises.

So while Google snickers thinking spammers are wasting money on these links it’s the spammers who are laughing all the way to the bank. Low overhead costs make even inefficient link manipulation profitable in a high demand market.

I’ve advised clients that I see this problem getting worse in the next 12-18 months until it reaches a critical mass that will force Google to revert back to some sort of penalization.


Link spam is falling through the cracks and working more often as Google’s shift to ignoring link spam versus penalizing it creates a “sellers market” that fuels link spam growth.

The Future of Mobile Search

August 29 2016 // SEO + Technology + Web Design // 17 Comments

What if I told you that the future of mobile search was swiping.

Google Mobile Search Tinderized

I don’t mean that there will be a few carousels of content. Instead I mean that all of the content will be displayed in a horizontal swiping interface. You wouldn’t click on a search result, you’d simply swipe from one result to the next.

This might sound farfetched but there’s growing evidence this might be Google’s end game. The Tinderization of mobile search could be right around the corner.

Horizontal Interface

Google has been playing with horizontal interfaces on mobile search for some time now. You can find it under certain Twitter profiles.

Google Twitter Carousel

There’s one for videos.

Google Video Carousel

And another for recipes.

Google Recipe Carousel

There are plenty of other examples. But the most important one is the one for AMP.

Google AMP Carousel

The reason the AMP example is so important is that AMP is no longer going to be served just in a carousel but will be available to any organic search result.

But you have to wonder how Google will deliver this type of AMP carousel interface with AMP content sprinkled throughout the results. (They already reference the interface as the ‘AMP viewer’.)

What if you could simply swipe between AMP results? The current interface lets you do this already.

Google AMP Swipe Interface

Once AMP is sprinkled all through the results wouldn’t it be easier to swipe between AMP results once you were in that environment? They already have the dots navigation element to indicate where you are in the order of results.

I know, I know, you’re thinking about how bad this could be for non-AMP content but let me tell you a secret. Users won’t care and neither will Google.

User experience trumps publisher whining every single time.

In the end, instead of creating a carousel for the links, Google can create a carousel for the content itself.


Accelerated Mobile Pages Project

For those of you who aren’t hip to acronyms, AMP stands for Accelerated Mobile Pages. It’s an initiative by Google to create near instantaneous availability of content on mobile.

The way they accomplish this is by having publishers create very lightweight pages and then cacheing them on Google servers. So when you click on one of those AMP results you’re essentially getting the cached version of the page direct from Google.

The AMP initiative is all about speed. If the mobile web is faster it helps with Google’s (not so) evil plan. It also has an interesting … side effect.

Google could host the mobile Internet.

That’s both amazing and a bit terrifying. When every piece of content in a search result is an AMP page Google can essentially host that mobile result in its entirety.

At first AMP was just for news content but as of today Google is looking to create AMP content for everything including e-commerce. So the idea of an all AMP interface doesn’t seem out of the question.

Swipes Not Clicks


Swipes Not Clicks

Why make users click if every search result is an AMP page? Seriously. Think about it.

Google is obsessed with reducing the time to long click, the amount of time it takes to get users to a satisfactory result. What better way to do this than to remove the friction of clicking back and forth to each site.

No more blue links.

Why make users click when you can display that content immediately? Google has it! Then users can simply swipe to the next result, and the next, and the next and the next. They can even go back and forth in this way until they find a result they wish to delve into further.

Swiping through content would be a radical departure from the traditional search interface but it would be vastly faster and more convenient.

This would work with the numerous other elements that bubble information up further in the search process such as Knowledge Panels and Oneboxes. Dr. Pete Meyers showed how some of these ‘cards’ could fit together. But the cards would work equally as well in a swiping environment.

How much better would it be to search for a product and swipe through the offerings of those appearing in search results?

New Metrics of Success

Turn It On Its Head

If this is where the mobile web is headed then the game will completely change. Success won’t be tied nearly as much to rank. When you remove the friction of clicking the number of ‘views’ each result gets will be much higher.

The normal top heavy click distribution will disappear to be replaced with a more even ‘view’ distribution of the top 3-5 results. I’m assuming most users will swipe at least three times if not more but that there will be a severe drop off after that.

When a user swipes to your result you’ll still get credit for a visit by implementing Google Analytics or another analytics package correctly. But users aren’t really on your site at that point. It’s only when they click through on that AMP result that they wind up in your mobile web environment.

So the new metric for mobile search success might be getting users to stop on your result and, optimally, click-through to your site. That’s right, engagement could be the most important metric. Doesn’t that essentially create alignment between users, Google and publishers?

Funny thing is, Google just launched the ability to do A/B testing for AMP pages. They’re already thinking about how important it’s going to be to help publishers optimize for engagement.

Hype or Reality?

Is this real or is this fantasy?

Google, as a mobile first company, is pushing hard to reduce the distance between search and information. I don’t think this is a controversial statement. The question is how far Google is willing to go to shorten that distance.

I’m putting a bunch of pieces together here, from horizontal interfaces, to AMP to Google’s obsession with speed to come up with this forward looking vision of mobile search.

I think it’s in the realm of possibility, particularly since the growth areas for Google are in countries outside of the US where mobile is vastly more dominant and where speed can sometimes be a challenge.


When every search result is an AMP page there’s little reason for users to click on a result to see that content. Should Google’s AMP project succeed, the future of mobile search could very well be swiping through content and the death of the blue link.

RankBrain Survival Guide

June 09 2016 // SEO // 37 Comments

This is a guide to surviving RankBrain. I created it, in part, because there’s an amazing amount of misinformation about RankBrain. And the truth is there is nothing you can do to optimize for RankBrain.

I’m not saying RankBrain isn’t interesting or important. I love learning about how search works whether it helps me in my work or not. What I am saying is that there are no tactics to employ based on our understanding of RankBrain.

So if you’re looking for optimization strategies you should beware of the clickbait RankBrain content being pumped out by fly-by-night operators and impression hungry publishers.

You Can’t Optimize For RankBrain

You Can't Optimize For RankBrain

I’m going to start out with this simple statement to ensure as many people as possible read, understand and retain this fact.

You can’t optimize for RankBrain.

You’ll read a lot of posts to the contrary. Sometimes they’re just flat out wrong, sometimes they’re using RankBrain as a vehicle to advocate for SEO best practices and sometimes they’re just connecting dots that aren’t there.

Read on if you want proof that RankBrain optimization is a fool’s errand and you should instead focus on other vastly more effective strategies and tactics.

What Is RankBrain?

RankBrain is a deep learning algorithm developed by Google to help improve search results. Deep learning is a form of machine learning and can be classified somewhere on the Artificial Intelligence (AI) spectrum.

I think of Deep Learning as a form of machine learning where the algorithm can adapt and learn without further human involvement. One of the more interesting demonstrations of deep learning was the identification of cats (among other things) in YouTube thumbnails (pdf).

How Does RankBrain Work?

Knowing how RankBrain works is important because it determines whether you can optimize for it or not. Despite what you might read, there are only a handful of good sources of information about RankBrain.

Greg Corrado

The first is from the October 26 Bloomberg RankBrain announcement that included statements and summaries of a chat with Google Senior Research Scientist, Greg Corrado.

RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.

This makes it pretty clear that RankBrain uses vectors to better understand complex language.

Word2Vec is most often referenced when talking about vectors. And it should be noted that Jeff Dean, Greg Corrado and many others were part of this effort. You’ll see these same names pop up time and again surrounding vectors and deep learning.

I wrote a bit about vectors in my post on Hummingbird. In particular I like the quote from a 2013 Jeff Dean interview.

I think we will have a much better handle on text understanding, as well. You see the very slightest glimmer of that in word vectors, and what we’d like to get to where we have higher level understanding than just words. If we could get to the point where we understand sentences, that will really be quite powerful. So if two sentences mean the same thing but are written very differently, and we are able to tell that, that would be really powerful. Because then you do sort of understand the text at some level because you can paraphrase it.

I was really intrigued by the idea of Google knowing that two different sentences meant the same thing. And they’ve made a fair amount of progress in this regard with research around paragraph vectors (pdf).

Paragraph Vector Paper

It’s difficult to say exactly what type of vector analysis RankBrain employs. I think it’s safe to say it’s a variable-length vector analysis and leave it at that.

So what else did we learn from the Corrado interview? Later in the piece there are statements about how much Google relies on RankBrain.

The system helps Mountain View, California-based Google deal with the 15 percent of queries a day it gets which its systems have never seen before, he said.

That’s pretty clear. RankBrain is primarily used for queries not previously seen by Google, though it seems likely that its reach may have grown based on the initial success.

Unfortunately the next statement has caused a whole bunch of consternation.

RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.

This provoked the all-too-typical reactions from the SEO community. #theskyisfalling The fact is we don’t know how Google is measuring ‘importance’ nor do we understand whether it’s for just that 15 percent or for all queries.

Andrey Lipattsev

To underscore the ‘third-most important’ signal boondoggle we have statements by Andrey Lipattsev, Search Quality Senior Strategist at Google, in a Q&A with Ammon Johns and others.

In short, RankBrain might have been ‘called upon’ in many queries but may not have materially impacted results.

Or if you’re getting technical, RankBrain might not have caused a reordering of results. So ‘importance’ might have been measured by frequency and not impact.

Later on you’ll find that RankBrain has access to a subset of signals so RankBrain could function more like a meta signal. It kind of feels like comparing apples and oranges.

But more importantly, why does it matter? What will you do differently knowing it’s the third most important signal?

Gary Illyes

Another source of RankBrain information is from statements by Gary Illyes in conversation with Eric Enge. In particular, Gary has been able to provide some examples of RankBrain in action.

I mean, if you think about, for example, a query like, “Can you get a 100 percent score on Super Mario without a walk-through?” This could be an actual query that we receive. And there is a negative term there that is very hard to catch with the regular systems that we had, and in fact our old query parsers actually ignored the “without” part.

And RankBrain did an amazing job catching that and actually instructing our retrieval systems to get the right results.

Gary’s statements lend clear support to the idea that RankBrain helps Google to better understand complex natural language queries.

Paul Haahr

Paul Haahr Speaking at SMX West 2016

Perhaps the most interesting statements about RankBrain were made by Paul Haahr, a Google Ranking Engineer, at SMX West during his How Google Works: An Google Ranking Engineer’s Story presentation and Q&A.

I was lucky enough to see this presentation live and it is perhaps the best and most revealing look at Google search. (Seriously, if you haven’t watched this you should turn in your SEO card now.)

It’s in the Q&A that Haahr discusses RankBrain.

RankBrain gets to see some subset of the signals and it’s a machine learning or deep learning system that has its own ideas about how you combine signals and understand documents.

I think we understand how it works but we don’t understand what it’s doing exactly.

It uses a lot of the stuff that we’ve published on deep learning. There’s some work that goes by Word2Vec or word embeddings that is one layer of what RankBrain is doing. It actually plugs into one of the boxes, one of the late post retrieval boxes that I showed before.

Danny then asks about how RankBrain might work to ascertain document quality or authority.

This is all a function of the training data that it gets. It sees not just web pages but it sees queries and other signals so it can judge based on stuff like that.

These statements are by far the most important because it provides a plethora of information. First and foremost Haahr states that RankBrain plugs in late post-retrieval.

This is an important distinction because it means that RankBrain doesn’t rewrite the query before Google goes looking for results but instead does so afterwards.

So Google retrieves results using the raw query but then RankBrain might rewrite the query or interpret it differently in an effort to select and reorder the results for that query.

In addition, Haahr makes it clear that RankBrain has access to a subset of signals and the query. As I mentioned this makes RankBrain feel more like a meta-signal instead of a stand-alone signal.

What we don’t know are the exact signals that make up that subset. Many will take this statement to theorize that it uses link data or click data or any sundry of signals. The fact is we have no idea which signals RankBrain has access to nor with what weight RankBrain might be using them or if they’re used evenly across all queries.

The inability to know the variables makes any type of regression analysis of RankBrain a non-starter.

Of course there’s also the statement that they don’t know what RankBrain is doing. That’s because RankBrain is a deep learning algorithm performing unsupervised learning. It’s creating its own rules.

More to the point, if a Google Ranking Engineer doesn’t know what RankBrain is doing, do you think that anyone outside of Google suddenly understands it better? The answer is no.

You Can’t Optimize For RankBrain

You can’t optimize for RankBrain based on what we know about what it is and how it works. At its core RankBrain is about better understanding of language, whether that’s within documents or queries.

So what can you do differently based on this knowledge?

Google is looking at the words, sentences and paragraphs and turning them into mathematical vectors. It’s trying to assign meaning to that chunk of text so it can better match it to complex query syntax.

The only thing you can do is to improve your writing so that Google can better understand the meaning of your content. But that’s not really optimizing for RankBrain that’s just doing proper SEO and delivering better user experience (UX).

By improving your writing and making it more clear you’ll wind up earning more links and, over time, be seen as an authority on that topic. So you’ll be covered no matter what other signals RankBrain is using.

The one thing you shouldn’t do is think that RankBrain will figure out your poor writing or that you now have the license to, like, write super conversationally you know. Strong writing matters more now than it ever has before.


RankBrain is a deep learning algorithm that plugs in post-retrieval and relies on variable-length text vectors and other signals to make better sense of complex natural language queries. While fascinating, there is nothing one can do to specifically optimize for RankBrain.

Query Classes

February 09 2016 // SEO // 9 Comments

Identifying query classes is one of the most powerful ways to optimize large sites. Understanding query classes allows you to identify both user syntax and intent.

I’ve talked for years about query classes but never wrote a post dedicated to them. Until now.

Query Classes

Ron Burgundy Stay Classy

What are query classes? A query class is a set of queries that are well defined in construction and repeatable. That sounds confusing but it really isn’t when you break it down.

A query class is most often composed of a root term and a modifier.

vacation homes in tahoe

Here the root term is ‘vacation homes’ and the modifier is ‘in [city]’. The construction of this query is well defined. It’s repeatable because users search for vacation homes in a vast number of cities.

Geography is often a dynamic modifier for a query class. But query classes are not limited to just geography. Here’s another example.

midday moon lyrics

Here the root term is dynamic and represents a song, while the modifier is the term ‘lyrics’. A related query class is ‘[song] video’ expressed as ‘midday moon video’.

Another simple one that doesn’t contain geography is ‘reviews’. This modifier can be attached to both products or locations.

Query Class Example for Reviews

Recently Glen Allsopp (aka Viperchill) blogged about a numeric modifier that creates a query class: [year].

best science fiction books 2015

This often happens as part of a query reformulation when people are looking for the most up-to-date information on a topic and this is the easiest way for them to do so.

Sometimes a query class doesn’t have a modifier. LinkedIn and Facebook (among others) compete for a simple [name] query class. Yelp and Foursquare and others compete for the [venue name] query class.

Query Class Example for Venues

Of how about food glorious food.

Query Class Example for Recipe

That’s right, there’s a competitive ‘[dish] recipe’ query class up for grabs. Then there are smaller but important query classes that are further down the purchase funnel for retailers.

Query Class Example for VS

You can create specific comparison pages for the query class of ‘[product x] vs [product y]’ and capture potential buyers during the end of the evaluation phase. Of course you don’t create all of these combinations, you only do so for those that have legitimate comparisons and material query volume.

If it isn’t obvious by now there are loads of query classes out there. But query classes aren’t about generating massive amounts of pages but instead are about matching and optimizing for query syntax and intent.

User Syntax

One reason I rely on query classes is that it provides a window to understanding user syntax. I want to know how they search.

Query classes represent the ways in which users most often search for content. Sure there are variations and people don’t all query the same way but the majority follow these patterns.

Do you want to optimize for the minority or the majority?

Here are just a few of the ‘[dish] recipe’ terms I thought of off the top of my head.

Query Class Query Volume Example

Look at that! And that’s just me naming three dishes off the top of my head. Imagine the hundreds if not thousands of dishes that people are searching for each day. You’re staring at a pile of search traffic based on a simple query class.

It’s super easy when you’re dealing with geography because you can use a list of top cities in the US (or the world) and then with some simple concatenation formulas can generate a list of candidates.

Sometimes you want to know the dominant expression of that query class. Here’s one for bike trails by state.

Query Class User Syntax

Here I have a list of the different variants of this query class. One using ‘[state] bike trails’ and the other ‘bike trails in [state]’. Using Google’s keyword planner I see that the former has twice the query volume than the latter. Yes, it’s exact match but that’s usually directionally valid.

I know there’s some of you who think this level of detail doesn’t matter. You’re wrong. When users parse search results or land on a page they want to see the phrase they typed. It’s human nature and you’ll win more if you’re using the dominant syntax.

Once you identify a query class the next step is to understand the intent of that query class. If you’ve got a good head on your shoulders this is relatively easy.

Query Intent

Bath LOLCat

Not only do we want to know how they search, we want to know why.

The person searching for ‘vacation homes in tahoe’ is looking for a list of vacation rentals in Lake Tahoe. The person searching for ‘midday moon lyrics’ is looking for lyrics to the Astronautalis song. The person looking for ‘samsung xxx’ vs ‘sony xxx’ is looking for information on which TV they should purchase.

Knowing this, you can provide the relevant content to satisfy the user’s active intent. But the sites and pages that wind up winning are those that satisfy both active and passive intent.

The person looking for vacation homes in tahoe might also want to learn about nearby attractions and restaurants. They may want to book airfare. Maybe they’re looking for lift tickets.

The person looking for midday moon lyrics may want more information about Astronautalis or find lyrics to his other songs. Perhaps they want concert dates and tickets. The person looking for a TV may want reviews on both, a guide to HDTVs and a simple way to buy.

Satisfying passive intent increases the value of your page and keeps users engaged.

Sometimes the query class is vague such a [name] or [venue] and you’re forced to provide answers to multiple types of intent. When I’m looking up a restaurant name I might be looking for the phone number, directions, menu, reviews or to make a reservation to name but a few.

Query classes make it easier to aggregate intent.


On larger sites the beauty of query classes is that you can map them to a page type and then use smart templates to create appropriate titles, descriptions and more.

This isn’t the same as automation but is instead about ensuring that the page type that matches a query class is well optimized. You can then also do A/B testing on your titles to see if a slightly different version of the title helps you perform across the entire query class.

Sometimes you can play with the value proposition in the title.

Vacation Homes in Tahoe vs Vacation Homes in Tahoe – 1,251 Available Now

It goes well beyond just the Title and meta description. You can establish consistent headers, develop appropriate content units that satisfy passive intent and ensure you have the right crosslink units in place for further discovery.

The wrinkle usually comes with term length. Take city names for instance. You’ve got Rancho Santa Margarita clocking in at 22 characters and then Ada with a character length of 3.

So a lot of the time you’re coming up with business logic that delivers the right text, in multiple places, based on the total length of the term. This can get complex, particularly if you’re matching a dynamic root term with a geographic modifier.

Smart templates let you scale without sacrificing quality.

Rank Indices

The other reason why query classes are so amazing, particularly for large sites, is that you can create rank indices based on those query classes and determine how you’re performing as a whole across that query class.

Query Class Rank Indices

Here I’ve graphed four similar but distinct query class rank indices. Obviously something went awry there in November of 2015. But I know exactly how much it impacted each of those query classes and then work on ways to regain lost ground.

Query classes usually represent material portions of traffic that impact bottomline business metrics such as user acquisition and revenue. When you get the right coverage of query classes and create rank indices for each you’re able to hone in on where you can improve and react when the trends start to go in the wrong direction.

I won’t go into the details now but read up if you’re interested in how to create rank indices.

Identifying Query Classes

Hopefully you’ve already figured out how to identify query classes. But if you haven’t here are a few tips to get you started.

First, use your head. Some of this stuff is just … right there in front of you. Use your judgement and then validate it through keyword research.

Second, look at what comes up in Google’s autocomplete suggestions for root terms. You can also use a tool like Ubersuggest to do this at scale and generate more candidates.

Third, look at the traffic coming to your pages via Search Analytics within Google Search Console. You can uncover patterns there and identify the true syntax bringing users to those pages.

Fourth, use paid search, particularly the report that shows the actual terms that triggered the ad, to uncover potential query classes.

Honestly though, you should really only need the first and second to identify and hone in on query classes.


Query classes are an enormously valuable way to optimize larger sites so they meet and satisfy patterns of query syntax and intent. Query classes let you understand how and why people search. Pages targeted at query classes that aggregate intent will consistently win.

Do 404 Errors Hurt SEO?

February 01 2016 // SEO // 25 Comments

Do 404 errors hurt SEO? It’s a simple question. However, the answer is far from simple. Most 404 errors don’t have a direct impact on SEO, but they can eat away at your link equity and user experience over time.

There’s one variety of 404 that might be quietly killing your search rankings and traffic.

404 Response Code

Abandoned Building

What is a 404 exactly? A 404 response code is returned by the server when there is no matching URI. In other words, the server is telling the browser that the content is not found.

404s are a natural part of the web. In fact, link rot studies show that links regularly break. So what’s the big deal? It’s … complicated.

404s and Authority

Evaporation Example

One of the major issues with 404s is that they stop the flow of authority. It just … evaporates. At first, this sort of bothered me. If someone linked to your site but that page or content is no longer there the citation is still valid. At that point in time the site earned that link.

But when you start to think it through, the dangers begin to present themselves. If authority passed through a 404 page I could redirect that authority to pages not expressly ‘endorsed’ by that link. Even worse, I could purchase a domain and simply use those 404 pages to redirect authority elsewhere.

And if you’re a fan of conspiracies then sites could be open to negative SEO, where someone could link toxic domains to malformed URLs on your site.

404s don’t pass authority and that’s probably a good thing. It still makes sense to optimize your 404 page so users can easily search and find content on your site.

Types of 404s

Google is quick to say that 404s are natural and not to obsess about them. On the other hand, they’ve never quite said that 404s don’t matter. The 2011 Google post on 404s is strangely convoluted on the subject.

The last line of the first answer seems to be definitive but why not answer the question simply? I believe it’s because there’s a bit of nuance involved. And most people suck at nuance.

While the status code remains the same there are different varieties of 404s: external, outgoing and internal. These are my own naming conventions so I’ll make it clear in this post what I mean by each.

Because some 404s are harmless and others are downright dangerous.

External 404s

External 404s occur when someone else is linking to a broken page on your site. Even here, there is a small difference since there can be times when the content has legitimately been removed and other times when someone is linking improperly.

External 404 Diagram

Back in the day many SEOs recommended that you 301 all of your 404s so you could reclaim all the link authority. This is a terrible idea. I have to think Google looks for sites that employ 301s but have no 404s. In short, a site with no 404s is a red flag.

A request for should return a 404. Of course, if you know someone is linking to a page incorrectly, you can apply a 301 redirect to get them to the right page, which benefits both the user and the site’s authority.

External 404s don’t bother me a great deal. But it’s smart to periodically look to ensure that you’re capturing link equity by turning the appropriate 404s into 301s.

Outgoing 404s

Outgoing 404s occur when a link from your site to another site breaks and returns a 404. Because we know how often links evaporate this isn’t uncommon.

Outgoing 404 Diagram

Google would be crazy to penalize sites that link to 404 pages. Mind you, it’s about scale to a certain degree. If 100% of the external links on a site were going to 404 pages then perhaps Google (and users) would think differently about that site.

They could also be looking at the age of the link and making a determination on that as well. Or perhaps it’s fine as long as Google saw that the link was at one time a 200 and is now a 404.

Overall these are the least concerning of 404 errors. It’s still a good idea, from a user experience perspective, to find those outgoing 404s in your content and remove or fix the link.

Internal 404s

The last type of 404 is an internal 404. This occurs when the site itself is linking to another ‘not found’ page on their own site. In my experience, internal 404s are very bad news.

Internal 404 Diagram

Over the past two years I’ve worked on squashing internal 404s for a number of large clients. In each instance I believe that removing these internal 404s had a positive impact on rankings.

Of course, that’s hard to prove given all the other things going on with the site, with competitors and with Google’s algorithm. But all things being equal eliminating internal 404s seems to be a powerful piece of the puzzle.

Why Internal 404s Matter

If I’m Google I might look at the number of internal 404s as a way to determine whether the site is well cared for and has an attention to detail.

Does a high-quality site have a lot of internal 404s? Unlikely.

Taken a step further, could Google determine that the odds of a user encountering a 404 on a site and then use that to demote sites from search? I think it’s plausible. Google doesn’t want their users having a poor experience so they might steer folks away from a site they know has a high probability of ending in a dead end.

That leads me to think about the user experience when encountering one of these internal 404s. When a user hits one of these they blame the site and are far more likely to leave the site and return to the search results to find a better result for their query. This type of pogosticking is clearly a negative signal.

Internal 404s piss off users.

The psychology is different with an outgoing 404. I believe most users don’t blame the site for these but the target of the link instead. There’s likely some shared blame, but the rate of pogosticking shouldn’t be as high.

In my experience internal 404s are generally caused by bugs and absolutely degrade the user experience.

Finding Internal 404s

You can find 404s using Screaming Frog or Google Search Console. I’ll focus on Google Search Console here because I often wind up finding patterns of internal 404s this way.

In Search Console you’ll navigate to Crawl and select Crawl Errors.

404s in Google Search Console

At that point you’ll select the ‘Not found’ tab to find the list of 404s Google has identified. Click on one of these URLs and you get a pop-up where you can select the ‘Linked from’ tab.

Linked from Details on 404 Error

I was actually trying to get Google to recognize another internal 404 but they haven’t found it yet. Thankfully I muffed a link in one of my posts and the result looks like an internal 404.

Malformed Link Causes Internal 404

What you’re looking for are instances where your own site appears in the ‘Linked from’ section. On larger sites it can be easy to spot a bug that produces these types of errors by just checking a handful of these URLs.

In this case I’ll just edit the malformed link and everything will work again. It’s usually not that easy. Most often I’m filing tickets in a client’s project tracking system and making engineers groan.

Correlation vs Causation

Not This Again

Some of you are probably shrieking that internal 404s aren’t the problem and that Google has been clear on this issue and that it’s something else that’s making the difference. #somebodyiswrongontheinternet

You’re right and … I don’t care.

You know why I don’t care? Every time I clean up internal 404s, it produces results. I’m not particularly concerned about exactly why it works. Mind you, from an academic perspective I’m intrigued but from a consulting perspective I’m not.

In addition, if you’re in the new ‘user experience optimization’ camp, then eliminating internal 404s fits very nicely, doesn’t it? So is it the actual internal 404s that matter or the behavior of users once they are eliminated that matters or something else entirely? I don’t know.

Not knowing why eliminating internal 404s works isn’t going to stop me from doing it.

This is particularly true since 404 maintenance is entirely in our control. That doesn’t happen much in this industry. It’s shocking how many people ignore 404s that are staring them right in the face. Whether it’s not looking at Google Search Console or not tracking down the 404s that crop up in weblog reports or deep crawls.

Make it a habit to check and resolve your Not found errors via Search Console or Screaming Frog.


404 errors themselves may not directly hurt SEO, but they can indirectly. In particular, internal 404s can quietly tank your efforts, creating a poor user experience that leads to a low-quality perception and pogosticking behavior.

Acquisition SEO and Business Crowding

January 20 2016 // SEO // 25 Comments

There’s an old saying that if you can’t beat ’em, join ’em. But in search, that saying is often turning into something different.

If you can’t beat ’em, buy ’em.

Acquisition search engine optimization is happening more often as companies acquire or merge, effectively taking over shelf space on search results. Why settle for having the top result on an important term when I can have the first and second result?

They Live Movie Scene

Should this trend continue you could find search results where only a handful of companies are represented on the first page. That undermines search diversity, one of the fundamentals of Google’s algorithm.

This type of ‘business crowding’ creates false choice and is vastly more dangerous than the purported dread brought on by a filter bubble.

Acquisition SEO

SEO stands for search engine optimization. Generally, that’s meant to convey the idea that you’re working on getting a site to be visible and rank well in search engines.

However, you might see diminishing returns when you’re near the top of the results in important query classes. Maybe the battle with a competitor for those top slots is so close that the effort to move the needle is essentially ROI negative.

In these instances, more and more often, the way to increase search engine traffic, and continue on a growth trajectory, is through an acquisition.

Example of Acquisition SEO

That’s not to say that Zillow or Trulia is doing anything wrong. But it brings up a lot of thorny questions.

Search Shelf Space

Gary Larson False Choice Cartoon

About seven years ago I had an opportunity to see acquisition SEO up close and personal. acquired Gilbert Guide and suddenly we had two results on the first page for an important query class in the senior housing space.

It’s hard not to get Montgomery Burns at that point and look at how you can dominate search results by having two sites. All roads lead to Rome as they say.

I could even rationalize that the inventory provided on each platform was different. A venn diagram would show a substantial overlap but there was plenty of non-shaded areas.

But who wants to maintain two sets of inventory? That’s a lot of operational and technical overhead. Soon you figure out that it’s probably better to have one set of inventory and syndicate it across both sites. Cost reduction and efficiency are powerful business tenets.

At that point the sites are, essentially, the same. They offer the same content (the inventory of senior housing options) but with different wrappers. It idea was awesome but also made my stomach hurt.

(Please note that this is not how these two sites are configured today.)

Host Crowding

Manspreading Example

The funny thing is that if I’d tried to do this with a subdomain on I’d have run afoul of something Google calls host crowding.

Matt Cutts wrote about this back in 2007 in a post about subdomains and subdirectories.

For several years Google has used something called “host crowding,” which means that Google will show up to two results from each hostname/subdomain of a domain name. That approach works very well to show 1-2 results from a subdomain, but we did hear complaints that for some types of searches (e.g. esoteric or long-tail searches), Google could return a search page with lots of results all from one domain. In the last few weeks we changed our algorithms to make that less likely to happen.

In essence, you shouldn’t be able to crowd out competitors on a search result through the use of multiple subdomains. Now, host crowding or clustering as it’s sometimes called has seen an ebb and flow over time.

In 2010 Google loosened host crowding constraints when a domain was included in the query.

For queries that indicate a strong user interest in a particular domain, like [exhibitions at amnh], we’ll now show more results from the relevant site.

At the time Google showed 7 from Today they show 9.

In 2012 they tweaked things again to improve diversity but that didn’t make much of a dent and Matt was again talking about changes to host clustering in 2013. I think a good deal of the feedback was around the domination of Yelp.

I know I was complaining. My test Yelp query is [haircut concord ca], which currently returns 6 results from Yelp. (It’s 8 if you add the ‘&filter=0’ parameter on the end of the URL.)

I still maintain that this is not useful and that it would be far better to show fewer results from Yelp and/or place many of those Yelp results as sitelinks under one canonical Yelp result.

But I digress.

Business Crowding

Freedom of Choice by Devo

The problem here is that acquisition SEO doesn’t violate host crowding in the strict sense. The sites are on completely different domains. So a traditional host crowding algorithm wouldn’t group or cluster those sites together.

But make no mistake, the result is essentially the same. Except this time it’s not the same site. It’s the same business.

Business crowding is the advanced form of host crowding.

It can actually be worse since you could be getting the same content delivered from the same company under different domains.

The diversity of that result goes down and users probably don’t realize it.

Doorway Pages

When you think about it, business crowding essentially meets the definition of a doorway page.

Doorways are sites or pages created to rank highly for specific search queries. They are bad for users because they can lead to multiple similar pages in user search results, where each result ends up taking the user to essentially the same destination.

When participating in business crowding you do have similar pages in search results where the user is taken to the same content. It’s not the same destination but the net result is essentially the same. One of the examples cited lends more credence to this idea.

Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page

In business crowding you certainly have multiple domain names but there’s no funnel necessary. The content is effectively the same on those multiple domains.

Business crowding doesn’t meet the letter of the doorway page guidelines but it seems to meet the spirt of them.

Where To Draw The Line?

Fry Not Sure If ...

This isn’t a cut and dry issue. There’s quite a bit of nuance involved if you were to address business crowding. Lets take my example above from Caring.

If the inventory of Caring and Gilbert Guide were never syndicated, would that exempt them from business crowding? If the inventories became very similar over time, would it still be okay?

In essence, if the other company is run independently, then perhaps you can continue to take up search shelf space.

But what prevents a company from doing this multiple times and owning 3, 4 or even 5 sites ranking on the first page for a search result? Even if they’re independently run, over time it will make it more difficult for others to disrupt that space since the incumbents have no real motivation to improve.

With so many properties they’re very happy with the status quo and are likely not too concerned with any one site’s position in search as long as the group of sites continues to squeeze out the competition.

Perhaps you could determine if the functionality and features of the sites was materially different. But that would be pretty darn difficult to do algorithmically.

Or is it simply time based? You get to have multiple domains and participate in business crowding for up to, say, one year after the acquistion. That would be relatively straight-forward but would have a tremendous impact on the mergers and acquisitions space.

If Zillow knew that they could only count on the traffic from Trulia for one year after the acquisition they probably wouldn’t have paid $3.5 billion (yes that’s a ‘b’) for Trulia. In fact, the deal might not have gotten done at all.

So when we start talking about addressing this problem it spills out of search and into finance pretty quickly.

What’s Good For The User?

At the end of the day Google wants to do what is best for the user. Some of this is altruistic. Trust me, if you talk to some of the folks on Google’s search quality team, they’re serious about this. But obviously if the user is happy then they return to Google and perform more searches that wind up padding Google’s profits.

Doing good by the user is doing good for the business.

My guess is that most users don’t realize that business crowding is taking place. They may pogostick from one site to the other and wind up satisfied, even if those sites are owned by the same company. In other words, search results with business crowding may wind up producing good long click and time to long click metrics.

It sounds like an environment ripe for a local maxima.

If business crowding were eliminated then users would see more options. While some of the metrics might deteriorate in the short-term would they improve long-term as new entrants in those verticals provided value and innovation?

There’s only one way to find out.

Vacation Rentals

One area where this is currently happening is within the vacation rentals space.

Business Crowding Example

In this instance two companies (TripAdvisor and HomeAway) own the first six results across five domains. This happens relatively consistently in this vertical. (Please note that I do have a dog in this fight. Airbnb is a client.)

Are these sites materially different? Not really. HomeAway makes the syndication of your listing a selling point.

HomeAway Business Crowding

Not only that but if you look at individual listings on these sites you find that there are rel=canonicals in place.

Rel Canonical to VRBO

Rel Canonical to VRBO

In this instance the property listings on VacationRentals and HomeAway point to the one on VRBO.

The way the inventory is sorted on each of these sites is different but it doesn’t seem like the inventory itself is all that different at the end of the day.

TripAdvisor doesn’t do anything with canonicals but they do promote syndication as a feature.

Syndication Selling Point on Vacation Home Rentals

A venn diagram of inventory between TripAdvisor properties would likely show a material overlap but with good portions unshaded. They seem to have a core set of inventory that is on all properties but aren’t as aggressive with full on syndication.

Let me be clear here. I don’t blame these companies for doing what they’re doing. It’s smart SEO and it’s winning within the confines of Google’s current webmaster guidelines.

My question is whether business crowding is something that should be addressed? What happens if this practice flourishes?

Is the false choice being offered to users ultimately detrimental to users and, by proxy, to Google?

The Mid-Life Crisis of Search Results

Thoughtful Cat is Thoughtful

Search hasn’t been around for that long in the scheme of things. As the Internet evolved we saw empires rise and fall as new sites, companies and business models found success.

Maybe you remember Geocities or Gator or Lycos or AltaVista or Friendster. Now, none of these fall into the inventory based sites I’ve referenced above but I use them as proxies. When it comes to social, whether you’re on Facebook or Instragram or WhatsApp, one company is still in control there.

Successful companies today are able to simply buy competitors and upstarts to solidify their position. Look no further than online travel agencies where Expedia now owns both Travelocity and Orbitz.

The days in which successful sites could rise and fall – and I mean truly fall – seem to be behind us.

The question is whether search results should reflect and reinforce this fact or if it should instead continue to reflect diversity. It seems like search is at a crossroads of sorts as the businesses that populate results have matured.

Can It Be Addressed Algorithmically?

The next question that comes to mind is whether Google could actually do anything about business crowding. We know Google isn’t going to do anything manual in nature. They’d want to implement something that dealt with this from an algorithmic perspective.

I think there’s a fairly straight forward way Google could do this via the Knowledge Graph. Each business is an entity and it would be relatively easy to map the relationship between each site as a parent child relationship.

Some of this can be seen in the remnants of Freebase and their scrape of CrocTail, though the data probably needs more massaging. But it’s certainly possible to create and maintain these relationships within the Knowledge Graph.

Once done, you can attach a parent company to each site and apply the same sort of host crowding algorithm to business crowding. This doesn’t seem that farfetched.

But the reality of implementing this could have serious implications and draw the ire of a number of major corporations. And if users really don’t know that it’s all essentially the same content I’m not sure Google has the impetus to do anything about it.

Too Big To Fail (at Search)

Having made these acquisitions under the current guidelines, could Google effectively reduce business crowding without creating a financial meltdown for large corporate players.

Organic Traffic via Similar Web for Trulia

SimilarWeb shows that Trulia gets a little over half of its traffic from organic search. Any drastic change to that channel would be a material event for the parent company.

Others I’ve mentioned in this post are less dependent on organic search to certain degrees but a business crowding algorithm would certainly be a bitter pill to swallow for most.

Selfishly, I’d like to see business crowding addressed because it would help one of my clients, Airbnb, to some degree. They’d move up a spot or two and gain additional exposure and traffic.

But there’s a bigger picture here. False diversity is creeping into search. If you extrapolate this trend search results become little more than a corporate shell game.

On the other hand, addressing business crowding could dramatically change the way sites deal with competitors and how they approach mergers and acquisitions. I can’t predict how that would play out in the short or long-term.

What do you think? I’m genuinely interested in hearing your thoughts on this topic so please jump in with your comments.