You Are Browsing The SEO Category

Are You Winning The Attention Auction?

January 20 2014 // Marketing + SEO + Social Media // 23 Comments

Every waking minute of every day we choose to do one thing or another.

For a long time we didn't have many choices. Hunt the mammoths or mind the fire. Read the bible or tend the crops. I can remember when we only got six television stations on an old black and white TV.

But as technology advances we're afforded more choices more often.

Freedom of Choice by Devo

We can decide to talk about the weather with the person next to us in the doctor's waiting room or stare into our phone and chuckle at a stupid BuzzFeed article. We can focus on that Excel spreadsheet or we can scroll through our Facebook feed.

You can sit on the couch and watch The Blacklist or you can sit on that same couch and read Gridlinked by Neal Asher on a Kindle. You could go out and play tennis or you could go out and play Ingress and hack some portals.

I was going to overwhelm you with statistics that showed how many choices we have in today's digital society, such as the fact that the typical email subscriber gets 416 commercial emails every month. That's more than 10 a day!

I could go on and on because there's a litany of surveys and data that tell the same story. But ... we all know this from experience. We live and breath it every day.

We all choose to look, hear and do only so many things. Because there are only so many hours in each day.

Our time and attention is becoming our most valued resource. (Frankly, we should really guard it far more fiercely than we do.) As marketers we must understand and adapt to this evolving environment. But ... it's not new.

The Attention Auction

Content Doge Meme

There's always been an auction on attention. That critical point in time where people decide to give their attention to one thing over the other.

Recently, there's been quite a kerfluffle over the idea of content shock. That there's too much content. There are some interesting points in that debate but I tend to believe the number of times content comes up in the auction has increased quite a bit. We consume far more content due to ubiquitous access.

Sure there's more content vying for attention. But there are more opportunities to engage and a large amount of content never comes up in the auction because of poor quality or mismatched interest.

There are hundreds of TV channels but really only a handful that are contextually relevant to you at any given time. Even if there are 68 sports channels the odds that you are in the mood to watch sports and that there will be something on each of those stations at the same time that you want to watch is very small. If you're looking to watch NFL Football then Women's College Badminton isn't really an option.

More importantly, I believe that we've adapted to the influx of content. It's knowing how we've adapted that can help marketers win the attention auction more often.

We Are Internet Old!

Sample Geocities Page

Adolescents often do very reckless things. They run red lights. They engage in binge drinking. They have unprotected sex. While some point to brain development as the cause (and there's some truth to that), I tend to believe Dr. Valerie Reyna has it right.

The researchers found that while adults scarcely think about engaging in many high-risk behaviors because they intuitively grasp the risks, adolescents take the time to mull over the risks and benefits.

It's not that adolescents don't weigh the pros and cons. They do and actually overestimate the potential cons. But despite that, they choose to play the odds and risk it more often than adults. In large part, this can be attributed to less life experience. They've had fewer opportunities to land on the proverbial whammy.

As we grow older we actually think less about many decisions because we have more experience and we can make what is referred to as 'gist' decisions. From my perspective it simply means we grok the general idea and can quickly say yea or nay.

So what does any of this have to do with the Internet, attention or content?

When it comes to consuming digital content, we're old. We've had plenty of opportunities to experience all sorts of content to the point where we don't have to think too hard about whether we're going to click or not. If it fits a certain pattern we have a certain response.

Nigerian Email Scam

Nay! A thousand times nay.

The vast majority of content being produced is, to put it bluntly, crap. Technology has a lot to do with this. It is both easy and free to create content in written or visual formats. From WordPress to Tumblr to Instagram, nearly anyone can add to the content tidal wave.

Of course, the popularity of 'content marketing' has increased the number of bland, "me too" articles, not to mention the eyesore round-up posts that are a simulacrum of true curation.

People have wasted too much time and attention on shitty content. The result? We're making decisions faster and faster by relying on those past experiences.

We create internal shortcuts in our mind for what is good or bad. It's a shortcut that protects us from wasting our time and attention, but may also prevent us from finding new legitimate content. So how do we address this cognitive shortcut? How do you win the attention auction?

You can ensure that you fit that shortcut and you can add yourself to that shortcut.

Fit The Shortcut

Getting Attention

Purple Goldfish

Fitting the shortcut is simple to say, but often difficult to execute. Make sure that, at a glance, you get the attention of your user. There are plenty of ways to do this from writing good titles to using appropriate images to leveraging social sharing.

When '1-800 service' pops up on caller ID you're probably making a snap decision that it's a telemarketer and you'll ignore the call. When it's the name of your doctor or someone from your family you pick up the phone. This same type of process happens on nearly all social platforms as people scan feeds on Twitter, Google+ and Facebook.

Recently Facebook even admitted to the issues revolving around feed consumption.

The fact that less and less of brands' content will surface is described as a result of increased competition for limited space, since "content that is eligible to be shown in news feed is increasing at a faster rate than people's ability to consume it."

Now this is a bit disingenuous since Facebook is crowding out legitimate content for ads (a whole lot of ads) but the essence of this statement is true. Not only that but your content is at a disadvantage on Facebook since much of the content is personal in nature. Cute pictures of your cousin's kids are going to trump and squeeze out content from brands.

So with what space you're left with on these platforms, you better make certain it has the best chance of getting noticed and fitting that shortcut. The thing is, too many still don't do what's necessary to give their content the best chance of success.

If you're not optimizing your social snippet you're shooting your content in the foot.

Be sure your title is compelling, that you have an eye catching image, that the description is (at a minimum) readable and at best engages and entices. Of course, none of this matters unless that content finds its way to social platforms.

Make sure you're encouraging social sharing. Don't make me hunt down where you put the sharing options or jump through hoops once I get there.

Ensure your content is optimized for both social and search. And when you're doing the latter rely on user centric syntax and intent to guide your optimization efforts.

Your job is to fit into that cognitive shortcut by making it easy for users to see and understand your content in the shortest amount of time possible.

Keeping Attention

Bored One Ear To Death LOLcat

Getting them to your content is the first step in winning their attention. At that point they're giving you the opportunity to take up more of their time and attention. They made a choice but they're going to be looking to confirm whether it was a good one with almost the same amount of speed.

When you land on a new website you instantly (perhaps unconsciously) make a decision about the quality and authority of that site and whether you'll stick around.

A websites’ first impression is known to be a crucial moment for capturing the users interest. Within a fraction of time, people build a first visceral “gut feeling” that helps them to decide whether they are going to stay at this place or continue surfing to other sites. Research in this area has been mainly stimulated by a study of Lindgaard et al. (2006), where the authors were able to show that people are able to form stable attractiveness judgments of website screenshots within 50 milliseconds.

That's from a joint research paper from the University of Basel and Google Switzerland about the role of visual complexity and prototypicality regarding first impression of websites (pdf).

Once they get to the content you need to ensure they instantly get positive reinforcement. Because at the same time there are other pieces of content, other things, battling for attention.

Grumpy Cat Nope

So if they don't instantly see what they're looking for you're giving them a reason to say nope. If what they see on that page looks difficult to read. Nope. If they see grammatical errors. Nope. If they feel the site is spammy looking. Nope.

There is a drum beat of research, examples and terms that underscore the importance of reducing friction.

Books On Reducing Friction

Call it cognitive fluency or cognitive ease, either way we seek out things that are familiar and look like we expect. Books such as Barry Schwartz's Paradox of Choice and Steve Krug's Don't Make Me Think make it clear that too many choices reduce action and satisfaction. And we should all internalize the fact that the majority of people don't read but instead skim articles.

That doesn't mean that the actual content has to suffer. I still write what are considered long-form posts but format them in ways that allow people to get meaning from them without having to read them word for word.

Do I hope they're poring over every sentence? Absolutely! I'm passionate about my writing and writing in general. But I'm a realist and would prefer that more people learn or take something from my writing than have a select few read every word and laud me for sentence construction.

I still point people to my post on readability as a way to get started down this road. Make no mistake, those who optimize for readability will succeed (even with lesser content) than those that refuse to do so out of ego or other rationalizations (I'm looking at you Google blogs).

I will shout in the face of the next person who whines that they shouldn't have to use an image in their post or that they only want people who are 'serious about the subject' to read their article. Wake up before you're the Geocities of the Internet.

Tomato

The one thing I do know is that being authentic and having a personality can help you stand out. It can help you to at least get and retain attention and sometimes even become memorable. Here's a bit of writing advice from Charles Stross.

Third and final piece of advice: never commit to writing something at novel length that you aren't at least halfway in love with. Because if you're phoning it in, your readers will spot it and throw rotten tomatoes at you. And because there's no doom for a creative artist that's as dismal as being chained to a treadmill and forced to play a tune they secretly hate for the rest of their working lives.

The emphasis is mine. Don't. Phone. It. In.

Add To The Shortcut

Using Attention

Dude Where's My Car?

When you do get someone's attention, what are you doing with it? You want them to add your site, product or brand to that cognitive shortcut. So the next time a piece of that content comes up in the attention auction you've got the inside track. They recognize it and select it intuitively.

For instance, every time I see something new from Matthew Inman of The Oatmeal, I give it my attention. He's delivered quality and memorable content enough times that he doesn't have to fight so hard for my attention. I have a preconceived notion of quality that I bring to each successive interaction with his content.

Welcome to branding 101.

Consistently creating positive and memorable interactions (across multiple channels) will cause users to associate your site, product or brand as being worthy of attention.

Let me be more explicit about that term 'interactions'. Every time you're up in the attention auction counts as an interaction. So if I choose to pass on reading your content, that counts and not in a good way. We're creatures of habit so the more times I pass on something the more likely I am to continue passing on it.

Add to that the perception (or reality) that we have less time per piece of content and each opportunity you have to get in front of a user is critical.

Now, if I actually get someone to share a piece of content, will it be presented in a way that will win the attention auction? If it isn't not only have I squandered that user action but I may have created a disincentive for sharing in the future. If I share something and no one gives me a virtual high five of thanks for doing so will I continue to share content from that source?

Poor social snippet optimization is like putting a kick-me sign on your user's back.

Memorable

Make A Short Cut

If you want to be added to that cognitive shortcut you need to make it easy for them to do so. You need them to remember and remember in the 'right' way.

I've read quite a bit lately about ensuring your content is useful. I find this bit of advice exceedingly dull. I mean, are you creating content to be useless? I'm sure content spammers might but by in large most aren't. Not only that but there's plenty of great content that isn't traditionally useful unless you count tickling the funny bone as useful.

Of course you've also probably read about how tapping into emotion can propel your content to the top! Well, there's some truth to that but that's often at odds with being useful such as creating a handy bookmarklet or a tutorial on Excel. I suppose you could link it to frustration but you're not going to have some Dove soap tear-jerker piece mashed up with Excel functions. Even Annie Cushing can't pull that off.

Story telling is also a fantastic device but it's not a silver bullet either. Mind you, I think it has a better chance than most but even then you're really retaining attention instead of increasing memory.

Cocktail Party

You have to make your content cocktail party ready. Your content has to roll off the tongue in conversation.

I read this piece on Global Warming in The New York Times.

I heard this song by Katy Perry about believing in yourself.

I saw this funny ad where Will Ferrell tosses eggs at a Dodge.

Seriously, when you're done with a piece of your content, describe it to someone out loud in one sentence. That's what it'll be reduced to for the most part.

As humans we categorize or tag things so we can easily recall them. I think the scientific term here is 'coding' of information. If we can't easily do so it's tough for us to talk about them again, much less find them again. As an aside, re-finding content is something we do far more often than we realize and is something Google continues to try to solve.

Even when we can easily categorize and file away that bit of information, we're not divvying it up into a very fine structure. Only the highlights make it into memory. We only take a few things from the source information. A sort of whisper down the lane effect takes place. You suddenly don't remember who wrote it, or where you saw it.

We're trying to optimize the ability to recall that information by using the right coding structure, one that we'll be able to remember.

Shh Armpit

It's the reason you need to be careful about if or how you go about guest blogging. This is also why I generally despise (strong word I know) Infographics. Because more often than not if you hear someone refer to one they say 'That Infographic on Water Conservation' or 'That Infographic on The History of Beer'.

Guess what, they have no clue where they saw it or what brand it represents. Seriously. Because usually the only two things remembered are the format (Infographic) and the topic. When I ask people to name the brands behind Infographics I usually get two responses: Mint and OK Cupid. Kudos to them but a big raspberry for the rest of you.

"But the links" I hear some of you moan. Stop. Stop it right now! That lame ass link (no don't tell me about the DA number) is nothing compared to the attention you just squandered.

I'm not saying that Infographics can't work, but they have to be done thoughtfully, for the right reasons and to support your brand. Okay, rant over.

Ensuring people walk away with a concise meaning increases satisfaction. And getting them to repeat it to someone else helps secure your content in memory. The act of sharing helps add your site or brand to that user's shortcut.

If there were a formula you could follow that would guarantee great content, why is there so much crap? If we all knew what makes a hit song or a hit movie why isn't every song and film a success? This isn't easy and anyone telling you different is lying.

Consistent

Janet Jackson

You can also add to the shortcut by creating an expectation. This can be around the quality of your content but that's pretty tough to execute on. I mean, I completely failed at generating enough blog content last year. I'm not advocating a paint-by-numbers schedule, but I had more to say and at some point if you're name isn't out there they begin to forget you.

There's a fair amount of research that shows that memory is a new mapping of neurons and that the path becomes stronger with repeated exposure. You inherently know this by studying. The more you study the more you remember.

But what if the memory of your site or brand, that path you're creating in your user's mind, isn't clear. What if the first time you associate the brand with one thing and the next time it's not quite that thing you thought it was. Or that the time between exposures is so great that you can't find that path anymore and inadvertently create a new path. How many times have you saved something only to realize you already saved it at some point in the past?

Now, I'm out there in other ways. I keep my Twitter feed going with what I hope is a great source of curated content across a number of industries. My Google+ feed is full of the same plus a whole bunch of other content that serves as a sort of juxtaposition to the industry specific content.

One of the more successful endeavors on Google+ is my #ididnotwakeupin series where I share photos from places around the world. It's a way for me to vicariously travel. So every morning for more than two years I've posted a photo tagged with #ididnotwakeupin.

The series gets a decent amount of engagement and if I tried harder (i.e. - interacted with other travel and photography folks) I'm pretty sure I could turn it into something bigger. I even had an idea of turning it into a coffee table book. I haven't though. Why? Because there's only so much time in every day. See what I did there?

Another example of this is Moz's Whiteboard Friday series. You aren't even sure what the topic is going to be but over time people expect it to be good so they tune in.

Or there's Daily Grace It's Grace on YouTube where people expect and get a new video from Grace Helbig every Monday through Friday. Want to double-down on consistent? Tell me what phrase you remember after watching this video from Grace (might be NSFW depending on your sensitivity).

Very ... yeah, you know.

That's right. Repetition isn't a bad thing. The mere exposure effect demonstrates that the more times we're exposed to something the better chance we'll wind up liking it. This is what so many digital marketing gurus don't want you to hear.

Saturation marketing (still) works because more exposure equals familiarity which improves cognitive fluency which makes it easier to remember.

It's sort of like the chorus in a song, right? Maybe you don't know all the words to each verse but you know the chorus! Particularly if you can't get away from hearing it on the radio every 38 minutes.

In some ways, the number of exposures necessary is inversely proportional to the quality of the content. Great content or ads don't need much repetition but for me to know that it's JanuANY at Subway this month might take a while.

Climbing Mount Diablo

And the biggest mistake I see people make is stopping. "We blogged for a few months and saw some progress but not enough to keep investing in it." This is like stopping your new diet and exercise regimen because you only lost 6 pounds.

You always have to be out there securing and reinforcing your brand as a cognitive shortcut.

Does Pepsi decide that they just don't need to do any more advertising? Everyone knows about Pepsi so why spend a billion dollars each year marketing it? You just can't coast. Well, you can, but you're taking a huge risk. Because someone or something else might fill the void. (Note to self, I need to take this advice.)

Shared

Everywhere

The act of sharing content likely means it will be remembered. To me it's almost like having to describe that content in your head again as you share it. You have that small moment where you have to ask questions about what you're sharing, with who and why it's interesting.

So sharing isn't just about getting your content in front of other people it's helping to cement your content in the mind of that user.

Of course, having the same piece of content float in front of your face a number of times from different sources helps tremendously. Not only are you hitting on the mere exposure effect you're also introducing some social proof to the equation.

To me the goal isn't really to 'go viral' but to increase the number of times I'm winning the attention auction by getting there more often with an endorsement.

You might not click on that 'What City Should You Actually Live In?' quiz on Facebook the first time but after four people have posted their answers you just might cave and click through. (Barcelona by they way.)

Examples

Breaking Bad

Walt and Jessie Suited Up on The Couch Eating

How did Breaking Bad become such a huge hit? It wasn't when it first started out. I didn't watch the first two seasons live.

But enough people did and AMC kept the faith and kept going. Because enough people were talking about it. It was easy to talk about too. "This show where a chemistry teacher becomes a meth dealer." Bonus points that the plot made it stand out from anything else on TV.

And then you figured out that you could watch it on Netflix! People gave it a try. Then they began to binge watch seasons and they were converts. They wanted more. MOAR!

Of course none of it would have happened if it weren't a great show. But Breaking Bad was also consistent, persistent, memorable and available.

BuzzFeed

BuzzFeed Logo

I know what you're thinking. BuzzFeed? Come on, their content sucks! And for the most part I'd have to agree. But it's sort of a guilty pleasure isn't it?

Here's why I think BuzzFeed works. You've found yourself on a BuzzFeed 'article' a number of times. It's not high quality in most senses of the word but it does often entertain. Not only that it does so very quickly.

If I'm 'reading' the 25 Times Anna Kendrick Was Painfully Accurate post I'm only scrolling through briefly and I do get a chuckle or two out of it. This has happened enough times that I know what to expect from BuzzFeed.

I've created a cognitive shortcut that tells me that I can safely click-through on a BuzzFeed post because I'll get a quick laugh out of it. They entertain and they respect my time. For my wife that same function is filled by Happy Place.

Blind Five Year Old

Blind Five Year Old Logo

How about my site and personal brand? I've done pretty well but it took me quite a while to get there, figuring out a bunch of stuff along the way.

Seriously, I blogged in relative obscurity from 2008 to 2010. But over time the quality of my posts won over a few people. But quality wasn't enough. I also got better and better at optimizing my content for readability and for sharing.

I use a lot of images in my content. And I spend a lot of time on selecting and placing them. I still think I botched the placement of an image in my Keywords Still Matter post. And it still irks me. No, I'm not joking.

The images make it easier to read. Not only do they give people a rest, they allow me to connect on a different level. Sometimes I might be able to communicate an idea better with the help of that image. It helps to make it all click.

I use a lot of music references as images. Part of it is because I like music but part of it is because if you're suddenly singing that song in your head, then you're associating my content with that song, if even just a little. When I do that I have a better chance of you remembering that content. I've helped create a tag in your mental filing system.

I try to build more ways for you to connect my content in your head.

TL;DR

We have more choices more often when it comes to content. In response to this we're protecting our time and attention by making decisions on content faster. Knowing this, marketers must work harder to fit cognitive shortcuts we've created, based on experience, for what is perceived as clickable or authoritative content.

Alternatively, the consistent delivery and visibility of memorable content can help marketers create a cognitive shortcut, giving themselves an unfair advantage when their content comes up in the attention auction.

Stop Carly Rae Content Marketing

December 17 2013 // Marketing + SEO // 9 Comments

Lately I've gotten a few too many Carly Rae content marketing emails, which makes me both sad and grouchy. This is not the way to promote content, gain fans or build a brand. Stop it.

What Is Carly Rae Content Marketing?

Carly Rae Content Marketing

The term comes from Carly Rae Jespen's popular Call Me Maybe song which contains the following lyrics.

Hey I just met you
And this is crazy
But here's my number
So call me maybe

I've changed the lyrics slightly to reflect the emails I'm increasingly receiving from folks.

Hey I just met you
And this is crazy
But here's my content
So promote me maybe

Carly Rae content marketing are out of the blue outreach emails from people you have no relationship with asking you to promote their content or engage in some other activity. In the end it's just shoddy email spam.

It's An Honor To Be Nominated?

The Oscars

I'm sure some of you are thinking that I'm ungrateful. The fact that I'm getting these emails shows that people want my endorsement. Perhaps it is better to be noticed than not but if I am some sort of influencer wouldn't you want to put your best foot forward?

First impressions matter and this one isn't going to win me over. In fact, I might remember you, your site or brand for the lousy outreach instead.

Win Over The Persnickety

I might demand a higher level of quality than others. So you could simply write me off as some anal-retentive prat with outrageous expectations and a self-inflated ego. But that would be a mistake.

Mr. Fussy

Because if you can put together a pitch that doesn't make me vomit in my mouth a little bit then you're likely going to have better luck with everyone else too. In short, win over your toughest critic and you'll have a powerful outreach message.

Content Marketing Basics

Johns

If you're doing outreach there are a few things you must get right. A recent post by Tadeusz Szewczyk about the perfect outreach message covered some of the basics. (It's not perfect in my view but it's certainly above average.)

You must be relevant, have a decent subject line, get my name right, respect my time and show that you've done some rudimentary homework about me. The sad part is that 50% of people fail to even get my name correct. Yup, somehow AJ Kohn is transformed into John. (Clicks trash icon.)

Respect My Time And Brain

Do or Do Not Dumbledore

One of the things that has bothered me lately is the number of people asking me to take time to provide feedback on their content. Feedback! Some of these people might actually want feedback but I'm going to call bullshit on the vast majority. You don't want feedback. You want me to share and promote your content.

Do you really want me to tell you that your infographic is an eyesore and then not share or promote it? Probably not. I mean, kudos if you really are open to that sort of thing but I'm guessing you're in promotion mode at this stage and you won't be asking for a redesign.

Getting me (or anyone) to do something is a high-friction event. Don't waste it asking them to do the wrong thing.

Honest Teasing

Teased Hair with Aqua Net

Being transparent about what you're trying to accomplish is almost always the best way to go. If you're looking for a link, tell them you're looking for a link. Stop beating around the bush.

I'd also argue that you should be applying marketing principles to outreach. Half the battle is getting me to actually click and read the content. So tease me! Get me interested in what you have to say. Give me a cliff-hanger! Don't put me to sleep or ask me to promote the content without reading it.

Get me interested so that I view or read your content. At that point you have to be confident that the content is good enough that I'll share and promote it. Stop trying to do everything all at once in your outreach email.

TL;DR

Stop Carly Rae content marketing! Fast and shoddy outreach might get you a handful of mentions but it won't lead to long-term success and may actually prevent it in many cases.

Google Now Topics

November 26 2013 // SEO + Technology // 16 Comments

Have you visited your Google Now Topics page? You should if you want to get a peek at how Google is translating queries into topics, which is at the core of the Hummingbird Update.

Google Now Topics

If you are in the United States and have Google Web History turned on you can go to your Google Now Topics page and see your query and click behavior turned into specific topics.

Google Now Topics Example

This is what my Google Now Topics page looked like a few weeks back. It shows specific topics that I've researched in the last day, week and month. If you're unfamiliar with this page this alone might be eye opening. But it gets even more interesting when you look at the options under each topic.

Topic Intent

The types of content offered under each topic is different.

Why is this exciting? To me it shows that Google understands the intent behind each topic. So the topic of New York City brings up 'attractions and photos' while the topic of Googlebot just brings up 'articles'. Google clearly understands that Back to the Future is a movie and that I'd want reviews for the Toyota Prius Plug-in Hybrid.

In essence, words map to a topic which in turn tells Google what type of content should most likely be returned. You can see how these topics were likely generated by looking back at Web History.

Search of Google Web History for Moto X

This part of my web history likely triggered a Moto X topic. I used the specific term 'Moto X' a number of times in a query which made it very easy to identify. (I did wind up getting the Moto X and love it.)

Tripping Google Now Topics

When I first saw this page  back in March and then again in June I wanted to start playing around with what combination of queries would produce a Google Now Topic. However, I've been so busy with client work that I never got a chance to do that until now.

Here's what I did. Logged into my Google account and using Chrome I tried the following series of queries (without clicking through on any results) at 1:30pm on November 13th.

the stranger
allentown
downeaster alexa
big shot
pressure
uptown girl
piano man

But nothing ever showed up in Google Now Topics. So I took a similar set of terms but this time engaged with the results at 8:35am on November 16th.

piano man (clicked through on Wikipedia)
uptown girl (clicked through on YouTube)
pressure (no click)
big shot (clicked through on YouTube)
the stranger lyrics (clicked through on atozlyrics, then YouTube)
scenes from an italian restaurant (no click)

Then at 9:20am a new Google Now Topic shows up!

Google Now Topic for Billy Joel Songs

Interestingly it understands that this is about music but it hasn't made a direct connection to Billy Joel. I had purposefully not used his name in the queries to see if Google Now Topics would return him as the topic instead of just songs. Maybe Google knows but I had sort of hoped to get a Billy Joel topic to render and think that might be the better result.

YouTube Categories

Engagement certainly seems to count based on my limited tests. But I couldn't help but notice the every one of the songs in that Google Now Topic was also a YouTube click. Could I get a Google Now Topic to render without a YouTube click.

The next morning I tried again with a series of queries at 7:04am.

shake it up (no click)
my best friend's girl (lyricsfreak click)
let the good times roll (click on Wikipeida, click to disambiguated song)
hello again (no click)
just what i needed (lastfm click)
tonight she comes (songmeanings click)
shake it up lyrics (azlyrics click)

At 10:04 nothing showed up so I decided to try another search.

let the good times roll (clicked on YouTube)

At 10:59 nothing showed up and I was getting antsy, which was probably not smart. I should have waited! But instead I performed another query.

the cars (clicked on knowledge graph result for Ric Ocasek)

And at 12:04 I get a new Google Now Topic.

Let The Good Times Roll Google Now Topic

I'm guessing that if I'd waited a bit longer after my YouTube click that this would have appeared, regardless of the click on the knowledge graph result. It seems that YouTube is a pretty important part of the equation. It's not the only way to generate a Google Now Topic but it's one of the faster ways to do so right now.

Perhaps it's easier to identify the topic because of the more rigid categorization on YouTube?

The Cars on YouTube

I didn't have time to do more research here but am hoping others might begin to compile a larger corpus of tests so we can tease out some conclusions.

Topic Stickiness

I got busy again and by the time I was ready to write this piece I found that my topics had changed.

New Google Now Topics

It was fairly easy to deduce why each had been produced, though the Ice Bath result could have been simply from a series of queries. But what was even more interesting was what my Google Now Topics looked like this morning.

My Google Now Topics Today

Some of my previous topics are gone! Both Ice Bath and Let The Good Times Roll are nowhere to be found. This seems to indicate that there's a depth of interaction and distance from event (time) factor involved in identifying relevant topics.

It would make sense for Google to identify intent that was more consistent from intent that was more ephemeral. I was interested in ice baths because my daughter has some plantar fascia issues. But I've never researched it before and likely (fingers crossed) won't again. So it would make sense to drop it.

There are a number of ways that Google could determine which topics are more important to a user, including frequency of searching, query chains, depth of interaction as well as type and variety of content.

Google Now Topics and Hummingbird

OMG It's Full of Stars Cat

My analysis of the Hummingbird Update focused largely on the ability to improve topic modeling through a combination of traditional text analysis natural and entity detection.

Google Now Topics looks like a Hummingbird learning lab.

Watching how queries and click behavior turn into topics (there's that word again) and what types of content are displayed for each topic is a window into Google's evolving abilities and application of entities into search results.

It may not be the full picture of what's going on but there's enough here to put a lot of paint on the canvass.

TL;DR

Google Now Topics provide a glimpse into the Hummingbird Update by showing how Google takes words, queries and behavior and turns them into topics with defined intent.

What Does The Hummingbird Say?

November 07 2013 // SEO + Technology // 27 Comments

What Does The Fox Say Video Screencap

Dog goes woof
Cat goes meow
Bird goes tweet
and mouse goes squeak

Cow goes moo
Frog goes croak
and the elephant goes toot

Ducks say quack
and fish go blub
and the seal goes ow ow ow ow ow

But theres one sound
That no one knows
What does the hummingbird say?

What Does The Hummingbird Say?

For the last month or so the search industry has been trying to figure out Google's new Hummingbird update. What is it? How does it work? How should you react.

There's been a handful of good posts on Hummingbird including those by Danny SullivanBill Slawski, Gianluca Fiorelli, Eric Enge (featuring Danny Sullivan), Ammon Johns and Aaron Bradley. I suggest you read all of these given the chance.

I share many of the views expressed in the referenced posts but with some variations and additions, which is the genesis of this post.

Entities, Entities, Entities

Are you sick of hearing about entities yet? You probably are but you should get used to it because they're here to stay in a big way. Entities are at the heart of Hummingbird if you parse statements from Amit Singhal.

We now get that the words in the search box are real world people, places and things, and not just strings to be managed on a web page.

Long story short, Google is beginning to understand the meaning behind words and not just the words themselves. And in August 2013 Google published something specifically on this topic in relation to an open source toolkit called word2vec, which is short for word to vector.

Word2vec uses distributed representations of text to capture similarities among concepts. For example, it understands that Paris and France are related the same way Berlin and Germany are (capital and country), and not the same way Madrid and Italy are. This chart shows how well it can learn the concept of capital cities, just by reading lots of news articles -- with no human supervision:

Example of Getting Meaning Behind Words

So that's pretty cool isn't it? It gets even cooler when you think about how these words are actually places that have a tremendous amount of metadata surrounding them.

Topic Modeling

It's my belief that the place where Hummingbird has had the most impact is in the topic modeling of sites and documents. We already know that Google is aggressively parsing documents and extracting entities.

When you type in a search query -- perhaps Plato -- are you interested in the string of letters you typed? Or the concept or entity represented by that string? But knowing that the string represents something real and meaningful only gets you so far in computational linguistics or information retrieval -- you have to know what the string actually refers to. The Knowledge Graph and Freebase are databases of things, not strings, and references to them let you operate in the realm of concepts and entities rather than strings and n-grams.

Reading this I think it becomes clear that once those entities are extracted Google is then performing a lookup on an entity database(s) and learning about what that entity means. In particular Google wants to know what topic/concept/subject to which that entity is connected.

Google seems to be pretty focused on that if you look at the Freebase home page today.

Freebase Topic Count

Tamar Yehoshua, VP of Search, also said as much during the Google Search Turns 15 event.

So the Knowledge Graph is great at letting you explore topics and sets of topics.

One of the examples she used was the search for impressionistic artists. Google returned a list of artists and allowed you to navigate to different genres like cubists. It's clear that Google is relating specific entities, artists in this case, to a concept or topic like impressionist artists, and further up to a parent topic of art.

Do you think that having those entities on a page might then help Google better understand what the topic of that page is about? You better believe it.

Based on client data I think that the May 2013 Phantom Update was the first application of a combined topic model (aka Hummingbird). Two weeks later it was rolled back and then later reapplied with some adjustments.

Hummingbird refined the topic modeling of sites and pages that are essential to delivering relevant results.

Strings AND Things

Hybrid Car

This doesn't mean that text based analysis has gone the way of the do-do bird. First off, Google still needs text to identify entities. Anyone who thinks that keywords (or perhaps it's easier to call them subjects) in text isn't meaningful is missing the boat.

In almost all cases you don't have as much labeled data as you'd really like.

That's a quote from a great interview with Jeff Dean and while I'm taking the meaning of labeled data out of context I think it makes sense here. Writing properly (using nouns and subjects) will help Google to assign labels to your documents. In other words, make it easy for Google to know what you're talking about.

Google can still infer a lot about what that page is about and return it for appropriate queries by using natural language processing and machine learning techniques. But now they've been able to extract entities, understand the topics to which they refer and then feed that back into the topic model. So in some ways I think Hummingbird allows for a type of recursive topic modeling effort to take place.

If we use the engine metaphor favored by Amit and Danny, Hummingbird is a hybrid engine instead of a combustion or electric only engine.

From Caffeine to Hummingbird

Electrical Outlet with USB and Normal Sockets

One of the head scratching parts of the announcement was the comparison of Hummingbird to Caffeine. The latter was a huge change in the way that Google crawled and indexed data. In large part Caffeine was about the implementation of Percolator (incremental processing), Dremel (ad-hoc query analysis) and Pregel (graph analysis). It was about infrastructure.

So we should be thinking about Hummingbird in the same way. If we believe that Google now wants to use both text and entity based signals to determine quality and relevance they'd need a way to plug both sources of data into the algorithm.

Imagine a hybrid car that didn't have a way to recharge the battery. You might get some initial value out of that hybrid engine but it would be limited. Because once out of juice you'd have to take the battery out and replace it with a new one. That would suck.

Instead, what you need is a way to continuously recharge the battery so the hybrid engine keeps humming along. So you can think of Hummingbird as the way to deliver new sources of data (fuel!) to the search engine.

Right now that new source of data is entities but, as Danny Sullivan points out, it could also be used to bring social data into the engine. I still don't think that's happening right now, but the infrastructure may now be in place to do so.

The algorithms aren't really changing but the the amount of data Google can now process allows for greater precision and insight.

Deep Learning

Mr. Fusion Home Reactor

What we're really talking about is a field that is being referred to as deep learning, which you can think of as machine learning on steroids.

This is a really fascinating (and often dense) area that looks at the use of labeled and unlabeled data and the use of supervised and unsupervised learning models. These concepts are somewhat related and I'll try to quickly explain them, though I may mangle the precise definitions. (Scholarly types are encouraged to jump in an provide correction or guidance.)

The vast majority of data is unlabeled, which is a fancy way of saying that it hasn't been classified or doesn't have any context. Labeled data has some sort of classification or identification to it from the start.

Unlabeled data would be the tub of old photographs while labeled data might be the same tub of photographs but with 'Christmas 1982', 'Birthday 1983', 'Joe and Kelly' etc. scrawled in black felt tip on the back of each one. (Here's another good answer to the difference between labeled and unlabeled data.)

Why is this important? Let's return to Jeff Dean (who is a very important figure in my view) to tell us.

You're always going to have 100x, 1000x as much unlabeled data as labeled data, so being able to use that is going to be really important.

The difference between supervised learning and unsupervised learning is similar. Supervised learning means that the model is looking to fit things into a pre-conceived classification. Look at these photos and tell me which of them are cats. You already know what you want it to find. Unsupervised learning on the other hand lets the model find it's own classifications.

If I have it right, supervised learning has a training set of labeled data where a unsupervised learning has no initial training set. All of this is wrapped up in the fascinating idea of neural networks.

The different models for learning via neural nets, and their variations and refinements, are myriad. Moreover, researchers do not always clearly understand why certain techniques work better than others. Still, the models share at least one thing: the more data available for training, the better the methods work.

The emphasis here is mine because I think it's extremely relevant. Caffeine and Hummingbird allow Google to both use more data and to process that data quickly. Maybe Hummingbird is the ability to deploy additional layers of unsupervised learning across a massive corpus of documents?

And that cat reference isn't just because I like LOLcats. A team at Google (including Jeff Dean) was able to use unlabeled, unsupervised learning to identify cats (among other things) in YouTube thumbnails (PDF).

So what does this all have to do with Hummingbird? Quite a bit if I'm connecting the dots the right way. Once again I'll refer back the Jeff Dean interview (which I seem to get something new out of each time I read it).

We're also collaborating with a bunch of different groups within Google to see how we can solve their problems, both in the short and medium term, and then also thinking about where we want to be four years, five years down the road. It's nice to have short-term to medium-term things that we can apply and see real change in our products, but also have longer-term, five to 10 year goals that we're working toward.

Remember at the end of Back to The Future when Doc shows up and implores Marty to come to the future with him? The flux capacitor used to need plutonium to reach critical mass but this time all it takes is some banana peels and the dregs from some Miller Beer in a Mr. Fusion home reactor.

So not only is Hummingbird a hybrid engine but it's hooked up to something that can turn relatively little into a whole lot.

Quantum Computing

So lets take this a little bit further and look at Google's interest in quantum computing. Back in 2009 Hartmut Neven was talking about the use of quantum algorithms in machine learning.

Over the past three years a team at Google has studied how problems such as recognizing an object in an image or learning to make an optimal decision based on example data can be made amenable to solution by quantum algorithms. The algorithms we employ are the quantum adiabatic algorithms discovered by Edward Farhi and collaborators at MIT. These algorithms promise to find higher quality solutions for optimization problems than obtainable with classical solvers.

This seems to have yielded positive results because in May 2013 Google upped the ante and entered into a quantum computer partnership with NASA. As part of that announcement we got some insight into Google's use of quantum algorithms.

We’ve already developed some quantum machine learning algorithms. One produces very compact, efficient recognizers -- very useful when you’re short on power, as on a mobile device. Another can handle highly polluted training data, where a high percentage of the examples are mislabeled, as they often are in the real world. And we’ve learned some useful principles: e.g., you get the best results not with pure quantum computing, but by mixing quantum and classical computing.

A highly polluted set of training data where many examples are mislabeled? Makes you wonder what that might be doesn't it? Link graph analysis perhaps?

Are quantum algorithms part of Hummingbird? I can't be certain. But I believe that Hummingbird lays the groundwork for these types of leaps in optimization.

What About Conversational Search?

Dog Answering The Phone

There's also a lot of talk about conversational search (pun intended). I think many are conflating Hummingbird with the gains in conversational search. Mind you, the basis of voice and conversational search is still machine learning. But Google's focus on conversational search is largely a nod to the future.

We believe that voice will be fundamental to building future interactions with the new devices that we are seeing.

And the first area where they've made advances is the ability to resolve pronouns in query chains.

Google understood my context. It understood what I was talking about. Just as if I was having a conversation with you and talking about the Eiffel Tower, I wouldn't have to keep repeating it over and over again.

Does this mean that Google can resolve pronouns within documents? They're getting better at that (there a huge corpus of research actually) but I doubt it's to the level we see in this distinct search microcosm.

Conversational search has a different syntax and demands a slightly different language model to better return results. So Google's betting that conversational search will be the dominant method of searching and is adapting as necessary.

What Does Hummingbird Do?

What's That Mean Far Field Productions

This seems to be the real conundrum when people look at Hummingbird. If it affects 90% of searches worldwide why didn't we notice the change?

Hummingbird makes results even more useful and relevant, especially when you ask Google long, complex questions.

That's what Amit says of Hummingbird and I think this makes sense and can map back to the idea of synonyms (which are still quite powerful). But now, instead of looking at a long query and looking at word synonyms Google could also be applying entity synonyms.

Understanding the meaning of the query might be more important than the specific words used in the query. It reminds me a bit of Aardvark which was purchased by Google in February 2010.

Aardvark analyzes questions to determine what they're about and then matches each question to people with relevant knowledge and interests to give you an answer quickly.

I remember using the service and seeing how it would interpret messy questions and then deliver a 'scrubbed' question to potential candidates for answering. There was a good deal of technology at work in the background and I feel like I'm seeing it magnified with Hummingbird.

And it resonates with what Jeff Dean has to say about analyzing sentences.

I think we will have a much better handle on text understanding, as well. You see the very slightest glimmer of that in word vectors, and what we'd like to get to where we have higher level understanding than just words. If we could get to the point where we understand sentences, that will really be quite powerful. So if two sentences mean the same thing but are written very differently, and we are able to tell that, that would be really powerful. Because then you do sort of understand the text at some level because you can paraphrase it.

My take is that 90% of the searches were affected because documents that appear in those results were re-scored or refined through the addition of entity data and the application of machine learning across a larger data set.

It's not that those results have changed but that they have the potential to change based on the new infrastructure in place.

Hummingbird Response

Le homard et le chat

How should you respond to Hummingbird? Honestly, there's not a whole lot to do in many ways if you've been practicing a certain type of SEO.

Despite the advice to simply write like no one's watching, you should make sure you're writing is tight and is using subjects that can be identified by people and search engines. "It is a beautiful thing" won't do as well as "Picasso's Lobster and Cat is a beautiful painting".

You'll want to make your content easy to read and remember, link out to relevant and respected sources, build your authority by demonstrating your subject expertise, engage in the type of social outreach that produces true fans and conduct more traditional marketing and brand building efforts.

TL;DR

Hummingbird is an infrastructure change that allows Google to take advantage of additional sources of data, such as entities, as well as leverage new deep learning models that increase the precision of current algorithms. The first application of Hummingbird was the refinement of Google's document topic modeling, which is vital to delivering relevant search results.

Authorship Is Dead, Long Live Authorship

October 24 2013 // SEO // 54 Comments

Google's Authorship program is still a hot topic. A constant string of blog posts, conference sessions and 'research' projects about Authorship and the idea that it can be used as a ranking signal fill our community.

I Do Not Think It Means What You Think It Does

Yet, the focus on the actual markup and clock-watching when AuthorRank might show up may not be the best use of time.

Would it surprise you to learn that the Authorship Project at Google has been shuttered? Or that this signals not the death of Authorship but a different method of assigning Authorship.

Here's my take on where Authorship stands today.

RIP Authorship Project

The Authorship Project at Google was headed up by Othar Hansson. He's an incredibly smart and amiable guy, who from time to time was kind enough to provide answers and insight into Authorship. I was going to reach out to him again the other day and discovered something.

Othar Hansson Google+ About

Othar no longer works on the Authorship Project. He's now a principal engineer on the Android search team, which is a pretty sweet gig. Congratulations!

Remember that it was Othar who announced the new markup back in June of 2011 and then appeared with Matt Cutts in the Authorship Markup video. His departure is meaningful. More so because I can't locate a replacement. (That doesn't mean there isn't one but ... usually I'm pretty good at connecting with folks.)

Not only that but there was no replacement for Sagar Kamdar, who left as product manager of Authorship (among other things) in July of 2012 to work at Google X and, ultimately, Project Loon.

At the time I thought the writing was on the wall. The Authorship Project wasn't getting internal resources and wasn't a priority for Google.

Authorship Adoption

Walter White with his Pontiac Aztec

The biggest problem with Authorship markup is adoption. Not everyone is participating. Study after study after study show that there are material gaps in who is and isn't using the markup. Even the most rosy study of Authorship adoption by technology writers isn't anything to write home about.

Google is unable to use Authorship as a ranking signal if important authors aren't participating.

That means people like Neil Gaiman and Kevin Kelly wouldn't rank as well since they don't employ Authorship markup. It doesn't take a lot of work to find important people who aren't participating and that makes any type of AuthorRank that relies on markup a non-starter.

Authorship SERP Benefits

Search Result Heatmap For Authorship Snippet

Don't get me wrong. Google still supports Authorship markup and there are clear click-through rate benefits to having an Authorship snippet on a search result. Even if you don't believe me or Cyrus Shepard, you should believe Google and the research they've done on social annotations in 2012 (PDF) and 2013 (PDF).

So if you haven't implemented Google Authorship yet it's still a good idea to do so. You'll receive a higher click-through rate and will build authority (different from AuthorRank), both of which may help you rank better over time.

Google knows users respond to Authorship.

Inferred Authorship

I Know What You Did Last Summer

It's clear that Google still wants to do something about identifying authority and expertise. Any monkey with a keyboard can add content to the Internet. So increasingly it's about who is creating that content and why you should trust and value their opinion.

One of the first ways Google was able to infer identity (aka authorship) was by crawling the public social graph. Rapleaf took the brunt of the backlash for this but Google was quietly mapping all of your social profiles as well.

So even if you don't have Authorship markup on a Quora or Slideshare profile Google probably knows about it and could assign Authorship. All this data used to be available via social circles but Google removed this feature a few years ago. But that doesn't mean Google isn't mining the social graph.

Heck, Google could even employ usernames as a way to identify accounts from the same person. What we're really talking about here is how Google can identify people and their areas of expertise.

Authors are People are Entities

But what if Google took another approach to identifying authors? Instead of looking for specific markup what if they looked for entities that happen to be people.

Authors are people are entities.

This would solve the adoption issue. And that's what the Freebase Annotations of the ClueWeb Corpora (FACC) seems to indicate.

Identifying Authors in Text

The picture makes it pretty clear in my mind. Here we're seeing that Google has been able to identify an entity (a person in this instance) within the text of a document and match it to a Freebase identifier.

Based on review of a sample of documents, we believe the precision is about 80-85%, and recall, which is inherently difficult to measure in situations like this, is in the range of 70-85%. Not every ClueWeb document is included in this corpus; documents in which we found no entities were excluded from the set. A document might be excluded because there were no entities to be found, because the entities in question weren’t in Freebase, or because none of the entities were resolved at a confidence level above the threshold.

At a glance you might think this means that Google still has a 'coverage' problem if they were to use entities as their approach to Authorship. But think about who is and isn't in Freebase (or Wikipedia). In some ways, these repositories are biased towards those who have achieved some level of notoriety.

Would Google prefer to rely on self referring markup or a crowd based approach to identifying experts?

Google+ Is An Entity Platform

AJ Kohn Cheltenham High School ID

While Google might prefer to use a smaller set of crowd sourced entities to assign Authorship initially I think they'd ultimately like to have a larger corpus of Authors. That's where Google+ fits into the puzzle.

I think most people understand that Google+ is an identity platform. But if people are entities (and so are companies) then Google+ is a huge entity platform, a massive database of people.

Google+ is the knowledge graph of everyday people.

And if we then harken back to social circles, to mapping the social graph and to measuring engagement and activity, we can begin to see how a comprehensive Authorship program might take shape.

Extract, Match and Measure

Concentration Board Game

Authorship then becomes about Google's ability to extract entities from documents, matching those entities to a corpus that contains descriptors of that entity (i.e. - social profiles, official page(s), subjects) and then measuring the activity around that entity.

Perhaps Google could even go so far as to understand triples on a very detailed (document) level, noting which documents I might have authored as well as the documents in which I've been mentioned.

The presence of Authorship markup might increase the confidence level of the match but it will likely play a supporting and refining role instead of the defining role in the process.

Trust and Authority

Trust Me Sign

I'm reminded that Google talks frequently about trust and authority. For years that was about how it assessed sites but that same terminology can (and should) be applied to people as well.

Authorship markup is but one part of the equation but that alone won't translate into some magical silver bullet of algorithmic success. Building authority is what will ultimately matter and be reflected in any related ranking signal.

Are the documents you author well regarded by your peers? Are they shared? By who? How often? With what velocity? And are you mentioned (or cited) by other documents? Do they sit on respected sites? Who are they authored by? What text surrounded your mention?

So part of this is doing the hard work of producing memorable content, marketing yourself and engaging with your community. The other part will be ensuring that your entity information is both comprehensive and up-to-date. That means filling out your entire Google+ profile and potentially finding ways to add yourself to traditional entity resources such as Wikipedia and Freebase.

Just as links are the result and not the goal of your efforts, any sort of AuthorRank will be the result of building your own trust and authority through content and engagement.

TL;DR

The Authorship Project at Google has been abandoned. But that doesn't mean Authorship is dead. Instead it signals a change in tactics from Authorship markup to entity extraction as a way to identify experts and a pathway to using Authorship as a ranking signal.

Crawl Optimization

July 29 2013 // SEO // 68 Comments

Crawl optimization should be a priority for any large site looking to improve their SEO efforts. By tracking, monitoring and focusing Googlebot you can gain an advantage over your competition.

Crawl Budget

Ceiling Cat

It's important to cover the basics before discussing crawl optimization. Crawl budget is the time or number of pages Google allocates to crawl a site. How does Google determine your crawl budget? The best description comes from an Eric Enge interview of Matt Cutts.

The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank. So if you have a lot of incoming links on your root page, we'll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we'll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank. The pages that get linked to a lot tend to get discovered and crawled quite quickly. The lower PageRank pages are likely to be crawled not quite as often.

In other words, your crawl budget is determined by authority. This should not come as a shock. But that was pre-Caffeine. Have things changed since?

Caffeine

Percolator

What is Caffeine? In this case it's not the stimulant in your latte. But it is a stimulant of sorts. In June of 2010, Google rebuilt the way they indexed content. They called this change 'Caffeine' and it had a profound impact on the speed in which Google could crawl and index pages. The biggest change, as I see it, was incremental indexing.

Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Essentially, Caffeine removed the bottleneck for getting pages indexed. The system they built to do this is aptly named Percolator.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

The speed in which Google can crawl is now matched by the speed of indexation. So did crawl budgets increase as a result? Some did, but not as much as you might suspect. And here's where it gets interesting.

Googlebot seems willing to crawl more pages post-Caffeine but it's often crawling the same pages (the important pages) with greater frequency. This makes a bit of sense if you think about Matt's statement along with the average age of documents benchmark. Pages deemed to have more authority are given crawl priority.

Google is looking to ensure the most important pages remain the 'freshest' in the index.

Time Since Last Crawl

Googlebot's Google Calendar

What I've observed over the last few years is that pages that haven't been crawled recently are given less authority in the index. To be more blunt, if a page hasn't been crawled recently, it won't rank well.

Last year I got a call from a client about a downward trend in their traffic. Using advanced segments it was easy to see that there was something wrong with their product page traffic.

Looking around the site I found that, unbeknownst to me, they'd implemented pagination on their category results pages. Instead of all the products being on one page, they were spread out across a number of paginated pages.

Products that were on the first page of results seemed to be doing fine but those on subsequent pages were not. I started to look at the cache date on product pages and found that those that weren't crawled (I'm using cache date as a proxy for crawl date) in the last 7 days were suffering.

Undo! Undo! Undo!

Depagination

That's right, I told them to go back to unpaginated results. What happened?

Depagination

You guessed it. Traffic returned.

Since then I've had success with depagination. The trick here is to think about it in terms of progressive enhancement and 'mobile' user experiences.

The rise of smartphones and tablets has made click based pagination a bit of an anachronism. Revealing more results by scrolling (or swiping) is an established convention and might well become the dominant one in the near future.

Can you load all the results in the background and reveal them only when users scroll to them without crushing your load time? It's not always easy and sometimes there are tradeoffs but it's a discussion worth having with your team.

Because there's no better way to get those deep pages crawled by having links to all of them on that first page of results.

CrawlRank

Was I crazy to think that the time since last crawl could be a factor in ranking? It turns out I wasn't alone. Adam Audette (a smart guy) mentioned he'd seen something like this when I ran into him at SMX West. Then at SMX Advanced I wound up talking with Mitul Gandhi, who had been tracking this in more detail at seoClarity.

seoClarity graph

Mitul and his team were able to determine that content not crawled within ~14 days receives materially less traffic. Not only that, but getting those same pages crawled more frequently produced an increase in traffic. (Think about that for a minute.)

At first, Google clearly crawls using PageRank as a proxy. But over time it feels like they're assigning a self-referring CrawlRank to pages. Essentially, if a page hasn't been crawled within a certain time period then it receives less authority. Let's revisit Matt's description of crawl budget again.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank.

The pages that aren't crawled as often are pages with little to no PageRank. CrawlRank is the difference in this very large pool of pages.

You win if you get your low PageRank pages crawled more frequently than the competition.

Now what CrawlRank is really saying is that document age is a material ranking factor for pages with little to no PageRank. I'm still not entirely convinced this is what is happening, but I'm seeing success using this philosophy.

Internal Links

One might argue that what we're really talking about is internal link structure and density. And I'd agree with you!

Not only should your internal link structure support the most important pages of your site, it should make it easy for Google to get to any page on your site in a minimum of clicks.

One of the easier ways to determine which pages are deemed most important (based on your internal link structure) is by looking at the Internal Links report in Google Webmaster Tools.

Google Webmaster Tools Internal Links

Do the pages at the top reflect the most important pages on your site? If not, you might have a problem.

I have a client whose blog was receiving 35% of Google's crawl each day. (More on how I know this later on.) This is a blog with 400 posts amid a total content corpus of 2 million+ URLs. Googlebot would crawl blog content 50,000+ times a day! This wasn't where we wanted Googlebot spending its time.

The problem? They had menu links to the blog and each blog category on nearly all pages of the site. When I went to the Internal Links report in Google Webmaster Tools you know which pages were at the top? Yup. The blog and the blog categories.

So, we got rid of those links. Not only did it change the internal link density but it changed the frequency with which Googlebot crawls the blog. That's crawl optimization in action.

Flat Architecture

Flat Architecture

Remember the advice to create a flat site architecture. Many ran out and got rid of subfolders thinking that if the URL didn't have subfolders then the architecture was flat. Um ... not so much.

These folks destroyed the ability for easy analysis, potentially removed valuable data in assessing that site, and did nothing to address the underlying issue of getting Google to pages faster.

How many clicks from the home page is each piece of content. That's what was, and remains, important. It doesn't matter if the URL is domain.com/product-name if it takes Googlebot (and users) 8 clicks to get there.

Is that mega-menu on every single page really doing you any favors? Once you get someone to a leaf level page you want them to see similar leaf level pages. Related product or content links are the lifeblood of any good internal link structure and are, sadly, frequently overlooked.

Depagination is one way to flatten your architecture but a simple HTML sitemap, or specific A-Z sitemaps can often be very effective hacks.

Flat architecture shortens the distance between authoritative pages and all other pages, which increases the chances of low PageRank pages getting crawled on a frequent basis.

Tracking Googlebot

"A million dollars isn’t cool. You know what’s cool? A billion dollars."

Okay, Sean Parker probably didn't say that in real life but it's an apt analogy for the difference in knowing how many pages Googlebot crawled versus where Googlebot is crawling, how often and with what result.

The Crawl Stats graph in Google Webmaster Tools only shows you how many pages are crawled per day.

Google Webmaster Tools Crawl Stats

For nearly five years I've worked with clients to build their own Googlebot crawl reports.

Googlebot Crawl Reporting That's Cool

That's cool.

And it doesn't always have to look pretty to be cool.

Googlebot Crawl Report by Page Type and Status

Here I can tell there's a problem with this specific page type. More than 50% of the crawl on that page type if producing a 410. That's probably not a good use of crawl budget.

All of this is done by parsing or 'grepping' log files (a line by line history of visits to the site) looking for Googlebot. Here's a secret. It's not that hard, particularly if you're even half-way decent with Regular Expressions.

I won't go into details (this post is long enough as it is) but you can check out posts by Ian Lurie and Craig Bradford for more on how to grep log files.

In the end I'm interested in looking at the crawl by page type and response code.

Googlebot Crawl Report Charts

You determine page type using RegEx. That sounds mysterious but all you're doing is bucketing page types based on pattern matching.

I want to know where Googlebot is spending time on my site. As Mike King said, Googlebot is always your last persona. So tracking Googlebot is just another form of user experience monitoring. (Referencing it like this might help you get this project prioritized.)

You can also drop the crawl data into a database so you can query things like time since last crawl, total crawl versus unique crawl or crawls per page. Of course you could also give seoClarity a try since they've got a lot of this stuff right out of the box.

If you're not tracking Googlebot then you're missing out on the first part of the SEO process.

You Are What Googlebot Eats

Cookie Monster Fruit

What you begin to understand is that you're assessed based on what Googlebot crawls. So if they're crawling a whole bunch of parameter based, duplicative URLs or you've left the email-a-friend link open to be crawled on every single product, you're giving Googlebot a bunch of empty calories.

It's not that Google will penalize you, it's the opportunity cost for dirty architecture based on a finite crawl budget.

The crawl spent on junk could have been spent crawling low PageRank pages instead. So managing your URL Parameters and using robots.txt wisely can make a big difference.

Many large sites will also have robust external link graphs. I can leverage those external links, rely less on internal link density to rank well, and can focus my internal link structure to ensure low PageRank pages get crawled more frequently.

There's no patent right or wrong answer. Every site will be different. But experimenting with your internal link strategies and measuring the results is what separates the great from the good.

Crawl Optimization Checklist

Here's a quick crawl optimization checklist to get you started.

Track and Monitor Googlebot

I don't care how you do it but you need this type of visibility to make any inroads into crawl optimization. Information is power. Learn to grep, perfect your RegEx. Be a collaborative partner with your technical team to turn this into an automated daily process.

Manage URL Parameters

Yes, it's confusing. You will probably make some mistakes. But that shouldn't stop you from using this feature and changing Googlebot's diet.

Use Robots.txt Wisely

Stop feeding Googlebot empty calories. Use robots.txt to keep Googlebot focused and remember to make use of pattern matching.

Don't Forget HTML Sitemap(s)

Seriously. I know human users might not be using these, but Googlebot is a different type of user with slightly different needs.

Optimize Your Internal Link Structure

Whether you try depagination to flatten your architecture, re-evaluate navigation menus, or play around with crosslink modules, find ways to optimize your internal link structure to get those low PageRank pages crawled more frequently.

Keywords Still Matter

June 05 2013 // SEO // 58 Comments

As content marketing becomes the new black I'm starting to hear people talk about how keywords don't matter anymore. This sentiment appears in more than a few posts and the general tenor seems to be that keyword focused strategies are a thing of the past - a relic from a dark time.

The problem? You need keywords to produce successful content.

Dwight Meme Keywords

Keyword Syntax

How do people search for something? That's what keywords are all about. It's vital to ensuring your content will be found and resonate with your users.

keyword syntax

Are people searching for 'all weather fluid displacement sculptures' or 'outdoor water fountains'. That's an extreme example but it makes an important point.

You need to understand the user and the words they use to find your content.

Keyword Intent

Keywords can also tell you a lot about the intent of a search. Look (well) beyond informational, navigational and transactional intent and start thinking about how you can map keywords to the various stages of your site's conversion funnel.

For instance, what does a query like 'majestic seo vs open site explorer' tell you? This user is probably further along in purchase funnel. They're aware of their choices and may have even narrowed it down to these two options. The keyword (yes, keyword) 'vs' makes it clear that they're looking for comparison data.

Google SERP for Comparison Intent

Sure enough, most of the results returned are posts that compare these two tools. Those pieces of content squarely meet that intent, in part because they're paying attention to keywords.

Majestic SEO has a result but ... it's the home page. Is that going to satisfy the desire to compare? Probably not. And where's SEOMoz? Missing in action.

Each could rely on the blog posts presented to deliver this comparison. Or they could also develop content that met that keyword and intent, allowing them to tell their story and frame the debate.

I know some will shriek, "Are you crazy? You don't want to promote your competition by mentioning them so prominently!" But that's denying reality. Users are searching with this syntax and intent.

Now, I'm not saying you have to put content that meets this particular intent prominently on the site or in the normal conversion flow. But if you know someone is on the fence and comparing products, why wouldn't you want a chance to engage that user on your own terms?

Keywords let you create content that matches user intent.

Magic Questions

Oh-O It's Magic!

There's also a lot of meta information that comes along with a keyword. I'm fond of using a term like 'eureka 313a manual' as an example. It's a query for a vacuum cleaner manual.

On the one hand it's a pretty simple. There's explicit intent. Someone is looking for the manual to their vacuum cleaner. The content to meet that informational search would be ... the manual. But, what's really going on?

If you're searching for the manual, odds are that something is wrong with your vacuum. There's an implied intent at work. The vacuum is either not working right or is flat out broken. You have the opportunity to anticipate and answer magic questions.

How can I fix my vacuum? Where can I buy replacement parts? Are there repair shops near me? What vacuum should I get to replace this one if it can't be fixed?

Be decoding the keyword you can create a relevant and valuable page that meets explicit and implied intent.

Keyword Frequency

Keyword frequency is important. Yes, really. One of my favorite examples of this is LinkedIn. How did they secure their place in the competitive 'name' query space?

LinkedIn Keyword Frequency

LinkedIn wanted to make it clear what (or who) these pages were about. That's what keyword frequency is about, making it easy for search engines and users to understand what that page is about.

LinkedIn doesn't just do it with their headers either, but uses the name frequently elsewhere on the page. The result?

Marshall Simmonds Wordle

There's no question what this page is about.

Keywords are Steve Krug for Googlebot.

Readability

This Is Not A Pipe

The reaction I get from many when I press on this issue is that it produces a poor user experience. Really? I've never heard anyone complain about LinkedIn and most never realize that it's even going on.

Using the keywords people expect to see can only help make your content more readable, which is still a tremendously undervalued aspect of SEO. Because people scan text and rarely read word for word.

And what do you think they're scanning for? What do you think is rattling around in their brain when they're scanning your content? It's not something random like 'bellhop poodle duster', it's probably the keyword that brought them there.

You may think Google is smart enough to figure it out. You'll claim that Google's gotten far more sophisticated in the application of synonyms and topical modeling. And you'd be right to a degree. But why take the chance? Particularly since users crave the repetition and consistency.

They don't want you to use four different ways to say the same thing and the hard truth is they're probably only going to read one of those words anyway. You'll create better content for users if you write for search engines.

Make sure you're using the words users expect to see.

TL;DR

Keywords aren't going away, they're becoming more important. Query syntax and user intent are vital in producing relevant and valuable content that resonates with users and answers both explicit and implicit questions.

Google Removes Related Searches

April 19 2013 // Rant + SEO // 44 Comments

This morning I went to use one of my go to techniques for keyword research and found it was ... missing.

Related Searches Gone

Related Searches Option Gone

It was bad enough that the new Search tools interface was this awkward double-click menu but I understood that decision. Because most mainstream users don't ever refine their results.

But to remove related searches from that menu altogether? In less than a year related searches went from being a search tip to being shuffled off to Buffalo?

WTF!

Out of Insight

Clooney is Pissed

Google needs to understand that there are SEOs, or digital marketing professionals if that makes it easier, who are helping to make search results better. We're helping sites understand the syntax and intent of their users and creating relevant and valuable experiences to match and satisfy those queries.

I wasn't happy but wasn't that upset when Google introduced (not provided). But as the amount of (not provided) traffic increases I see no reason why Google shouldn't implement my (not provided) drill down suggestion. Seriously, get on that.

But then Google merged Google Trends with Google Insights for Search and in the process removed its most useful feature. That's right, knowing what percentage of the traffic that was attributed to each category let SEOs better understand the intent of that query.

Now Google's taking away the interface for related searches? Yeah, you've gone too far now. Hulk mad.

Stop Ignoring Influencers

You Wouldn't Like Me When I'm Angry

Just like the decision to terminate Google Reader, Google doesn't seem to understand that they need to address influencers. And believe it or not Google, SEOs are influencers. We're demystifying search so that sites don't fall for get-rank-quick schemes. And you need us to do that because you're dreadful at SEO. Sites aren't finding much of your educational content. They're not. Really.

In the last year Google's made it more and more difficult for SEOs to do good work. And you know who ultimately suffers? Google. Because the content coming out won't match the right syntax and intent. It'll get tougher for Google, over-time, to find the 'right' content and users will feel the slow decline in search quality. You know, garbage in, garbage out.

Any good marketer understands that they have to serve more than one customer segment. Don't like to think of SEOs as influencers? Fine. Call us power users and put us back on your radar and stop removing value from the search ecosystem.

Time To Long Click

April 17 2013 // SEO // 57 Comments

The internal metric Google uses to determine search success is time to long click. Understanding this metric is important for search marketers in assessing changes to the search landscape and developing better optimization strategies.

Short Clicks vs Long Clicks

Longcat

Back in 2009 I wrote about the difference between short clicks and long clicks. A long click occurs when a user performs a search, clicks on a result and remains on that site for a long period of time. In the optimal scenario they do not return to the search results to click on another result or reformulate their query.

A long click is a proxy for user satisfaction and success.

On the other hand, a short click occurs when a user performs a search, clicks on a result and returns to the search results quickly to click on another result or reformulate their query. Short clicks are an indication of dissatisfaction.

Google measures success by how fast a search result produces a long click.

Bounce Rate vs Pogosticking

Before I continue I want to make sure we're not conflating short clicks with bounce rate. While many bounces could be construed as short clicks, that's not always the case. The bounce rate on Stack Overflow is probably very high. Users search for something specific, click through to a Stack Overflow result, get the answer they needed and move on with their life. This is not a bad thing. That's actually a long click.

You can gain greater clarity on this by configuring an adjusted bounce rate or something even more advanced that takes into account the amount of time the user spent on the page. In the example above you'd likely see that users spent a material amount of time on that one page which would be a positive indicator.

The behavior you want to avoid is pogosticking. This occurs when users click through on a result, returns quickly to the search results and clicks on another result. This indicates, to some extent, that the user was not satisfied with the original result.

Two problems present themselves with pogosticking. The first is that it's impossible for sites to measure this metric. That sort of sucks. We can only look at short bounces as a proxy and even then can't be sure that the user pogosticked to another result.

The second is that some verticals will naturally produce pogosticking behavior. Health related queries will show pogosticking behavior since users want to get multiple points of view (or opinions if you will) on that ailment or issue.

This could be overcome by measuring the normal pogosticking behavior for a vertical or query class and then determining which results produce lower and higher than normal pogosticking rates. I'm not sure Google is doing this but it's not out of the question since they already have a robust understanding of query and vertical mapping.

But I digress.

Speed

Part of the way Google works on reducing the time to long click is by improving the speed of search results and the Interent in general. Their own research showed the impact of speed on search results.

All other things being equal, more usage, as measured by number of searches, reflects more satisfied users. Our experiments demonstrate that slowing down the search results page by 100 to 400 milliseconds has a measurable impact on the number of searches per user of -0.2% to -0.6% (averaged over four or six weeks depending on the experiment). That's 0.2% to 0.6% fewer searches for changes under half a second!

Remember that while usage was the metric used, they were trying to measure satisfaction. Making it faster to get to information made people happier and more likely to use search for future information requests. Google's simply reducing the friction of searching.

But it's not just the speed of presenting results but in how quickly Google gets someone to that long click that matters. Search results that don't produce long clicks are bad for business as are those that increase the time selecting a result. And pogosticking blows up the query timeline as users loop back and tack on additional seconds worth of selection and page load time.

Google Query Timeline

Make no mistake. Google wants to reduce every portion of this timeline they presented at Inside Search in 2011.

Answers

42

One of the ways in which we've seen Google reduce time to long click is through various 'answers' initiatives. Whether it's a OneBox or a Knowledge Graph result the idea is that answers can often reduce the time to long click. It's immediate gratification and in line with Amit Singhal's Star Trek computer ideal.

In some of cases a long click is measured by the absence of a click and reformulated query. If I search for weather, don't click but don't take any further actions, that should register as a long click.

Ads

John Henry Man vs Machine

You'll also hear Google (and Bing) talk about the fact that ads are answers. Of course ads are what fill the coffers but they also provide another way to get people to a long click. Arguing the opposite (that ads aren't contributing to satisfaction) is a lot like arguing that marketers and advertisers aren't valuable.

Not only that, but Google has features in place to help ensure that good ads answers rise to the top. The auction model coupled with quality score and keyword level bidding all produce relevant ads that lead to long clicks.

The analysis of pixel space on search results is often used to show how Google is marginalizing organic search. Yet, the other way to look at it is that advertisers are getting better at delivering results (with the help of new Google ad extensions). Isn't it, in some ways, man versus machine? The advertiser being able to deliver a better result than the algorithm?

Without doubt Google benefits financially from having more space dedicated to paid results but they still must result in long clicks for Google to optimize long-term use, which leads to long-term revenues and profits.

I would be very surprised if changes to search results (both paid and organic) weren't measured by the impact they had in time to long click.

Hubs

Bow Tie

All of this is interesting but what does the time to long click metric mean for SEO? More than you might suspect.

When I started in the SEO field I read everything I could get my hands on (which is not altogether different from now). At the time there was advice about becoming a hub.

There was a good deal of hand waving about the definition of a hub but the general idea was that you wanted to be at the center of a topic by providing value and resources. People would link to you and the traffic you received would often go on to the resources you provided. About.com is a good example.

Funny thing is, this isn't some well kept secret. Marshall Simmonds spells it out pretty clearly in this 2010 Whiteboard Friday video where he discusses bow tie theory (hubs) and link journalism. (I just watched this again while writing this and, man, this is an awesome video.)

Most people focus on the fact that hubs receive a lot of backlinks. They do because of the value they provide, which is often in the aggregation of and links to other content. In the end, the real value of hubs is that they play an important part in getting people to content and that long click.

Search is a multi-site experience.

This is what search marketers must realize. You will get credit for a long click if you're part of the long click. If you ensure that the user doesn't return to search results, even by sending them to another site, then you're going to be rewarded.

Too often sites won't link out. I regularly run into this as my clients navigate business development deals with partners. It's frustrating. They think linking out is a sign of weakness and reduces their ability to consolidate Page Rank.

While Page Rank math might support not linking out, that strategy ultimately limits success.

Link Out!

Local Maxima Graph

Limiting your outlinks creates a local maxima problem. You'll optimize only up to a certain ceiling based on constrained Page Rank math. Again, not a real secret. Cyrus Shepard talked about this in a 2011 Whiteboard Friday video (though I wouldn't stress too much about the anchor text myself.)

Linking out can help you break through that local maxima by delivering more long clicks. Suddenly, your page is a sort of mini-hub. People search, get to your page and then go on to other relevant information.

Google wants to include results that contribute to reducing the time to long click for that query. 

I'm not advocating that you vomit up pages with a ton of links. What I'm recommending is that you link to other valuable sources of information when appropriate so that you fully satisfy that user's query. In doing so you'll generate more long clicks and earn more links over time, both of which can have profound and positive impact on your rankings.

Stop thinking about optimizing your page and think about optimizing the search experience instead. 

I ran into someone as SMX West who inherited a vast number of low quality sites. These sites used the old technique of being relevant enough to get someone to the page but not delivering enough value to answer their query. The desired result was a click on an ad. Simple arbitrage when you get down to it.

In a test, placing prominent links to relevant content on a sub-set of these pages had a material and positive impact on their ranking. It's certainly not conclusive, but it showed the potential impact of being part of a multi-site long click search result.

As an aside, it's not that those ad clicks were bad. Some of those probably resulted in long clicks. Just not enough of them. The majority either pogosticked to another result or wound up back at the search result after an ad click. And we already know this as search marketers by looking at the performance of search versus display campaigns.

Impact On Domain Diversity

If you believe time to long click is the way in which Google is measuring search success then you start to see some of the changes in a new light. I've been disappointed by the lack of domain diversity on many search results.

Yelp Dominating Search Results for Haircut in Concord CA

Sadly, this type of result hasn't been that rare within the last year. Pete Myers has been doing amazing work on this topic.

For a while I just thought this was Google being stupid. But then it dawned on me. The lack of domain diversity may be reducing the time to long click. It might actually be improving the overall satisfaction metrics Google uses to optimize search!

In some ways this makes a bit of sense, if even from a straight up Paradox of Choice perspective. Selecting from 10 different domains versus 5 might reduce cognitive strain. Too many choices overwhelm people, reducing both action and satisfaction. So perhaps Google's just reflecting that in their results with both domain diversity (or lack there of) and more instances of 7 results pages.

Downsides to Time To Long Click?

MC Escher Relativity Stairs

Are these long clicks are truly a sign of satisfaction. The woman who had been cutting my hair for nearly 10 years retired. So I actually did need to find someone new. I hated search result but did wind up clicking through and using Yelp to locate someone. So from Google's perspective I was satisfied but in reality ... not so much.

I wonder how long a time frame Google uses in assessing the value of long clicks. I abandoned my haircut search a number of times over the course of a month. In many of those instances I'm sure it looked like I was satisfied with the result. It looked like a long click. Yet, if you looked over a longer period of my search history it would become clear I wasn't. I think this is a really difficult problem to solve. Is it satisfaction or abandonment?

The other danger here is that Google is training people to use another service. Now, I don't particularly like Yelp but what this result tells me is that if I wanted to find something like this again I should just skip Google and go right to Yelp instead.

The same could be said by reflecting our own bias toward brands. While users may respond better to brands and the time to click might be reduced, the long term implications could be that Google is training users to visit those brands directly. Why start my product search on Google when all they're doing is giving me links to Amazon 90% of the time?

Of course, Google could argue that it will remain the hub for information requests because it continues to deliver value. (See what I did there?)

TL;DR

Google is using time to long click to measure the effectiveness of search results. Understanding this puts many search changes and initiatives into perspective and gives sites renewed reason to link out and think of search as a multi-site experience.

Tracking Image Search In Google Analytics

March 27 2013 // Analytics + SEO // 49 Comments

The Internet is becoming increasingly visual but the standard Google Analytics default lumps image search traffic in with organic traffic. The problem with that is these two types of traffic have radically different behaviors.

Google Analytics Y U No Track Image Search

So here's a quick way for you to track image search in Google Analytics to gain insight into how images are performing for your business.

Image Search Referrers

After the last image search update I was asked by Annie Cushing if I'd figured out a way to track images in Google Analytics. I'd meant to but hadn't yet. Her reminder led me to find out what was possible. I fired up Firefox and used Live HTTP Headers to look at the referrers for image search traffic.

I found that there were two distinct referrers for Google, one from Google images and one from images that showed up via universal search results.

Here's what the referrer looks like from Google image search.

Google Image Search Referrer

The parts to note here are the /url? and the source=images parameter. Now lets look at what the referrer looks like from an image via universal search.

Google Image Referrer via Universal Search

The part to note here is that the URL doesn't use /url? but imgres? instead. This means you can track traffic from each source!

Finally lets take a look at Bing.

Bing Image Search Referrer

This is pretty straight forward and doesn't change based on whether it's from image search proper or via a universal result.

Google Analytics Image Search Filters

If you know the referrer patterns you can set up some Google Analytics filters to capture and reclassify this traffic into the appropriate buckets. Here's the step-by-step way to do that.

From Google Analytics click Admin.

Accessing Admin in Google Analytics

That takes you to a list of profiles.

Google Analytics Profiles

Here you can either create a new profile or select a current one. I'd suggest creating a new profile to test this out before you decide to integrate it into your primary profile. Because you might screw it up or just may not like the detail or may not want to have the change in continuity. That said, I've created these filters so they'll have the least amount of impact on your reporting while still delivering added insight.

Next you'll reach the profile navigation pane where you'll want to click on Filters.

Google Analytics Profile

At that point you'll want to go ahead and click the New Filter button.

Google Analytics New Filter

That's when the real fun begins and you construct a new advanced filter.

Creating a Google Analytics Google Image Search Filter

The first step is to name this filter. This won't show up in your reports and is simply a way for you to know what that filter is doing. So make it descriptive and obvious.

Next you'll want to select the Custom filter button (2) which then reveals a list of options. From that list you'll want to select Advanced (3). This is where it gets a bit tricky.

In step 4 you'll select Referral from the menu of options and then apply some RegEx to match the pattern we've identified. In this instance the RegEx I'm using is:

.*google\.(.*)/url.*source=images.*

I love RegEx, which stands for Regular Expression, but I don't always get it right the first time and regularly rely on this RegEx cheat sheet to remind and guide me. In this instance I'm looking for all Google domains (trying to include international here) with /url and source=images within the referrer.

The RegEx for the other two filters you'll create for Google universal images and Bing images are:

.*google\.(.*)/imgres.*

.*bing\.(.*)/images/search.*

In step five you're selecting what you're going to do when a referrer matches your RegEx. I've chosen Campaign Source from the menu and then created a new source called 'google images'. You can name these whatever you like but I keep them lowercase to match the other sources.

You'll note that the 'Override Output Field' is set to Yes which means that I'm going to change the Campaign Source for those that match this referrer pattern from what it is currently to 'google images'. The great part about this is that you retain the fact that the medium is 'organic'. So all those reports remain completely valid.

Finally, you click Save and then you wait for the filter to be applied to traffic coming into the site. Depending on the amount of traffic you get from these sources, it may take a few hours to a few days to see the filter working in your reports.

Image Search Reports

So what do you get to see in the reports?

Image Filters Create Better Google Analytics Reports

This is data from a client site where I've had all the filters in place for a few days. The medium for all of these is still organic but I've now got new sources for google images, universal images and bing images.

What you should see right away is the very large difference in how this traffic performs. Image search traffic in this instance has a 1.5 Pages/Visit and 3:00 Avg. Visit Duration while the web based organic traffic has a 6 Pages/Visit and 6.00 Avg. Visit Duration.

Most importantly, the conversion rate on these two types of traffic is different as well. Segmenting your image search traffic can bring more clarity to your analysis and help you make the right decisions on what's working, how to allocate resources and what to optimize.

Image Search Filter Validation

So how do I know this is really working? I drill down into one of these new sources and then select keyword as the secondary dimension. Did I forget to mention that the keyword data remains in tact?

Google Analytics Universal Images Keyword Report

Yup, sure does! So the next step here is to see if there really is a universal result for these keywords.

Google Search Result for Badass Over Here Real Pic

Sure enough, I'm the second result in this universal search result. Now lets see if the filter for normal image search is working.

Google Analytics Google Images Keyword Report

I'll use 'wifi logo' as my target term and first go to make sure that I'm not showing up in universal search results.

Google Search Result for Wifi Logo

Nope, not showing up there. But am I showing up in Google image search?

Google Images Search Results for Wifi Logo

Sure enough I'm there just inside the top 100 results from what I can tell. So I'm pretty confident that the filter is catching things and bucketing them appropriately. I've also validated this with very robust client data but can't share that level of detail publicly.

What Is images.google?

You might have noticed the images.google source above. What's that you ask? I don't know. But I don't think it's traditional image search traffic since the user behavior of that source doesn't conform to the other three image based sources. It's also a small source of traffic so while my OCD senses are tingling I'm currently ignoring the urge to figure out exactly what images.google represents.

Tell me if you figure it out.

Caveats

You Raise a Valid Point Ice Cream

The big question is why I wouldn't just use the Google Webmaster Tools queries report and filter by image right? Well first off, the integration into Google Analytics still isn't where I'd like it to be making any type of robust reporting near impossible.

In addition, I don't like mixing image search traffic with web search traffic in my normal reports because they're so different. It makes any analysis you do using that mixed data less precise and prone to unintentional error.

More problematic is the fact that the data between Google Webmaster Tools and Google Analytics doesn't match up.

I started looking at specific keywords via my filters versus what was reported in Google Webmaster Tools. There were just too many times when Google Webmaster Tools reported material amounts of traffic that wasn't showing up in my Google Analytics reports.

Google Webmaster Tools Clicks

Here you can see that the top term received 170 clicks in this time frame. Yet during the same time frame here's what the Google Analytics filter based method reports.

Google Analytics Image Based Clicks

170 versus 24! Even if I factor in the (not provided) percentage (which runs about 35% for this client) and add that back in I only get close to 40 visits.

But that's when the lightbulb went off. Maybe Google Analytics is reporting Visits while Google Webmaster Tools is reporting Clicks?

While I can't confirm this I'm guessing that Google Webmaster Tools is counting all clicks on a result. Many of those clicks are going directly to the image and not the page the image resides on. That's important since direct clicks to the image (i.e. - .jpg files and the like) aren't going to be tracked in Google Analytics as a visit. There is no Google Analytics code on these files. The delta between the two could be the number of users who clicked directly to the image.

In addition, this method doesn't catch any of the mobile clicks and visits since no image search visits (and very few universal images) show up using this filter when looking at mobile traffic. I'm pretty sure that the referrers are just getting stripped and these wind up going into direct instead which is part of the iOS and Android 4+ search attribution issue. (If someone else has an explanation here or finds a different referrer for mobile image search please let me know.)

Finally, there's something funky with Chrome. When I look at the distribution of traffic to each bucket Chrome is an outlier for Google images.

Image Filters Browser Distribution

That 3.7% is just way out of proportion. And it's not related to the amount of (not provided) traffic since Firefox actually has a higher percentage of (not provided) 72% than Chrome (64%) in this instance. So I can only conclude that there's some amount of data loss going on with Chrome. Maybe that also contributes to the discrepancy I see between Google Analytics and Google Webmaster Tools.

Despite all of these caveats I love having the additional detail on image traffic which has wildly different intent and user behavior.

TL;DR

Apply a few simple Google Analytics filters to gain insight into how much traffic you're getting through image search. This is increasingly important as the Internet becomes more visual and the user behavior of these visits differs in material ways from traditional search traffic.