Knowledge Graph Optimization

March 10 2014 // KGO + SEO // 40 Comments

A few months ago I offhandedly made a reference to KGO which stands for Knowledge Graph Optimization.

Now, I know what you’re thinking. We need another acronym like another hole in the head! But over the past year I feel like there are a set of tactics that can help you optimize your site’s connection to the Knowledge Graph. And that can yield material gains in search visibility.

The Knowledge Graph

The Knowledge Graph

Here’s a brief explanation from Google for those not familiar with the Knowledge Graph.

The Knowledge Graph enables you to search for things, people or places that Google knows about—landmarks, celebrities, cities, sports teams, buildings, geographical features, movies, celestial objects, works of art and more—and instantly get information that’s relevant to your query. This is a critical first step towards building the next generation of search, which taps into the collective intelligence of the web and understands the world a bit more like people do.

It’s about searching for things instead of strings. Or without the rhyming, it’s about entities instead of text.

Take the query ‘Golden State Warriors’. From a string stand point you’d be looking at the individual keywords which might be confusing. Now, Google got extremely good at understanding terms that were most frequently used together using bigrams and other methods so that this query would yield a result about the NBA basketball team.

But with the Knowledge Graph Google can instead identify ‘Golden State Warriors’ as an entity (a thing) that has a specific entry in the Knowledge Graph and return a much richer result.

Knowledge Graph Result for Golden State Warriors

Pretty amazing stuff really. (Go Warriors!)  Hummingbird was largely an infrastructure update that allowed Google to take advantage of burgeoning entity data. So we’re just getting started with the application of entities on search.

Entity Challenge

Challenge Accepted

You need only look to the Entity Recognition and Disambiguation Challenge co-sponsored by Microsoft and Google to see the writing on the wall.

The objective of an Entity Recognition and Disambiguation (ERD) system is to recognize mentions of entities in a given text, disambiguate them, and map them to the entities in a given entity collection or knowledge base.

Can it be any more clear? Well, actually, it can.

The Challenge is composed of two parallel tracks. In the “long text” track, the challenge targets are pages crawled from the Web; these contain documents that are meant to be easily understood by humans. The “short text” track, on the other hand, consists of web search queries that are intended for a machine. As a result, the text is typically short and often lacks proper punctuation and capitalization.

Search engines are chomping at the bit to get better at extracting entities from documents and queries so they can return more relevant and valuable search results.

So …

Wiley Coyote Bat Suit

But what exactly are we supposed to do? There has been little in the way of real rubber-meets-the-road content that describes how you might go about optimizing for this new world full of entities. One of the exceptions would be Aaron Bradley’s Semantic SEO post, though it mixes both theory and tactics.

Now, I love theory. That’s pretty clear from my writing. But today I want to talk more about tactics, about the actual stuff we can do as marketers to affect change in the Knowledge Graph.

Nouns

Noun

The first thing we can do is make sure we’re using the entity names in our writing. That ERD challenge above? Well, the systems they’re designing are looking to extract entities from text.

So if you’re not using the entity names – the nouns – in your writing then you’re going to make it vastly more difficult for search engines to identify and match entities. This does not mean you should engage in entity stuffing and mention every associated entity you can think of in your content.

Write clearly so that both humans and search engines know what the hell you’re talking about.

Connect

Connect All The Things!

Stop hoarding authority and ‘link juice’ by not linking out to other sites. The connections between sites and pages are important and not just in a traditional PageRank formula.

I think of it this way. The entities that are contained on one page are transmitted to linked pages and vice versa.

Entities are meta information passed in links.

Structured Data

Structured Data

You can make the identification of entities easier for search engines by using schema.org markup along with some other forms of structured data. Not only will this ensure that the number of entities that are transmitted via links increase, it can often make connections to the Knowledge Graph with a very limited amount of data.

Google Maps Entity Hack

So, here’s the actual bit of discovery that I’ve been holding onto for six months and is the real impetus for this entire post. If you go to Google Maps and search for a branded term coupled with a geographic location you often get some very interesting results. Take ‘zillow san diego, ca‘ for instance.

Google Maps Result for Zillow San Diego CA

Look at all those results and red dots! I didn’t ask for realtors, mortgage brokers or appraisers in my query. I simply used the term Zillow in combination with a geography and got these very related and relevant results. They’re not simply looking for a Zillow office located in San Diego.

So, lets look at the details here to see what’s going on. I’ll take one of the red dots and investigate further.

Mesa Pacific Mortgage Google Maps Result

So why is this on the map results? First I go to the linked website.

Mesa Pacific Mortgage Website

So, there are no links to Zillow anywhere on the site and the address and phone number here don’t match the one on Google Maps. But they are the ones listed on his Zillow Profile.

Zillow Structured Data Example

Now the link to the website closes the connection here so it’s not completely linkless, but I still find it pretty amazing. And this is without Zillow fully optimizing the markup. They declare the page as an organization.

Zillow Organization Schema

But they don’t detail out the professional information with schema markup.

Zillow Definition List Markup

Instead they’re using some old(er) school definition list markup for list term and description. Combined with the organization scope it looks like Google can put 1 and 1 together.

Google+

In doing due diligence I found Mesa Pacific Mortgage also has a Google+ page which reinforces the right address and phone number. So the connection isn’t as startling as it might seem but it’s still intriguing.

And I have no idea in what order these things came into existence. It’s pretty clear the Zillow listing probably came first based on the 2006 Member Since date on his profile. Whether the Google+ Local page and associated map listing came directly as a result is unknown.

In fact, as you do more and more investigation as to what shows up on the map and what doesn’t it seems like a Google+ Local page is required. However, a fair amount of them have been created by Google. Obviously Google uses a multitude of sources to create these listing. If you can be one of those sources, all the better. But even if you’re not, connecting to these entities delivers value to all involved.

Lets look at another Google Maps result.

Google Maps Result for Pacific Sotheby's

If you follow that reviews link you wind up on their Google+ page.

Pacific Sotheby's Google+ Page

Odd that Google isn’t sucking in the reviews from Zillow, which would show a greater connection. Google+ Local Pages provide a vast database of entities for Google. And they rely on the data in Google+ more than that from other sources.

Zillow Profile for Keke Jones

Here the phone number on Zillow doesn’t match the one on Google+ or Google Maps. A quick aside that you’re also seeing the potential to create a relationship between Keke Jones (person) and Pacific Sotheby’s Int’l Realty (place). But I digress.

Outside of the website connection and address match in that Professional Information section, the other reason this result shows up for this search is because they use Zillow products on their website.

Pacific Sotheby's Links to Zillow

The rest of you can run away of these types of implementations based on poor analysis of a Matt Cutts video if you like, but that would be a mistake in my view.

Okay, one last example. Lets zoom in and find another result.

Google Maps Result for Roger Ma

The hours data indicates that Roger probably has a Google+ Page. Yup.

Google+ Page for Roger Ma

Now we can see that they’re pulling in reviews from Zillow and Roger does have a profile on Zillow. So why he shows up for a Zillow+Geography search is pretty straight-forward.

Interestingly, searching for ‘homethinking san diego, ca’ on Google Maps does not return Roger Ma. Perhaps because they don’t include an address line 1 or because they only use hreview-aggregate and don’t declare a schema.org scope (thank you handy structured data testing tool bookmarklet).

Tough to say but you can see how important it might be to ensure you did what was necessary to confirm these connections.

People Talk About

People Talk About Amber Bistro

Now lets home in (pun intended) on the ‘People talk about’ feature. These terms are generated though some process/algorithm that analyzes the review text and pulls out the relevant (depending on who you ask) key phrases.

Now, I’m not going to go too far down this rabbit hole, though I think it’s possible Google might be using both review text and query syntax to create these phrases. Bill Slawski did a nice job teasing out how Google finds ‘known for’ terms for entities.

What’s important in my view is that these key phrases become more meta information that gets passed back and forth through entity connections.

Google is assigning this entity (Roger Ma) a certain cluster of key phrases including ‘sell a home’ and ‘great realtor’. Zillow is connected to this entity, as we’ve demonstrated, which means that those key phrases are, on some level, applied to Zillow’s page and site.

Now imagine the aggregated key phrases from connected entities that are flowing into Zillow. Do you think that might give Google a better idea of exactly when and for what queries they should return Zillow content?

And Google might very well know the terms people used to get to Roger Ma’s page on Zillow and use that to inform all of the other connected entities. That’s speculation but it’s made with over six months of experimentation and observation.

I can’t share many of the details because I’m under various NDAs, but once you make these connections using structured data there seems to be an increased ability to rank for relevant terms.

SameAs

Okay, we veered off a bit into theory so lets get back to tactics. If you have a page that is about a known entity you may want to use the SameAs schema.org property.

sameAs Schema Property

If I had to describe it plainly, I’d say sameAs acts as an entity canonical. Sure, it’s a bit more complicated than that and has a lot to do with confirming identity but in my experience using sameAs properly can be a valuable (and more direct) way of telling search engines what entity that page contains or represents.

sameAs Schema Example

Here you see that a page about Leonardo DiCaprio a sameAs property to his Wikipedia entry. Now, obviously you could try to spam this property but there would be a number of ways to catch this type of behavior. Sadly, I know that won’t stop some of you.

Wikipedia

Cat Editing Wikipedia

Like it or not Wikipedia is still a primary source of data for the Knowledge Graph. If you’ve got a lot of time, patience and can be objective rather than subjective you can wade into Wikipedia to help create company profiles, provide reference links (more important than you may imagine) and generally ensure that your brand is represented in as many legitimate places as possible.

Your goal here isn’t to spam Wikipedia but to simply crack the Kafka-like nature of Wikipedia moderation and provide a real representation of your site or brand that adds value to the entire corpus and platform.

Freebase

Freebase on the other hand has a different type of challenge. Instead of obstinate editors and human drama, Freebase is just … a byzantine structure of updates. The good news? It’s a direct line to the Knowledge Graph.

For instance if you search for Twitter this is the Knowledge Card you get as a result.

Knowledge Graph Result for Twitter

There’s no Google+ part of the Knowledge Card because there is no reference to a Google+ Page under Social Media Presence.

Twitter Freebase Profile

Turns out they don’t have a Google+ Page. Seriously? Man, get with it Twitter. Compare this to StumbleUpon.

Knowledge Card for StumbleUpon

They’ve got the business specific information as well as the Google+ integration with the Recent posts unit. Why? They’ve got a Google+ entry in their Social Media Presence on Freebase.

StumbleUpon Freebase Profile

How about Foursquare?

Knowledge Card for Foursquare

Oy! Not so good. They’ve got their Google+ account in Freebase.

Foursquare Freebase Entry

However, the business section on their ‘Inc.’ entry in Freebase (different from the standard entry) is empty.

Foursquare Business on Freebase

Now, the interplay between a standard entry and a business entry on Freebase can be strange and some entities don’t even need this dual set-up, which makes understanding how to enter it all really complex. So, it’s not just you who thinks updating Freebase is hard. But … it’s totally worth it.

Because Freebase really is where the Knowledge Graph flows as I’ve just shown. For just one more example, look at the Knowledge Card for Garret Dillahunt and then look at the data in his Freebase entry. Match the elements that show up in the Knowledge Card. Convinced?

You might ask why Google links to Wikipedia in the Knowledge Cards and not Freebase? Have you looked at Freebase!? It’s not a destination site anyone on the Google search team would wish on a user. That and Wikipedia has a solid brand that likely resonates with a majority of users.

KGO

Knowledge Graph Optimization is just getting started but here are the real things (pun intended) you can do to start meeting this new world head on.

Use Entities (aka Nouns) In Your Writing

Make it easy for users and search engines to know what you’re talking about by using the actual names of the entities in your writing.

Get Connected and Link Out To Relevant Sites

Stop hoarding link juice and link out to relevant sites so that the entity information can begin to flow between sites.

Use Structured Data To Increase Entity Detection

Make it easier for search engines to detect, extract and connect entities to the Knowledge Graph by using various forms of structured data.

Go A Step Further and Use the sameAs Property 

When appropriate use the sameAs property to reference the exact Freebase or Wikipedia entry for that entity. Think of it as an entity canonical.

Claim and Optimize Your Google+ Presence

There’s no doubt that Google+ sits in the middle of a lot of the knowledge graph, particularly about places. So claim and optimize your presence, which also extends to getting reviews.

Get Exposure on Wikipedia

Put on some music and slug it out with Wikepedians who seem straight from Monty Python’s Argument sketch and edit your profile and add some appropriate references.

Edit and Update Your Freebase Entry

Update your Freebase entry and make it as complete as possible. I hope to have a more instructive post on editing Freebase some time in the near future.

Knowledge Graph Optimization (KGO) is about making it easy to connect to as many relevant entities as possible so that search engines better understand your site on a ‘thing’ level and can pass important meta information between connected entities.

What I Learned in 2013

February 26 2014 // Career + Life // 24 Comments

(This post is a personal post about running Blind Five Year Old, building on similar posts for 2011 and 2012.)

It’s nearly the end of February and I haven’t completed my now annual ‘What I Learned’ post. That should tell you that one of the things I learned is how quickly time gets away from you.

If you’re looking for a post where every problem has an answer with a pretty bright red bow on top then you should click the back button immediately. Because while 2013 was a crazy successful year, it was also messy and confusing.

Success Devours Time

I won’t humble brag. It was a great year for the business. I moved many clients to retainers and wound up working with three top 50 web properties according to comScore. The work was interesting and challenging, revenue was up and I was more than comfortable financially.

Winning

Yet, success introduced new problems. If you’d like to play the smallest violin now, please go ahead. I get it. It feels strange to complain about success. Yet, here I am about to do that. Don’t get me wrong, I don’t want the opposite. But here is my reality.

More clients meant more client work. A lot more. The result? I had a choice. Dial down the time I spent learning or building the brand. When I got serious about the business back in 2010 my ratio of client work to learning and brand building was 50/50. For me, the choice was obvious.

I spent far less time building the brand. One only need look at the number of blog posts to see how my output diminished. Mind you, I made the most out of the blog posts I did manage to publish. But it was an anemic year in terms of output and that bothers me not just from a business perspective but because I enjoy writing.

Perfectionism Works (For Me)

Good Is The Enemy of Great

Part of the problem is that I’m a perfectionist. I’d probably tell you I simply had “very high standards for the quality of my work” and I could even talk myself into believing that. But it probably looks a hell of a lot like perfectionism.

So at the beginning of 2013 I was hell bent on embracing the ‘done is better than perfect’ mantra. Jonathon Colman would be proud. But you know what? Didn’t happen.

Not only that, but all the evidence seemed to indicate that spending that extra time to make my work that much better … paid off. Even if I was late delivering the work (which happened more often than I’d like), the quality of the work was such that it carried the day. The delay was suddenly explainable given the quality and success of the recommendations.

Yes, you still have to produce results. And I did.

Sales Avalanche

Having a sales funnel is important. You don’t want a client or two go dark and suddenly be struggling. I had this mentality as I spun up the business. Yet, in 2013 I was actively turning away business. This sounds and feels ugly since I know others aren’t in the same situation.

Most of the clients I wound up taking on were through referrals. Why did I get these referrals? Because of the quality of my work. Work that I’d taken a lot of time to get just right. That’s what I’ve learned. Great work creates … more work.

There are other factors involved in this sales windfall. One is the fact that I’ve created an sort of A-Team perception.

The A-Team

If you have a problem, if no one else can help, and if you can find them….maybe you can hire The A-Team.

I don’t blog on traditional platforms to gain exposure, though you might find me referenced there (and I’m very grateful to those authors for doing so.) Hopefully I continue to create content that merits these links from talented professionals.

But the clients I want are looking for the person behind the scenes. The guy who isn’t on all those crazy industry blogs you can’t trust. Now, that’s not how I think of them but that’s how a lot of the people I want to work for think about them. So instead they ask their colleagues if they know someone they can trust.

Scarcity is a powerful marketing tactic.

In addition, there’s a supply and demand issue in the digital marketing industry, with way more demand than (good) supply. This was driven home to me in a conversation with Mike Ramsey at SMX Advanced last year.

He asked me whether I had ever done any traditional advertising for the business. Never, I responded. He then asked me if I could name another industry where you could build a successful business without advertising. I couldn’t.

Juggling Fail

Dropped and Cracked Egg

So what this all adds up to is that things fall off the plate. You can only juggle so many things. You’re response time to email goes up. You deliver work late. The smaller requests for your time may go ignored.

It makes me feel fucking awful.

I still try very hard to get back to as many people as possible. To answer questions. To respond to every blog comment. Yet, there are only so many hours in the day and I’m not a workaholic. My wife might disagree with that statement since I work 7 days a week. But it’s not 10 hours a day. And it’s on my own schedule. If I want to binge watch House of Cards I can do that.

Right now I simply have to acknowledge that I’m going to drop the ball here and there. I’m not Superman.

Don’t Think About Doing It

Action Jackson Action Figures

One of the ways I was able to become more productive was to catch myself when I began to think about doing something. I’d think about returning that email. Or I would begin to compose a blog post in my head. Or I’d ruminate about the steps I needed to take for an upcoming audit.

Thinking about these things took up a lot of time. Time I could spend actually getting work done. And in the case of blogging, once I’d written it in my head I was far less passionate about putting it down on ‘paper’.

So I made a real effort to start doing what I was thinking about doing. I haven’t mastered this and sometimes realize I’ve been thinking about doing instead of actually doing for a the last 15 minutes. But I’ve gotten a lot better.

I find that doing something in the physical world helps a lot. Taking something from my honey-do list, something as simple as folding and putting away my clothes, can help to put me back on the right track.

Are We There Complacent Yet?

Complacency Kills Grafitti

I’m probably not as paranoid as Bill Gates or Steve Jobs were when they were at the top of their game. But I try very hard not to get complacent. I shouldn’t feel like I can get away with delivering an audit late. But the thought creeps into my mind as I juggle commitments and that’s a bad place to be. Because at some point that’s going to bite me in the ass. Maybe not today. Maybe not tomorrow. But it will.

It already did to some degree.

At the beginning of 2012 I began writing Marketing Biz for Marketing Land. In 2013 I started to get paid for that work. By March I was spending more time than I’d like on it (getting paid for it made me want to do it better), my interest waned and there were some creative differences about the column. Nothing serious but it was mutually decided it would be best to shutter Marketing Biz.

I stayed on and helped out with the Period Table of SEO Success Factors. I was proud of and enjoyed that work. But I dropped the ball on the next project and was quickly asked if I had enough time to continue and I gratefully took the opportunity to say no.

I tried to do too much and wanted to keep that working relationship with Danny Sullivan and Matt McGee. Not because of the connections they have (screw that) but because they’re just smart, good people. So leaving on those terms sucked.

Exposure vs Scarcity

Exposure

The selling point for doing all of the above was, to some degree, exposure. In our industry you don’t get much bigger than Search Engine Land and Marketing Land. (Though I was proud as a peacock to get one of my posts on Techmeme last year.)

I thought of Marketing Biz as a natural out-growth of my normal curation activities. Not only that but it wasn’t content I would have put on my own blog. So Marketing Biz was my own little place where I might build a reputation and exposure beyond traditional SEO.

That was different than writing a guest post or even being a contributing writer. It didn’t violate my thoughts on guest blogging. It helped that I wasn’t after more exposure at that point, but I’d like to think if I had to do it all over again that I’d do it the same way.

The real question was did I need more exposure? I was turning away business as it stood. I wasn’t eager to drive more people to my door just so I could say no or, even worse, take on additional business and juggle even more work.

Obviously I need to continue to build my reputation, but I’m not sure that’s accomplished by heaping on more and more exposure.  I don’t want to fade away completely and I grok the mere exposure effect. You need to have some degree of mindshare. But I don’t feel the need to be trending all the time.

I haven’t figured out the balance yet. But I do know this. I  want to continue to earn my reputation not coast on it.

Scaling Experiments

House of Cards

Three years ago I had the opportunity to chat with Wil Reynolds. He admitted that he never really thought he’d scale SEER to its current size. But people came to him asking for his help and he wanted to say yes. The only way he could was to bring more people on board. I understand where he’s coming from. Totally.

Yet, I also know I’m not cut out to run a big operation (alone). I don’t enjoy managing people. Well, some people I do. (Hello Keith, Kirby and Jeremy!) But I have a really short fuse when it comes to effort and the ability to pick up new material.

Keep up with and (better yet) challenge me and I’m a great boss. Fall behind and make me explain something twice and I’ll make your life a living hell.

But 2013 was the year that I was going to experiment. I didn’t hire anyone. That’s a huge step! But I did bring a few people on Voltron style on specific jobs. They’d do a fair chunk of the audit punch list and I’d review, edit and add to it as well as do most of the client interaction and presentation.

It worked okay but it didn’t save me as much time as I had hoped. Maybe that would get better as I worked with them more and I’m still open to it to a certain degree. Admittedly, it did feel good to write and send those checks at the end of the project.

I’m just not sure scaling satisfies me. I might be able to make more money but the incremental amount doesn’t seem like enough unless I loosened my grip on the work product and took on a lot more clients. I’m not really prepared to do that. I want to be involved in the client work. I want to unlock the riddles and chase down the red herrings.

This year I’ll be experimenting with other ways of scaling.

Friends

Friends Logo

Despite a lot of the negativity in the industry, and there’s a lot to be negative about, I found a number of colleagues who supported, encouraged and inspired me.

Whether it was someone like Dennis Goedegebuure giving me a good reference to a massive client (which I secured), watching Joel Klettke evolve and hit his stride, chatting with Dan Shure, IMing with Zeph Snapp or plusing with Mark Traphagen, I was reminded of how lucky I am. (I’m leaving a ton of great people out here but I only have so much space. But the entire community of those who link, Tweet, comment, plus and generally support me continues to overwhelm me.)

I want to be the same person I was when I met these people. Or as close to the same person as I can be, since you’re constantly evolving as a person. I recognize that getting out there, following the golden rule and staying grounded is essential.

I don’t ever want to feel like I’m too cool for school.

And for someone who works at home, having these relationships is huge. Don’t get me wrong, I love working at home. The days I have to drive to a client on the Peninsula or when we’re driving back from my daughter’s tennis class during rush hour remind me just how much I abhor commuting.

But normal interactions, both work related and off-topic, help to break things up and keep you connected. Isolation can be a real issue if you’re working at home so making time for real conversation is important.

Organize!

Color Organized Cars in Parking Lot

Enough of the trials and tribulations. I had to have done some things right to have gotten here, right? I sure did.

I’m super organized. I have a digital filing system so I never have to wonder where to find something. I have another filing system (very limited) for my payables and receivables. Nearly everyday I clean up my desktop and make sure nothing builds up.

I live by my Google calendar and I often block off time for client work, making it easy for me to get focused and not schedule too many meetings that require context switching and reduce productivity.

I also refined a whole bunch of business templates so that I have off-the-shelf ready templates for proposals, agreements, kick-off notes, audits, guides and invoices. For some I even have a few different flavors based on the type of engagement. Doing all of this work up front makes a big difference.

Sometimes it feels like I’m tidying up as a form of procrastination but being organized makes me feel calm and that’s important.

Sweat

Sweating

I kept the weight off this year for the most part, got a Fitbit and stayed active. It’s great going into the garage, getting onto our elliptical machine and sweating for 45 minutes as you stream an episode of Arrow on Netflix. Seriously, how cool is technology?!

I also took up tennis. I’d played here and there and my wife played in high school and a wee bit in college. But it was watching my daughter take classes from Coach Joe that really got both my wife and I back into it. Let me tell you, you can learn and pick up a lot just by watching a very talented, passionate and personable tennis pro teach others. (There’s a lesson here about learning overall if you’re paying attention.)

Exercise helps clear my head and helps me solve problems. It’s a lubricant of sorts, allowing me to unclog a whole bunch of mental blocks.

Best Job Ever

Best Ever

Despite my bitching and moaning, this is the best job I’ve ever had and I sometimes take a step back and am amazed, a goofy smile rising to my face. I make good money working with great clients doing something I genuinely like doing from the comfort of my own home. Jackpot!

But the real treasure has been spending time with my wife and really being here for my daughter as she grows up. Yesterday when she got home she told me about a new game she and her friends made up at school called Monkey In The Middle Two Square. (The rules are quite complicated.) Late last this year I attended her geography bee and even had to cancel a phone call because who knew a geography bee would take nearly two hours!

Do I have all the answers on how my business will evolve? Nope. And that’s okay. Anyone who tells you they have it all figured out is either stupid or lying (or both.)

2014

Looking Forward

This year I look forward to blogging more. I’m going to talk about attention hacking and argue against the filter bubble among other things. I want to attend and potentially speak at Pubcon Las Vegas.

I’ll look to pivot some of the business into being a start-up marketing advisor. Because it turns out I have a pretty good track record helping start-ups secure another round of funding or positive exit.

Of course I also want to continue to help my clients to crush their business goals. But most importantly, I plan to stay healthy, happy, optimistic and connected. Something I wish for all of you reading as well.

Are You Winning The Attention Auction?

January 20 2014 // Marketing + SEO + Social Media // 30 Comments

Every waking minute of every day we choose to do one thing or another.

For a long time we didn’t have many choices. Hunt the mammoths or mind the fire. Read the bible or tend the crops. I can remember when we only got six television stations on an old black and white TV.

But as technology advances we’re afforded more choices more often.

Freedom of Choice by Devo

We can decide to talk about the weather with the person next to us in the doctor’s waiting room or stare into our phone and chuckle at a stupid BuzzFeed article. We can focus on that Excel spreadsheet or we can scroll through our Facebook feed.

You can sit on the couch and watch The Blacklist or you can sit on that same couch and read Gridlinked by Neal Asher on a Kindle. You could go out and play tennis or you could go out and play Ingress and hack some portals.

I was going to overwhelm you with statistics that showed how many choices we have in today’s digital society, such as the fact that the typical email subscriber gets 416 commercial emails every month. That’s more than 10 a day!

I could go on and on because there’s a litany of surveys and data that tell the same story. But … we all know this from experience. We live and breath it every day.

We all choose to look, hear and do only so many things. Because there are only so many hours in each day.

Our time and attention is becoming our most valued resource. (Frankly, we should really guard it far more fiercely than we do.) As marketers we must understand and adapt to this evolving environment. But … it’s not new.

The Attention Auction

Content Doge Meme

There’s always been an auction on attention. That critical point in time where people decide to give their attention to one thing over the other.

Recently, there’s been quite a kerfluffle over the idea of content shock. That there’s too much content. There are some interesting points in that debate but I tend to believe the number of times content comes up in the auction has increased quite a bit. We consume far more content due to ubiquitous access.

Sure there’s more content vying for attention. But there are more opportunities to engage and a large amount of content never comes up in the auction because of poor quality or mismatched interest.

There are hundreds of TV channels but really only a handful that are contextually relevant to you at any given time. Even if there are 68 sports channels the odds that you are in the mood to watch sports and that there will be something on each of those stations at the same time that you want to watch is very small. If you’re looking to watch NFL Football then Women’s College Badminton isn’t really an option.

More importantly, I believe that we’ve adapted to the influx of content. It’s knowing how we’ve adapted that can help marketers win the attention auction more often.

We Are Internet Old!

Sample Geocities Page

Adolescents often do very reckless things. They run red lights. They engage in binge drinking. They have unprotected sex. While some point to brain development as the cause (and there’s some truth to that), I tend to believe Dr. Valerie Reyna has it right.

The researchers found that while adults scarcely think about engaging in many high-risk behaviors because they intuitively grasp the risks, adolescents take the time to mull over the risks and benefits.

It’s not that adolescents don’t weigh the pros and cons. They do and actually overestimate the potential cons. But despite that, they choose to play the odds and risk it more often than adults. In large part, this can be attributed to less life experience. They’ve had fewer opportunities to land on the proverbial whammy.

As we grow older we actually think less about many decisions because we have more experience and we can make what is referred to as ‘gist’ decisions. From my perspective it simply means we grok the general idea and can quickly say yea or nay.

So what does any of this have to do with the Internet, attention or content?

When it comes to consuming digital content, we’re old. We’ve had plenty of opportunities to experience all sorts of content to the point where we don’t have to think too hard about whether we’re going to click or not. If it fits a certain pattern we have a certain response.

Nigerian Email Scam

Nay! A thousand times nay.

The vast majority of content being produced is, to put it bluntly, crap. Technology has a lot to do with this. It is both easy and free to create content in written or visual formats. From WordPress to Tumblr to Instagram, nearly anyone can add to the content tidal wave.

Of course, the popularity of ‘content marketing’ has increased the number of bland, “me too” articles, not to mention the eyesore round-up posts that are a simulacrum of true curation.

People have wasted too much time and attention on shitty content. The result? We’re making decisions faster and faster by relying on those past experiences.

We create internal shortcuts in our mind for what is good or bad. It’s a shortcut that protects us from wasting our time and attention, but may also prevent us from finding new legitimate content. So how do we address this cognitive shortcut? How do you win the attention auction?

You can ensure that you fit that shortcut and you can add yourself to that shortcut.

Fit The Shortcut

Getting Attention

Purple Goldfish

Fitting the shortcut is simple to say, but often difficult to execute. Make sure that, at a glance, you get the attention of your user. There are plenty of ways to do this from writing good titles to using appropriate images to leveraging social sharing.

When ’1-800 service’ pops up on caller ID you’re probably making a snap decision that it’s a telemarketer and you’ll ignore the call. When it’s the name of your doctor or someone from your family you pick up the phone. This same type of process happens on nearly all social platforms as people scan feeds on Twitter, Google+ and Facebook.

Recently Facebook even admitted to the issues revolving around feed consumption.

The fact that less and less of brands’ content will surface is described as a result of increased competition for limited space, since “content that is eligible to be shown in news feed is increasing at a faster rate than people’s ability to consume it.”

Now this is a bit disingenuous since Facebook is crowding out legitimate content for ads (a whole lot of ads) but the essence of this statement is true. Not only that but your content is at a disadvantage on Facebook since much of the content is personal in nature. Cute pictures of your cousin’s kids are going to trump and squeeze out content from brands.

So with what space you’re left with on these platforms, you better make certain it has the best chance of getting noticed and fitting that shortcut. The thing is, too many still don’t do what’s necessary to give their content the best chance of success.

If you’re not optimizing your social snippet you’re shooting your content in the foot.

Be sure your title is compelling, that you have an eye catching image, that the description is (at a minimum) readable and at best engages and entices. Of course, none of this matters unless that content finds its way to social platforms.

Make sure you’re encouraging social sharing. Don’t make me hunt down where you put the sharing options or jump through hoops once I get there.

Ensure your content is optimized for both social and search. And when you’re doing the latter rely on user centric syntax and intent to guide your optimization efforts.

Your job is to fit into that cognitive shortcut by making it easy for users to see and understand your content in the shortest amount of time possible.

Keeping Attention

Bored One Ear To Death LOLcat

Getting them to your content is the first step in winning their attention. At that point they’re giving you the opportunity to take up more of their time and attention. They made a choice but they’re going to be looking to confirm whether it was a good one with almost the same amount of speed.

When you land on a new website you instantly (perhaps unconsciously) make a decision about the quality and authority of that site and whether you’ll stick around.

A websites’ first impression is known to be a crucial moment for capturing the users interest. Within a fraction of time, people build a first visceral “gut feeling” that helps them to decide whether they are going to stay at this place or continue surfing to other sites. Research in this area has been mainly stimulated by a study of Lindgaard et al. (2006), where the authors were able to show that people are able to form stable attractiveness judgments of website screenshots within 50 milliseconds.

That’s from a joint research paper from the University of Basel and Google Switzerland about the role of visual complexity and prototypicality regarding first impression of websites (pdf).

Once they get to the content you need to ensure they instantly get positive reinforcement. Because at the same time there are other pieces of content, other things, battling for attention.

Grumpy Cat Nope

So if they don’t instantly see what they’re looking for you’re giving them a reason to say nope. If what they see on that page looks difficult to read. Nope. If they see grammatical errors. Nope. If they feel the site is spammy looking. Nope.

There is a drum beat of research, examples and terms that underscore the importance of reducing friction.

Books On Reducing Friction

Call it cognitive fluency or cognitive ease, either way we seek out things that are familiar and look like we expect. Books such as Barry Schwartz’s Paradox of Choice and Steve Krug’s Don’t Make Me Think make it clear that too many choices reduce action and satisfaction. And we should all internalize the fact that the majority of people don’t read but instead skim articles.

That doesn’t mean that the actual content has to suffer. I still write what are considered long-form posts but format them in ways that allow people to get meaning from them without having to read them word for word.

Do I hope they’re poring over every sentence? Absolutely! I’m passionate about my writing and writing in general. But I’m a realist and would prefer that more people learn or take something from my writing than have a select few read every word and laud me for sentence construction.

I still point people to my post on readability as a way to get started down this road. Make no mistake, those who optimize for readability will succeed (even with lesser content) than those that refuse to do so out of ego or other rationalizations (I’m looking at you Google blogs).

I will shout in the face of the next person who whines that they shouldn’t have to use an image in their post or that they only want people who are ‘serious about the subject’ to read their article. Wake up before you’re the Geocities of the Internet.

Tomato

The one thing I do know is that being authentic and having a personality can help you stand out. It can help you to at least get and retain attention and sometimes even become memorable. Here’s a bit of writing advice from Charles Stross.

Third and final piece of advice: never commit to writing something at novel length that you aren’t at least halfway in love with. Because if you’re phoning it in, your readers will spot it and throw rotten tomatoes at you. And because there’s no doom for a creative artist that’s as dismal as being chained to a treadmill and forced to play a tune they secretly hate for the rest of their working lives.

The emphasis is mine. Don’t. Phone. It. In.

Add To The Shortcut

Using Attention

Dude Where's My Car?

When you do get someone’s attention, what are you doing with it? You want them to add your site, product or brand to that cognitive shortcut. So the next time a piece of that content comes up in the attention auction you’ve got the inside track. They recognize it and select it intuitively.

For instance, every time I see something new from Matthew Inman of The Oatmeal, I give it my attention. He’s delivered quality and memorable content enough times that he doesn’t have to fight so hard for my attention. I have a preconceived notion of quality that I bring to each successive interaction with his content.

Welcome to branding 101.

Consistently creating positive and memorable interactions (across multiple channels) will cause users to associate your site, product or brand as being worthy of attention.

Let me be more explicit about that term ‘interactions’. Every time you’re up in the attention auction counts as an interaction. So if I choose to pass on reading your content, that counts and not in a good way. We’re creatures of habit so the more times I pass on something the more likely I am to continue passing on it.

Add to that the perception (or reality) that we have less time per piece of content and each opportunity you have to get in front of a user is critical.

Now, if I actually get someone to share a piece of content, will it be presented in a way that will win the attention auction? If it isn’t not only have I squandered that user action but I may have created a disincentive for sharing in the future. If I share something and no one gives me a virtual high five of thanks for doing so will I continue to share content from that source?

Poor social snippet optimization is like putting a kick-me sign on your user’s back.

Memorable

Make A Short Cut

If you want to be added to that cognitive shortcut you need to make it easy for them to do so. You need them to remember and remember in the ‘right’ way.

I’ve read quite a bit lately about ensuring your content is useful. I find this bit of advice exceedingly dull. I mean, are you creating content to be useless? I’m sure content spammers might but by in large most aren’t. Not only that but there’s plenty of great content that isn’t traditionally useful unless you count tickling the funny bone as useful.

Of course you’ve also probably read about how tapping into emotion can propel your content to the top! Well, there’s some truth to that but that’s often at odds with being useful such as creating a handy bookmarklet or a tutorial on Excel. I suppose you could link it to frustration but you’re not going to have some Dove soap tear-jerker piece mashed up with Excel functions. Even Annie Cushing can’t pull that off.

Story telling is also a fantastic device but it’s not a silver bullet either. Mind you, I think it has a better chance than most but even then you’re really retaining attention instead of increasing memory.

Cocktail Party

You have to make your content cocktail party ready. Your content has to roll off the tongue in conversation.

I read this piece on Global Warming in The New York Times.

I heard this song by Katy Perry about believing in yourself.

I saw this funny ad where Will Ferrell tosses eggs at a Dodge.

Seriously, when you’re done with a piece of your content, describe it to someone out loud in one sentence. That’s what it’ll be reduced to for the most part.

As humans we categorize or tag things so we can easily recall them. I think the scientific term here is ‘coding’ of information. If we can’t easily do so it’s tough for us to talk about them again, much less find them again. As an aside, re-finding content is something we do far more often than we realize and is something Google continues to try to solve.

Even when we can easily categorize and file away that bit of information, we’re not divvying it up into a very fine structure. Only the highlights make it into memory. We only take a few things from the source information. A sort of whisper down the lane effect takes place. You suddenly don’t remember who wrote it, or where you saw it.

We’re trying to optimize the ability to recall that information by using the right coding structure, one that we’ll be able to remember.

Shh Armpit

It’s the reason you need to be careful about if or how you go about guest blogging. This is also why I generally despise (strong word I know) Infographics. Because more often than not if you hear someone refer to one they say ‘That Infographic on Water Conservation’ or ‘That Infographic on The History of Beer’.

Guess what, they have no clue where they saw it or what brand it represents. Seriously. Because usually the only two things remembered are the format (Infographic) and the topic. When I ask people to name the brands behind Infographics I usually get two responses: Mint and OK Cupid. Kudos to them but a big raspberry for the rest of you.

“But the links” I hear some of you moan. Stop. Stop it right now! That lame ass link (no don’t tell me about the DA number) is nothing compared to the attention you just squandered.

I’m not saying that Infographics can’t work, but they have to be done thoughtfully, for the right reasons and to support your brand. Okay, rant over.

Ensuring people walk away with a concise meaning increases satisfaction. And getting them to repeat it to someone else helps secure your content in memory. The act of sharing helps add your site or brand to that user’s shortcut.

If there were a formula you could follow that would guarantee great content, why is there so much crap? If we all knew what makes a hit song or a hit movie why isn’t every song and film a success? This isn’t easy and anyone telling you different is lying.

Consistent

Janet Jackson

You can also add to the shortcut by creating an expectation. This can be around the quality of your content but that’s pretty tough to execute on. I mean, I completely failed at generating enough blog content last year. I’m not advocating a paint-by-numbers schedule, but I had more to say and at some point if you’re name isn’t out there they begin to forget you.

There’s a fair amount of research that shows that memory is a new mapping of neurons and that the path becomes stronger with repeated exposure. You inherently know this by studying. The more you study the more you remember.

But what if the memory of your site or brand, that path you’re creating in your user’s mind, isn’t clear. What if the first time you associate the brand with one thing and the next time it’s not quite that thing you thought it was. Or that the time between exposures is so great that you can’t find that path anymore and inadvertently create a new path. How many times have you saved something only to realize you already saved it at some point in the past?

Now, I’m out there in other ways. I keep my Twitter feed going with what I hope is a great source of curated content across a number of industries. My Google+ feed is full of the same plus a whole bunch of other content that serves as a sort of juxtaposition to the industry specific content.

One of the more successful endeavors on Google+ is my #ididnotwakeupin series where I share photos from places around the world. It’s a way for me to vicariously travel. So every morning for more than two years I’ve posted a photo tagged with #ididnotwakeupin.

The series gets a decent amount of engagement and if I tried harder (i.e. – interacted with other travel and photography folks) I’m pretty sure I could turn it into something bigger. I even had an idea of turning it into a coffee table book. I haven’t though. Why? Because there’s only so much time in every day. See what I did there?

Another example of this is Moz’s Whiteboard Friday series. You aren’t even sure what the topic is going to be but over time people expect it to be good so they tune in.

Or there’s Daily Grace It’s Grace on YouTube where people expect and get a new video from Grace Helbig every Monday through Friday. Want to double-down on consistent? Tell me what phrase you remember after watching this video from Grace (might be NSFW depending on your sensitivity).

Very … yeah, you know.

That’s right. Repetition isn’t a bad thing. The mere exposure effect demonstrates that the more times we’re exposed to something the better chance we’ll wind up liking it. This is what so many digital marketing gurus don’t want you to hear.

Saturation marketing (still) works because more exposure equals familiarity which improves cognitive fluency which makes it easier to remember.

It’s sort of like the chorus in a song, right? Maybe you don’t know all the words to each verse but you know the chorus! Particularly if you can’t get away from hearing it on the radio every 38 minutes.

In some ways, the number of exposures necessary is inversely proportional to the quality of the content. Great content or ads don’t need much repetition but for me to know that it’s JanuANY at Subway this month might take a while.

Climbing Mount Diablo

And the biggest mistake I see people make is stopping. “We blogged for a few months and saw some progress but not enough to keep investing in it.” This is like stopping your new diet and exercise regimen because you only lost 6 pounds.

You always have to be out there securing and reinforcing your brand as a cognitive shortcut.

Does Pepsi decide that they just don’t need to do any more advertising? Everyone knows about Pepsi so why spend a billion dollars each year marketing it? You just can’t coast. Well, you can, but you’re taking a huge risk. Because someone or something else might fill the void. (Note to self, I need to take this advice.)

Shared

Everywhere

The act of sharing content likely means it will be remembered. To me it’s almost like having to describe that content in your head again as you share it. You have that small moment where you have to ask questions about what you’re sharing, with who and why it’s interesting.

So sharing isn’t just about getting your content in front of other people it’s helping to cement your content in the mind of that user.

Of course, having the same piece of content float in front of your face a number of times from different sources helps tremendously. Not only are you hitting on the mere exposure effect you’re also introducing some social proof to the equation.

To me the goal isn’t really to ‘go viral’ but to increase the number of times I’m winning the attention auction by getting there more often with an endorsement.

You might not click on that ‘What City Should You Actually Live In?‘ quiz on Facebook the first time but after four people have posted their answers you just might cave and click through. (Barcelona by they way.)

Examples

Breaking Bad

Walt and Jessie Suited Up on The Couch Eating

How did Breaking Bad become such a huge hit? It wasn’t when it first started out. I didn’t watch the first two seasons live.

But enough people did and AMC kept the faith and kept going. Because enough people were talking about it. It was easy to talk about too. “This show where a chemistry teacher becomes a meth dealer.” Bonus points that the plot made it stand out from anything else on TV.

And then you figured out that you could watch it on Netflix! People gave it a try. Then they began to binge watch seasons and they were converts. They wanted more. MOAR!

Of course none of it would have happened if it weren’t a great show. But Breaking Bad was also consistent, persistent, memorable and available.

BuzzFeed

BuzzFeed Logo

I know what you’re thinking. BuzzFeed? Come on, their content sucks! And for the most part I’d have to agree. But it’s sort of a guilty pleasure isn’t it?

Here’s why I think BuzzFeed works. You’ve found yourself on a BuzzFeed ‘article’ a number of times. It’s not high quality in most senses of the word but it does often entertain. Not only that it does so very quickly.

If I’m ‘reading’ the 25 Times Anna Kendrick Was Painfully Accurate post I’m only scrolling through briefly and I do get a chuckle or two out of it. This has happened enough times that I know what to expect from BuzzFeed.

I’ve created a cognitive shortcut that tells me that I can safely click-through on a BuzzFeed post because I’ll get a quick laugh out of it. They entertain and they respect my time. For my wife that same function is filled by Happy Place.

Blind Five Year Old

Blind Five Year Old Logo

How about my site and personal brand? I’ve done pretty well but it took me quite a while to get there, figuring out a bunch of stuff along the way.

Seriously, I blogged in relative obscurity from 2008 to 2010. But over time the quality of my posts won over a few people. But quality wasn’t enough. I also got better and better at optimizing my content for readability and for sharing.

I use a lot of images in my content. And I spend a lot of time on selecting and placing them. I still think I botched the placement of an image in my Keywords Still Matter post. And it still irks me. No, I’m not joking.

The images make it easier to read. Not only do they give people a rest, they allow me to connect on a different level. Sometimes I might be able to communicate an idea better with the help of that image. It helps to make it all click.

I use a lot of music references as images. Part of it is because I like music but part of it is because if you’re suddenly singing that song in your head, then you’re associating my content with that song, if even just a little. When I do that I have a better chance of you remembering that content. I’ve helped create a tag in your mental filing system.

I try to build more ways for you to connect my content in your head.

TL;DR

We have more choices more often when it comes to content. In response to this we’re protecting our time and attention by making decisions on content faster. Knowing this, marketers must work harder to fit cognitive shortcuts we’ve created, based on experience, for what is perceived as clickable or authoritative content.

Alternatively, the consistent delivery and visibility of memorable content can help marketers create a cognitive shortcut, giving themselves an unfair advantage when their content comes up in the attention auction.

Stop Carly Rae Content Marketing

December 17 2013 // Marketing + SEO // 12 Comments

Lately I’ve gotten a few too many Carly Rae content marketing emails, which makes me both sad and grouchy. This is not the way to promote content, gain fans or build a brand. Stop it.

What Is Carly Rae Content Marketing?

Carly Rae Content Marketing

The term comes from Carly Rae Jespen’s popular Call Me Maybe song which contains the following lyrics.

Hey I just met you
And this is crazy
But here’s my number
So call me maybe

I’ve changed the lyrics slightly to reflect the emails I’m increasingly receiving from folks.

Hey I just met you
And this is crazy
But here’s my content
So promote me maybe

Carly Rae content marketing are out of the blue outreach emails from people you have no relationship with asking you to promote their content or engage in some other activity. In the end it’s just shoddy email spam.

It’s An Honor To Be Nominated?

The Oscars

I’m sure some of you are thinking that I’m ungrateful. The fact that I’m getting these emails shows that people want my endorsement. Perhaps it is better to be noticed than not but if I am some sort of influencer wouldn’t you want to put your best foot forward?

First impressions matter and this one isn’t going to win me over. In fact, I might remember you, your site or brand for the lousy outreach instead.

Win Over The Persnickety

I might demand a higher level of quality than others. So you could simply write me off as some anal-retentive prat with outrageous expectations and a self-inflated ego. But that would be a mistake.

Mr. Fussy

Because if you can put together a pitch that doesn’t make me vomit in my mouth a little bit then you’re likely going to have better luck with everyone else too. In short, win over your toughest critic and you’ll have a powerful outreach message.

Content Marketing Basics

Johns

If you’re doing outreach there are a few things you must get right. A recent post by Tadeusz Szewczyk about the perfect outreach message covered some of the basics. (It’s not perfect in my view but it’s certainly above average.)

You must be relevant, have a decent subject line, get my name right, respect my time and show that you’ve done some rudimentary homework about me. The sad part is that 50% of people fail to even get my name correct. Yup, somehow AJ Kohn is transformed into John. (Clicks trash icon.)

Respect My Time And Brain

Do or Do Not Dumbledore

One of the things that has bothered me lately is the number of people asking me to take time to provide feedback on their content. Feedback! Some of these people might actually want feedback but I’m going to call bullshit on the vast majority. You don’t want feedback. You want me to share and promote your content.

Do you really want me to tell you that your infographic is an eyesore and then not share or promote it? Probably not. I mean, kudos if you really are open to that sort of thing but I’m guessing you’re in promotion mode at this stage and you won’t be asking for a redesign.

Getting me (or anyone) to do something is a high-friction event. Don’t waste it asking them to do the wrong thing.

Honest Teasing

Teased Hair with Aqua Net

Being transparent about what you’re trying to accomplish is almost always the best way to go. If you’re looking for a link, tell them you’re looking for a link. Stop beating around the bush.

I’d also argue that you should be applying marketing principles to outreach. Half the battle is getting me to actually click and read the content. So tease me! Get me interested in what you have to say. Give me a cliff-hanger! Don’t put me to sleep or ask me to promote the content without reading it.

Get me interested so that I view or read your content. At that point you have to be confident that the content is good enough that I’ll share and promote it. Stop trying to do everything all at once in your outreach email.

TL;DR

Stop Carly Rae content marketing! Fast and shoddy outreach might get you a handful of mentions but it won’t lead to long-term success and may actually prevent it in many cases.

Google Now Topics

November 26 2013 // SEO + Technology // 16 Comments

Have you visited your Google Now Topics page? You should if you want to get a peek at how Google is translating queries into topics, which is at the core of the Hummingbird Update.

Google Now Topics

If you are in the United States and have Google Web History turned on you can go to your Google Now Topics page and see your query and click behavior turned into specific topics.

Google Now Topics Example

This is what my Google Now Topics page looked like a few weeks back. It shows specific topics that I’ve researched in the last day, week and month. If you’re unfamiliar with this page this alone might be eye opening. But it gets even more interesting when you look at the options under each topic.

Topic Intent

The types of content offered under each topic is different.

Why is this exciting? To me it shows that Google understands the intent behind each topic. So the topic of New York City brings up ‘attractions and photos’ while the topic of Googlebot just brings up ‘articles’. Google clearly understands that Back to the Future is a movie and that I’d want reviews for the Toyota Prius Plug-in Hybrid.

In essence, words map to a topic which in turn tells Google what type of content should most likely be returned. You can see how these topics were likely generated by looking back at Web History.

Search of Google Web History for Moto X

This part of my web history likely triggered a Moto X topic. I used the specific term ‘Moto X’ a number of times in a query which made it very easy to identify. (I did wind up getting the Moto X and love it.)

Tripping Google Now Topics

When I first saw this page  back in March and then again in June I wanted to start playing around with what combination of queries would produce a Google Now Topic. However, I’ve been so busy with client work that I never got a chance to do that until now.

Here’s what I did. Logged into my Google account and using Chrome I tried the following series of queries (without clicking through on any results) at 1:30pm on November 13th.

the stranger
allentown
downeaster alexa
big shot
pressure
uptown girl
piano man

But nothing ever showed up in Google Now Topics. So I took a similar set of terms but this time engaged with the results at 8:35am on November 16th.

piano man (clicked through on Wikipedia)
uptown girl (clicked through on YouTube)
pressure (no click)
big shot (clicked through on YouTube)
the stranger lyrics (clicked through on atozlyrics, then YouTube)
scenes from an italian restaurant (no click)

Then at 9:20am a new Google Now Topic shows up!

Google Now Topic for Billy Joel Songs

Interestingly it understands that this is about music but it hasn’t made a direct connection to Billy Joel. I had purposefully not used his name in the queries to see if Google Now Topics would return him as the topic instead of just songs. Maybe Google knows but I had sort of hoped to get a Billy Joel topic to render and think that might be the better result.

YouTube Categories

Engagement certainly seems to count based on my limited tests. But I couldn’t help but notice the every one of the songs in that Google Now Topic was also a YouTube click. Could I get a Google Now Topic to render without a YouTube click.

The next morning I tried again with a series of queries at 7:04am.

shake it up (no click)
my best friend’s girl (lyricsfreak click)
let the good times roll (click on Wikipeida, click to disambiguated song)
hello again (no click)
just what i needed (lastfm click)
tonight she comes (songmeanings click)
shake it up lyrics (azlyrics click)

At 10:04 nothing showed up so I decided to try another search.

let the good times roll (clicked on YouTube)

At 10:59 nothing showed up and I was getting antsy, which was probably not smart. I should have waited! But instead I performed another query.

the cars (clicked on knowledge graph result for Ric Ocasek)

And at 12:04 I get a new Google Now Topic.

Let The Good Times Roll Google Now Topic

I’m guessing that if I’d waited a bit longer after my YouTube click that this would have appeared, regardless of the click on the knowledge graph result. It seems that YouTube is a pretty important part of the equation. It’s not the only way to generate a Google Now Topic but it’s one of the faster ways to do so right now.

Perhaps it’s easier to identify the topic because of the more rigid categorization on YouTube?

The Cars on YouTube

I didn’t have time to do more research here but am hoping others might begin to compile a larger corpus of tests so we can tease out some conclusions.

Topic Stickiness

I got busy again and by the time I was ready to write this piece I found that my topics had changed.

New Google Now Topics

It was fairly easy to deduce why each had been produced, though the Ice Bath result could have been simply from a series of queries. But what was even more interesting was what my Google Now Topics looked like this morning.

My Google Now Topics Today

Some of my previous topics are gone! Both Ice Bath and Let The Good Times Roll are nowhere to be found. This seems to indicate that there’s a depth of interaction and distance from event (time) factor involved in identifying relevant topics.

It would make sense for Google to identify intent that was more consistent from intent that was more ephemeral. I was interested in ice baths because my daughter has some plantar fascia issues. But I’ve never researched it before and likely (fingers crossed) won’t again. So it would make sense to drop it.

There are a number of ways that Google could determine which topics are more important to a user, including frequency of searching, query chains, depth of interaction as well as type and variety of content.

Google Now Topics and Hummingbird

OMG It's Full of Stars Cat

My analysis of the Hummingbird Update focused largely on the ability to improve topic modeling through a combination of traditional text analysis natural and entity detection.

Google Now Topics looks like a Hummingbird learning lab.

Watching how queries and click behavior turn into topics (there’s that word again) and what types of content are displayed for each topic is a window into Google’s evolving abilities and application of entities into search results.

It may not be the full picture of what’s going on but there’s enough here to put a lot of paint on the canvass.

TL;DR

Google Now Topics provide a glimpse into the Hummingbird Update by showing how Google takes words, queries and behavior and turns them into topics with defined intent.

What Does The Hummingbird Say?

November 07 2013 // SEO + Technology // 29 Comments

What Does The Fox Say Video Screencap

Dog goes woof
Cat goes meow
Bird goes tweet
and mouse goes squeak

Cow goes moo
Frog goes croak
and the elephant goes toot

Ducks say quack
and fish go blub
and the seal goes ow ow ow ow ow

But theres one sound
That no one knows
What does the hummingbird say?

What Does The Hummingbird Say?

For the last month or so the search industry has been trying to figure out Google’s new Hummingbird update. What is it? How does it work? How should you react.

There’s been a handful of good posts on Hummingbird including those by Danny SullivanBill Slawski, Gianluca Fiorelli, Eric Enge (featuring Danny Sullivan), Ammon Johns and Aaron Bradley. I suggest you read all of these given the chance.

I share many of the views expressed in the referenced posts but with some variations and additions, which is the genesis of this post.

Entities, Entities, Entities

Are you sick of hearing about entities yet? You probably are but you should get used to it because they’re here to stay in a big way. Entities are at the heart of Hummingbird if you parse statements from Amit Singhal.

We now get that the words in the search box are real world people, places and things, and not just strings to be managed on a web page.

Long story short, Google is beginning to understand the meaning behind words and not just the words themselves. And in August 2013 Google published something specifically on this topic in relation to an open source toolkit called word2vec, which is short for word to vector.

Word2vec uses distributed representations of text to capture similarities among concepts. For example, it understands that Paris and France are related the same way Berlin and Germany are (capital and country), and not the same way Madrid and Italy are. This chart shows how well it can learn the concept of capital cities, just by reading lots of news articles — with no human supervision:

Example of Getting Meaning Behind Words

So that’s pretty cool isn’t it? It gets even cooler when you think about how these words are actually places that have a tremendous amount of metadata surrounding them.

Topic Modeling

It’s my belief that the place where Hummingbird has had the most impact is in the topic modeling of sites and documents. We already know that Google is aggressively parsing documents and extracting entities.

When you type in a search query — perhaps Plato – are you interested in the string of letters you typed? Or the concept or entity represented by that string? But knowing that the string represents something real and meaningful only gets you so far in computational linguistics or information retrieval — you have to know what the string actually refers to. The Knowledge Graph and Freebase are databases of things, not strings, and references to them let you operate in the realm of concepts and entities rather than strings and n-grams.

Reading this I think it becomes clear that once those entities are extracted Google is then performing a lookup on an entity database(s) and learning about what that entity means. In particular Google wants to know what topic/concept/subject to which that entity is connected.

Google seems to be pretty focused on that if you look at the Freebase home page today.

Freebase Topic Count

Tamar Yehoshua, VP of Search, also said as much during the Google Search Turns 15 event.

So the Knowledge Graph is great at letting you explore topics and sets of topics.

One of the examples she used was the search for impressionistic artists. Google returned a list of artists and allowed you to navigate to different genres like cubists. It’s clear that Google is relating specific entities, artists in this case, to a concept or topic like impressionist artists, and further up to a parent topic of art.

Do you think that having those entities on a page might then help Google better understand what the topic of that page is about? You better believe it.

Based on client data I think that the May 2013 Phantom Update was the first application of a combined topic model (aka Hummingbird). Two weeks later it was rolled back and then later reapplied with some adjustments.

Hummingbird refined the topic modeling of sites and pages that are essential to delivering relevant results.

Strings AND Things

Hybrid Car

This doesn’t mean that text based analysis has gone the way of the do-do bird. First off, Google still needs text to identify entities. Anyone who thinks that keywords (or perhaps it’s easier to call them subjects) in text isn’t meaningful is missing the boat.

In almost all cases you don’t have as much labeled data as you’d really like.

That’s a quote from a great interview with Jeff Dean and while I’m taking the meaning of labeled data out of context I think it makes sense here. Writing properly (using nouns and subjects) will help Google to assign labels to your documents. In other words, make it easy for Google to know what you’re talking about.

Google can still infer a lot about what that page is about and return it for appropriate queries by using natural language processing and machine learning techniques. But now they’ve been able to extract entities, understand the topics to which they refer and then feed that back into the topic model. So in some ways I think Hummingbird allows for a type of recursive topic modeling effort to take place.

If we use the engine metaphor favored by Amit and Danny, Hummingbird is a hybrid engine instead of a combustion or electric only engine.

From Caffeine to Hummingbird

Electrical Outlet with USB and Normal Sockets

One of the head scratching parts of the announcement was the comparison of Hummingbird to Caffeine. The latter was a huge change in the way that Google crawled and indexed data. In large part Caffeine was about the implementation of Percolator (incremental processing), Dremel (ad-hoc query analysis) and Pregel (graph analysis). It was about infrastructure.

So we should be thinking about Hummingbird in the same way. If we believe that Google now wants to use both text and entity based signals to determine quality and relevance they’d need a way to plug both sources of data into the algorithm.

Imagine a hybrid car that didn’t have a way to recharge the battery. You might get some initial value out of that hybrid engine but it would be limited. Because once out of juice you’d have to take the battery out and replace it with a new one. That would suck.

Instead, what you need is a way to continuously recharge the battery so the hybrid engine keeps humming along. So you can think of Hummingbird as the way to deliver new sources of data (fuel!) to the search engine.

Right now that new source of data is entities but, as Danny Sullivan points out, it could also be used to bring social data into the engine. I still don’t think that’s happening right now, but the infrastructure may now be in place to do so.

The algorithms aren’t really changing but the the amount of data Google can now process allows for greater precision and insight.

Deep Learning

Mr. Fusion Home Reactor

What we’re really talking about is a field that is being referred to as deep learning, which you can think of as machine learning on steroids.

This is a really fascinating (and often dense) area that looks at the use of labeled and unlabeled data and the use of supervised and unsupervised learning models. These concepts are somewhat related and I’ll try to quickly explain them, though I may mangle the precise definitions. (Scholarly types are encouraged to jump in an provide correction or guidance.)

The vast majority of data is unlabeled, which is a fancy way of saying that it hasn’t been classified or doesn’t have any context. Labeled data has some sort of classification or identification to it from the start.

Unlabeled data would be the tub of old photographs while labeled data might be the same tub of photographs but with ‘Christmas 1982′, ‘Birthday 1983′, ‘Joe and Kelly’ etc. scrawled in black felt tip on the back of each one. (Here’s another good answer to the difference between labeled and unlabeled data.)

Why is this important? Let’s return to Jeff Dean (who is a very important figure in my view) to tell us.

You’re always going to have 100x, 1000x as much unlabeled data as labeled data, so being able to use that is going to be really important.

The difference between supervised learning and unsupervised learning is similar. Supervised learning means that the model is looking to fit things into a pre-conceived classification. Look at these photos and tell me which of them are cats. You already know what you want it to find. Unsupervised learning on the other hand lets the model find it’s own classifications.

If I have it right, supervised learning has a training set of labeled data where a unsupervised learning has no initial training set. All of this is wrapped up in the fascinating idea of neural networks.

The different models for learning via neural nets, and their variations and refinements, are myriad. Moreover, researchers do not always clearly understand why certain techniques work better than others. Still, the models share at least one thing: the more data available for training, the better the methods work.

The emphasis here is mine because I think it’s extremely relevant. Caffeine and Hummingbird allow Google to both use more data and to process that data quickly. Maybe Hummingbird is the ability to deploy additional layers of unsupervised learning across a massive corpus of documents?

And that cat reference isn’t just because I like LOLcats. A team at Google (including Jeff Dean) was able to use unlabeled, unsupervised learning to identify cats (among other things) in YouTube thumbnails (PDF).

So what does this all have to do with Hummingbird? Quite a bit if I’m connecting the dots the right way. Once again I’ll refer back the Jeff Dean interview (which I seem to get something new out of each time I read it).

We’re also collaborating with a bunch of different groups within Google to see how we can solve their problems, both in the short and medium term, and then also thinking about where we want to be four years, five years down the road. It’s nice to have short-term to medium-term things that we can apply and see real change in our products, but also have longer-term, five to 10 year goals that we’re working toward.

Remember at the end of Back to The Future when Doc shows up and implores Marty to come to the future with him? The flux capacitor used to need plutonium to reach critical mass but this time all it takes is some banana peels and the dregs from some Miller Beer in a Mr. Fusion home reactor.

So not only is Hummingbird a hybrid engine but it’s hooked up to something that can turn relatively little into a whole lot.

Quantum Computing

So lets take this a little bit further and look at Google’s interest in quantum computing. Back in 2009 Hartmut Neven was talking about the use of quantum algorithms in machine learning.

Over the past three years a team at Google has studied how problems such as recognizing an object in an image or learning to make an optimal decision based on example data can be made amenable to solution by quantum algorithms. The algorithms we employ are the quantum adiabatic algorithms discovered by Edward Farhi and collaborators at MIT. These algorithms promise to find higher quality solutions for optimization problems than obtainable with classical solvers.

This seems to have yielded positive results because in May 2013 Google upped the ante and entered into a quantum computer partnership with NASA. As part of that announcement we got some insight into Google’s use of quantum algorithms.

We’ve already developed some quantum machine learning algorithms. One produces very compact, efficient recognizers — very useful when you’re short on power, as on a mobile device. Another can handle highly polluted training data, where a high percentage of the examples are mislabeled, as they often are in the real world. And we’ve learned some useful principles: e.g., you get the best results not with pure quantum computing, but by mixing quantum and classical computing.

A highly polluted set of training data where many examples are mislabeled? Makes you wonder what that might be doesn’t it? Link graph analysis perhaps?

Are quantum algorithms part of Hummingbird? I can’t be certain. But I believe that Hummingbird lays the groundwork for these types of leaps in optimization.

What About Conversational Search?

Dog Answering The Phone

There’s also a lot of talk about conversational search (pun intended). I think many are conflating Hummingbird with the gains in conversational search. Mind you, the basis of voice and conversational search is still machine learning. But Google’s focus on conversational search is largely a nod to the future.

We believe that voice will be fundamental to building future interactions with the new devices that we are seeing.

And the first area where they’ve made advances is the ability to resolve pronouns in query chains.

Google understood my context. It understood what I was talking about. Just as if I was having a conversation with you and talking about the Eiffel Tower, I wouldn’t have to keep repeating it over and over again.

Does this mean that Google can resolve pronouns within documents? They’re getting better at that (there a huge corpus of research actually) but I doubt it’s to the level we see in this distinct search microcosm.

Conversational search has a different syntax and demands a slightly different language model to better return results. So Google’s betting that conversational search will be the dominant method of searching and is adapting as necessary.

What Does Hummingbird Do?

What's That Mean Far Field Productions

This seems to be the real conundrum when people look at Hummingbird. If it affects 90% of searches worldwide why didn’t we notice the change?

Hummingbird makes results even more useful and relevant, especially when you ask Google long, complex questions.

That’s what Amit says of Hummingbird and I think this makes sense and can map back to the idea of synonyms (which are still quite powerful). But now, instead of looking at a long query and looking at word synonyms Google could also be applying entity synonyms.

Understanding the meaning of the query might be more important than the specific words used in the query. It reminds me a bit of Aardvark which was purchased by Google in February 2010.

Aardvark analyzes questions to determine what they’re about and then matches each question to people with relevant knowledge and interests to give you an answer quickly.

I remember using the service and seeing how it would interpret messy questions and then deliver a ‘scrubbed’ question to potential candidates for answering. There was a good deal of technology at work in the background and I feel like I’m seeing it magnified with Hummingbird.

And it resonates with what Jeff Dean has to say about analyzing sentences.

I think we will have a much better handle on text understanding, as well. You see the very slightest glimmer of that in word vectors, and what we’d like to get to where we have higher level understanding than just words. If we could get to the point where we understand sentences, that will really be quite powerful. So if two sentences mean the same thing but are written very differently, and we are able to tell that, that would be really powerful. Because then you do sort of understand the text at some level because you can paraphrase it.

My take is that 90% of the searches were affected because documents that appear in those results were re-scored or refined through the addition of entity data and the application of machine learning across a larger data set.

It’s not that those results have changed but that they have the potential to change based on the new infrastructure in place.

Hummingbird Response

Le homard et le chat

How should you respond to Hummingbird? Honestly, there’s not a whole lot to do in many ways if you’ve been practicing a certain type of SEO.

Despite the advice to simply write like no one’s watching, you should make sure you’re writing is tight and is using subjects that can be identified by people and search engines. “It is a beautiful thing” won’t do as well as “Picasso’s Lobster and Cat is a beautiful painting”.

You’ll want to make your content easy to read and remember, link out to relevant and respected sources, build your authority by demonstrating your subject expertise, engage in the type of social outreach that produces true fans and conduct more traditional marketing and brand building efforts.

TL;DR

Hummingbird is an infrastructure change that allows Google to take advantage of additional sources of data, such as entities, as well as leverage new deep learning models that increase the precision of current algorithms. The first application of Hummingbird was the refinement of Google’s document topic modeling, which is vital to delivering relevant search results.

Finding A Look As Well As A Sound

October 28 2013 // Life + Marketing // 9 Comments

(This is a personal post. While it does have a lot of marketing insight it’s also a bit introspective so you’ve been warned if that’s not your thing.)

In the past year I’ve been interviewed by a number of folks. One of the questions that often comes up is who has influenced my work.

I get the sense that thy want me to reference other people in SEO or the marketing industry overall. And don’t get me wrong, there are a number of smart folks out there but most of my influences come from outside the industry.

Artists

At the end of the day I am influenced and inspired by artists. Musicians are often at the top of my list and I regularly listen to music as I do my work, whether it’s Daft Punk or The Chemical Brothers to get me through large chunks of analysis or Adam Ant, Kasabian, Cake or Siouxsie and the Banshees as I put together blog posts or conference decks.

I am continually impressed by artists who go out on that ledge with their own work. Of course nearly everything is derivative in some form, but I admire those that are able to express something in their own way, to put their twist on it with passion. I connect with those that aren’t afraid to be authentic.

Adam Ant Full Costume

I mean, Adam Ant ladies and gentleman! Sure, he’s been a bit off the map psychologically but it doesn’t change his music and his appearance.

“I grew up in the glam era and, for me, every album should have a look as well as a sound.”

See, I appreciate that sentiment. That’s what I think about when I’m working, when I think about what I stand for and what I want people to remember. A fair amount of what I’ve written lately connects to this central theme.

Expression

Ominous Van Gogh

Artists are investing something of themselves into their art, or at least the ones that matter do. You have to find your own voice, not someone else’s voice if you’re going to make an impression.

Will what you express always find an audience? Nope. Sometimes it just might take a long time for you to finally get that recognition, for people to understand what you’re trying to communicate. Or maybe it never happens. Face it, not everyone is expressing something of value. #truestory

But it is the attempt, on your own terms, that matters I think. Or at least that’s what I’ve embraced. This is slightly different then the failing your way to success mantra. I believe that, but I think what you’re failing at matters a lot.

For well over two years I blogged here in relative obscurity. Did I get better over those two years? Hell yes! I still think some of those early posts are solid but it took time for me to put together that my content had to ‘have a look as well as a sound.’

Authenticity

Ubik Book Cover Art

But I also try to put as much of myself into this blog, both in normal posts and the more personal ones.

I’m not talking about the ‘the mistake I made that turned out to change my business for the better’ posts that seem to be so en vogue lately. Yeah, we get that you can learn from your mistakes but it’s all too … tidy.

But reality is messy and I feel like it’s exposing that reality that resonates. A better representation of this is my Google+ feed where I share things that I find funny, interesting or poignant along with my normal industry content. It could be the IPA I’m drinking at Beer O’Clock or a picture of some Sleestaks.

And many of my blog posts are actually just me documenting stuff that I’m figuring out, because there’s always something more to learn.

Periods

Violator Depeche Mode Album Cover

The trite thing to say is that I’ve been lucky to have such success, but that type of humble brag isn’t authentic. I worked hard (and continue to) and am very happy for the recognition. While I can’t reveal many of my clients due to NDAs I’m damn proud to count 2 of the top 50 websites as clients.

I had a plan to develop my personal brand and I attacked it with 50% of my time. One of the things that worked out early on was exploring Google+ and Authorship. I didn’t do this because I thought I could make it into something but because I truly did see something interesting.

But should I just continue to blog about those things even if my interest has waned? I think many people, sites and brands get stuck doing what has brought them success in the past. And that makes sense in many ways. Marketing is often about finding what works and repeating that.

Not only that but the fans and followers you’ve garnered provide a huge boost to your confidence to say nothing of their ability to amplify your content. I can’t tell you how meaningful it is to have that support. I don’t take that for granted for a second.

But if you’re an artist, you evolve and grow.

What you want to express changes. In talking about writing this post with my wife she told me about how she and her friend listened to Depeche Mode’s Violator album when it first came out. They hated it. It was a departure from their prior work. It took her time to embrace the new album but today it’s still one of her favorites.

So I did write about Authorship again recently but I feel like that was an ending. I doubt I will again. Instead I’ll continue to write and explore what I’m passionate about. Maybe that won’t be as popular and that’s … okay.

Don’t get me wrong, I hope it is! No artist doesn’t want to achieve success. But just as importantly, success doesn’t define them.

Inspiration

Drive's Scorpion Jacket

So in the end I am influenced by those who inspire me to do better, who challenge me to get out of my rut.

It’s those that I read, look at or listen to and make me feel something. It’s that photo of Los Angeles that brings back a flood of memories. It’s the mood that Wang Chung’s To Live and Die in LA instantly creates. (Seriously folks the entire album is incredible.)

So maybe I’ll get up in this jacket at a conference and turn my presentation into a performance. Or maybe I’ll just work to encourage my clients to be authentic and to find a look and sound for their content.

No matter what it is, I’m energized by the idea of putting myself out there (again) and taking those risks and seeing how people react.

Authorship Is Dead, Long Live Authorship

October 24 2013 // SEO // 62 Comments

Google’s Authorship program is still a hot topic. A constant string of blog posts, conference sessions and ‘research’ projects about Authorship and the idea that it can be used as a ranking signal fill our community.

I Do Not Think It Means What You Think It Does

Yet, the focus on the actual markup and clock-watching when AuthorRank might show up may not be the best use of time.

Would it surprise you to learn that the Authorship Project at Google has been shuttered? Or that this signals not the death of Authorship but a different method of assigning Authorship.

Here’s my take on where Authorship stands today.

RIP Authorship Project

The Authorship Project at Google was headed up by Othar Hansson. He’s an incredibly smart and amiable guy, who from time to time was kind enough to provide answers and insight into Authorship. I was going to reach out to him again the other day and discovered something.

Othar Hansson Google+ About

Othar no longer works on the Authorship Project. He’s now a principal engineer on the Android search team, which is a pretty sweet gig. Congratulations!

Remember that it was Othar who announced the new markup back in June of 2011 and then appeared with Matt Cutts in the Authorship Markup video. His departure is meaningful. More so because I can’t locate a replacement. (That doesn’t mean there isn’t one but … usually I’m pretty good at connecting with folks.)

Not only that but there was no replacement for Sagar Kamdar, who left as product manager of Authorship (among other things) in July of 2012 to work at Google X and, ultimately, Project Loon.

At the time I thought the writing was on the wall. The Authorship Project wasn’t getting internal resources and wasn’t a priority for Google.

Authorship Adoption

Walter White with his Pontiac Aztec

The biggest problem with Authorship markup is adoption. Not everyone is participating. Study after study after study show that there are material gaps in who is and isn’t using the markup. Even the most rosy study of Authorship adoption by technology writers isn’t anything to write home about.

Google is unable to use Authorship as a ranking signal if important authors aren’t participating.

That means people like Neil Gaiman and Kevin Kelly wouldn’t rank as well since they don’t employ Authorship markup. It doesn’t take a lot of work to find important people who aren’t participating and that makes any type of AuthorRank that relies on markup a non-starter.

Authorship SERP Benefits

Search Result Heatmap For Authorship Snippet

Don’t get me wrong. Google still supports Authorship markup and there are clear click-through rate benefits to having an Authorship snippet on a search result. Even if you don’t believe me or Cyrus Shepard, you should believe Google and the research they’ve done on social annotations in 2012 (PDF) and 2013 (PDF).

So if you haven’t implemented Google Authorship yet it’s still a good idea to do so. You’ll receive a higher click-through rate and will build authority (different from AuthorRank), both of which may help you rank better over time.

Google knows users respond to Authorship.

Inferred Authorship

I Know What You Did Last Summer

It’s clear that Google still wants to do something about identifying authority and expertise. Any monkey with a keyboard can add content to the Internet. So increasingly it’s about who is creating that content and why you should trust and value their opinion.

One of the first ways Google was able to infer identity (aka authorship) was by crawling the public social graph. Rapleaf took the brunt of the backlash for this but Google was quietly mapping all of your social profiles as well.

So even if you don’t have Authorship markup on a Quora or Slideshare profile Google probably knows about it and could assign Authorship. All this data used to be available via social circles but Google removed this feature a few years ago. But that doesn’t mean Google isn’t mining the social graph.

Heck, Google could even employ usernames as a way to identify accounts from the same person. What we’re really talking about here is how Google can identify people and their areas of expertise.

Authors are People are Entities

But what if Google took another approach to identifying authors? Instead of looking for specific markup what if they looked for entities that happen to be people.

Authors are people are entities.

This would solve the adoption issue. And that’s what the Freebase Annotations of the ClueWeb Corpora (FACC) seems to indicate.

Identifying Authors in Text

The picture makes it pretty clear in my mind. Here we’re seeing that Google has been able to identify an entity (a person in this instance) within the text of a document and match it to a Freebase identifier.

Based on review of a sample of documents, we believe the precision is about 80-85%, and recall, which is inherently difficult to measure in situations like this, is in the range of 70-85%. Not every ClueWeb document is included in this corpus; documents in which we found no entities were excluded from the set. A document might be excluded because there were no entities to be found, because the entities in question weren’t in Freebase, or because none of the entities were resolved at a confidence level above the threshold.

At a glance you might think this means that Google still has a ‘coverage’ problem if they were to use entities as their approach to Authorship. But think about who is and isn’t in Freebase (or Wikipedia). In some ways, these repositories are biased towards those who have achieved some level of notoriety.

Would Google prefer to rely on self referring markup or a crowd based approach to identifying experts?

Google+ Is An Entity Platform

AJ Kohn Cheltenham High School ID

While Google might prefer to use a smaller set of crowd sourced entities to assign Authorship initially I think they’d ultimately like to have a larger corpus of Authors. That’s where Google+ fits into the puzzle.

I think most people understand that Google+ is an identity platform. But if people are entities (and so are companies) then Google+ is a huge entity platform, a massive database of people.

Google+ is the knowledge graph of everyday people.

And if we then harken back to social circles, to mapping the social graph and to measuring engagement and activity, we can begin to see how a comprehensive Authorship program might take shape.

Extract, Match and Measure

Concentration Board Game

Authorship then becomes about Google’s ability to extract entities from documents, matching those entities to a corpus that contains descriptors of that entity (i.e. – social profiles, official page(s), subjects) and then measuring the activity around that entity.

Perhaps Google could even go so far as to understand triples on a very detailed (document) level, noting which documents I might have authored as well as the documents in which I’ve been mentioned.

The presence of Authorship markup might increase the confidence level of the match but it will likely play a supporting and refining role instead of the defining role in the process.

Trust and Authority

Trust Me Sign

I’m reminded that Google talks frequently about trust and authority. For years that was about how it assessed sites but that same terminology can (and should) be applied to people as well.

Authorship markup is but one part of the equation but that alone won’t translate into some magical silver bullet of algorithmic success. Building authority is what will ultimately matter and be reflected in any related ranking signal.

Are the documents you author well regarded by your peers? Are they shared? By who? How often? With what velocity? And are you mentioned (or cited) by other documents? Do they sit on respected sites? Who are they authored by? What text surrounded your mention?

So part of this is doing the hard work of producing memorable content, marketing yourself and engaging with your community. The other part will be ensuring that your entity information is both comprehensive and up-to-date. That means filling out your entire Google+ profile and potentially finding ways to add yourself to traditional entity resources such as Wikipedia and Freebase.

Just as links are the result and not the goal of your efforts, any sort of AuthorRank will be the result of building your own trust and authority through content and engagement.

TL;DR

The Authorship Project at Google has been abandoned. But that doesn’t mean Authorship is dead. Instead it signals a change in tactics from Authorship markup to entity extraction as a way to identify experts and a pathway to using Authorship as a ranking signal.

Crawl Optimization

July 29 2013 // SEO // 85 Comments

Crawl optimization should be a priority for any large site looking to improve their SEO efforts. By tracking, monitoring and focusing Googlebot you can gain an advantage over your competition.

Crawl Budget

Ceiling Cat

It’s important to cover the basics before discussing crawl optimization. Crawl budget is the time or number of pages Google allocates to crawl a site. How does Google determine your crawl budget? The best description comes from an Eric Enge interview of Matt Cutts.

The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank. So if you have a lot of incoming links on your root page, we’ll definitely crawl that. Then your root page may link to other pages, and those will get PageRank and we’ll crawl those as well. As you get deeper and deeper in your site, however, PageRank tends to decline.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank. The pages that get linked to a lot tend to get discovered and crawled quite quickly. The lower PageRank pages are likely to be crawled not quite as often.

In other words, your crawl budget is determined by authority. This should not come as a shock. But that was pre-Caffeine. Have things changed since?

Caffeine

Percolator

What is Caffeine? In this case it’s not the stimulant in your latte. But it is a stimulant of sorts. In June of 2010, Google rebuilt the way they indexed content. They called this change ‘Caffeine’ and it had a profound impact on the speed in which Google could crawl and index pages. The biggest change, as I see it, was incremental indexing.

Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Essentially, Caffeine removed the bottleneck for getting pages indexed. The system they built to do this is aptly named Percolator.

We have built Percolator, a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. By replacing a batch-based indexing system with an indexing system based on incremental processing using Percolator, we process the same number of documents per day, while reducing the average age of documents in Google search results by 50%.

The speed in which Google can crawl is now matched by the speed of indexation. So did crawl budgets increase as a result? Some did, but not as much as you might suspect. And here’s where it gets interesting.

Googlebot seems willing to crawl more pages post-Caffeine but it’s often crawling the same pages (the important pages) with greater frequency. This makes a bit of sense if you think about Matt’s statement along with the average age of documents benchmark. Pages deemed to have more authority are given crawl priority.

Google is looking to ensure the most important pages remain the ‘freshest’ in the index.

Time Since Last Crawl

Googlebot's Google Calendar

What I’ve observed over the last few years is that pages that haven’t been crawled recently are given less authority in the index. To be more blunt, if a page hasn’t been crawled recently, it won’t rank well.

Last year I got a call from a client about a downward trend in their traffic. Using advanced segments it was easy to see that there was something wrong with their product page traffic.

Looking around the site I found that, unbeknownst to me, they’d implemented pagination on their category results pages. Instead of all the products being on one page, they were spread out across a number of paginated pages.

Products that were on the first page of results seemed to be doing fine but those on subsequent pages were not. I started to look at the cache date on product pages and found that those that weren’t crawled (I’m using cache date as a proxy for crawl date) in the last 7 days were suffering.

Undo! Undo! Undo!

Depagination

That’s right, I told them to go back to unpaginated results. What happened?

Depagination

You guessed it. Traffic returned.

Since then I’ve had success with depagination. The trick here is to think about it in terms of progressive enhancement and ‘mobile’ user experiences.

The rise of smartphones and tablets has made click based pagination a bit of an anachronism. Revealing more results by scrolling (or swiping) is an established convention and might well become the dominant one in the near future.

Can you load all the results in the background and reveal them only when users scroll to them without crushing your load time? It’s not always easy and sometimes there are tradeoffs but it’s a discussion worth having with your team.

Because there’s no better way to get those deep pages crawled by having links to all of them on that first page of results.

CrawlRank

Was I crazy to think that the time since last crawl could be a factor in ranking? It turns out I wasn’t alone. Adam Audette (a smart guy) mentioned he’d seen something like this when I ran into him at SMX West. Then at SMX Advanced I wound up talking with Mitul Gandhi, who had been tracking this in more detail at seoClarity.

seoClarity graph

Mitul and his team were able to determine that content not crawled within ~14 days receives materially less traffic. Not only that, but getting those same pages crawled more frequently produced an increase in traffic. (Think about that for a minute.)

At first, Google clearly crawls using PageRank as a proxy. But over time it feels like they’re assigning a self-referring CrawlRank to pages. Essentially, if a page hasn’t been crawled within a certain time period then it receives less authority. Let’s revisit Matt’s description of crawl budget again.

Another way to think about it is that the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank. There are a large number of pages on the web that have very little or close to zero PageRank.

The pages that aren’t crawled as often are pages with little to no PageRank. CrawlRank is the difference in this very large pool of pages.

You win if you get your low PageRank pages crawled more frequently than the competition.

Now what CrawlRank is really saying is that document age is a material ranking factor for pages with little to no PageRank. I’m still not entirely convinced this is what is happening, but I’m seeing success using this philosophy.

Internal Links

One might argue that what we’re really talking about is internal link structure and density. And I’d agree with you!

Not only should your internal link structure support the most important pages of your site, it should make it easy for Google to get to any page on your site in a minimum of clicks.

One of the easier ways to determine which pages are deemed most important (based on your internal link structure) is by looking at the Internal Links report in Google Webmaster Tools.

Google Webmaster Tools Internal Links

Do the pages at the top reflect the most important pages on your site? If not, you might have a problem.

I have a client whose blog was receiving 35% of Google’s crawl each day. (More on how I know this later on.) This is a blog with 400 posts amid a total content corpus of 2 million+ URLs. Googlebot would crawl blog content 50,000+ times a day! This wasn’t where we wanted Googlebot spending its time.

The problem? They had menu links to the blog and each blog category on nearly all pages of the site. When I went to the Internal Links report in Google Webmaster Tools you know which pages were at the top? Yup. The blog and the blog categories.

So, we got rid of those links. Not only did it change the internal link density but it changed the frequency with which Googlebot crawls the blog. That’s crawl optimization in action.

Flat Architecture

Flat Architecture

Remember the advice to create a flat site architecture. Many ran out and got rid of subfolders thinking that if the URL didn’t have subfolders then the architecture was flat. Um … not so much.

These folks destroyed the ability for easy analysis, potentially removed valuable data in assessing that site, and did nothing to address the underlying issue of getting Google to pages faster.

How many clicks from the home page is each piece of content. That’s what was, and remains, important. It doesn’t matter if the URL is domain.com/product-name if it takes Googlebot (and users) 8 clicks to get there.

Is that mega-menu on every single page really doing you any favors? Once you get someone to a leaf level page you want them to see similar leaf level pages. Related product or content links are the lifeblood of any good internal link structure and are, sadly, frequently overlooked.

Depagination is one way to flatten your architecture but a simple HTML sitemap, or specific A-Z sitemaps can often be very effective hacks.

Flat architecture shortens the distance between authoritative pages and all other pages, which increases the chances of low PageRank pages getting crawled on a frequent basis.

Tracking Googlebot

“A million dollars isn’t cool. You know what’s cool? A billion dollars.”

Okay, Sean Parker probably didn’t say that in real life but it’s an apt analogy for the difference in knowing how many pages Googlebot crawled versus where Googlebot is crawling, how often and with what result.

The Crawl Stats graph in Google Webmaster Tools only shows you how many pages are crawled per day.

Google Webmaster Tools Crawl Stats

For nearly five years I’ve worked with clients to build their own Googlebot crawl reports.

Googlebot Crawl Reporting That's Cool

That’s cool.

And it doesn’t always have to look pretty to be cool.

Googlebot Crawl Report by Page Type and Status

Here I can tell there’s a problem with this specific page type. More than 50% of the crawl on that page type if producing a 410. That’s probably not a good use of crawl budget.

All of this is done by parsing or ‘grepping‘ log files (a line by line history of visits to the site) looking for Googlebot. Here’s a secret. It’s not that hard, particularly if you’re even half-way decent with Regular Expressions.

I won’t go into details (this post is long enough as it is) but you can check out posts by Ian Lurie and Craig Bradford for more on how to grep log files.

In the end I’m interested in looking at the crawl by page type and response code.

Googlebot Crawl Report Charts

You determine page type using RegEx. That sounds mysterious but all you’re doing is bucketing page types based on pattern matching.

I want to know where Googlebot is spending time on my site. As Mike King said, Googlebot is always your last persona. So tracking Googlebot is just another form of user experience monitoring. (Referencing it like this might help you get this project prioritized.)

You can also drop the crawl data into a database so you can query things like time since last crawl, total crawl versus unique crawl or crawls per page. Of course you could also give seoClarity a try since they’ve got a lot of this stuff right out of the box.

If you’re not tracking Googlebot then you’re missing out on the first part of the SEO process.

You Are What Googlebot Eats

Cookie Monster Fruit

What you begin to understand is that you’re assessed based on what Googlebot crawls. So if they’re crawling a whole bunch of parameter based, duplicative URLs or you’ve left the email-a-friend link open to be crawled on every single product, you’re giving Googlebot a bunch of empty calories.

It’s not that Google will penalize you, it’s the opportunity cost for dirty architecture based on a finite crawl budget.

The crawl spent on junk could have been spent crawling low PageRank pages instead. So managing your URL Parameters and using robots.txt wisely can make a big difference.

Many large sites will also have robust external link graphs. I can leverage those external links, rely less on internal link density to rank well, and can focus my internal link structure to ensure low PageRank pages get crawled more frequently.

There’s no patent right or wrong answer. Every site will be different. But experimenting with your internal link strategies and measuring the results is what separates the great from the good.

Crawl Optimization Checklist

Here’s a quick crawl optimization checklist to get you started.

Track and Monitor Googlebot

I don’t care how you do it but you need this type of visibility to make any inroads into crawl optimization. Information is power. Learn to grep, perfect your RegEx. Be a collaborative partner with your technical team to turn this into an automated daily process.

Manage URL Parameters

Yes, it’s confusing. You will probably make some mistakes. But that shouldn’t stop you from using this feature and changing Googlebot’s diet.

Use Robots.txt Wisely

Stop feeding Googlebot empty calories. Use robots.txt to keep Googlebot focused and remember to make use of pattern matching.

Don’t Forget HTML Sitemap(s)

Seriously. I know human users might not be using these, but Googlebot is a different type of user with slightly different needs.

Optimize Your Internal Link Structure

Whether you try depagination to flatten your architecture, re-evaluate navigation menus, or play around with crosslink modules, find ways to optimize your internal link structure to get those low PageRank pages crawled more frequently.

Keywords Still Matter

June 05 2013 // SEO // 59 Comments

As content marketing becomes the new black I’m starting to hear people talk about how keywords don’t matter anymore. This sentiment appears in more than a few posts and the general tenor seems to be that keyword focused strategies are a thing of the past – a relic from a dark time.

The problem? You need keywords to produce successful content.

Dwight Meme Keywords

Keyword Syntax

How do people search for something? That’s what keywords are all about. It’s vital to ensuring your content will be found and resonate with your users.

keyword syntax

Are people searching for ‘all weather fluid displacement sculptures’ or ‘outdoor water fountains’. That’s an extreme example but it makes an important point.

You need to understand the user and the words they use to find your content.

Keyword Intent

Keywords can also tell you a lot about the intent of a search. Look (well) beyond informational, navigational and transactional intent and start thinking about how you can map keywords to the various stages of your site’s conversion funnel.

For instance, what does a query like ‘majestic seo vs open site explorer’ tell you? This user is probably further along in purchase funnel. They’re aware of their choices and may have even narrowed it down to these two options. The keyword (yes, keyword) ‘vs’ makes it clear that they’re looking for comparison data.

Google SERP for Comparison Intent

Sure enough, most of the results returned are posts that compare these two tools. Those pieces of content squarely meet that intent, in part because they’re paying attention to keywords.

Majestic SEO has a result but … it’s the home page. Is that going to satisfy the desire to compare? Probably not. And where’s SEOMoz? Missing in action.

Each could rely on the blog posts presented to deliver this comparison. Or they could also develop content that met that keyword and intent, allowing them to tell their story and frame the debate.

I know some will shriek, “Are you crazy? You don’t want to promote your competition by mentioning them so prominently!” But that’s denying reality. Users are searching with this syntax and intent.

Now, I’m not saying you have to put content that meets this particular intent prominently on the site or in the normal conversion flow. But if you know someone is on the fence and comparing products, why wouldn’t you want a chance to engage that user on your own terms?

Keywords let you create content that matches user intent.

Magic Questions

Oh-O It's Magic!

There’s also a lot of meta information that comes along with a keyword. I’m fond of using a term like ‘eureka 313a manual’ as an example. It’s a query for a vacuum cleaner manual.

On the one hand it’s a pretty simple. There’s explicit intent. Someone is looking for the manual to their vacuum cleaner. The content to meet that informational search would be … the manual. But, what’s really going on?

If you’re searching for the manual, odds are that something is wrong with your vacuum. There’s an implied intent at work. The vacuum is either not working right or is flat out broken. You have the opportunity to anticipate and answer magic questions.

How can I fix my vacuum? Where can I buy replacement parts? Are there repair shops near me? What vacuum should I get to replace this one if it can’t be fixed?

Be decoding the keyword you can create a relevant and valuable page that meets explicit and implied intent.

Keyword Frequency

Keyword frequency is important. Yes, really. One of my favorite examples of this is LinkedIn. How did they secure their place in the competitive ‘name’ query space?

LinkedIn Keyword Frequency

LinkedIn wanted to make it clear what (or who) these pages were about. That’s what keyword frequency is about, making it easy for search engines and users to understand what that page is about.

LinkedIn doesn’t just do it with their headers either, but uses the name frequently elsewhere on the page. The result?

Marshall Simmonds Wordle

There’s no question what this page is about.

Keywords are Steve Krug for Googlebot.

Readability

This Is Not A Pipe

The reaction I get from many when I press on this issue is that it produces a poor user experience. Really? I’ve never heard anyone complain about LinkedIn and most never realize that it’s even going on.

Using the keywords people expect to see can only help make your content more readable, which is still a tremendously undervalued aspect of SEO. Because people scan text and rarely read word for word.

And what do you think they’re scanning for? What do you think is rattling around in their brain when they’re scanning your content? It’s not something random like ‘bellhop poodle duster’, it’s probably the keyword that brought them there.

You may think Google is smart enough to figure it out. You’ll claim that Google’s gotten far more sophisticated in the application of synonyms and topical modeling. And you’d be right to a degree. But why take the chance? Particularly since users crave the repetition and consistency.

They don’t want you to use four different ways to say the same thing and the hard truth is they’re probably only going to read one of those words anyway. You’ll create better content for users if you write for search engines.

Make sure you’re using the words users expect to see.

TL;DR

Keywords aren’t going away, they’re becoming more important. Query syntax and user intent are vital in producing relevant and valuable content that resonates with users and answers both explicit and implicit questions.