Google's Authorship program is still a hot topic. A constant string of blog posts, conference sessions and 'research' projects about Authorship and the idea that it can be used as a ranking signal fill our community.
Yet, the focus on the actual markup and clock-watching when AuthorRank might show up may not be the best use of time.
Would it surprise you to learn that the Authorship Project at Google has been shuttered? Or that this signals not the death of Authorship but a different method of assigning Authorship.
Here's my take on where Authorship stands today.
RIP Authorship Project
The Authorship Project at Google was headed up by Othar Hansson. He's an incredibly smart and amiable guy, who from time to time was kind enough to provide answers and insight into Authorship. I was going to reach out to him again the other day and discovered something.
Othar no longer works on the Authorship Project. He's now a principal engineer on the Android search team, which is a pretty sweet gig. Congratulations!
Remember that it was Othar who announced the new markup back in June of 2011 and then appeared with Matt Cutts in the Authorship Markup video. His departure is meaningful. More so because I can't locate a replacement. (That doesn't mean there isn't one but ... usually I'm pretty good at connecting with folks.)
At the time I thought the writing was on the wall. The Authorship Project wasn't getting internal resources and wasn't a priority for Google.
The biggest problem with Authorship markup is adoption. Not everyone is participating. Study after study after study show that there are material gaps in who is and isn't using the markup. Even the most rosy study of Authorship adoption by technology writers isn't anything to write home about.
Google is unable to use Authorship as a ranking signal if important authors aren't participating.
That means people like Neil Gaiman and Kevin Kelly wouldn't rank as well since they don't employ Authorship markup. It doesn't take a lot of work to find important people who aren't participating and that makes any type of AuthorRank that relies on markup a non-starter.
Authorship SERP Benefits
Don't get me wrong. Google still supports Authorship markup and there are clear click-through rate benefits to having an Authorship snippet on a search result. Even if you don't believe me or Cyrus Shepard, you should believe Google and the research they've done on social annotations in 2012 (PDF) and 2013 (PDF).
So if you haven't implemented Google Authorship yet it's still a good idea to do so. You'll receive a higher click-through rate and will build authority (different from AuthorRank), both of which may help you rank better over time.
Google knows users respond to Authorship.
It's clear that Google still wants to do something about identifying authority and expertise. Any monkey with a keyboard can add content to the Internet. So increasingly it's about who is creating that content and why you should trust and value their opinion.
One of the first ways Google was able to infer identity (aka authorship) was by crawling the public social graph. Rapleaf took the brunt of the backlash for this but Google was quietly mapping all of your social profiles as well.
So even if you don't have Authorship markup on a Quora or Slideshare profile Google probably knows about it and could assign Authorship. All this data used to be available via social circles but Google removed this feature a few years ago. But that doesn't mean Google isn't mining the social graph.
Heck, Google could even employ usernames as a way to identify accounts from the same person. What we're really talking about here is how Google can identify people and their areas of expertise.
Authors are People are Entities
But what if Google took another approach to identifying authors? Instead of looking for specific markup what if they looked for entities that happen to be people.
Authors are people are entities.
This would solve the adoption issue. And that's what the Freebase Annotations of the ClueWeb Corpora (FACC) seems to indicate.
The picture makes it pretty clear in my mind. Here we're seeing that Google has been able to identify an entity (a person in this instance) within the text of a document and match it to a Freebase identifier.
Based on review of a sample of documents, we believe the precision is about 80-85%, and recall, which is inherently difficult to measure in situations like this, is in the range of 70-85%. Not every ClueWeb document is included in this corpus; documents in which we found no entities were excluded from the set. A document might be excluded because there were no entities to be found, because the entities in question weren’t in Freebase, or because none of the entities were resolved at a confidence level above the threshold.
At a glance you might think this means that Google still has a 'coverage' problem if they were to use entities as their approach to Authorship. But think about who is and isn't in Freebase (or Wikipedia). In some ways, these repositories are biased towards those who have achieved some level of notoriety.
Would Google prefer to rely on self referring markup or a crowd based approach to identifying experts?
Google+ Is An Entity Platform
While Google might prefer to use a smaller set of crowd sourced entities to assign Authorship initially I think they'd ultimately like to have a larger corpus of Authors. That's where Google+ fits into the puzzle.
I think most people understand that Google+ is an identity platform. But if people are entities (and so are companies) then Google+ is a huge entity platform, a massive database of people.
Google+ is the knowledge graph of everyday people.
And if we then harken back to social circles, to mapping the social graph and to measuring engagement and activity, we can begin to see how a comprehensive Authorship program might take shape.
Extract, Match and Measure
Authorship then becomes about Google's ability to extract entities from documents, matching those entities to a corpus that contains descriptors of that entity (i.e. - social profiles, official page(s), subjects) and then measuring the activity around that entity.
Perhaps Google could even go so far as to understand triples on a very detailed (document) level, noting which documents I might have authored as well as the documents in which I've been mentioned.
The presence of Authorship markup might increase the confidence level of the match but it will likely play a supporting and refining role instead of the defining role in the process.
Trust and Authority
I'm reminded that Google talks frequently about trust and authority. For years that was about how it assessed sites but that same terminology can (and should) be applied to people as well.
Authorship markup is but one part of the equation but that alone won't translate into some magical silver bullet of algorithmic success. Building authority is what will ultimately matter and be reflected in any related ranking signal.
Are the documents you author well regarded by your peers? Are they shared? By who? How often? With what velocity? And are you mentioned (or cited) by other documents? Do they sit on respected sites? Who are they authored by? What text surrounded your mention?
So part of this is doing the hard work of producing memorable content, marketing yourself and engaging with your community. The other part will be ensuring that your entity information is both comprehensive and up-to-date. That means filling out your entire Google+ profile and potentially finding ways to add yourself to traditional entity resources such as Wikipedia and Freebase.
Just as links are the result and not the goal of your efforts, any sort of AuthorRank will be the result of building your own trust and authority through content and engagement.
The Authorship Project at Google has been abandoned. But that doesn't mean Authorship is dead. Instead it signals a change in tactics from Authorship markup to entity extraction as a way to identify experts and a pathway to using Authorship as a ranking signal.