You Are Browsing The Technology Category

Mechanical Turk Tips

June 03 2011 // SEO + Technology // 7 Comments

Amazon Mechanical Turk is a great way to do a wide variety of tasks, from content creation to image tagging to usability. Here are 15 tips to get the most out of Mechanical Turk.

Mechanical Turk Logo

Learn The Lingo

What’s a HIT? Mechanical Turk can be a bit confusing upon first glance. In particular, you’ll need to understand this one important acronym.

A HIT is a Human Intelligence Task and is the work you’re asking workers to perform. A HIT can refer to the specific task your asking them to perform but also doubles as the terminology of the actual job you post in the community.

Select A+ Workers

95 percent or more approval rate for HITs

The long and the short of it is that reputation matters and past performance is a good indicator of future performance. Limit your HITs to those with at least a 95% approval rate.

It may shrink your pool of workers and could increase the time to completion but you make up for it in QA savings.

Segment Your Workers

Match the right workers to the right task. In my experience, you get better results from US based workers when you’re doing anything that requires writing or transcription. Conversely, international workers often excel in tasks such as data validation and duplicate detection.

Give Workers More Time Than They Need

The time you give is the time workers have before the HIT disappears. Imagine starting a job and when you come back to turn in your work and collect payment the shop has closed and left town. This can really frustrate workers.

Mechanical Turk Reward Tip

I think Amazon creates this problem with the messaging around the hourly rate calculation. My advice, don’t get too hung up on the hourly rate and err on the side of providing more time for your HITs.

Provide Specific Directions

Remember that you are communicating work at a distance to an unknown person. There’s no back-and-forth dialog to clarify.

In addition, workers are looking to complete work quickly and to ensure they fulfill the HIT so their approval rate remains high. The latter, in particular, makes specificity very important.

Tell workers exactly what to do and what type of work output is expected.

Make It Look Easy

While the directions should be specific you don’t want a 500 word paragraph of text to scare folks off. Make sure your HIT looks easy from a visual perspective. This means it’s easily scanned and understood.

Take advantage of the HTML editor and build in a proper font hierarchy, appropriate input fields and use a pop of color when you really want to draw attention to something important.

Give Your HIT a Good Title

Make sure your HIT title is the appropriate length (not too short or long) and that it’s descriptive and appealing.

Mechanical Turk HIT Title examples

A good title is a mixture of SEO and marketing principles. It should be relevant and descriptive but also interesting and alluring.

Bundle The Work

If you can do it, bundle a bunch of small tasks into one HIT. For instance, have them tag 10 photos at a time.

This helps because you can set a higher price for your HIT. You’ll attract a larger pool of workers since many don’t seek out ‘penny’ HITs.

Mind Your Email

Workers will email you – frequently. Do not ignore them.

You are joining a community. Just take a peek at Turker Nation. As with any community, you get and build a reputation. Don’t make it a bad one. Respond to your email, even if the response isn’t what workers want to hear.

In addition, you learn how to tweak your HIT by listening to and interacting with the workers.

Pay Fast

A lot of the email you may receive is around a familiar refrain: “When will you pay.” This gets tedious so I generally recommend paying quickly, reducing the amount of unproductive email and giving you a good reputation within the community.

Pay Mechanical Turk HITs Fast

That means setting your automatic approval for something like 2 or 3 days.

Develop a QA System

To pay fast you need a good QA system. You can either do this yourself or, alternatively, put the work out as a separate HIT. That’s right, you can use Mechanical Turk to QA your Mechanical Turk work. Insert your Inception or Yo Dawg joke here.

Bonus Good Work

10 Dollar Bill

Give a bonus when you find workers who have done an excellent job on number of HITs. It doesn’t have to be a huge amount, but take the top performers and give them a bonus.

Not only is this the right thing to do, it’ll go a long way to establishing yourself in the community and developing a loyal pool of quality workers.

Build a Workforce

Once you find and bonus good workers, continue to give them HITs. You can do this by creating a list of and limiting HITs to just those workers.

If you do this you probably want to keep the ‘Required for preview’ box checked so workers not on that list aren’t frustrated by previewing a HIT they don’t have any chance of working on.

Download the worker history (under Manage > Workers) and use Excel to find high volume and high quality workers. Then create your list (under Manage > Qualification Types) so you can use it in your HIT.

Block Bad Apples

Just as you build a list of good workers, you also need to block a few of the bad ones. They might have dynamite approval ratings but for different types of tasks. Some people are good a some things and … not so good at others.

Coaching workers is time consuming and costly, so it’s probably better for you and the worker to simply part ways. You ensure the approval rate on your HITs remains high and the worker won’t put their approval rate in jeopardy.

Understand Assignments

Finally, understand and use assignments wisely. Each HIT can be assigned to a certain number of workers.

Warning on Assignments per HIT

So if you’re HIT is about getting feedback on your new homepage design, you might assign 500 workers to that HIT. That means you’ll give 500 reactions to your new homepage. It’s one general task that requires multiple responses.

But if you’re HIT is about validating phone numbers for 500 businesses, you will assign 1 worker to each HIT. That means you’ll get one validation per phone number. Do not assign 500 workers or you’ll get 500 validations per phone number. That’s wasteful and likely to irk those businesses too.

Mechanical Turk Tips

These tips are the product of experience (both mine and the talented Drew Ashlock), of trial and error, of stubbing toes during the process.

I hope this helps you avoid some of those pitfalls and allows you to get the most out of a truly innovative and valuable service.

Yahoo Email Hacked

May 23 2011 // Rant + Technology // 445 Comments

(IMPORTANT: Before I get to my story, if your Yahoo! email has been hacked I recommend that you immediately change your password, update your security questions and ensure your Yahoo! Mobile and Y! Messenger are both up-to-date. You should also visit Yahoo! Email Abuse Help and use this process if you are unable to login to your Yahoo! account. Also, make sure to read the comments on this post since there is a tremendous amount of good information there as well.)

(UPDATE 12/13/11: Yahoo has introduced second sign-in verification as an added security measure. It will require that you add a mobile phone number and verify it via a text message. Here’s the direct link to start using second sign-in verification.)

It happened just before we arrived at the San Francisco Zoo. We are at a red light on Sloat Boulevard when my phone started to vibrate.

Buzz. Buzz. Buzz. Buzz. Buzz. Buzz. Buzz. Buzz. Buzz. Buzz. Buzz. Buzz. Buzz.

Had the rapture come a day late? No. I was getting undeliverable messages. Lots of them. My Yahoo email had been hacked!

admiral akbar star wars its a trap spoof

Here are the two important lessons I learned as a result.

I Have Good Friends

I didn’t want our day at the Zoo ruined, me staring into my phone resetting passwords and figuring out what happened. So I put the problem on the back burner and proceeded to have a fun family day.

But I did take time to quickly tap out a response to people who replied to the spam coming from my hijacked account. Why? Because they took the time and effort to give me a heads up that I had a problem. These were good people. Good friends.

The thing is, I’d gotten a number of these same emails lately from other hacked Yahoo accounts. I figured these people knew they’d been compromised and I didn’t need to respond. With the shoe on the other foot, I realized those emails were comforting even though I was well aware of the problem.

I’ll shoot off an email the next time I get a hacked email from someone.

Yahoo Email Security Failed

The odds are that I will get another one of those emails because I learned just how easy Yahoo makes it for hackers.

Upon getting home I went about securing my account. On a lark, I checked Yahoo’s ‘View your recent login activity’ link.

yahoo recent login activity

Sure enough at 10:03 AM my account was accessed from Romania. This obvious login anomaly didn’t set off any alarms? Shouldn’t my security questions have been presented in this scenario? I have never logged in from Romania before.

I’ve never logged in from outside the US. Yahoo knows this. In fact, Yahoo knows quite a bit about my location.

yahoo location history

My locations puts me in three states: California, New York and Pennsylvania. I also have location history turned on, so it’s not just my own manually saved locations (some of which are ancient), but Yahoo’s automated location technology keeping track of me.

Do you see Romania in this list? I don’t.

Why is Yahoo making it this easy for spammers to hijack accounts? Make them work a little bit! At a minimum, make them spoof their location.

Yahoo should have noted this anomaly and used my security questions to validate identity. I still would have had to change my password (which wasn’t that bad) but I would have avoided those embarrassing emails.

A simple rule set could have been applied here where users are asked to validate identity if the login (even a successful one) is outside of a 500 mile radius of any prior location.

I’ve had a Yahoo account for over 10 years without a problem, even as I moved my business accounts over to Gmail.

Yesterday I thanked those friends who had my back. Unfortunately, Yahoo wasn’t one of them.

WordPress Duplicate Content

April 27 2011 // Rant + SEO + Technology // 23 Comments

In February Aaron Bradley sent me an email to let me know that I had a duplicate content problem on this blog. He had just uncovered and rectified this issue on his own blog and was kind enough to give me a heads up.

Comment Pagination

The problem comes in the way that WordPress handles comment pagination. The default setting essentially creates a duplicate comment page.

Here’s what it looks like in the wild. Two pages with the same exact content.

http://blog.wolframalpha.com/2011/04/18/new-age-pyramids-enhance-population-data/comment-page-1/

http://blog.wolframalpha.com/2011/04/18/new-age-pyramids-enhance-population-data

That’s not good. Not good at all.

Comment-Page-1 Problem

The comment-page-1 issue offends my own SEO sensibilities, but how big of a problem is it really?

WordPress Spams Google

There are 28 million inurl results for comment-page-1. 28 million!

Do the same inurl search for comment-page-2 and you get about 5 million results. This means that only 5 million of these posts attracted enough comments to create a second paginated comment page. Subtract one from the other and you wind up with 23 million duplicate pages.

The Internet is a huge place so this is probably not a large percentage of total pages but … it’s material in my opinion.

Change Your Discussion Settings

If you’re running a WordPress blog I implore you to do the following.

Go to your WordPress Dashboard and select Settings –> Discussions.

How To Fix Comment-Page-1 Problem

If you regularly get a lot of comments (more than 50 in this default scenario) you might want to investigate SEO friendly commenting systems like Disqus, IntenseDebate or LiveFyre.

Unchecking the ‘break comments into pages’ setting will ensure you’re not creating duplicate comment pages moving forward. Prior comment-page-1 URLs did redirect, but seemed to be doing so using a 302 (yuck). Not satisfied I sought out a more permanent solution.

Implement an .htaccess RewriteRule

It turns out that this has been a known issue for some time and there’s a nice solution to the comment-page-1 problem in the WordPress Forum courtesy of Douglas Karr. Simply add the following rewrite rule to your .htaccess file.

RewriteRule ^(.*)/comment-page-1/ $1/ [R=301,L]

This puts 301s in place for any comment-page-1 URL. You could probably use this and keep the ‘break comments into pages’ setting on, which would remove duplicate comment-page-1 URLs but preserve comment-page-2 and above.

Personally, I’d rather have the comments all on one page or move to a commenting platform. So I turned the ‘break comments into pages’ setting off and went a step further in my rewrite rule.

RewriteRule ^.*/comment-page-.* $1/ [R=301,L]

This puts 301s in place for any comment-page-#. Better safe than sorry.

Don’t Rely on rel=canonical

Many of the comment-page-1 URLs have a rel=canonical in place. However, sometimes it is set up improperly.

Improper Rel=Canonical

Here the rel=canonical actually reinforces the duplicate comment-page-1 URL. I’m not sure if this is a problem with the Meta SEO Pack or simple user error in using that plugin.

Many times the rel=canonical is set up just fine.

Canonical URL from All-In-One SEO Pack

The All in One SEO Pack does have a Canonical URL option. I don’t use that option but I’m guessing it probably addresses this issue. The problem is that rel=canonical doesn’t stick nearly as well as a 301.

Comment-Page-1 in SERP

So even though this post from over three months ago has a rel=canonical, the comment-page-1 URL is still being returned. In fact, there are approximately 110 instances of this on this domain alone.

Comment Page 1 Site Results

Stop Comment-Page-1 Spam

23 million pages and counting. Sure, it would be nice if WordPress would fix this issue, but short of that it’s up to us to stop this. Fix your own blog and tell a friend.

Friends don’t let friends publish duplicate content.

Open Graph Business Intelligence

April 06 2011 // Social Media + Technology // 1 Comment

Facebook’s Open Graph can be used as a valuable business intelligence tool.

Here’s how easy it can be to find out more about the people powering social media on your favorite sites.

How It Works

The Open Graph is populated with meta tags. One of these tags is fb:admins which is a list of Facebook user IDs.

fb:admins open graph tag

Here we are on a Time article that is clearly using the Open Graph.

Sample Time.com Article

The fb:admins tag is generally found on the home page (or root) of a site because that’s one of the ways you grant people access to Insights for Websites.

Lint Bookmarklet

You could open up a new tab and go to the Facebook Linter Tool to enter the domain or you can use my handy Bookmarklet that gives you one-click access to Lint that site.

Get Lint Info

Drag the link above to your bookmark bar and then click on it anytime you want to get information about the Open Graph mark-up from that site’s home page.

Linter Results

The results will often include a list of Facebook IDs. In this instance there are 8 administrators on the Time domain.

Facebook Lint for Time

Click on each ID to learn as much as that person’s privacy settings will allow. You can find out quite a bit when you do this.

In this instance I’ve identified Time’s Technical Lead, a Senior Program Manager (with a balloon decorating company on the side), a bogus test account (against Facebook rules) and the Program Manager, Developer Relations for … Facebook.

I guess it makes sense that Time would get some special attention from Facebook. Still, it raised my eyebrows to see a Facebook staffer as a Time administrator.

Cat Lee

Cat actually snagged ‘cat’ as her Facebook name (nicely done!) and says her favorite football team is the Eagles. I might be able to strike up a conversation with her about that. Go Eagles!

I’d probably also ask her why a fake test account is being used by Time.

Tester Time on Facebook

That is unless Time really does have a satanic handball enthusiast on staff.

Dig Deeper

Sometimes a site won’t use fb:admins but will authenticate using fb:app_id instead. But that doesn’t mean your sleuthing has come to an end. Click on the App ID number and you’ll usually go to that application.

Time Facebook Application Developer Information

By clicking on Info I’m able to view a list of Developers. Some of these I’ve already seen via fb:admins but two of them are actually new, providing a more robust picture of Time’s social media efforts and resources.

You’ll only be stymied if the site is using fb:page_id to authenticate. That’s generally a dead end for business intelligence.

Open Graph Business Intelligence

I imagine this type of information might be of interest to a wide variety or people from recruiters to journalists to sales and business development professionals like Jimmy John Shark. You could use this technique on its own or collect the names and use LinkedIn and Google to create a more accurate picture of those individuals.

How would you use this information?

Google Personalized Search

March 21 2011 // SEO + Social Media + Technology // Comments Off on Google Personalized Search

Google recently launched a new feature that allows users to personalize their search results by blocking certain domains. What impact will this have and what does it mean for the future of search?

The Smiths

Artificial Intelligence

A recent New York Post article by Peter Norvig discussed advances in artificial intelligence. Instead of creating HAL, the current philosophy is to allow both human and computer to concentrate on what they do best.

A good example is the web search engine, which uses A.I. (and other technology) to sort through billions of web pages to give you the most relevant pages for your query. It does this far better and faster than any human could manage. But the search engine still relies on the human to make the final judgment: which link to click on, and how to interpret the resulting page.

The partnership between human and machine is stronger than either one alone. As Werner von Braun said when he was asked what sort of computer should be put onboard in future space missions, “Man is the best computer we can put aboard a spacecraft, and the only one that can be mass produced with unskilled labor.” There is no need to replace humans; rather, we should think of what tools will make them more productive.

I like where this might be leading and absolutely love the idea of personalized results. Let me shape my own search results!

Human Computer Information Retrieval

I’ve been reading a lot about HCIR lately. It’s a fascinating area of research that could truly change how we search. Implemented the right way, search would become very personal and very powerful.

The challenge seems to be creating effective human computer refinement interfaces. Or, more specifically, interfaces that produce active refinement, not passive refinement.

At present, Google uses a lot of passive refinement to personalize results. They look at an individual’s search and web history, track click-through rate and pogosticking on SERPs and add a layer of geolocation.

Getting users to actively participate has been a problem for Google.

Jerry Maguire

A Brief History of Google Personalization

Google launched personalized search in June of 2005 and expanded their efforts in February of 2007. But the first major foray into soliciting active refinement was in November of 2008 with the launch of SearchWiki.

This new feature is an example of how search is becoming increasingly dynamic, giving people tools that make search even more useful to them in their daily lives.

The problem was that no one really used SearchWiki. In the end it was simply too complicated and couldn’t compete with other elements on the page, including the rising prominence of universal search results and additional Onebox presentations.

In December of 2009 Google expanded the reach of personalized search.

What we’re doing today is expanding Personalized Search so that we can provide it to signed-out users as well. This addition enables us to customize search results for you based upon 180 days of search activity linked to an anonymous cookie in your browser.

This didn’t go down so well with a number of privacy folks. However, I believe it showed that Google felt personalized search did benefit users. They also probably wanted to expand their data set.

In March of 2010 SearchWiki was retired with the launch of Stars.

With stars, we’ve created a lightweight and flexible way for people to mark and rediscover web content.

Stars wasn’t really about personalizing results. It presented relevant bookmarks at the top of your search results. Google clearly learned that the interaction design for SearchWiki wasn’t working. The Stars interaction design was far easier, but the feature benefits weren’t compelling enough.

A year later, Stars is replaced with blocked sites.

We’re adding this feature because we believe giving you control over the results you find will provide an even more personalized and enjoyable experience on Google.

Actually, I’m not sure what this feature is called. Are we blocking sites or hiding sites? The lack of product marketing surrounding this feature makes me think it was rushed into production.

In addition, the interaction design of the feature is essentially the same as FriendFeed’s hide functionality. Perhaps that’s why the messaging is so confused.

Cribbing the FriendFeed hide feature isn’t a bad thing – it’s simple, elegant and powerful. In fact, I hope Google adopts the extended feature set and allows results from a blocked site to be surfaced if it is recommended by someone in my social graph.

Can Google Engage Users?

I wish Google would have launched the block feature more aggressively and before any large scale algorithmic changes. The staging of these developments points to a lack of confidence in engaging users to refine search results.

Google hasn’t solved the active engagement problem. Other Google products that rely on active engagement have also failed to dazzle, including Google Wave and Google Buzz.

I worry that this short-coming may cause Google to focus on leveraging engagement rather then working on ways to increase the breadth and depth of engagement.

In addition, while we’re not currently using the domains people block as a signal in ranking, we’ll look at the data and see whether it would be useful as we continue to evaluate and improve our search results in the future.

This may simply be a way to reserve the right to use the data in the future. And, in general, I don’t have a problem with using the data as long as it’s used in moderation.

Curated data can help augment the algorithm. Yet, it is a slippery slope. The influence of others shouldn’t have a dramatic effect on my search results and certainly should not lead to sites being removed from results altogether.

That’s not personalization, that’s censorship.

SERPs are not Snowflakes

All of Google’s search personalization has been relatively subtle and innocuous. Rank is still meaningful despite claims by chicken little SEOs. I’m not sure what reports they’re looking at, but the variation in rank on terms due to personalization is still low.

SERPs are not Snowflakes

Even when personalization is applied, it is rarely a game changer. You’ll see small movement within the rankings, but not wild changes. I can still track and trend average rank, even with personalization becoming more commonplace. Given the amount of bucket testing Google is doing I can’t even say that the observed differences can be attributed solely to personalization.

I don’t use rankings as a way to steer my SEO efforts, but to think rank is no longer useful as a measurement device is wrong. Yet, personalization still has the potential to be disruptive.

The Future of Search Personalization

Google needs to increase the level of active human interaction with search results. They need our help to take search to the next level. Yet, most of what I hear lately is about Google trying to predict search behavior. Have they given up on us? I hope not.

Gary Marchionini, a leader in the HCIR field, puts forth a number of goals for HCIR systems. Among them are a few that I think bear repeating.

Systems should increase user responsibility as well as control; that is, information systems require human intellectual effort, and good effort is rewarded.

Systems should be engaging and fun to use.

The idea that the process should be engaging, fun to use and that good effort is rewarded sounds a lot like game mechanics. Imagine if Google could get people to engage search results on the same level as they engage with World of Warcraft!

World of Google

Might a percentage complete device, popularized by LinkedIn, increase engagement? Maybe, like StackOverflow, certain search features are only available (or unlocked) once a user has invested time and effort? Game mechanics not only increases engagement but helps introduce, educate and train users on that product or system.

Gamification of search is just one way you could try to tackle the active engagement problem. There are plenty of other avenues available.

Personalization and SEO

I used the cover artwork from the Smith’s last studio album at the beginning of this post. I thought ‘Strangeways, Here We Come’ was an apt description for the potential future of personalized search. However, a popular track from this album may be more meaningful.

Stop me if you think you’ve heard this one before.

SEO is not dead, nor will it die as a result of personalization. The industry will continue to evolve and grow. Personalization will only hasten the integration of numerous other related fields (UX and CRO among others) into SEO.

The block site feature is a step in the right direction because it allows control and refinement of the search experience transparently without impacting others. It could be the start of a revolution in search. Yet … I have heard this one before.

Lets hope Google has another album left in them.

Facebook Comments and SEO

March 16 2011 // SEO + Social Media + Technology // 27 Comments

Facebook Comments could be the most disruptive feature released by Facebook. Why? Comments are one of the largest sources of meta content on the web. Our conversations provide a valuable feedback mechanism, giving greater context to both users and to search engines.

The Walled Garden

Using Firebug you can quickly locate Facebook Comments and determine how they’re being rendered. Facebook Comments are served in an iframe.

Facebook Comments Delivered in iFrame

This means that the comments are not going to be attributed to that page or site nor seen by search engines. In short, Facebook Comments reside in the walled garden. All your comments are belong to Facebook.

This differs from implementations like Disqus or IntenseDebate where the comments are ‘on the page’ or ‘in-line’. One of the easier ways to understand this is to grab comment text from each platform and search for it on Google. Remember to put the entire text in quotes so you’re searching for that exact comment phrase.

Disqus Comments

Here’s a comment I made at Search Engine Roundtable via Disqus.

Comment on Disqus

Here’s a search for that comment on Google.

Disqus Comment SERP

Sure enough you can find my comment directly at Search Engine Roundtable or at FriendFeed, where I import my Disqus comments.

Facebook Comments

Here’s a comment made via Facebook Comments on TechCrunch.

Comment made via Facebook Comments

Here’s a search for this comment on Google.

Facebook Comments SERP

In this instance you can’t find this comment via search (even on Bing). The comment doesn’t exist outside of Facebook’s walled garden. It doesn’t resolve back to TechCrunch.

I thought of an edge case where Facebook Comments might show up on FriendFeed (via Facebook), but my test indicates they do not.

Comments and SEO

Search engines won’t see Facebook Comments. That is a big deal. Comments reflect the user syntax. They capture how people are really talking about a topic or product. Comments help search engines to create keyword clusters and deliver long-tail searches. Comments may signal that the content is still fresh, important and popular. All that goes by the wayside.

It’s no secret that search engines crave text. Depriving Google of this valuable source of text is an aggressive move by Facebook.

Is this on purpose? I have to believe it is. I can’t know for sure but it’s curious that my Quora question has gone unanswered by Facebook, even when I’ve asked a specific Facebook Engineer to answer.

[Update] Ray C. He did wind up answering my question and provided some examples of how Facebook comments could be made visible to search engines. (Thank you.) Essentially you grab the comments via the API and display them inline behind the comment box, similar to using a noscript tag. It’s nice that they have this capability but most will simply use the default version without question or not apply this hack due to lack of technical expertise or time.

In addition, many have since noted that Google has started indexing Facebook comments. Problem solved right? Wrong! Google has always reserved the right to associate iframe content with a URL when it felt it was important. It just rarely did so. The truth of the matter is Google is still only indexing a small fraction of Facebook comments overall. So don’t count on Google indexing your Facebook comments.

Comment Spam

Comment Spam

Comment spam is a huge problem. You know this if you’ve managed a blog for any amount of time. Google’s implementation of nofollow didn’t do much to stop this practice. So Facebook Comments is appealing to many since the forced identity will curtail most, if not all, of the comment spam.

This also means that the meta content for sites using Facebook Comments may be more pristine. This should be an advantage when Facebook does any type of Natural Language Processing on this data. A cleaner data set can’t hurt.

Article Sentiment

Extending this idea, you begin to realize that Facebook could have a real leg up on determining the sentiment of an article or blog post. Others might be able to parse Tweets or other indicators, but Facebook would have access to a large amount of proprietary content to mine page level and domain level sentiment.

Comment Reputation

Facebook can improve on sentiment by looking at comment reputation. Here’s where it gets exciting and scary all at the same time. Facebook can map people and their comments to Open Graph objects. It sounds a bit mundane but I think it’s a huge playground.

Suddenly, Facebook could know who carries a high reputation on certain types of content. Where did you comment? How many replies did you receive? What was the sentiment of those replies? What was the reputation for those who replied to you? How many Likes did you receive? How many times have you commented on the same Open Graph object as someone else?

You might be highly influential when commenting on technology but not at all when commenting on sports.

The amount of analysis that could be performed at the intersection of people, comments and objects is … amazing. Facebook knows who is saying what as well as when and where they’re saying it.

PeopleRank

PeopleRank

Facebook Comments could go a long way in helping Facebook create a PeopleRank algorithm that would help them better rank pages for their users. If I haven’t said it recently, Facebook’s Open Graph is just another version of Google’s Search Index.

In this instance, Facebook seems to be doing everything it can to develop an alternate way of ranking the web’s content while preventing Google from doing so. (Or am I projecting my own paranoia on the situation?)

PeopleRank could replace PageRank as the dominant way to organize content.

Traffic Channel Disruption

The traffic implications of Facebook Comments are substantial. By removing this content from the web, Facebook could reduce the ability of Google and Bing to send traffic to these sites. The long tail would get a lot shorter if Facebook Comments were widely adopted as is.

We’ve seen some anecdotal evidence that referring traffic from Facebook has increased after implementing Facebook Comments. That makes sense, particularly in the short-term.

The question is whether this is additive or a zero-sum game. In the long-run, would implementing Facebook Comments provide more traffic despite the potential loss in search engine traffic via fewer long-tail visits?

For publishers, the answer might be yes. For retailers, the answer might be no. That has a lot to do with the difference between informational and transactional search.

Even posing the question shows how disruptive Facebook Comments could be if it is widely adopted. It could be the true start of a major shift in website traffic channel mix.

Google Search Quality Decline or Elitism?

January 27 2011 // Marketing + SEO + Technology // 8 Comments

Are content farms really the problem or are you just a snob?

The recent complaints about Google’s search quality (here, here, here and here) range from real spam to indictments of content farms. I think we can all agree that spam (cloaking, scrapers, splogs, status code manipulation etc.) should be weeded out. But that leaves us with the larger issue: the quality of results.

Quality

The definition of quality usually refers to a ‘degree of excellence’ or ‘superiority of kind’. It’s often associated with grade. Think back to your time in school. Did you ever get back a paper you thought deserved a higher grade? You were certain it was an A paper and you got a B+ instead!

B+ Grade

Quality is a matter of taste.

Taste

Ruination IPA or Coors Light

What about beer? I adore Stone’s Ruination IPA. But I’m certain a lot more Coors Light is sold in a day than Ruination IPA in a month, maybe even a year. Even if I were to try to determine the best IPA, there would be many conflicting and passionate opinions on the topic.

Value

Perhaps it’s about value instead? Ruination IPA costs a pretty penny while Coors Light is cheap. Maybe Coors Light is the best value because of the ratio of price to quality. But people value things in very different ways. This is clear when looking at restaurant reviews.

Applebees vs The French Laundry

When I read restaurant reviews I can tell whether the reviewer has the same food bias as I do. I treat reviews which laud huge portions, or rock bottom prices, or extol the virtues of never-ending refills differently. Their view of what a good meal is differs from mine. They’re looking for quantity, no matter how mediocre the food. I’m looking for quality and generally don’t want a pound and a half of garlic mashed potatoes.

There’s nothing wrong with either perspective. But they are different.

Popularity

Google Serves Lots of People

Look around folks. What do you see more of? Fast food or fine dining? It’s fast food hands down.

And you can see this in nearly every area of life. Justin Bieber and Miley Cyrus are wildly popular musicians but I’m listening to Kasabian and Kaiser Chiefs. I haven’t touched Internet Explorer in years but it’s (sadly) still the most popular browser.

Mahalo, Squidoo and eHow get millions of visitors a month. These site are popular, and while you might find them distasteful, lacking quality or providing little value, many others (clearly) disagree.

Do I like these sites? No. Perhaps I’m a snob. Maybe you are too.

Numbers

The number of searches has skyrocketed in the last five years. Using comScore’s monthly numbers, core searches has gone from 6.9 billion at the beginning of 2007 to 16.4 billion at the beginning of 2011.

US Search Volume 2007 to 2011

At the same time Pew reports a growing percentage of adults are now online and using search engines on a daily basis.

Audience

The search audience has changed. One way to measure this is to plot daily search engine usage by adults against the innovation curve.

Diffusion of Innovation

The U.S. Census Bureau puts the population of the US at around 300 million. Using the CIA World Factbook we can estimate that 80% of those are over the age of 14. I’m going to use the resulting number (240 million) as my adult population number.

In 2007 Pew reported that 70% of adults were online and that 40% of them used search on a daily basis.

  • 240,000,000 X 70% X 40% = 67,200,000

In 2010 Pew reported that 79% of adults were online and that 49% of them used search on a daily basis.

  • 240,000,000 X 79% X 49% = 92,904,000

innovation adoption of search

In both 2007 and 2010 daily search usage penetrated the Early Majority. The difference is that the Early Majority now outnumber the Innovator and Early Adopter groups combined.

Early Majority Rule Search Volume

That’s just in three years, imagine the difference between 2005 and 2010. The picture of a daily search user is very different today.

Mental Models

The nature of our searches (as a whole) is likely changing because of who is now searching. The mental model of an Innovator or Early Adopter is going to be different than that of someone in the Early Majority.

Each group is going to approach search with different ideas and baggage. The Innovator and Early Adopter are more likely to be open to new experiences and to explore. They are more risk tolerant.

The Early Majority and Late Majority are more likely to apply their information seeking behaviors from other mediums to search. They’re looking for the familiar.

Brands

Many seemed surprised when Google Instant revealed a ‘bias’ toward brands. It has since been confirmed that Google is not engaging in any internal bias. That bias is a user bias. It’s a predication based, in large part, on the volume of searches.

Should we really be surprised? Many of these companies are spending a fortune to advertise and market their brand. Their goal is to capture mindshare and they are succeeding. So much so that people, particularly the Early and Late Majority, go online to search for those brands.

Brand Search Acceleration

In 2005, a DoubleClick report (Search Before The Purchase) showed relatively low levels of brand search. While it accelerated closer to the actual purchase, in some instances only 27% of searches were on brand. Do you honestly think that’s still true today?

eCommerce has certainly grown in that time. The number of navigation searches has climbed, which is closely related to brand. People continue to search (a lot) for Facebook or Craigslist as a way to get to those destination. But last year Bing also reported that Walmart was the 8th most searched term.

Users

Matt Cutts tells us not to chase the algorithm but to chase the user. But who is the user really? The audience has changed! And if the algorithm is trying to use human feedback as a signal, wouldn’t the results reflect that new composition?

Might that be why in October of 2010 many people noticed an algorithm change that seemed to skew toward bigger brands. It’s what Jonathan Mendez called ‘gentrification of the SERPs‘. (I wish I’d come up with that term!)

I may not think the results got better, but perhaps someone from the Early Majority or Late Majority did. They look at those results and see a lot of familiar brands and that instills confidence.

Content Farms

So when you see eHow at the top of a result and cringe, others might be thinking Google has led them to the easiest and best result. When you find a Mahalo page you might grind your teeth, but others could walk away thinking they got exactly what they needed.

I may enjoy reading the works of Shakespeare but plenty of others will be super happy to have the CliffsNotes version instead.

Which User is Google Optimizing For?

McGoogle

I believe Google when they say they want to provide the most relevant results. But there is a fair bit of subjectivity involved because the user is not some monolithic, homogeneous blob. Quality, taste, value and popularity are all going to inform what people think is relevant.

If Google is optimizing for the majority, that may mean a very different interpretation of relevancy. There’s nothing really wrong with that, but if you’re an Innovator or Early Adopter, you might think things are getting worse and not better.

There’s usually a better place to eat right down the street from a McDonald’s, but it’s McDonald’s that still gets most of the business. There are some places (North Beach in San Francisco for instance) that have a ‘no-chains’ policy.

Google could certainly do that. They could stand up and say that fast food content from Demand Media wouldn’t gain prime SERP real estate. Google could optimize for better instead of good enough. They could pick fine dining over fast food.

But is that what the ‘user’ wants?

2011 Predictions

December 31 2010 // Analytics + Marketing + SEO + Social Media + Technology + Web Design // 3 Comments

Okay, I actually don’t have any precognitive ability but I might as well have some fun while predicting events in 2011. Lets look into the crystal ball.

2011 Search Internet Technology Predictions

Facebook becomes a search engine

The Open Graph is just another type of index. Instead of crawling the web like Google, Facebook lets users do it for them. Facebook is creating a massive graph of data and at some point they’ll go all Klingon on Google and uncloak with several bird of prey surrounding search. Game on.

Google buys Foursquare

Unless you’ve been under a rock for the last 6 months it’s clear that Google wants to own local. They’re dedicating a ton of resources to Places and decided that getting citations from others was nice but generating your own reviews would be better. With location based services just catching on with the mainstream, Google will overpay for Foursquare and bring check-ins to the masses.

UX becomes more experiential

Technology (CSS3, Compass, HTML5, jQuery, Flash, AJAX and various noSQL databases to name a few) transforms how users experience the web. Sites that allow users to seamlessly understand applications through interactions will be enormously successful.

Google introduces more SEO tools

Google Webmaster Tools continues to launch tools that will help people understand their search engine optimization efforts. Just like they did with Analytics, Google will work hard in 2011 to commoditize SEO tools.

Identity becomes important

As the traditional link graph becomes increasingly obsolete, Google seeks to leverage social mentions and links. But to do so (in any major way) without opening a whole new front of spam, they’ll work on defining reputation. This will inevitably lead them to identity and the possible acquisition of Rapleaf.

Internet congestion increases

Internet congestion will increase as more and more data is pushed through the pipe. Apps and browser add-ons that attempt to determine the current congestion will become popular and the Internati will embrace this as their version of Greening the web. (Look for a Robert Scoble PSA soon.)

Micropayments battle paywalls

As the appetite for news and digital content continues to swell, a start-up will pitch publications on a micropayment solution (pay per pageview perhaps) as an alternative to subscription paywalls. The start-up may be new or may be one with a large installed user base that hasn’t solved revenue. Or maybe someone like Tynt? I’m crossing my fingers that it’s whoever winds up with Delicious.

Gaming jumps the shark

This is probably more of a hope than a real prediction. I’d love to see people dedicate more time to something (anything!) other than the ‘push-button-receive-pellet’ games. I’m hopeful that people do finally burn out, that the part of the cortex that responds to this type of gratification finally becomes inured to this activity.

Curation is king

The old saw is content is king. But in 2011 curation will be king. Whether it’s something like Fever, my6sense or Blekko, the idea of transforming noise into signal (via algorithm and/or human editing) will be in high demand, as will different ways to present that signal such as Flipboard and Paper.li.

Retargeting wins

What people do will outweigh what people say as retargeting is both more effective for advertisers and more relevant for consumers. Privacy advocates will howl and ally themselves with the government. This action will backfire as the idea of government oversight is more distasteful than that of corporations.

Github becomes self aware

Seriously, have you looked at what is going on at Github? There’s a lot of amazing work being done. So much so that Github will assemble itself Voltron style and become a benevolently self-aware organism that will be our digital sentry protecting us from Skynet.

Facebook Like Numbers Are Inflated

November 05 2010 // Social Media + Technology // 15 Comments

As you surf the web today you’ll inevitably run into Facebook’s Like button.

number of likes

There are a number of implementations but they all tell you how many Likes that item (or object in Open Graph speak) has received.

When a Like is not a Like

But did 938 people really Like this rather interesting Slate article about Netflix? No.

actual like count

Only 130 actually liked this article. The rest of that 938 is composed of shares and comments.

What you’re looking at above is XML output from a links.getStats call from Facebook’s old REST API. The data definitions for the link_stat table detail what share, like, comment and total represent.

Link_Stat Data Definitions

The Like number shown to users is actually the total_count – “the total number of times the URL has been shared, liked, or commented on.”

I’m not particularly perturbed by lumping share and like together – those two actions are similar. In both cases I’m explicitly choosing to interact and promote that item. And I suspect that they’re doing this for some amount of backwards compatibility.

But comments seems like a stretch to me. I’m choosing to interact with a combination of item and person. My comment might have little to do with the item and more to do with the person sharing it. In this instance I could have commented on the movie viewing habits of the person sharing the item. Does that mean I ‘Like’ that item?

Like Number Inflation

At a minimum, I think this is a manipulation of perception. The numbers are part of a Like marketing campaign. Large Like numbers throughout the Internet make it seem like the functionality is being used frequently. Yet, here we see that the specific Like feature isn’t as popular as we might have suspected.

I’m still a fan (pun intended) of the Like button and the Open Graph, but showing this inflated number (even if it can be rationalized) seems disingenuous. What do you think?

The Best SEO Tools Not About SEO

June 22 2010 // SEO + Technology // 5 Comments

There are plenty of great blog posts about SEO tools, though you should be careful to look at a curated and updated list. Actually an SEO tool wiki would be an interesting idea. But I digress.

Instead of discussing the SEO tools I use I thought I’d share the other tools I use each and every day. Tools that have become indispensable, saving me time, energy and headaches.

Dropbox

Dropbox Logo

Sharing files can be hassle unless you have Dropbox.

Dropbox is essentially a cloud based storage system. I started using it to sync files between my laptop and desktop computers. But what Dropbox is really good for is sharing files with clients.

Email is unreliable and you often wind up spending time waiting for folks to find and download files. Dropbox lets you create a shared folder for each client where you can keep all related materials. Not only can your client find the materials, they can point internal resources to it with ease. This is particularly useful if a client uses contracted or offshore developers.

You may have to convince clients to install Dropbox. Don’t worry, the 2GB plan is free, installation is easy and the instant value it delivers will earn you quick kudos.

Adium

Adium Logo

I love Instant Messaging. Short of being on-site, this is often the best way to communicate, clarify and remove roadblocks. Email is slow and asynchronous. The phone doesn’t provide the added context of links or screen shots. IM is fast and effective. It can also be a hassle if you have clients on multiple IM platforms. Yahoo! Messenger, Google Talk, Jabber, AIM and more.

Setting up accounts with each is easy, but having every IM client up and running at once creates problems. Context switching between each platform’s UI is not trivial. The messages arrive in different ways (with different sounds) and you wind up having multiple windows begging for your attention.

That’s where Adium comes in.

Adium is a free instant messaging application for Mac OS X that can connect to AIM, MSN, Jabber, Yahoo, and more.

Adium unifies all your IM programs into one slick interface. The tabbed chat feature is particularly nice so that you don’t have a new window for each IM conversation cluttering up your monitor.

You can even combine contacts (the same person on multiple IM platforms) “so that each one represents a person, not an account.” This is nice when you don’t care how you reach them, just that you reach them. Like the phone, you don’t care who the carrier is, you just want to connect.

The good news is you can use Adium even if your friends or clients don’t. The bad news, it’s Mac only. Windows users might want to check out Trillian instead.

TinyGrab

TinyGrab Logo

If a picture is worth a thousand words, perhaps a screen shot is worth a few hundred.

Screen grabs are a vital part of the SEO process. You want to show clients what you’re seeing and how to fix it. If you’re building a presentation deck this isn’t a huge problem.

If you’re having an IM conversation about an issue (with Adium I hope), the traditional screen grab can be slow and clunky. Enter TinyGrab.

Download this tool and each time you take a screen grab it saves it to the cloud and copies a tiny URL of it to your clipboard. Then simply paste it into your conversation and you’ll be looking at the same thing in no time.

The free version of TinyGrab gives you 10 grabs a day. For a one-time fee of £10 you can upgrade to the premium version for unlimited grabs.

These tools make me more productive every day. Do you have other tools that make a difference in your daily life? Share them here.

xxx-bondage.com