Friday, February 4, 2011

Is Microsoft Crowdsourcing Search?

By now you’ve surely heard the hubbub about Google’s allegations that Microsoft is scraping their search results. If this is news to you, go read up before you continue.

Microsoft has been trying to defuse the situation by saying that it’s only “one of 1,000 signals” Bing uses to build search results. They go further by calling the Google signal “click stream data” that users opt into by using IE8.

This raises a few questions in my mind:

  1. Is it fair to scoff at the allegation by saying it’s just one of a thousand signals? No. Matt Cutts at Google addresses this argument quite well. The gist of it is, that if it’s really so insignificant, why do it?
  2. Is it less unethical to copy Google if it really is via clickstream data that doesn’t specifically target Google? Maybe. Let’s go into that.

Let’s thing a moment about how a search engine works. At a ridiculously high level, it wanders around the web clicking on every link it can find. This is called crawling. It records everything as it goes and builds an index. Then when you search for something, it looks at the index to figure out what pages might be relevant to you. If the search engine doesn’t crawl the web with its crawlers, it can’t find pages.

imageSo what’s the alternative to crawling? While exchanging tweets with Phil Haack (Microsoft product manager of the über-awesome ASP.NET MVC project) something occurred to me. Is Microsoft crowdsourcing search? That is, in addition to crawling pages from link to link like everyone else, are they also crawling pages that users click on via “click stream data”?

If Bing is doing this and it’s not targeted to Google (a huge if), this is so fantastically awesome. Of course it’s still unethical to literally plagiarize the text on Google’s results pages—the snippets and what have you—but if Bing’s only watching clicks to find pages to crawl itself, this is pretty cool.

It’d be nice if they were more upfront about it, though. Recording every click users make and then waving around a privacy policy that certainly <1% of users have read isn’t exactly being forthcoming.

Of course it’s also possible that Bing is doing this only to Google and totally ripping them off—that remains to be seen.