Tuesday, July 31, 2012

Why I enjoyed reading I'm Feeling Lucky: The Confessions of Google Employee Number 59. Part II



This is Part II of my blog on this book. For Part I, click here

Alright, most of us are clueless about esoteric computer language and the workings of a search engine behemoth such as Google. Yet, it is highly probable you will enjoy this book, nonetheless.

Let me give you an example.

If you were to randomly open the book to Chapter 11 (entitled Liftoff), you are in for a delightful surprise. Here we learn that in the Spring of 2000, Google has just completed an unpublicized deal to be the search engine for Yahoo. The employees were kept in the dark.

This situation was akin to David negotiating a deal with Goliath despite huge differences in size.

 True, Google was servicing eight million searches a day which would leap to nine million two weeks later.

However, Yahoo the search portal (featuring links to dozens of topics ranging from sports, politics, international news to fashion) had about 50 million unique visitors per month.

Google, on the other hand, had only about 3 million unique visitors.

Many, many problems were faced by Google and all can be boiled down to speed, capacity and results. All had to be met simultaneously to meet a deadline of July 4th, some three months away.

Speed problems would be caused by a dramatic increase in the online traffic due to queries from Yahoo. Here, Google would have to decrease the latency, or the average delay in returning results over any given hour.

Storage capacity was going to be strained. The founders set an arbitrary goal of indexing/storing 1 billion URL's--a herculean task. This would require increasing storage capacity by many times.

 But, with so much new information being indexed daily, how could you assure the searcher that he or she is getting current information as opposed to data that is weeks old?

So results had to be not only fast, but also contain the latest information. The software for pulling together information on a specific topic is called a crawler.  Google's engineers had to design crawlers that could process huge amounts of data quickly. In addition, the crawlers would have to do their work, not just monthly, but several times a week. So, that if you were looking for the latest references to 'global warming,' you could rely on the results.

Many other problems arose such as linking data from three different server farms (discussed in Part I)  located all over the county.

How successful was Google in dealing with these problems?

I leave it to the reader to determine whether Google was able to meet the contract promises made to Yahoo when indeed July 4th rolled around.

You won't be disappointed!

No comments: