Friday, January 23, 2009

Recommended Read: Planet Google

It takes me a while to get to things on pulp rather than in bytes, but I finally had a chance to read Randall Stross' Planet Google. It provides some fascinating insights - did you know that they taught Google's machine translator by feeding it pairs of documents prepared by the UN for training its human translators? (p. 82) By allowing the machine to analyze larger patterns rather than just individual words, the translations are an improvement over earlier machine-based translations. The more matched documents fed to the machine translator, the better it is able to translate.

Google's search algorithm also gets smarter as more documents are indexed. Stross describes the Google maxim as "More data equals better data." (p. 87) It explains why Google has moved into indexing, and sometimes digitizing, so many different types of information, books, news, forums, email, etc.

Stross also recounts Eric Schmidt's 2005 estimate that only 2-3% of the world's information currently is in digital, indexable form. Schmidt said that company estimated that it would take 300 years to digitize and index the rest! (p. 200)

This book gave me a greater appreciation for how much Google has actually accomplished in 10 short years. It also made it clear why our vendor search tools are unlikely to catch up without access to an equivalent datastore...

No comments: