Mar 30 2010
Comments

How To Make Google Out Of Date

I saw the TED demonstration of Microsoft Labs’ Pivot the other day. Gary Flake describes it as the point between searching and browsing information. In another video, he says to think of it as business intelligence meets a search engine.

This demonstration came at a great time for me. A side project I’ve been working on (my non-analytics/web/business reading) led me to realize the limitations of Google, or, more precisely, the idea of a search engine that’s based on keywords.

I don’t think we’ll ever outlive searching for keywords. With so much based on language, there are not many ways around that. However, it is limiting when you are looking for more than just some keywords or a quote or some lyrics to that song you just heard.

For instance, for my project I would like to have digital/searchable copies of all the works of a philosopher that I am currently reading; then, I want to compare all the text in those books to a database that contains the names of other philosophers. I would like to do this so I can trace this philosopher’s thought, and the people that he wrote about.

There are numerous problems in trying to accomplish this. Here’s just a few:

  1. We are extremely behind in digitizing content and making that content available online…this is a whole other post in and of itself.
  2. HTML typically does not provide semantic markup; that is something like XML’s domain. For example, on Wikipedia I find that Jacques Derrida’s name is surrounded by this: <h1 id=”firstHeading” class=”firstHeading”>Jacques Derrida</h1>. The H1 tag is not helpful, since it is only “structural” in nature; I need something like <philosopher>Jacques Derrida</philosopher>. Sure, I could write a program that looks for the name in <h1 id=”firstHeading” class=”firstHeading”></h1>, but I don’t know Python (as well as most Americans).
  3. Finally, it would be a lot of information to search through. The ability to instantly visualize it (e.g., the philosopher cites person X only once, whereas they cite person Y 115 times) is important.

As far as I know, Google et. al. do not allow this. I have not had time to do my research on what other people have written about this topic, but after thinking about it I realized that the search box we all know and love on the homepage of Google may be our generation’s 8 track, especially when it comes to accessing the so-called deep web.

The future of search is so much more exciting than a text box and ten blue links.


blog comments powered by Disqus