Everyone knows that the Internet and the Web contain an enormous amount of information. It was estimated that, in 2005, over 12 billion documents were stored online. In order to sort through all of those documents and find what we need, we rely on Internet search engines and, more often than not, Google. Web crawlers (or spiders) sift through html source code to search for key words that are indexed for future Internet search reference.

Although search engines have become very advanced, there is a critical, fundamental flaw in the technology, mainly due to the tremendous growth rate of information on the Internet. According to a New York Times article, by the year 2010, information that is available in the world will outnumber the available storage available by a factor of nearly two to one. If we think logically about the situation, search engine results cannot possibly filter the 12 billion documents well enough and fast enough to get the best, most relevant information possible.

According to a 1999 study by Lawrence and Giles, who were among the first to explore search engine technologies, no search engine indexes more than 16 percent of the Web. Although a team of four Google “spiders” can crawl at an estimated 100 pages per second – or around 650 kilobytes of data per second – it is still not fast enough. What this ultimately means is that there is too much information on the Internet for our relatively primitive search engine technologies to sort through.

In addition to this primary flaw of search engines, there are other issues. Because the search engines work in a straightforward and methodical way, the algorithms can be exploited to make the sites rank higher on Google search results, thus bring in more traffic and therefore more revenue. This is known as search engine optimization (SEO) or search engine marketing (SEM).

As we look into the current landscape of the Web, we can see a possible solution to the problem of too much information for such primitive search engines. The solution involves the human mind and, more importantly, the human network.

First, let’s examine the capability of the human mind. The new “cell” central processing unit used in the new Playstation 3 is estimated to have a performance of two teraFLOPS (two trillion floating point operations) per second; however, the human brain is believed to have a theoretical performance of one hundred teraFLOPS per second. What this means is that humans have a vastly superior ability to sift through information and decide if it is relevant or not; with the help of the human social networking of the emerging Internet, we can predict that future search engines will not rely on computer software to sort out relevant information and documents but instead will have human beings deciding on the relevancy of Internet content. I don’t want to reiterate my previous column, but we can see this happening already on the Web. Take Younanimous.com for example; for this search engine, the user searches through one of the popular search engines (i.e. Google, MSN, Yahoo!) and submits what he has found to Younanimous.com. Then Younanimous.com will rank those links based on popularity and will be archived for future search queries. Through this democratic system, superior, quality content will float to the top of the rankings where it will have the best chance of exposure. I’ve tried out their service, and the search results were mediocre. However, because the site is relatively new, we’ll have to wait a bit to see it’s full potential.

Currently, exposed content is mainly due to good SEO or SEM practice, which does not guarantee the best content. But using this new Internet search engine architecture, we can see a more balanced and fair method ranking quality content. This will ultimately mean a more advanced Internet and more relevant content for the end-user, which, especially for university-level students, is a good thing.

Print