With most queries turning up millions of results, how does a site like Google deliver the most relevant and accurate data as top results? Most people do not give much thought to the intricate workings of the search engine that they use daily.
Computer science professor Tao Yang, a former chief scientist of Ask.com has done extensive work on this subject. According to Yang, the search engines help users avoid having to sift through irrelevant or repetitive information.
“What a search engine does is perform data mining on different websites to analyze their relevance,” Yang said. “What often turns up is that there are many duplicates posted in a variety of sites. By finding these duplicates, we can eliminate them, making searches more accurate and efficient.”
This process, combined with document clustering — which arranges results by their relevance to the query, — eliminates the potential hassle of manually searching through individual links.
While search engines are the main users of sorting and ‘mining,’ the practice of similarity computing has a wide variety of uses, some of which even involve saving lives. According to Maha Alabduljalil, a graduate student working with professor Yang, similarity computing may be useful for medicine, business and police investigations.
For example, if the police have a sketch of a criminal, instead of going through the painstaking task of trying to identify the culprit on their own, investigators can use “content-based image retrieval.” Using this type of technology, the suspect’s photo can be analyzed and checked against all police records simultaneously, making criminal identification much faster. Likewise, a business can sift through customer reviews using such a system, clumping together reviews based on similarity and enabling them to more easily identify product defects and gauge customer satisfaction. In medical practice, patient records can be analyzed in which similar cases are grouped together, potentially providing doctors with faster, more consistent ways of treating and diagnosing illnesses.
According to Yang, by providing doctors with this technology, physicians can diagnose large numbers of patients more accurately and give them more effective treatment. However, Yang says it will be some time before such a system could be implemented, if it ever goes ahead, due to the strict confidentiality associated with patient records. He says an effective way to get around such an obstacle would be to ask for patient consent, while removing their name from the records to make it anonymous.
Search engines themselves can also aid in combating disease. For example, Google’s Trends, which documents the most popular search queries over time, may be able to predict flu outbreaks, since the popularity of flu symptoms searches is recorded by the search engine. Accordingly, when users search for information about the flu in increasing amounts, it usually correlates with an increased figure of flu cases in Center for Disease Control records. According to Google, their trends can be sorted by nation and are updated daily, while health agencies are typically slow to update their records. By using search queries, Google claims that workers in the health industry can better prepare for flu outbreaks.
In sum, the streamlining of once time-consuming tasks enables authorities to respond to emergency situations much faster and more effectively, while allowing companies to offer better products and services to consumers. So Gauchos, next time you hit the “Search” button, realize you are part of something bigger than the hangover cures, couch-surfing opportunities and recipes with fewer than three ingredients that you’re currently perusing. In reality, you’re only a statistic in a larger network, but for once, that may not be a bad thing.