“I’m sorry Dave, I can’t let you do that.”
In a cramped, well air-conditioned, florescent-lighted room, UCSB’s version of HAL sits. Well, sort of – the Engineering I building is home to the Gargleblaster Cluster Parallel Computing Facility, but it’s not planning to take over the world or anything.
The Gargleblaster Cluster is a high-performance computer system that is composed of 42 individual computers that work in unison. Each computer has at least two processors and some have four processors. This type of configuration is know as cluster computing, and this specific type of cluster is known as a Beowulf cluster. Beowulf clustering was originally developed by NASA researcher Donald Becker and allows for off-the-shelf consumer PCs to be connected, creating a system that is capable of high performance without breaking the bank.
What makes cluster computing so powerful is that a single calculation can be broken down into multiple parts. Each part is then distributed to a different machine, or node, in the cluster. Each part is then computed by a separate machine at the same time, or in parallel. The end product is a much faster system that is capable of producing results in less time.
UCSB associate professor of computer science Tao Yang, together with a few graduate students, conducts cluster-computing research at UCSB. The current project is named Project Neptune and is aimed at creating software to manage a cluster and provide programmers with an easier way to create software to take advantage of clusters.
“The hardware is commodity-based, available on the market,” said Yang, “The purpose is to develop the software infrastructure so that the machines can connect together easily. If a service running on one machine goes down, another machine needs to provide the service immediately for fault tolerance. [Programmers] should not have to worry about writing complicated software to use [the cluster], they can use Neptune to automate that process.”
With traditional supercomputers from companies such as Sun Microsystems, IBM and Cray Research, only a select group of institutions could own a supercomputer because of the high cost of a system, often ranging in the millions of dollars. Sun Microsystems’ Sun Fire 15K Server can cost as much as $3.2 million. Most universities do not have the financial resources to invest in a supercomputer for research or teaching.
Cluster computing’s price to performance ratio is much more cost effective and makes it plausible for organizations and universities with smaller budgets to reap the benefits of high-performance computing. UCSB’s Gargleblaster Cluster cost about $200,000, which is inexpensive compared to most commercial offerings.
“You want to build it using cheap PCs and make it work very fast. That requires a lot of research to make the algorithm very fast and to make the computation very fast, to make effective use of a lot of machines,” Yang said.
The Gargleblaster Cluster system has 96 processors, 72 of which are Intel Pentium II processors. The remaining 24 are Intel Pentium III processors. The cluster is also equipped with 30 gigabytes of memory and each individual machine, or node, has two ethernet cards for network communication.
Neptune is used to manage the cluster, which entails adding, removing or repairing all nodes. One of the Neptune Project’s goals is to create what the researchers call a dynamic service migration.
“If one machine fails or goes down, you want to migrate the services to another machine,” Yang said.
Currently, service migration must be done manually, Yang said. One of Yang’s graduate students, Lingkun Chu, is currently working on automating this process.
Another main issue when dealing with a cluster is scalability, which is a measure of how well the can cluster grows (adding nodes) as the computations require more and more nodes.
“The fundamental issue is how to assemble a bunch of components into a large cluster installation. [One part of this] is the scalability issue and how to make the system scale to a large number of nodes,” said Kai Shen, one of Yang’s graduate students who will be an assistant professor of computer science at the University of Rochester this Fall.
While this may seem a bit of an overkill for an average day-to-day user, such high-power computer systems are often used for scientific computing such as weather-pattern modeling, particle simulations, artificial intelligence and DNA sequencing. They are also used to power flight simulators for the military and private industry, as well as in Hollywood to create computer-generated images (CGI) for movies.
“If you’re involved in large-scale scientific calculations, [such as] graph and matrix manipulations, you want to do something very fast, and that’s why you need high performance computing,” Yang said.
Yang travels from coast to coast, checking in with his students at UCSB and working to create a search engine that will rival Google.
Teoma is a new search engine that is looking to improve and perhaps surpass Google’s accuracy and popularity. Teoma’s advanced computing algorithms used to rank pages for searches require cluster computing. Neptune is used to manage Teoma’s clusters.
“Google had something like 6,000 [machines] initially, and now probably over 10,000 machines deployed for their search engine. There are so many machines, how do you manage them?” Yang said.
The Internet company and search engine AskJeeves acquired Teoma while Yang was working as chief scientist and vice president of research and development at Teoma. Yang is now the chief scientist for AskJeeves and Teoma.
“Teoma is an example of how cluster computing can be used in a real world application,” Yang said.
A search engine such as Teoma must respond to millions of requests from users each time a user runs a search. For Google or Teoma to remain running, they must use cluster computing to handle the workload.
“Normally, you have some kind of web service or some application running on one machine. Now you say, I want to run a lot of queries and [process] a lot of traffic, and you also have high speed [connections]. Then one machine is not enough, so you put a lot of machines together,” Yang said.
On a commercial system, it is critical that service to a client is not interrupted. The Neptune Project’s ability to balance large workloads is critical to Teoma’s operations. Having a single piece of software, such as Neptune, makes cluster management much simpler.
“At Teoma, we have to process terabytes of data. We have a different variety of machines, but we want to be able to manage them together and have a complete storage system,” Yang said.
Hong Tang is another graduate student studying under Yang and is currently researching how to use a cluster to act as a dynamic storage system.
Working with the Private Sector
Beowulf clusters provided universities with a way to achieve super-computing power with a relatively small budget, but collaboration between the private sector and higher education is mutually beneficial.
“At UCSB we have a small cluster, but we don’t have enough machines for this kind of research. That’s why it’s good to work with a company; they have the money and infrastructure available. You can do more large-scale research which you could not do in a university environment because of a limited budget,” Yang said.
Shen, Tang, and Chu all worked in the past for Teoma building cluster-based services.
“We collaborate with a commercial company [Teoma] to provide services to more than 7 million people every day. That kind of experience with real-world applications, from my point of view, is much better than experience with some toy-level application,” Shen said.
“Our goal is to provide leading technology to the industry and learn from the feedback, and that’s the way we make an impact,” Yang said.