UCSB has become an important influence in the field of cloud computing, a system that shares processing data and resources between multiple computers via the internet.
The “cloud” is a metaphor for the internet, which is depicted in flowcharts as a cloud. Currently, UCSB is leading projects involved in making sensitive data more secure, utilizing a new type of data and developing open-source software — programs that, along with their source code, can be downloaded from the internet free of charge — that will work in a similar way to Google App Engine or Amazon Web Services.
Using such a system, computing power can become similar to a utility like electricity, where the users are charged depending on how much is used. Companies can then avoid needless expenditure on hardware and software and individual users can reap the benefits of high-level programs without having expensive systems.
Cloud computing enables the user to run multiple programs built for different operating systems at the same time through a virtual interface and has been becoming part of the worldwide computing experience.
Cloud computing is useful for companies because of its scalability — the ability to provide enough space and processing power for whatever is required. However, the use of cloud computing is not widespread because it tends to present a security issue.
“Companies are reluctant to take advantage because of the fact that they have to trust someone else with their data. Sometimes this is business critical data, very sensitive financial stuff, but you have to put it on someone else’s servers in order for you to take advantage of all these great things that cloud computing offers.” Ben Zhao, assistant professor of computer science at UCSB, said.
To make data more secure, he explained, sensitive information must be separated from nonsensitive data in an automated way.
“We’ve got some cool techniques to do this in an automated way. There’s a little program that traces through and watches how your system interacts,” Zhao said. “It figures out what information is sensitive and traces the flow of that information.”
The program tracks how the data interacts with other data and marks data that interacts with sensitive data.
“It’s sort of an infectious model,” Zhao said. “Whatever is touched by sensitive stuff becomes sensitive, and also we can quarantine it in a very secure way to guarantee that no sensitive information gets out.”
Encryption is used to obscure data using an algorithm that serves as a key for understanding the data.
Zhao explained that increasing security would make businesses more willing to begin using cloud computing in the workplace.
“I think that it’s going to have a great impact and I think that once this gets out there a lot of companies are going to be willing to make use of this because now they can trust that their information won’t be leaked out,” Zhao said.
Eucalyptus and AppScale, two open-source cloud computing services, were created at UCSB.
Rich Wolski, leader of the research project that eventually became Eucalyptus, said that the goal of the project was to learn how to run scientific programs on National Science Foundation supercomputers.
“Our research agenda was to understand how to run this very large, elaborate weather forecasting program on NSF computers,” Wolski said.
Wolski decided to combine the NSF computers with Amazon Web Services.
“Our goal was to run the code in AWS and on the supercomputers … in order to do that we had to build a software layer that could allow university data centers to pretend to be AWS,” Wolski said. “The software that we used to emulate AWS became Eucalyptus.”
The second cloud computing project from UCSB is a research project based on creating a cloud like Google App Engine, called AppScale. AppScale is for running multiple Google applications written in programming languages such as Python or Java.
Although a user may run multiple operating systems on a computer without a cluster, that ability, called virtualization, is separate from cloud computing. To take advantage of cloud computing systems like AppScale, one would have to have their own cluster.
“It’s not meant for a person who has just a desktop or laptop.” Chandra Krintz, associate professor of computer science at UCSB said.
She said that the research they are currently doing is to enable people to program in other languages, which would make it easier to program applications for AppScale.
New Data Type
“We’re looking at how to extend cloud computing to what we call graph data. Graph data is a new, interesting type of data that has become very popular,” Zhao said.
Tools from Microsoft and Google, Zhao explained, as big cloud clusters running in parallel using typical data compute by splitting the data into pieces and having them process on different machines.
“If you have typical database data, it is sometimes easy to chop it up in pieces, split it up across different machines and have them all process,” Zhao said.
However, this method of processing does not work when using highly connected graph data, described best as a web of data that is so connected with other data that a tiny piece of datum by itself does not make sense. This presents a challenge for cloud computing systems.
“We’re working on the software infrastructure to essentially do extremely scalable, large scale processing of this type of graph data and make it useful for the cloud.” Zhao said.
Some typical information on social networking falls under the scope of graph data. A feature of Facebook profiles is the “Suggestions” section, which recommends friends and fan pages for a person based on a large web of data, like networks and mutual friends.
“I have to look inside this big connected clump of data rather than get a single file. It’s a very different model for working with things. Companies have a lot of trouble dealing with [graph data] because they don’t know how to split it up, how to manage it in an efficient way.” Zhao said.
The goal is to allow the machines to run in parallel, as a cluster, without problems.
“Essentially, we’re writing another layer on top of popular cloud computing tools that allows you to take this kind of data that’s really tightly interconnected and break it up in a very interesting way that allows every machine to process in parallel without impeding each other,” Zhao said.
Limitations and What Lies Ahead
There are a few disadvantages to cloud computing, such as security holes and hardware/software limitations. These problems, while being researched, are keeping cloud computing from reaching the mainstream. Once these hurdles are met, however, working in the cloud may not require one’s head to be in one.