Building DNA in the Cloud

Penn State researcher Howard Salis created a simple tool for a complex process — DNA sequencing — and turned it into a highly scalable, on-demand system that serves scientists all over the world.

2014 Campus Technology Innovators Awards

Category: IT Infrastructure and Systems
Institution: Penn State University
Project: DNA Compiler
Project lead: Howard Salis, assistant professor of biological and chemical engineering
Tech vendor/partner: Amazon Web Services

Howard Salis' Twitter bio sums up his work well: "Creating synthetic microbes from the bottom-up."

As assistant professor of biological and chemical engineering and synthetic biology at Penn State University, Salis develops physical models that predict how DNA is interpreted inside an organism. Specifically, the models predict the rates at which that DNA will cause the organism to produce the corresponding amount of protein. "We can use these models to rationally engineer organisms to carry out new activities including the production of biofuels, plastics and drugs," he said.

Howard Salis
Howard Salis

Several years ago, when Salis was a postdoctoral fellow at the University of California, San Francisco, he and other researchers would combine many different genetic parts, trying to engineer a genetic system to have a particular desired behavior. "There were so many possible combinations that we could have put together and yet we only had the time and resources to think of a few and see if they worked," he explained. "So a lot of the research was trial and error in that regard. But today using physical models we can actually calculate the thermodynamic properties of these different genetic parts and we can make predictions about how they will work together when put together. It is like AutoCAD for biology."

In 2009 he observed that his field of synthetic biology needed improved computer-aided design software for researchers to do their work more efficiently. In response, Salis, who said he has been programming since he was 12 years old, created and launched the DNA Compiler Web portal in early 2010.

In developing the DNA Compiler, he recognized that a streamlined user interface was important. "The calculations are complicated, but nobody will use it unless the interface is friendly, so we basically took a very complicated model and put a simple input/output relationship on top of it on a clearly designed Web site," Salis said. If you are a biologist, you don't need to know how the model works, he noted. You can copy and paste in your DNA sequences and get predictions back. You can also tell the algorithm what you would like to accomplish in terms of how much protein it should express, and then the algorithm will design for you a completely new DNA sequence that will achieve that outcome.

DNA Compiler
Behind the DNA Compiler Web portal's simple user interface, a powerful tool provides complicated calculations for genetic research.

Within six months of Salis making the DNA Compiler available, researchers from Japan, China, the U.K, France, Finland and Sweden began using it. Salis created a Google Analytics map of usage. "Basically all the people who do synthetic biology research or metabolic engineering research at the institutions well known for those areas of research are using it," he said.

But the portal's popularity meant long queues on the server located in Salis' office. The server could run 16 simultaneous jobs and even then, there were 80 to 90 jobs in the queue. With the amount of data involved, jobs could take a significant amount of time and slow research. For example, more than 100 compute hours are required to predict the E. coli genome's protein production rates.

Although Penn State has its own high-performance computing resources, those systems are not connected to the Internet. "They are very finicky about people connecting to their computers from outside the network, which is understandable," Salis said.

The solution? Salis moved the DNA Compiler to the cloud using Amazon Web Services (AWS).

The portal now combines AWS Elastic Compute Cloud's AutoScaling groups for compute resources with Simple Queue Service (SQS) to decouple application components so they can be run independently, as well as Simple Storage Service for storage. This design has eliminated the need for researchers to wait in line for their jobs to be run — and it has made calculation times faster as well.

"We now have a nice on-demand computing system, where users from around the world can submit their jobs," Salis explained. "Compute nodes dynamically turn on in response to those jobs; they run them and then they turn themselves off."

Salis said there are certainly other cloud service providers to consider, but noted that Amazon has the largest compute cloud available. "Something like 40 percent of Internet traffic is Netflix, and it runs on Amazon AWS. I have my research lab, and people use my Web site; if Amazon were to have a server malfunction, they are not going to care about me. But they are going to care about Netflix. So it will get fixed really quickly. That is what you are signing up for: always-on access, almost scalable to infinity and low cost because of the economies of scale. Also, Amazon had already fully developed their platform by the time I started to use it, so I didn't have to learn how to use it while they were still developing it — which was not the case for other providers."

More than 2,000 biotechnology researchers designing over 30,000 synthetic DNA sequences have used the DNA Profiler over the past two years. The vision for this project is the global optimization of every nucleotide within a genome to perform a specific and useful task.

According to Salis, a crucial point is that cloud solutions such as AWS allow you to develop highly scalable on-demand resources that are connected to the Web. "So if there are some applications or interfaces that someone would like to develop that have to be connected very broadly to the Internet, it is much better to use a computing cloud environment than a dedicated hardware environment."

For a person running a research lab, if there is a problem that requires some intensive computing but you only need to solve it once, you may not want to deal with the hassle of buying or using institutional computational resources. "But if you have access to the compute cloud," Salis explained, "you can solve that problem in a short period of time using the exact same software you would normally use."

For more information on the Campus Technology Innovators program, visit the awards site.

Featured

  • From Fire TV to Signage Stick: University of Utah's Digital Signage Evolution

    Jake Sorensen, who oversees sponsorship and advertising and Student Media in Auxiliary Business Development at the University of Utah, has navigated the digital signage landscape for nearly 15 years. He was managing hundreds of devices on campus that were incompatible with digital signage requirements and needed a solution that was reliable and lowered labor costs. The Amazon Signage Stick, specifically engineered for digital signage applications, gave him the stability and design functionality the University of Utah needed, along with the assurance of long-term support.

  • Abstract geometric shapes including hexagons, circles, and triangles in blue, silver, and white

    Google Launches Its Most Advanced AI Model Yet

    Google has introduced Gemini 2.5 Pro Experimental, a new artificial intelligence model designed to reason through problems before delivering answers, a shift that marks a major leap in AI capability, according to the company.

  • Training the Next Generation of Space Cybersecurity Experts

    CT asked Scott Shackelford, Indiana University professor of law and director of the Ostrom Workshop Program on Cybersecurity and Internet Governance, about the possible emergence of space cybersecurity as a separate field that would support changing practices and foster future space cybersecurity leaders.

  • Two stylized glowing spheres with swirling particles and binary code are connected by light beams in a futuristic, gradient space

    New Boston-Based Research Center to Advance Quantum Computing with AI

    NVIDIA is establishing a research hub dedicated to advancing quantum computing through artificial intelligence (AI) and accelerated computing technologies.