Big Data | Feature
Training Next-Gen Data Superheroes
Faced with a shortage of professionals capable of taming Big Data, schools are launching data science programs to train a new generation of data specialists.
- By John K. Waters
Illustration by Miracle Studios
This is the third and final installment of CT's series on Big Data. The first article, "Big Data," explained the phenomenon and why it matters, while "Digging for Gold" explored how schools are using Big Data to improve student performance.
Run for your lives, the data is coming. The information equivalent of a giant meteor is hurtling toward Earth, threatening to bury us in data sets that could--ironically--contain the very seeds of our salvation. And the size of the threat seems to grow with each passing day. The latest data alert: By 2020, the amount of data generated daily will reach 40 zettabytes, or roughly 5,247 gigabytes for every person on earth.
That's one of the findings in a new report published in December by IT industry analysts at IDC. Cosponsored by EMC, the study casts doubt on our ability to capture the value of all this data, especially since we've barely tapped our existing stores of data. The study's researchers estimate that only 0.5 percent of the world's data is being analyzed effectively today.
"As the volume and complexity of data barraging businesses from all angles increases, IT organizations have a choice," said Jeremy Burton, executive vice president of product operations and marketing at EMC, in a statement. "They can either succumb to information-overload paralysis, or they can take steps to harness the tremendous potential teeming within all of those data streams."
That's sound advice as far as it goes, but many organizations are failing to tap this rich resource because there simply aren't enough IT pros capable of doing it. Where are the superheroes of the information age? Where are Data Woman and Byte Boy?
In a report published in May 2011, analysts at McKinsey & Co. predicted that the US would soon face a shortage of "140,000 to 190,000 people with deep analytical skills, as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings."
Higher Ed Ramps Up
Yet these gloomy forecasts and scary statistics don't take into account recent initiatives in higher education intended to fill the skills gap. A growing number of colleges and universities have launched--or plan to launch--graduate programs designed to turn out specialists in a field many are calling "data science."
"In some ways it's a catchall term that includes [a wide variety of] statistical analysis, predictive modeling, text analytics, and so forth," says James Kobielus, a former Forrester analyst who now serves as Big Data Evangelist at IBM. "But clearly the data scientist is going to play a key role in this Big Data phenomenon. And one of the big questions companies are asking right now is, 'Where are the data scientists of the world coming from?'"
If Terence Parr has his way, some of them will graduate from the University of San Francisco (CA). Last fall, USF launched a Master of Science in Analytics program in response to "massive demand for this set of skills among both students and companies," says Parr, an associate professor of computer science who conceived and runs the multidisciplinary program. The first dozen students are on track to graduate in June.
"We've built all new classes for the program, hired new faculty, and contracted with some strong adjuncts from Google and others," explains Parr. "We're a small school, and we have excellent support from the upper echelon, so we were able to move quickly on this."
Capturing the value of Big Data--loosely defined as data sets too large and/or diverse for conventional tools to manage and mine effectively--requires a cluster of capabilities not typically required or taught within a single academic program. Essentially, USF has combined business, computer science, and graphic design to teach students how to collect, process, analyze, and report on massive data sets.
"Students need to know something about economics, business, and finance, so they can ask the right questions and understand the data," Parr says. In addition, students must have some computational ability, so they can clean and prepare the data for processing by analytical tools. They also need to understand data modeling, so they can interpret the results and explain them. Finally, they must learn visualization techniques so they can share the value of their findings with others. "We want our graduates to be able to do the whole thing," notes Parr. "Companies don't want somebody who's a siloed specialist and needs five other people to do his job."
According to Parr, USF's new graduate program has quickly become a competitive differentiator for the school, attracting savvy students who expect to pursue lucrative careers in data science. "I'm already warning my programming assistant that there's an onslaught of applications coming into the system," adds Parr. "And I get e-mails from companies all the time asking to be part of our career-development and recruitment events."
Anjul Bhambhri, IBM's vice president of Big Data, also sees a lot of student interest in this emerging field. "I speak at a lot of conferences and events, and there are always a fair number of undergrad and grad students who are taking a keen interest, not just in the technology, but in how the different verticals are trying to leverage big data," she explains. "They want to understand the technology, but also the use cases and the challenges the enterprises have and how to overcome them. A lot of questions [involve] what courses they should be taking, especially around analytics, statistics, and mathematics-related stuff."
Even so, few undergraduates arrive perfectly prepared for this evolving, multi-disciplinary field of graduate study. In fact, USF found it necessary to provide some serious remedial preparation for its first cohort, in the form of a summer analytics boot camp. The five-week program focused on what Parr calls the "three pillars" of the graduate program: the ability to write some software (SQL, Python, and R); familiarity with probability, statistics, and linear algebra; and some background in business, finance, and basic economics.
"It was an issue finding students who had these three things nailed," Parr says. "It's actually a rare combination. So, we had a boot camp where we pounded on them for five weeks before the [graduate] program really got going in the fall. Now we're able to tell undergraduates what they need to study if they want to pursue this graduate degree."
In September, Northwestern University (IL) began offering a Master of Science in Analytics through its McCormick School of Engineering. The 15-month professional degree program is the result of collaboration between the school and IBM. Big Blue has long been active in higher education through its Academic Initiative, which the company describes as "a program that offers colleges and universities access to the latest advances in technology and business industry expertise." In the case of Northwestern, IBM reportedly provided curriculum materials, project-focused case studies, and access to "a wide spectrum of software solutions."
"We have been talking with many schools and universities in the US, but also in countries such as China and India," Bhambhri says. "We're really helping to put Big Data in context for these students. We're talking to them about why it's important, and why enterprises care about it. That's a perspective that a company like IBM can bring: what's happening in the enterprise world."
Part of that context is recognizing the difference between the traditional college-level study of data systems and the emerging discipline of data science.
"Traditionally, college courses in data management have been about managing relational databases--in other words, structured data," explains Bhambhri. "But big data is mostly unstructured. For students coming out of undergrad and grad programs today with an interest in this stuff, it's important that they understand that taking courses that teach them how to manage, analyze, and report on structured data is not going to be sufficient. It won't give them the right skill set. They have to find a course of study that can help them get their arms around the unstructured world."
The Northwestern-IBM relationship is not unique. In May, the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT unveiled a research initiative called bigdata@CSAIL. At the same time, Intel announced the establishment of an Intel Science and Technology Center for Big Data at CSAIL. The computer chip maker is expected to contribute $2.5 million per year for up to five years to support the research center.
In August, Louisiana State University announced a collaboration with SAS, maker of business analytics software. The two organizations partnered to launch a Master of Science in Analytics in the school's Department of Information Systems and Decision Sciences (ISDS), which is part of the E.J. Ourso College of Business. The university describes the program as a combined effort between ISDS and the Department of Experimental Statistics. Students in a pilot program during the 2011-2012 school year were able to apply "cutting-edge SAS software to real-world data and challenges."
In addition to financial support, SAS provides LSU with experts to conduct on-site training, and hosts LSU faculty at the company's North Carolina headquarters. A SAS education expert serves on LSU's Industry Advisory Board, and the school gets free access to SAS software and teaching materials through SAS OnDemand for Academics. The school reports that all 38 of the first class of students to complete the new LSU program found employment within weeks of graduation, in companies ranging from Amazon to Bank of America and, of course, SAS.
Given that the lack of Big Data skills is being felt first in the corporate world, it's not surprising that companies such as IBM, Intel, and SAS are sponsoring programs to produce graduates who can help them with their growing data challenges. It also doesn't hurt that these companies make products that they'd like these graduates to use.
The USF program also benefits from the school's close relationship with local businesses. In fact, students work at various local companies throughout the yearlong program. Some of these internships are paid positions. "Companies have been very interested in working with our students," Parr says. "They get a low-risk way to check them out and find someone who'll really fit."
A Bright Future for Graduates
USF found its own low-risk way to provide its MS Analytics students with experience accessing and managing remote servers. The school has worked closely with Amazon Web Services (AWS), a division of Amazon that delivers IT infrastructure services in the cloud, to give all students their own virtual servers and on-demand access to compute resources.
"A lot of students have been using a laptop and know how to hit the 'run' button, but being able to access a giant set of cloud-computing resources and manage them remotely is an important Big Data skill that we drill into our students," explains Parr. "Using AWS, we can provide them with their own dedicated boxes, and they're free to run amok and do whatever they want, because it's virtual and we can throw it all away when they're done."
Despite the technical aspects that must be mastered, USF's Master of Science in Analytics needs to be considered a professional degree, Parr insists, not one focused on computer science or statistics. It's an assessment that IBM's Kobielus believes is fair.
"Fundamentally, this is about empowering the business analysts [in a company] to find patterns and trends, and to be able to do predictive forecasting," he says. "What these universities are providing are critical business tools for optimizing your company, finding opportunities, and nipping threats in the bud. This isn't a math project, but an ongoing effort to build a particular set of skills and to encourage promising young people to learn to work with data of disparate sorts to solve business problems."
And for these new professionals, the sky is the limit. "There are so few people in this field right now that some careers are going to launch like rockets," concludes Parr. "When there's a dearth of talent, people can get sucked right up to the top. Nature abhors a vacuum."