Open Menu Close Menu

Data Management | Feature

Mining Data to Help Students

In the second installment of a two-part series, CT examines how pioneering schools--either alone or in consortia--are mining Big Data in hopes of uncovering the ultimate riches: improved student learning and performance.

Illustration by James Steinberg

There's gold in them thar' hills. In the case of higher education, though, the hills are more like mountains--mountains of data that "we're accumulating at a ferocious rate," according to Gerry McCartney, CIO of Purdue University (IN). Surprisingly, perhaps, the riches within have been largely undisturbed.

"Every higher education institution has this data, but it just sits there like gold in the ground," complains McCartney. "Big Data and the new tools we're seeing now are about mining that gold. It's about extracting real value from the data."

This story appeared in the November 2012 digital edition of Campus Technology with an exclusive video interview.

While the quest to extract this value may not resemble the original 49er gold rush yet, many institutions have at last decided to stake a claim. Some are evaluating third-party Big Data systems; others are testing new environments for cross-institutional predictive analytics; and a few are developing their own in-house tools.

Signals of Intent
Purdue was one of the earliest prospectors. Not only has the school developed its own set of Big Data analytics tools, but it has successfully commercialized them. Launched in fall 2009, Purdue's Course Signals is a Hadoop-based system designed to track academic progress and warn students in real time if they need to work on certain areas.

"A student can be halfway through a class before he realizes that he's not going to do well in it, and that's too late for most students to remediate themselves in a useful way," explains McCartney. "Signals is designed to get you in front of that problem with Big Data analytics. The system allows us to predict as early as the second week of a class whether a student is likely to be unsuccessful in it, so you can have interventions much earlier on."

Purdue claims that students in courses using the Course Signals system receive more B's and C's, and fewer D's and F's. And the number of students earning A's and B's has increased by as much as 28 percent in some courses. In 2009, Purdue licensed its Signals system for commercial distribution to SunGard Higher Education (now Ellucian).

Purdue has also developed an application called Hotseat that captures in-class comments posted by students via their Facebook and Twitter accounts. The program allows everyone in the class--students and teachers--to view the messages. Another Purdue tool, Mixable, is a classroom-centered social-learning environment designed to allow students to create online study groups within Facebook and share documents through the Dropbox cloud-storage application. According to McCartney, data gathered in these two applications could eventually be streamed to the Signals application to create an even richer source of information.

"These tools are evolving, and we're still discovering things," explains McCartney. "Through Signals, for example, we've learned that the results of the analytics vary somewhat with the different disciplines. Behaviors that would be irrelevant in one discipline might be damaging in another--or maybe beneficial in a third. There isn't a uniform algorithm that applies to zoology, calculus, and Edwardian literature. It's intuitively obvious that this would be true, but the science bears it out, as well."

In "Big Data, Part I," author John K. Waters explains what Big Data is and why it matters.

Charting a Degree Path
Austin Peay State University (TN) offers another example of a homegrown Big Data analytics solution. In 2011, the school launched Degree Compass, a course-recommendation tool inspired by similar systems at Netflix, Amazon, and Pandora that offer personalized suggestions for movies, books, and music. Developed with funding from the Bill and Melinda Gates Foundation, the APSU system pulls information from various student information (SIS) and learning management (LMS) systems to recommend courses based on degree requirements and predicted grades. In making its recommendations, the system draws on a student's previous grades (including college, high school, and SAT scores), a database of 500,000 grades of other APSU students, correlations among grades in different courses, and the requirements of a major.

Like Course Signals, Degree Compass alerts instructors and advisers of potential problems, but it also provides more of a preemptive strike. Essentially, the system predicts the grades a student is likely to earn in any class in the course catalog; the student can then use those predictions to choose classes in which he is more likely to succeed.

System designer Tristan Denley, who is both a math professor and provost at APSU, claimed in a New York Times article that Degree Compass predictions have proved accurate to within about a half letter grade. Three other Tennessee schools are currently using Degree Compass in pilot programs: Nashville State Community College, Volunteer State Community College, and the University of Memphis. ASPU is reportedly considering licensing the system to a third-party vendor.

Knowledge Through Numbers
One of the underlying precepts of Big Data is that the bigger the data set, the more insightful the analysis can be. So far, the tools at Purdue and APSU are self-contained. Charles Thornburgh, CEO and founder of Civitas Learning, believes that such systems would be more powerful if they evolved in a collaborative ecosystem. To that end, his Austin, TX-based educational-analytics company has begun building what he describes as "a community of institutions using data to inform smarter decisions throughout the student lifecycle."

"We've seen some remarkable solutions like Tristan's [Degree Compass] that have been built at individual campuses," Thornburgh says. "But there have really been only two paths for them going forward: Be satisfied with the fact that your particular tool won't ever see the light of day outside your campus; or sell the products you've created--the vision, the idea, the name--to a third-party vendor, as Purdue did with Signals. We're trying to create a third option."

Announced in May, the Civitas Learning Community is a nascent network of four-year institutions, community colleges, and online universities that will link their homegrown Big Data solutions to a normalized data model built across the institutions. The fact that the data is aggregated, says Thornburgh, will make it a richer and more useful resource for everyone in the community. Data will be drawn from SISs, LMSs, customer relationship management systems (CRMs) and "any other systems…that contain student learning data that would be relevant to predicting students' likelihood of success."

"Essentially, we're federating data and insights in a common analytics infrastructure across different institutional types," explains Thornburgh. "But we're also making it possible to scale what you might call a single-campus solution, which is a totally different set of challenges."

The goal of the Civitas Learning Community is to create an ecosystem of front-end apps to translate the insights from predictive analysis into specific recommendations that can directly impact student outcomes. According to Thornburgh, these are student-, adviser-, faculty-, and administrator-facing apps. Some will be built by Civitas, others by partner institutions, and still others by publishers and ed-tech providers.

"Education is inherently more collaborative," adds Thornburgh. "Penn State does not need Ohio State to fail for it to feel successful academically. Education is not a zero-sum game from that perspective. We all know that many of these schools share a similar set of challenges."

The idea of federating data from different types of institution lies at the heart of another Big Data project, the Predictive Analytics Reporting (PAR) Framework, launched last year by the WICHE Cooperative for Educational Technologies (WCET). According to the group's website, the community's mission is "accelerate the adoption of effective practices and policies in technology-enhanced teaching and learning in higher education."

PAR started as a proof of concept with a data set from six participating institutions comprising more than 640,000 anonymized student records and more than 3 million course-level records, focusing on 33 common variables. Among the participating schools are public, private, two-year, four-year, and proprietary institutions.

While stressing the collaborative and complementary nature of the work performed by PAR and the Civitas community, Thornburgh did identify some differences in approach. "The data set from our first two schools was about 10 times the size of the data set of the six schools they aggregated, because we're pulling in dramatically more granular LMS, CRM, and historical data. In some cases we're pulling in 30 years of historical data on different student profiles and demographics and case histories."

So far, the Civitas Learning Community is working with six institutions, including Austin Community College (TX) and University of Maryland University College, as part of a closed beta, and expects to expand the program in 2013. 

What attracted UMUC to the Civitas community was not the size of the data sets but the potential value of combining disparate sets. "You have two problems to solve here," explains Darren Catalano, assistant vice president of business intelligence. "Harvesting the Big Data, and then combining the data sources to form a cohesive picture--which is data modeling. The value from a Big Data perspective is in combining the disparate data sources, modeling the resulting data set, and then using an analytical technique on top of that data to develop your insights. Then you have to operationalize them. It takes those steps to be successful in analytics in higher education or any other industry."

UMUC is a non-traditional university focused on adult education, primarily online. From a Big Data perspective, this gives it an advantage. "Because we're online, we can capture a lot more data," notes Catalano. "It's like the difference between an online retailer and a brick-and-mortar store. The online retailer knows a lot more about its customers: They know what they looked at, how long they looked at it, and they can follow their clickstream."

UMUC's ultimate goal, of course, is to improve student outcomes, and it is actively exploring solutions to reach that goal. The school is currently working with a third-party vendor to develop predictive models, using the data it collects to create a 360-degree view of the student.

"It's the advanced analytics on top of the data that lead us to the variables that have the most predictive power, and create a system that makes our insights actionable," says Catalano. "The result is a more personalized experience and a better chance for student success, which means higher retention rates, which means higher graduation rates. And that's our mission."

Outsourcing Non-Core Tasks
While most schools want the predictive benefits of Big Data, some recognize that developing these tools is not central to their mission. Instead, they're looking to others to provide the solutions. Arizona State University, which considers itself to be at the forefront of student data analytics, is one such school.

"When we're looking at outsourcing at ASU, we consider core versus context," explains John Rome, deputy CIO and BI strategist in the University Technology Office. The school has already outsourced its e-mail, as well as the hosting of its ERP and LMS systems. "As we consider Big Data, the question is why would we want to be in the data center business?" asks Rome. "Why should we have a great set of Hadoop clusters when we can look to a third-party vendor to do that while we focus on what is core to our mission?"

ASU is not coming to the issue cold. The school was among the first universities in the country to build a data warehouse, and maintains Big Data and Hadoop working groups. The school combs through data from its LMS, as well as from web logs, swipe cards, and social media, to find students with unmet requirements. The sophisticated data dashboards provided to ASU faculty are well known in higher education (see "Dashboards Deliver Data Visually at ASU").

"We know there are big opportunities in Big Data, and we fully intend to exploit them," concludes Rome. "But we know there are some challenges looming on the horizon. Vendors are knocking on our door daily, and we're going to see which ones really want to partner with us to come up with solutions."

comments powered by Disqus