Predictive Analytics | Viewpoint

The Predictive Analytics Reporting Framework Moves Forward

A Q & A with WCET Executive Director Ellen Wagner on the PAR Framework

In the spring of 2011, WCET announced that it had received funding from the Bill & Melinda Gates Foundation for the Predictive Analytics Reporting Framework, to apply predictive analytics in higher education and examine the feasibility of creating a federated database to look for patterns of student loss (e.g., dropping out) and momentum (e.g., achieving academic success). In a few short months, six participating institutions (the American Public University System, the Community College System of Colorado, Rio Salado College, the University of Hawaii System, the University of Illinois-Springfield, and the University of Phoenix) have contributed “anonymized” student and course level records to a very large dataset. Project researchers have already begun to examine variables affecting student success, using the common dataset; rigorous and expanded analysis is planned. Here, WCET Executive Director Ellen Wagner (photo, above) comments on this milestone for the PAR Framework and on plans to expand the database to increase opportunities for sharing predictive analytics more widely.

Mary Grush: How did WCET get started on its work with the Predictive Analytics Reporting Framework?

Ellen Wagner: A number of WCET member institutions had, for several years, been using learning analytics and pattern recognition approaches to explore issues like student loss or momentum. To us at WCET, an obvious and compelling question was: What would happen if we federated data from many institutions into a single--and very large--dataset, so we could ultimately perform analytics that would be more reminiscent of business intelligence than of academic research or program evaluation? We obtained funding last year to demonstrate whether or not it was even possible to do this type of data federation. I’m pleased to say that it is, and we did.

The techniques we wanted to employ would probably be more familiar to someone working in eCommerce or marketing than in teaching and learning or instructional design. What really got us going was how we could see that just about every other sector in the world is able to use the digital information collected about our lives online to achieve their desired outcomes. We thought the time was right to look at this more deeply and to bring that sensibility to higher education.

Grush: How did you build your design for the kinds of data to collect?

Wagner: Prior to launching the PAR Framework project, WCET had supported the Transparency by Design project for the Presidents’ Forum, and discussions with many of our WCET members had helped us identify and define desirable variables for a project like this. Before we started the PAR research project we had already been meeting with the six partner institutions to learn what variables they tracked, and within the project framework we identified about 30 common variables across our six participating institutions. We went into the project with a general idea of the kinds of variables that would be common. Then, a big part of the work of the PAR team was to make sure the six participating institutions’ variables were commonly defined, comparable in meaning, and consistently used across the project. One of the key things we have found about this project is how differently each institution defines and tracks their student data--there is variation between the institutions on everything from how transfer credit is awarded to whether or not a course is completed, making the work around data definition as challenging as it is project-critical.

Grush: Where do you stand now, in your work with PAR? Will you now begin running analyses of the data?

Wagner: At this point in time we have pretty much concluded our work on the ‘proof of concept’ part of the Predictive Analytics Reporting Framework--building the federated dataset--and we’re expanding beyond the early analysis associated with the proof of concept. We plan to continue the investigation into what kind of patterns we can find in the aggregated dataset, which will include performing even more analyses.

In six months, we created a federated database of de-identified student records from six unique institutions. We have 640,000 student records and more than three million course records that have been placed into a single dataset, and we are now applying descriptive, inferential, and exploratory research techniques looking for patterns and relationships between and among variables.

Grush: As your team begins reporting on analyses during this next phase, what kind of predictive analytics might you see with this data? How might the dataset be used?

Wagner: In the next few months, as we more deeply explore what the data is telling us, we’ll uncover student and institutional patterns and potential ways the dataset can be used. We’ve already identified arenas for deeper exploration. Ultimately we want to explore ways to make a ­resource like this available for institutions--and if I really want to pipe dream, even for individuals--to take a look at how they stack up against other institutions--or other students--like themselves, in a number of areas. And we’ll explore how it might be possible for a query to generate recommendations from across the entire ecosystem of education opportunities--recommendations that would be targeted specifically “for me.”

That just gives you an idea of some of the things we will be researching. In a sense, we are stretching the boundaries of expectations for how these predictive analytics can be used, at scale, to support decision making in higher education. Our researchers are just beginning what will be a deep exploration of this dataset, so watch what happens over the next few months.

Grush: You mentioned individuals—does this have the potential down the road of being “Amazon-like”?

Wagner: The simple snapshot of comparing what we’re doing with systems like Amazon or Netflicks is fairly compelling, but it’s really important for us to keep in mind that, while there are a lot of transactions being tracked here, it would be naive to suggest that a college education is nothing more than a series of transactions. Still, our project’s principal investigator, Dr. Phil Ice, has pointed out that mathematically speaking, a point of sale calculation for a marketing platform really is not all that different from a learning outcome calculation--so the usefulness really depends on how you ask your questions and how you set up your algorithms.

Grush: Are you hearing any arguments against adopting these kinds of analytics?

Wagner: For some in education what we’re doing might seem a bit heretical at first--we’ve all been warned in research methods classes that data snooping is bad! But the newer technologies and the sophisticated analyses that those technologies have enabled have helped us to move away from looking askance at pattern recognition. That may take a while in some research circles, but in decision-making circles it’s clear that pattern recognition techniques are making a real difference in terms of the ways numerous enterprises in many industries are approaching their work. And with all the pressures on education, it seemed to us it was actually well past time to start finding alternative ways of asking questions and making decisions.

Grush: A lot of the information available about the PAR Framework points to student success as one of the major issues that analyses from this dataset could inform. You’ve talked about that here. But I’m sure there are numerous other areas that researchers will be interested in, given the scope of the data. Do you expect a lot of other research areas to be proposed, as work on this data opens up?

Wagner: Absolultely. Our hope is to do a great job creating the type of data resource that we can use to demonstrate, using our own questions, ways to help maximize probabilities of student success--with a keen eye on the national college completion agenda. We expect that other researchers, hearing about this, will come up with virtually an unlimited number of further research questions.

Grush: A big part of the PAR Framework is the idea of sharing data widely. What happens next in terms of the number of institutions participating in PAR?

Wagner: Again, as large as our dataset is, right now this is an exploratory project involving just six institutions and focused exclusively on the online coursework of their students. The dataset has only been in existence since October. In American postsecondary education, there are more than four thousand institutions. Basing predictions on results from just six institutions to the universe of higher education could be misleading. So what happens next? We’ll be continuing our efforts to build upon what we learned in this proof of concept project, expanding the database, increasing the number of participating organizations, and continuing to evolve and refine the definitions for variables that matter.

But it’s more than just about the numbers or even about representation. It’s the coming together as a community to share data and to learn from one another--validating what we’re doing across institutions. Every institution wants to be as successful as possible individually, but raising the bar for education across the board really is the thing that gets all educators up in the morning. It’s a very exciting time, and I think analytics, once we get our heads wrapped around them, are going to bring some profound changes--a whole different way of using technology than we anticipated. It’s about time.

comments powered by Disqus