Predictive Analytics | Viewpoint
The Predictive Analytics Reporting Framework Moves Forward
A Q & A with WCET Executive Director Ellen Wagner on the PAR Framework
In the spring of 2011, WCET announced that it had received
funding from the Bill & Melinda Gates Foundation for the Predictive Analytics Reporting Framework, to
apply predictive analytics in higher education and examine the feasibility of
creating a federated database to look for patterns of student loss (e.g.,
dropping out) and momentum (e.g., achieving academic success). In a few short
months, six participating institutions (the American Public University System,
the Community College System of Colorado, Rio Salado College, the University of
Hawaii System, the University of Illinois-Springfield, and the University of
Phoenix) have contributed “anonymized” student and course level records to a
very large dataset. Project researchers have already begun to examine variables
affecting student success, using the common dataset; rigorous and expanded
analysis is planned. Here, WCET Executive Director Ellen Wagner (photo, above) comments on
this milestone for the PAR Framework and on plans to expand the database to
increase opportunities for sharing predictive analytics more widely.
Mary Grush: How did
WCET get started on its work with the Predictive Analytics Reporting Framework?
Ellen Wagner: A number of WCET member institutions had, for several years, been using learning analytics and pattern recognition approaches to explore issues like student
loss or momentum. To us at WCET, an obvious and compelling question was: What
would happen if we federated data from many institutions into a single--and
very large--dataset, so we could ultimately perform analytics that would be
more reminiscent of business intelligence than of academic research or program
evaluation? We obtained funding last year to demonstrate whether or not it was
even possible to do this type of data federation. I’m pleased to say that it
is, and we did.
The techniques we wanted to employ would probably be more
familiar to someone working in eCommerce or marketing than in teaching and
learning or instructional design. What really got us going was how we could see
that just about every other sector in the world is able to use the digital
information collected about our lives online to achieve their desired outcomes.
We thought the time was right to look at this more deeply and to bring that
sensibility to higher education.
Grush: How did you build your design for the kinds of data to collect?
Wagner: Prior to launching the PAR Framework project, WCET had supported the Transparency by Design project for the Presidents’ Forum, and discussions with many of our WCET
members had helped us identify and define desirable variables for a project
like this. Before we started the PAR research project we had already been
meeting with the six partner institutions to learn what variables they tracked,
and within the project framework we identified about 30 common variables across
our six participating institutions. We went into the project with a general
idea of the kinds of variables that would be common. Then, a big part of the
work of the PAR team was to make sure the six participating institutions’
variables were commonly defined, comparable in meaning, and consistently used
across the project. One of the key things we have found about this project is
how differently each institution defines and tracks their student data--there
is variation between the institutions on everything from how transfer credit is
awarded to whether or not a course is completed, making the work around data
definition as challenging as it is project-critical.
Grush: Where do you stand now, in your work with PAR? Will you now begin running analyses of the data?
Wagner: At this point in time we have pretty much concluded our work on the ‘proof of concept’ part of the Predictive Analytics Reporting Framework--building the federated
dataset--and we’re expanding beyond the early analysis associated with the
proof of concept. We plan to continue the investigation into what kind of
patterns we can find in the aggregated dataset, which will include performing
even more analyses.
In six months, we created a federated database of de-identified student records from six unique institutions. We have 640,000 student records and more than three million course records that have been placed into a single dataset, and we are now applying descriptive, inferential,
and exploratory research techniques looking for patterns and relationships
between and among variables.
Grush: As your team begins reporting on analyses during this next phase, what kind of predictive analytics might you see with this data? How might the dataset be used?
Wagner: In the next few months, as we more deeply explore what the data is telling us, we’ll uncover student and institutional patterns and potential ways the dataset can
be used. We’ve already identified arenas for deeper exploration. Ultimately we
want to explore ways to make a resource like this available for
institutions--and if I really want to pipe dream, even for individuals--to take
a look at how they stack up against other institutions--or other students--like
themselves, in a number of areas. And we’ll explore how it might be possible
for a query to generate recommendations from across the entire ecosystem of
education opportunities--recommendations that would be targeted specifically
That just gives you an idea of some of the things we will be
researching. In a sense, we are stretching the boundaries of expectations for
how these predictive analytics can be used, at scale, to support decision
making in higher education. Our researchers are just beginning what will be a
deep exploration of this dataset, so watch what happens over the next few
Grush: You mentioned individuals—does this have the potential down the road of being “Amazon-like”?
Wagner: The simple snapshot of comparing what we’re doing with systems like Amazon or Netflicks is fairly compelling, but it’s really important for us to keep in mind that, while
there are a lot of transactions being tracked here, it would be naive to
suggest that a college education is nothing more than a series of transactions.
Still, our project’s principal investigator, Dr. Phil Ice, has pointed out that
mathematically speaking, a point of sale calculation for a marketing platform
really is not all that different from a learning outcome calculation--so the
usefulness really depends on how you ask your questions and how you set up your algorithms.
Grush: Are you hearing any arguments against adopting these kinds of analytics?
Wagner: For some in education what we’re doing might seem a bit heretical at first--we’ve all been warned in research methods classes that data snooping is bad! But the newer
technologies and the sophisticated analyses that those technologies have
enabled have helped us to move away from looking askance at pattern
recognition. That may take a while in some research circles, but in
decision-making circles it’s clear that pattern recognition techniques are
making a real difference in terms of the ways numerous enterprises in many
industries are approaching their work. And with all the pressures on education,
it seemed to us it was actually well past time to start finding alternative
ways of asking questions and making decisions.
Grush: A lot of the information available about the PAR Framework points to student success as one of the major issues that analyses from this dataset could inform. You’ve talked
about that here. But I’m sure there are numerous other areas that researchers
will be interested in, given the scope of the data. Do you expect a lot of
other research areas to be proposed, as work on this data opens up?
Wagner: Absolultely. Our hope is to do a great job creating the type of data resource that we can use to demonstrate, using our own questions, ways to help maximize
probabilities of student success--with a keen eye on the national college
completion agenda. We expect that other researchers, hearing about this, will
come up with virtually an unlimited number of further research questions.
Grush: A big part of the PAR Framework is the idea of sharing data widely. What happens next in terms of the number of institutions participating in PAR?
Wagner: Again, as large as our dataset is, right now this is an exploratory project involving just six institutions and focused exclusively on the online coursework of their
students. The dataset has only been in existence since October. In American
postsecondary education, there are more than four thousand institutions. Basing
predictions on results from just six institutions to the universe of higher
education could be misleading. So what happens next? We’ll be continuing our
efforts to build upon what we learned in this proof of concept project,
expanding the database, increasing the number of participating organizations,
and continuing to evolve and refine the definitions for variables that matter.
But it’s more than just about the numbers or even about representation. It’s the coming together as a community to share data and to learn from one another--validating what we’re doing across institutions. Every institution wants to be as successful as possible individually, but raising the bar for education across the board really is the thing that gets all educators
up in the morning. It’s a very exciting time, and I think analytics, once we
get our heads wrapped around them, are going to bring some profound changes--a
whole different way of using technology than we anticipated. It’s about time.