Data Mining for Academic Success
Purdue’s academic analytics correlate data from the course management and student information
systems, to create predictive models that can support student retention strategies.
IN A PROJECT begun in 2005, researchers at
Purdue University (IN) are developing models to predict
academic success: academic analytics that will
eventually be used to create interventions for at-risk
students. Their first step was to identify data that could
be mined from the course management system (CMS)
and from the student information system (SIS), and
demonstrate which factors are most significant.
Researchers studied an initial sample of about 1,500
students during the Fall ’05 semester, and quickly
expanded their work to reflect the entire range of
WebCT supported classes at Purdue
in Spring ’06. Analyses now include data on some
130,000 seats in the CMS (individual students may be
counted more than once if they take more than one
course), representing more than 30,000 students.
Exploring the Factors
Project lead John Campbell, Purdue’s associate VP for
Teaching and Learning Technologies, explains how the
study looks at the factors influencing academic success:
“Academic success is really based on two different
components: aptitude and effort. You can be the
smartest person in the world, but if you don’t put in any
effort, you’re not going to be successful. And people
with less aptitude, who put a lot of effort into it, can be
very successful.” So the researchers are rigorously
examining indicators of aptitude and effort, by mining
historical data such as SAT scores and GPA from the SIS
(reflecting aptitude), and data on student use of the CMS
from the Oracle back-end database connected
to their WebCT system
(reflecting effort).
The example in the graph
above is a representative sample
of 600 students across a range
of classes and departments at
Purdue. The chart shows the
number of WebCT logins (where
the fourth quartile is high and relative
to the given class), the SAT
scores (where the fourth quartile
is high and relative to student
SAT records for the given class),
and the earned grade for the
course (where A=4.0). This analyis
demonstrates that the number
of WebCT logins tends to impact the final grade—more
dramatically in the case of students with a history of lower
SAT scores and fewer WebCT logins.
Predicting Is in the Future
Ultimately, the end goals are to develop intelligent agents
that will automatically take actions (such as alerting the
instructor that a student is likely in trouble, or notifying the
student about help sessions that are available), and to provide
trend data to administrators with an interest in retention.
Campbell explains: “We have a lot of retention initiatives;
the biggest challenge is getting the right people to
the right initiative.” He points out that early intervention can
be critical to success—and interventions may be more
timely when triggered by academic analytics.
Editor’s Note: John Campbell and a team from Purdue
will present their work on academic analytics at Campus
Technology 2006 in Boston. For more information, go to
www.campus-technology.com/conf.