Data Mining for Academic Success

Purdue’s academic analytics correlate data from the course management and student information systems, to create predictive models that can support student retention strategies.

StatsIN A PROJECT begun in 2005, researchers at Purdue University (IN) are developing models to predict academic success: academic analytics that will eventually be used to create interventions for at-risk students. Their first step was to identify data that could be mined from the course management system (CMS) and from the student information system (SIS), and demonstrate which factors are most significant.

Researchers studied an initial sample of about 1,500 students during the Fall ’05 semester, and quickly expanded their work to reflect the entire range of WebCT supported classes at Purdue in Spring ’06. Analyses now include data on some 130,000 seats in the CMS (individual students may be counted more than once if they take more than one course), representing more than 30,000 students.

Exploring the Factors

Project lead John Campbell, Purdue’s associate VP for Teaching and Learning Technologies, explains how the study looks at the factors influencing academic success: “Academic success is really based on two different components: aptitude and effort. You can be the smartest person in the world, but if you don’t put in any effort, you’re not going to be successful. And people with less aptitude, who put a lot of effort into it, can be very successful.” So the researchers are rigorously examining indicators of aptitude and effort, by mining historical data such as SAT scores and GPA from the SIS (reflecting aptitude), and data on student use of the CMS from the Oracle back-end database connected to their WebCT system (reflecting effort).

The example in the graph above is a representative sample of 600 students across a range of classes and departments at Purdue. The chart shows the number of WebCT logins (where the fourth quartile is high and relative to the given class), the SAT scores (where the fourth quartile is high and relative to student SAT records for the given class), and the earned grade for the course (where A=4.0). This analyis demonstrates that the number of WebCT logins tends to impact the final grade—more dramatically in the case of students with a history of lower SAT scores and fewer WebCT logins.

StatsPredicting Is in the Future

Ultimately, the end goals are to develop intelligent agents that will automatically take actions (such as alerting the instructor that a student is likely in trouble, or notifying the student about help sessions that are available), and to provide trend data to administrators with an interest in retention. Campbell explains: “We have a lot of retention initiatives; the biggest challenge is getting the right people to the right initiative.” He points out that early intervention can be critical to success—and interventions may be more timely when triggered by academic analytics.

Editor’s Note: John Campbell and a team from Purdue will present their work on academic analytics at Campus Technology 2006 in Boston. For more information, go to

comments powered by Disqus