MIT Researchers Develop Model To Predict MOOC Dropouts
Researchers from the Massachusetts Institute of Technology have developed a model that
aims to predict when students will drop out of a massive open online course (MOOC).
The model, presented at last week's Conference on Artificial Intelligence in Education, was trained on data from one course and is designed
to apply to a wide range of other courses. "The prediction remains fairly accurate even if the organization of the course changes, so that the
data collected during one offering doesn't exactly match the data collected during the next," according to a news release.
The study was conducted by Kalyan Veeramachaneni, a research scientist at MIT's Computer
Science and Artificial Intelligence Laboratory, and Sebastien Boyer, a graduate student in MIT's Technology and Policy Program.
"There's a known area in machine learning called transfer learning, where you train a machine-learning model in one environment and see what
you have to do to adapt it to a new environment," said Veeramachaneni, in a prepared statement. "Because if you're not able to do that, then
the model isn't worth anything, other than the insight it may give you. It cannot be used for real-time prediction."
Veeramachaneni and Boyer began by compiling a list of variables such as amount of time spent per correct homework item and amount of time
spent on learning resources such as video lectures.
"Next, for each of three different offerings of the same course, they normalized the raw values of those variables against the class
averages," according to information released by MIT. "So, for instance, a student who spent two hours a week watching videos where the class
average was three would have a video-watching score of 0.67, while a student who spent four hours a week watching videos would have a score of
1.33."
That normalized data was then fed to a machine-learning algorithm that looked for correlations between the data and dropouts, referred to
as "stopouts" in the MOOC world. Uncovered correlations were used to predict stopouts in the next two course offerings, with the process then
repeated for the second course offering.
To improve the already fairly accurate model from there, Veeramachaneni and Boyer then employed importance sampling, which matched students
in subsequent offerings of the course with students who most closely matched their variables in a previous offering, giving greater importance
to students more closely matched.
That improved the accuracy, but not dramatically.
Moving forward, the team is working on tweaking the weight given to variables and looking to add more for the algorithm to work with.
"One of the variables that I think is very important is the proportion of time that students spend on the course that falls on the weekend,"
Veeramachaneni said in a news release. "That variable has to be a proxy for how busy they are. And that put together with the other variables
should tell you that the student has a strong motivation to do the work but is getting busy. That's the one that I would prioritize next."
About the Author
Joshua Bolkan is contributing editor for Campus Technology, THE Journal and STEAM Universe. He can be reached at [email protected].