MIT Researchers Develop Model To Predict MOOC Dropouts

Researchers from the Massachusetts Institute of Technology have developed a model that aims to predict when students will drop out of a massive open online course (MOOC).

The model, presented at last week's Conference on Artificial Intelligence in Education, was trained on data from one course and is designed to apply to a wide range of other courses. "The prediction remains fairly accurate even if the organization of the course changes, so that the data collected during one offering doesn't exactly match the data collected during the next," according to a news release.

The study was conducted by Kalyan Veeramachaneni, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory, and Sebastien Boyer, a graduate student in MIT's Technology and Policy Program.

"There's a known area in machine learning called transfer learning, where you train a machine-learning model in one environment and see what you have to do to adapt it to a new environment," said Veeramachaneni, in a prepared statement. "Because if you're not able to do that, then the model isn't worth anything, other than the insight it may give you. It cannot be used for real-time prediction."

Veeramachaneni and Boyer began by compiling a list of variables such as amount of time spent per correct homework item and amount of time spent on learning resources such as video lectures.

"Next, for each of three different offerings of the same course, they normalized the raw values of those variables against the class averages," according to information released by MIT. "So, for instance, a student who spent two hours a week watching videos where the class average was three would have a video-watching score of 0.67, while a student who spent four hours a week watching videos would have a score of 1.33."

That normalized data was then fed to a machine-learning algorithm that looked for correlations between the data and dropouts, referred to as "stopouts" in the MOOC world. Uncovered correlations were used to predict stopouts in the next two course offerings, with the process then repeated for the second course offering.

To improve the already fairly accurate model from there, Veeramachaneni and Boyer then employed importance sampling, which matched students in subsequent offerings of the course with students who most closely matched their variables in a previous offering, giving greater importance to students more closely matched.

That improved the accuracy, but not dramatically.

Moving forward, the team is working on tweaking the weight given to variables and looking to add more for the algorithm to work with.

"One of the variables that I think is very important is the proportion of time that students spend on the course that falls on the weekend," Veeramachaneni said in a news release. "That variable has to be a proxy for how busy they are. And that put together with the other variables should tell you that the student has a strong motivation to do the work but is getting busy. That's the one that I would prioritize next."

About the Author

Joshua Bolkan is contributing editor for Campus Technology, THE Journal and STEAM Universe. He can be reached at [email protected].

Featured

  • pattern featuring interconnected lines, nodes, lock icons, and cogwheels

    Red Hat Enterprise Linux 9.5 Expands Automation, Security

    Open source solution provider Red Hat has introduced Red Hat Enterprise Linux (RHEL) 9.5, the latest version of its flagship Linux platform.

  • glowing lines connecting colorful nodes on a deep blue and black gradient background

    Juniper Launches AI-Native Networking and Security Management Platform

    Juniper Networks has introduced a new solution that integrates security and networking management under a unified cloud and artificial intelligence engine.

  • a digital lock symbol is cracked and breaking apart into dollar signs

    Ransomware Costs Schools Nearly $550,000 per Day of Downtime

    New data from cybersecurity research firm Comparitech quantifies the damage caused by ransomware attacks on educational institutions.

  • landscape photo with an AI rubber stamp on top

    California AI Watermarking Bill Garners OpenAI Support

    ChatGPT creator OpenAI is backing a California bill that would require tech companies to label AI-generated content in the form of a digital "watermark." The proposed legislation, known as the "California Digital Content Provenance Standards" (AB 3211), aims to ensure transparency in digital media by identifying content created through artificial intelligence. This requirement would apply to a broad range of AI-generated material, from harmless memes to deepfakes that could be used to spread misinformation about political candidates.