Inside the First-Year Data from MITx and HarvardX

Harvard researcher Andrew Ho on what we can learn from the first set of edX MOOC data.

Last week, MIT and Harvard University released a series of working papers based on data from 17 massive open online courses offered on the edX platform from 2012-2013. The goal of the study was "to research how students learn and how technologies can facilitate effective teaching both on-campus and online."

The data sets were considerable: On average, 20 gigabytes of data were analyzed per course. Among the key findings:

  • Course completion rates are not necessarily an indicator of a MOOC's impact on students. Even when large numbers of registrants failed to complete a course, they still accessed substantial amounts of course content.
  • While 50 percent of MOOC registrants dropped off within a week or two of enrolling, attrition rates decreased substantially after that window.
  • The most typical MOOC registrant was male, 26 or older, with a bachelor's degree — yet that demographic represented fewer than one in three students.
  • There are considerable variations in average demographics across courses, in terms of gender, college degree attainment, median age and percentage from the U.S.

CT asked Andrew Ho, associate professor at the Harvard Graduate School of Education and a lead researcher on the project, for his take on the study findings and MOOCs in general.

CT: Did any of your findings surprise you?

Ho: There is a reason that we at HarvardX and MITx released 16 reports on a single day and not just one: Each of our courses is different, with different content, different registrants, different learning goals and different philosophies about what a "MOOC" is and should be. It's easy to make the mistake of believing that MOOCs are a monolithic entity, all with common structure, common students and common goals. The data, both quantitative and qualitative, surprised me with their variability across students and courses.

Our results show considerable differences on all demographic variables, from gender to age to prior educational attainment. Science, technology, engineering and mathematics courses had larger proportions of male students and had younger age distributions on average. Our courses in the School of Public Health had older students who were particularly well educated and international. And we show that every course has substantial representation from young students, old students, educated students, uneducated students, international students and students from countries with low average socioeconomic status.

CT: What are some of the challenges in interpreting the data?

Ho: The first challenge is remembering that these courses are open and online. To some registrants, they are not courses as much as Web content to surf. We purposefully called students "registrants" in our reports, and we were careful not to assume that a registrant is anything like a conventional student in a college or graduate school. We tried to let the data tell their story rather than impose a conventional analytic lens from higher education.

The second challenge is asynchronicity. Some registrants enroll months before a course launches and others months after it technically closes. The certification rates for some of the courses in our reports are dropping, simply because registrants are still enrolling now, well after the certification window has closed. They become "dropouts" the second they enroll, and then they stick around and learn. There is no conventional analog to these students in higher education, and calling them "dropouts" does not adequately tell their story or the story of MOOCs.

CT: How do you define student success in a MOOC?

Ho: In our reports, we are careful to describe what registrants did, not evaluate whether they were "successful." Did certified students learn anything, or were they already experts and simply desire certification? If a student is interested in one module of a course, takes it, and never completes the rest, is that student successful? If registrants with advanced degrees in physics breeze through a physics MOOC for material to teach their own courses, are they successful? If a bored web surfer registers for a class and watches a single video about the meaning of justice, never to return, is that a failure? Without the cost structure, accountability structure and expectations of conventional college classrooms, answering these questions is difficult.

This will be unsatisfying to people who want to answer the question, "Do MOOCs work?" Our research demonstrates that we have to get specific: "Work for what?" The presumed downstream use of MOOCs is to enhance, replace and disrupt existing models of higher education. The data that we analyze are not from this downstream use case, but one where the near-term goal was open access and experimentation. To be clear, I think we need to define student success in a MOOC, and we present some metrics that show promise in our reports. Stay tuned for Year 2.

CT: Did you find any indicators that might predict student performance?

Ho: Yes, but we downplayed them in the report, and for good reason. In a related paper with MIT coauthors, we show correlations with performance and also discuss why they are misleading. In MOOCs, student performance is confounded with student interest far more than in conventional classrooms, where student level of interest is relatively homogeneous.

What if I told you that usage of discussion forums predicts student grades? That might make a good headline: Online discussion increases MOOC performance! But the headline is misleading. Everything predicts MOOC performance, because doing anything in this space separates you from the thousands of people who are doing relatively little — thus doing anything predicts doing anything else. Nonetheless, our Year 1 findings have helped us to motivate and design our Year 2 research, currently underway, where we have a number of randomized experiments in place that will allow us to estimate causal impacts on performance and persistence. We can't wait to present these results.

CT: What kind of further research is needed?

Ho: We need more reports like this from other institutions and from other MOOC providers. And we need better data. As we mention in our report, we do not have sufficient data about registrant demographics and intentions to tell whether students, instructors and institutions are achieving their goals. HarvardX now has a common survey instrument across its courses, and we will rely on that heavily when we report our Year 2 results. Finally, although there has been excellent informal experimentation in the MOOC space, we need more rigorous, systematic, controlled experiments to understand how we can increase the likelihood of achievement and persistence through open online courses.

CT: Where would you place MOOCs on the hype cycle?

Ho: I think one third has inflated expectations and two thirds are in the trough of disillusionment. If anything can get us responsibly to the plateau of productivity, it's good data and good research. This is just the beginning. We look forward to Year 2.

Featured