The Challenge of Understanding MOOC Data
Four years after the launch of edX, the data generated by massive open online courses still mystifies many institutions. Could inter-university collaboration unlock the secrets to better course delivery?
- By Dian Schaffhauser
Plenty of scholarly research has come out about massive open online courses since edX's official introduction in 2012. What's lesser covered is how the institutions running the MOOCs have used the data to improve learning in their regular courses. Part of the reason for that is that the colleges and universities involved in edX don't necessarily have the resources — expertise, tools or understanding — to exploit the torrents of data their courses generate.
That isn't true for every edX partner. For example, the founders of edX, Harvard University and MIT, signed a legal agreement early on to share their findings, data and tools. The two schools hold research meetings bi-weekly, said Dustin Tingley, political science professor and faculty director for Vice Provost for Advances in Learning (VPAL) research at Harvard; they also regularly trade code. "This isn't just a sharing environment of data but also a sharing environment of infrastructure and the tools that help us utilize that data," he added. "That's been a very special relationship."
For similar reasons, four smallish eastern liberal arts colleges working with edX — Colgate, Davidson, Hamilton and Wellesley — formed a collaborative in 2013 to share the cost and expertise of developing their online offerings, encourage cross-teaching among faculty, bulk up on the amount of data available for research and build systems for managing the MOOC data.
Those types of agreements have been the exception rather than the rule among the 106 or so edX partners. While edX delivers data on a weekly and nightly basis to partner institutions, partners are responsible for managing their own data, explained Daniel Seaton, a research scientist in VPAL. They're left on their own to figure out how "to build the infrastructure and reporting around the data." At many institutions, the data ends up going to the center for teaching and learning. "But there's not always somebody on staff who can handle that kind of data," Seaton noted. Therein forms the gap between data analytics and instruction: "The pie in the sky is that you complete the feedback loop. We have all this data. How do we feed it back into instruction?"
A Gathering of Learners
To address the data confusion among edX partners, Seaton and others recently organized a gathering of institutional analysts and data engineers from nine schools in the edX consortium to help them learn more about how to work with data their courses generate. The theme of the event was to share what Harvard and MIT — the two co-hosts — do with their edX data, how they handle data governance and what tools they use. But Seaton was also more pragmatic. "It was not a research conference," he said. "I wanted [attendees] to get from raw data to working dashboards."
Seaton provided a testbed to facilitate the work for the day, and by the end of the event, he said, "all attendees essentially stood up their own version of the pipeline." By the following week, he started hearing from people who had successfully begun filling that pipeline up with their own institutional data and pushing it to the cloud.
Seaton is a prime candidate to understand the needs of edX participants. In a previous educational technology role at Davidson College, he worked on the creation of assessment tools, such as a vector drawing program that would allow a MOOC student to draw on the screen and have the edX platform automatically grade his or her work. Before that Seaton was at MIT, working with Isaac "Ike" Chuang, a professor and senior associate dean of digital learning, on development of edX, and later in MIT's Office of the Provost working as a data analyst in institutional research.
In other words, Seaton was one of the first people to work with edX data. And he still sees people "struggling with things" he struggled with in those early days. Now, however, schools are realizing that they can "collaboratively solve the same problems and build up that institutional knowledge across institutions," he said.
The MOOC Data Tools Harvard and MIT Use
At the edX partner gathering, attendees learned about three tools: Google BigQuery, edx2bigquery and edx-analytics-dashboard.
The first is Google's analytics data warehouse cloud service. Why recommend this over any other cloud option? Price and performance. "It's not expensive at all," insisted Seaton. "It's really dollars per month." As he explained, "If you compare it to other cloud providers, usually you pay for resources by the machine. But with BigQuery, they actually charge you by the terabyte of data that you store in their system, and they charge you per query. There's no machine. It's just your data living somewhere." As an example, he said, the bill for the liberal arts collaborative was often "well under a dollar. When things were really active, it was maybe $3 a month." And then there's the query processing speed: "If you were on a single machine somewhere on campus, a query might take you a day to run," he said. On BigQuery, however, queries "run in like maximum of 30 seconds. It's really unbelievable."
Edx2bigquery is an open source tool for converting and loading data from the edX platform into BigQuery. That contains "all the various scripts that clean and parse the edX data in its raw format," said Seaton. From there, it "massages" the data into a form that's "more aggregable" for uploading into the cloud. Once that's done, the transformed data is pushed to Google's infrastructure.
Edx-analytics-dashboard, also open source, delivers views into the edX data once it's loaded into BigQuery. For example, the dashboard shares daily activity, number of enrollees, geolocation and interaction with course content. This serves as an alternative to the edX standby edX Insights, said Seaton, with the advantage that it can handle custom reporting that grabs and compares data from up to 10 MOOC courses simultaneously.
Working Toward the MOOC Payoff
Seaton hopes to repeat the workshop in the spring, but with a twist. Now that some of the edX partners have gotten their hands on the lassos they need to wrangle their data, it's time to show them how to help their researchers get access to the right data sets — and help their faculty members use the dashboards to fine-tune their courses, such as by looking at problems that are too difficult or too easy, monitoring and tweaking how many people are posting to the forums and so on.
After all, closing that loop between data and practice is the payoff for many edX partners, including its founders. "Part of the investment that Harvard really wants to see from releasing this content isn't just that it helps learners across the world, but that this is material that can help our residential learners," said Tingley. "It's extremely high-production-value content. Making that available in a variety of ways as follow-on material is just one of the things we're thinking a lot about now. Learning doesn't stop when the class ends. And that is something that these types of systems are well equipped to be able to offer."
Working together will become even more essential for edX partners. "We're going to be quickly, in the next couple of years, moving to a world where we all speak the same language — such that we are able to do some more of that collaborative type of work, even with different types of data sets, questions and procedures for extracting and making usable that data," predicted Tingley. "It is already set up and the foundations are in place for it."