The Quest for Data that Really Impacts Student Success
As colleges and universities accumulate more and more information from campus systems, researchers are looking for ways to turn big data into learning insights.
- By Dian Schaffhauser
IT systems on today's campuses are generating great piles of data: grades, student records, financial aid, admissions, alumni information and more. Yet higher ed has been behind the eight ball compared to other sectors in applying big data and analytics to strategic objectives — Educause said as much in its "Top Ten IT Issues for 2013."
There are pockets of pioneering work going on in this area, but, as Marist College's (NY) Josh Baron pointed out, "Most institutions haven't implemented anything, or at least not to scale, and are just sinking their teeth into evaluation."
As co-director of Marist's Center for Teaching Excellence and lead on the Open Academic Analytics Initiative (OAAI), an open source learning analytics project, Baron sees the potential that learning analytics holds for addressing some of the major challenges that exist in higher ed — improving student success, increasing retention and guiding students through their degree programs with greater efficiency. But even as "big data" becomes part of the lexicon for education leaders, identifying the individual data points that really matter for a particular institution can be like looking for needles in the proverbial haystack.
Here, four higher ed leaders who are knee-deep in the data haystack share their insights.
Lecture Capture Usage
At the University of Saskatchewan, Christopher Brooks, a human-computer interaction researcher, was involved in the development of an open source lecture capture system that eventually became Opencast Matterhorn. The Matterhorn team recognized early on that even though faculty often were reluctant to jump into lecture capture, their tepid response "flew in the face" of student enthusiasm. So they set out to examine whether there was learning gain and pedagogical value in the use of the technology. What Brooks (who has since moved on to the University of Michigan as a research fellow at the School of Information) and his fellow researchers uncovered was a nuanced view of how lecture capture can improve student outcomes.
The researchers categorized students into groupings based on lecture capture "access patterns" and evaluated how lecture capture played into academic performance. As a start, they zeroed in on second-year STEM classes, kicking off with data analysis pulled from an introduction to organic chemistry course. From that data, Brooks and his collaborators identified five patterns of usage, summarized in a recent publication in Computers & Education:
- People who didn't use the system;
- Those who used it every week;
- Those who used it just during the week leading up to the midterm;
- Those who used it in the first half of the course; and
- Those who used it in the second half.
Only one of those groups had higher grades: the people who used lecture capture every week. The grade difference on examinations was "quite higher," said Brooks — roughly 10 percent — even though incoming grade records of the various groups of learners weren't "statistically significantly different."
Brooks and his colleagues validated that finding by checking it against the same course given the following year, as well as a second-year biomolecules course. His conclusion: "The more we can get students to regularly integrate lecture capture into their study habits, the better off they appear to be — at least in second-year STEM classes!"
Still, success isn't guaranteed. The model was applied to a first-year social sciences class, and it "didn't hold up very well at all," admitted Brooks. The research team hasn't been able to dig into the data to figure out whether it's the course that was different, the students, the domain or the instructor. "There are so many different variables," he noted.
3 Learning Analytics Tips Worth Knowing
Marist College (NY) Senior Academic Technology Officer Josh Baron offered this advice to institutions regarding their work with learning analytics:
Collaborate with other institutions on your learning analytics efforts, whether that's the open source Open Academic Analytics Initiative (OAAI); the nonprofit Gates-funded Predictive Analytics Reporting Framework; or the commercial Education Advisory Board's Student Success Collaborative. Baron suggested looking at the various models to understand what the data points are; how well the model might work at your institution; and how easily you can tune or customize the model based on your school's agenda and student population.
Don't jump into an analytics product willy-nilly.A variety of analytical tools are showing up among the offerings from LMS and library system companies. These may be "short term things to experiment with," he said, but ultimately the data they're examining will need to be pulled into one big repository if you really want to access the power of big data.
Take care with ethics and data privacy considerations.For example, as the initial OAAI pilots were being designed, the researchers made sure that the "human subjects" (the students) were told about the study and that they could discontinue their participation at any time. Participating institutions also generated a unique identifier for each subject. For a model set of principles on the "collection, storage, distribution and analysis of data derived from human engagement with learning resources," check out the "Asilomar Convention for Learning Research in Higher Education" released this summer.
A common place researchers look for signs of student engagement is among the log data generated by learning management systems. As traditional thinking goes, the greater the amount of time students spend interacting with the LMS and the higher the number of page views they generate, the greater their engagement and the better their performance. But in a review of prevailing research, a team at Brigham Young University (UT) led by Charles Graham found that for most projects examining that kind of user activity, the tendency was to take the data and look at it in a summary or aggregate way. The BYU researchers wanted to drill down to find out whether log data could be used more granularly "to learn more about how students are learning at a real-time level," said team member Curtis Henrie, a research assistant and graduate instructor.
The test was done in an upper-level education department course with 20 students in a blended-learning, competency-based environment. The bulk of the data being analyzed consisted of time stamps and URLs being visited by the students — bits of information that piled up quickly, even with such a limited number of subjects to monitor. Every time a student logged into the LMS, the researchers examined what URL was being visited along with the amount of time (in seconds) he or she spent on the page, in order to reconstruct as closely as possible "the story of the session," explained Henrie. Along with that they characterized page visits as three flavors: procedural, such as looking at the calendar or syllabus; social, such as participating in a discussion board or e-mailing the instructor; and content, where the student is looking at an assignment page or turning something in — "where the actual learning is going on," Henrie said.
The team discovered that while time spent and page views in the LMS did not necessarily indicate student performance, the type of page visit — procedural, social or content — was meaningful. For example, one student — "Suzy" — spent less time on procedural and content pages compared to her peers; essentially, she spent less time previewing assignments before they were due. While her early grades were high, she appeared to fall behind and eventually withdrew from the course. "Was Suzy falling behind because she didn't know what assignments were due, because she couldn't use the LMS effectively?" asked Henrie. In the future, he added, he'll be "much more careful to show my students how to use the LMS in the way I've set it up."
Henrie emphasized that many of the insights uncovered in the study are intended to become a topic for further exploration when the research team picks the project up again this fall. In particular, the team is investigating how faculty and student advisers can act on LMS usage data: For example, an instructor could receive alerts on how often students are viewing assignments in the LMS before they're due.
The Right Content at the Right Moment
Zachary Pardos is a self-professed "educational data miner." He's also an assistant professor in the Graduate School of Education and School of Information at the University of California, Berkeley. By studying how students learn, looking at the data generated by the technology systems they use, he has uncovered two points that could impact how tutoring systems are optimized for learning.
First, two "affective states" are surprisingly related to positive learning outcomes: frustration and confusion. "You have to care to be frustrated, and the students who don't care aren't doing so well," Pardos explained. Also, learners who are confused tend to spend "more time on task because [they're] trying to get unconfused"; it's possible that the confused state is "somehow neurologically setting the stage for resolution, priming you to attain new knowledge and to make new neural connections." The confusion, he said, will resolve itself to "engaged concentration and understanding" (and learning) if hints and other tutoring are immediately available.
Second, not all learning content is equal, nor is it equal for every student. The "learning rate" or "learning dynamics," as he calls it, varies depending on student background and other personal factors. If a set of learning resources fail to help a particular student, "From a psychometric perspective, it may not mean that the resources are necessarily bad," Pardos noted. "It could just mean they're not teaching the same construct that you are measuring with the assessments."
As part of his research, Pardos is analyzing the relationship between content and assessments in learning systems. The data generated from those systems can be used to remove whatever content isn't really working, he suggested; their logs can also be squeezed to predict student performance on future problems. Building that kind of accuracy into the systems would optimize students' time.
For example, a tutoring system with a built-in assessment function might offer five templates of questions that are given to students in random order. If you assume every set of questions has an equal probability of improving student knowledge, you'll have some level of accuracy in predicting whether the students get the next question right or wrong. But if you change that assumption to say that "each item can have a different influence on improving knowledge," you'll improve your predictions. Instructional teams could extract the positive benefits of particular items and perhaps generalize that to other items and eliminate items that don't provide benefit at all.
Although the initial research leading to these conclusions came out of data generated from end-of-year assessments in K-12, the findings apply to higher ed too, Pardos said. In fact, he's pursuing the opportunity to study MOOC data at UC Berkeley and match up "student frustration" or confusion with "appropriate remediation" — the tutorial resources inside and outside of MOOCs that can help students get over the hump and improve learning. And this fall, Pardos and his fellow researchers will release code that can be integrated into a learning system to pinpoint which learning interventions proved to be the most effective for making improvements in student outcomes.
Who Moves Learning Analytics Forward?
A new role is emerging within higher education that takes on the work of examining big data on campus. According to Christopher Brooks, a research fellow at the University of Michigan School of Information, the "educational data scientist" is focused on learning analytics and acts as a linchpin to bring together the interests of three groups on campus: central IT; teaching and learning; and institutional educational research.
Without those three groups at the table, data innovation will not happen, insisted Brooks. In places where the IT group tries to lead learning analytics work, there is often "a mismatch between the solution that's purchased and how it's deployed," he pointed out. When teaching and learning groups take over and leave IT out, "there's a lot of [attention paid to] traditional methods and not about leveraging institutional data in a bottom-up manner." If researchers are put in charge, "There tends to be this almost myopia on specific problems, such as understanding lecture capture, and not looking broadly at the technology across the institution," he said. "Then there's a problem with impact."
When those three groups come together to decide what needs to be done, he observed, "it's pretty hard for the provost to say no." And with the provost on board, the money will be found to push data-driven innovation forward.
Interventions Need Personalization
The creation of OAAI, the Marist College-led analytics initiative, was sparked by the number 38. That was the percentage of students who started their degrees at four-year institutions in 2004 and completed them in four years. Looking to improve on that number, OAAI set out to create an open source early alert system that used big data and learning analytics to help predict at-risk students and then deploy interventions.
OAAI's predictive models have been tested out at Marist as well as at two community colleges and two historically black colleges and universities. The researchers chose specific environments where a given instructor was teaching three sections of the same course: one to be used as a control, one to be assigned one type of intervention and the other to be assigned another type of intervention.
The result was good: It worked! Students in the intervention groups received course grades that ran about 4 percent higher on average compared to students in the control sections. (Neither type of intervention turned out to be more beneficial than the other.)
The researchers went on to improve the portability of their predictive models so that the technology could be picked up and used elsewhere (work that earned the initiative a Campus Technology Innovators award last year).
Save the Date
Marist College will host the 2015 Learning Analytics & Knowledge Conference March 16–20, in collaboration with the Society for Learning Analytics Research. The event will include a practitioner track for campus leaders outside of research.
What also came out of the work was a new research area: a study of "intervention immunity," as Baron labels it. At-risk students fall into two fundamental groups, he explained. One group of students would receive an intervention — either a standardized message from the instructor expressing concern, or a similar message that invited them to join a support community — and "they would pretty quickly change their behavior," said Baron. Those students would seek help and they'd disappear from the at-risk group in the next round of analysis.
But the second group of students wouldn't respond to that first intervention, nor were they likely to heed a second or third intervention. "They were almost immune to the intervention we were trying to deploy," Baron observed. The theory he and his colleagues came up with: Students reject interventions that are not adequately personalized. The intervention "has to be matched to the student," said Baron.
For example, if a working single mom receives an intervention message such as, "You need to show up next Tuesday at 2 to meet with a tutor," Baron said, she's "probably going to be put off by that, frustrated, and say, 'These people have no idea what my needs are.'"
Baron noted that learning analytics researchers at the University of Michigan are experimenting with personalization via an open source system designed to help people quit smoking. They are using the system to provide personalized messaging for academic intervention in a physics course, based on extensive pre-surveys. "The idea is, if you really try to customize the message heavily to the individual, they're more likely to respond," he said.