Laying the Groundwork for Big Data
Three higher ed chief data officers discuss the state of analytics at their institutions.
Data analytics may be a top priority for most higher education CIOs, but it's also a realm fraught with challenges and unanswered questions. Summing up his impressions of the 2015 Educause Annual Conference in Indianapolis, Jeff Alderson, principal analyst for Eduventures, noted that CIOs there were "all trying to make sense of the crowded market of analytics vendors and other service providers and their different approaches to aggregating institutional data to glean insights for various stakeholders."
The lag in higher education's adoption of analytics, he added in his Nov. 6 blog post, is due to the "overwhelming demand for data integration between systems and the need for high-quality data and governance that sometimes stall the ROI of any analytics offering."
To help gauge the state of analytics in higher ed, Campus Technology spoke with three university chief data officers about their current priorities, with a heavy emphasis on information governance.
Engaging the Campus on Business Intelligence Issues
Brent Drake, chief data officer at Purdue University (IN), remembers what happened when he held a meeting to discuss how the university should measure instructional load and how that information is represented in the human resources system.
Drake had invited stakeholders to discuss the difficult definitional decisions that must be made in order to do effective reporting and business intelligence. Issues include what constitutes an instructional team and how to deal with split appointments among departments. The meeting had representatives from the academy, the provost's office, the business offices, the HR system, finance and fiscal reporting.
"In that first meeting we had 92 people in the room," Drake said. "It was exhausting! But it was good. They were really engaged."
In the past, the academic department heads were not always in these discussions, and that is why they would sometimes look at business intelligence and say, "that has no relevance to me and how I am trying to operate," Drake said. "We made sure they were in that room. They had an opportunity to raise what they thought were relevant issues — and really shaped the discussion in ways that those of us that are in the central offices never would have even considered if those viewpoints were not in the room."
When Drake was named the first chief data officer at Purdue in 2013, there was already a data governance structure in place, particularly around student data, but it needed to be refreshed with a focus on analytics. "It had stagnated a bit and we weren't pushing forward as much as we would like to," he said. "The old system was focused on policy and security. There was little in terms of analytics and little thought in terms of business intelligence for the betterment of the organization. We have created a new structure I will freely admit that we stole from Notre Dame [IN] in terms of practice for data governance."
Since it first built a data warehouse in 1999, Purdue has had a decentralized business intelligence model to allow users in multiple departments to analyze data for their individual needs. There are directors of data analytics in each of the academic colleges and they report to their deans. They help look at questions in a more nuanced and detailed layer that is specific to that college. Having that functionality embedded in academic and business units allows them to do the analytic work themselves.
"Because we have embraced this disaggregated model, it involves a whole lot more discussion to move forward and try to get them moving in the same direction," Drake said. "But ultimately we believe it is worth it and makes for a stronger overall effort at the university."
In 2015, Drake's office focused on data governance and instructional activity. But looking ahead to 2016, Purdue is launching an analytics platform using an in-memory processing system. "It allows us to combine structured and unstructured data from the enterprise," Drake said. "The student data that comes out of Banner is combined with our disparate systems that never get into Banner, that tell us how students are interacting with campus — such as the card transaction system, LMS and stand-alone student success databases. Because of the value of in-memory processing, we can combine and run them at a fast rate of speed, and do machine learning algorithms to gain insights into what are some of the triggers of student behaviors on campus that will tell us whether students are being successful or not," he explained. "We've launched that platform and are discussing now how to best disseminate the results around campus."
Starting at the Policy Level
At the University of Wisconsin-Madison, Chief Data Officer Jason Fishbain is focusing on governance policies rather than technology solutions to aggregate data or data definitions.
Administrators in the university's Academic Planning and Institutional Research office provide a majority of the analysis for campus administration and for academic departments — as they can fit it into their queue. "The issue is that if you get outside that small pocket, others on campus don't have the resources or tools," Fishbain explained. "Yet other people were creating their own analyses and going to campus leadership with it and the data conflicted with something that institutional research created — and then the question was why don't they match."
A few years ago, faculty members who were trying to access data to improve student success in their classrooms were running into a number of barriers: not understanding the data or how to access data, or being told they couldn't have access. A task force created to address this issue realized that the scope was not just in student data but across all data domains. And that led to the creation of Fishbain's position and an information governance program.
"We started with who should have access to data and why," he said. If an individual does have access, what is his or her responsibility as a user? "We have to have the technical and human infrastructure in place to say, If you get access to restricted data, what does that mean? Where can you store it? Who can you share it with or not share it with? Those policies do not exist today here on campus. We have to get that baseline first, so people can get access; then we'll tackle whether they understand the data they are working with."
Fishbain said having common data definitions across different domains and applications currently is not as high a priority. "How we report student enrollment to the feds is different than how the Department of Engineering might want to look at enrollment, and there are legitimate reasons for those to be different," he said. "The important thing is how you talk about it. You must be able to say, 'Here's why I am not using the same numbers.' You can't do that unless you understand the data. And to get to that data, we need to start at the policy level."
Traditionally the Madison campus has taken the approach that you get data for your classroom and your classroom only, and once your students leave your class, you don't get their data anymore, because you don't have a need to know, Fishbain said. "But that depends on what your goal is. Is your goal just teaching that class or the overall education of the student? That is the conversation we are having," he said. "If we are going to get past that, it is a culture shift. It is not a snap of the fingers."
As far as the challenges related to pulling data from multiple systems and normalizing it, Fishbain sees that as a question of time and resources. "The biggest challenge here is getting people to understand the value of it and funding it, not the technical complexity."
Standardizing Data Definitions
Mike Kelly, chief data officer at the University of South Carolina, said all universities have data reporting structures for mandatory reporting and accreditation. "To me, analytics is more about the one-off questions," he said. "Someone has a bright idea or a theory that if only we knew XYZ we could provide better services or help students graduate closer to on time, or predict which courses they were going to struggle with and provide more academic support."
For analytics, you have to think about what data you have, which data stewards are responsible for it, and whether you can merge data from those different systems in a way that is meaningful, Kelly said. "Can we link a student's record in the student success center to his or her record from our ERP and housing?" he asked. "Do we have the ability to match the records consistently?
Another component is the timeliness of data, Kelly said. "We are not looking for a decision 10 months from now. It is too late at that point. We need to know now, so we can apply these services when they can make a difference, not when the student is done for the academic year and it is too late to make a meaningful intervention."
Kelly said UofSC is working to standardize data definitions in support of its analytics efforts. "I think there are six or seven grade point averages that a student can have in our ERP system including semester, major, cumulative and transfer. When you say you want a student's GPA, which one do you want to know?"
When the registrar reports data to the vice president for student affairs, it would be most helpful if the data were sent in a way that is consistent with the way the institutional research office is going to report it to the South Carolina Commission on Higher Education, he said. "Then there are no surprises, so that when they open up a report that seems to describe the same type of thing, and same academic year, the numbers more or less match."
One of the things Kelly has learned about the chief data officer job is that although it was created as a strategic position to work on governance, he realized he couldn't stay on a strategic level only. "People have only so much appetite for talking about what we ought to be doing or big ideas," he said. "At the end of the day, every unit manager has concrete things they need to accomplish. So the CDO role had to have practical applications in tactical efforts."
Kelly has been the champion to overhaul how UofSC does enrollment reporting — the critical and basic questions are how many enrolled students do you have and who are they. "You would think those would be easy questions to answer, but they are not. A student can be a student one day and drop out the next. At what point in time do you count them? Students might take classes at multiple campuses. Which one do they count for?"
UofSC recognizes the need for a data warehouse, he said, and the university is working on that capability. "Our data warehouse at this point is mostly focused on the data from our ERP system and the demands for data transformation required for mandatory reporting," he said. "We are working to transform the data elements in the ERP to conform to definitions we have to report to the Department of Education, our accrediting agency and the state government. We are beginning with the end in mind, rethinking how we administer and structure data and configuring systems where we have the opportunity."
"My role is not technical," Kelly added. "It is to make sure data from different areas can be exchanged in a meaningful way. It is a position of collaboration and communication. We are making group decisions about where we go, driven by the business units of the university because that is where the data is."