Thinking Big: Tools, Resources, and Strategies to Bring Big Data to the Classroom

A Q&A with Mark Frydenberg

Mark Frydenberg

The tools we use every day are quietly producing enormous amounts of data. And very common software is allowing easier access and analysis of that data. Among the tools we now need to consider, in terms of leveraging big data, is social software. Social media can be found all around campus, often incorporated in instruction--in fact in many cases it's an essential classroom tool. And now it's also a tool in the big data arsenal. And we know that collaboration tools and better communications in general are opening up wider access to shared academic resources--and now relevant data--and impacting the scope and nature of conversations in the disciplines

Here, Bentley University's Senior Lecturer in Computer Information Systems explores how big data is increasingly flourishing all around us, along with new tools to manage and analyze it. He considers what that means for instruction, for the academic disciplines, and for IT in higher education.

Mary Grush: What's the emerging role for data--for strategies with the potential to leverage 'big data' from this new data-rich environment and turn it toward building a better academic experience?

Mark Frydenberg: When Tim O'Reilly and Dale Dougherty first described the Web 2.0 phenomenon back in 2005, one of the characteristics they cited for business models for the next generation of software was "Data as the next Intel Inside." As companies and their applications moved to the Web, data gave them power. Today, data is at the center of a company's value: Tweets, searches, and Likes give Twitter, Google, and Facebook information that determines the advertising you see when you use their services and provides revenue from advertisers that keeps these sites available at no cost to users.

The Library of Congress is archiving every public tweet sent out on Twitter since it started in 2006. That database will be huge--170 billion tweets and counting. Such information is of value to students of history, sociology, culture, and technology, investigating different slices of the human experience. Computer scientists see an interesting challenge in how, in nearly real time, they can store, search, and retrieve information from such a vast collection of unstructured data that keeps on growing.

O’Reilly predicted, "The rise of proprietary databases [would] result in a Free Data movement" within a decade, and today data from governments, libraries, researchers, and online databases are available free online. This changes how students seek primary source materials across the disciplines. Their challenge now is to have the skills to be able to analyze it. In an IT class I was teaching last year, I showed data sets from data.worldbank.org. The site offers information on topics from science and technology to poverty and climate change. The class was learning to make graphs, charts, and visualizations using Excel. One student selected to analyze a dataset summarizing the prevalence of AIDS over the past decade, because he was studying this topic in his sociology class. Completing this exercise in IT gave him the skills to analyze real data that was of value in other courses. By 2009, the U.S. Government launched data.gov, a Web site for sharing, visualizing, and presenting data with the intent of making government transparent. Today there are hundreds of open government Web sites from cities, states, and countries across the world.

Big Data has opened new business models worthy of study, new problems for solving, and access to information that before was not even possible.

Grush: Can you give me a couple concrete, real-world examples of what you mean by tools for analyzing and teaching with big data? And might this address the concept of a more hands-on, authentic experience for the learner?

Frydenberg: Google has a tool called BigQuery which allows users to "interactively analyze massive datasets--up to billions of rows." Google gives a few examples datasets to explore free: You can now query all of Wikipedia, Shakespeare, and weather stations from within your browser. Companies with their own large data sets can upload those using pay-as-you-go pricing based on the amount of storage and number of queries their application receives in a day.

For advanced students and developers, they provide a series of APIs (Application Programming Interfaces) which enable experienced developers to create their own applications that make use of this data.

Students are familiar with Wikipedia, but asking them how they might determine the number of articles it contains with 'technology' in the title, for example, is a problem that until now, would be impossible to solve.

Data visualizations are becoming increasingly popular as big data and its sources continue to grow. Think about all the Web sites that plot information on maps these days: real estate listings, store finders, earthquakes, or photos uploaded to Instagram.

Tools such as Tableau, visual.ly, and explore.data.gov are finding their way into the curricula of many universities, as students learn to create data visualizations and tell the stories they have discovered because of doing so. This process engages the learner and provides new ways to relate to big data.

Grush: You sometimes mention the fire hose analogy. What are the problems or pitfalls associated with tapping the huge amounts of academic data that are gushing into some disciplines? Is there anything good about 'drinking from the fire hose'?

Frydenberg: 'Drinking from the fire hose' is a common expression that refers to the rushing stream of data that comes from a variety of sources, such as sensors that capture readings and measurements to status updates on social networks, almost in real time. The information arrives so fast that we can barely keep up with it all.

Universities are using analytical tools to target prospective students. Information related to graduation rates, dates of application, and financial aid awarded in the past help universities determine which students are more likely to elect to attend if accepted. This past information provides new insights into possible future outcomes.

As universities grapple with their futures in an age characterized by the rise of MOOCs (Massive Online Open Courses), the data that online courses provide allows for analysis that might not have been available otherwise. In MOOCs, students interact entirely online, leaving behind a record of every page they visited. The same is true with browsers that record a history of each transaction we perform on the Web. Analyzing this trail of data breadcrumbs in educational software tools or learning management systems often can show teachers what students actually read, exercises on which students got stuck, or pages where they spent unusually large or small amounts of time.

Grush: You teach students in IT. Is there a particular value in introducing them to these problems, especially since big data is growing and all around them?

Frydenberg: One of the lessons learned from big data is that it can lead to big problems. All of a sudden, we need to think about storage space, processing power, Internet connectivity, security, and ways to access or update it. These topics are beginning to make their way into the curricula for information technology students at all levels, for they need to be prepared to consider these questions in order to recommend solutions to their future employers.

Some of the tools for querying and visualizing big data that I described earlier can be introduced to undergraduates in order to raise awareness about the topic. Students who learn about relational databases need to understand where that technology falls short. So often, textbooks provide sample datasets, which are nicely structured, well formed, and conducive to a particular problem. In the real world, that is not always the case.

Grush: Here's the Big IT Question about Big Data: What does all this (incorporating big data into instruction) mean for IT and for the CIO? What are some of the essential planning aspects of preparing the campus for (I remember the words Cliff Lynch and others have used in terms of research data) 'a data deluge' in the academic disciplines?

Frydenberg: From presence on social media sites to in-house application data files, the amount of data that companies and campuses generate is staggering. CIOs will need to have in-house expertise to find sources of and make sense of this data and find ways to leverage the information that it may provide.

Understanding big data will help see what customers, potential students, and others are saying. CIO Magazine last year reported that within the next three to five years there would be a gap between companies that "get" big data, and those that don't.

The data deluge, or flood of information, reminds us why nearly real-time processing is so important. The metric "Time to Answer" has become crucial for companies to improve, in order to be able to keep up with incoming data, or with the requests for analyzing its content.

Grush: It seems these days that "cloud" strategies come up to calm the unsettled questions in really challenging technology areas. What do you mean when you talk about "Data as a Service" in this sense of providing big data tools and services for instruction? Is there a there, there? Can we really look to the cloud to address our concerns about managing "Big" instructional data?

Frydenberg: Data as a Service refers to the delivery of data to applications through a Web browser. Any service with which users interact through a Web browser, from running applications (Software as a Service) to online backups (Infrastructure as a Service) is an aspect of cloud computing. Data is no different.

Learning Management Systems will evolve to have access to all of a student's grades from his or her entire educational history, and display current performance as a "dashboard" of information. Assessment and assurance of learning are important metrics; access to big data and visualization tools will assist in making these concepts more intuitive.

This is already beginning to happen. I recently read an article where a school system is using data from standardized test scores to point out specific issues from students' learning histories. Looking for patterns in performance over several years, they were able to determine those areas in which a student succeeds or is in need of academic assistance. This provides a more personalized course of study based on a student's own history.

The cloud makes access to all of this data seamless, though because data is stored in the cloud, security becomes a major concern. In a keynote presentation at the recent Social Media Week conference in New York, Social media expert and Senior Researcher at Microsoft Research danah boyd described four challenges of living in a world inspired by big data (box, below):

Living with Big Data

Data is persistent. What you say online stays online. What you wrote online as a teenager is still there. While persistence allows for the asynchronous interactions on the Web upon which we have come to rely, at the same time, it makes our online persona much more difficult to manage.

Data is replicable. While we live in a world that relies on copy and paste to spread information on the Internet, that makes it difficult to tell the original from a copy. This creates new kinds of conflicts as people modify what is really happening, so you really don’t know how things evolved.

Data is searchable. People are now searchable by others who hold power over them. This creates discomforts of not always knowing who is looking at you? When? Why?

Data is scalable. Once you put it out there, there is a potential that millions of people can see it. What information that we share online spreads the fastest?

Source: danah boyd, Senior Researcher, Microsoft Research (paraphrased) http://new.livestream.com/smwnyc/events/1868227

These issues are a reminder of the importance of introducing big data both the boardroom and the classroom.

[Editor's note: Mark Frydenberg is a Senior Lecturer in Computer Information Systems at Bentley University in Waltham, MA and the Director of its Learning and Technology Sandbox. He is also a contributing author of Discovering Computers 2014 and author of Web 2.0 Concepts and Applications, both published by Cengage Learning. A classroom educator and innovator, Frydenberg's research focuses on tools for collaborative learning. He has spoken at academic and professional conferences, and faculty development events throughout the U.S. and Europe. Frydenberg is also a member of the Campus Technology Conference Advisory Board.]

Featured