Thinking Big: Tools, Resources, and Strategies to Bring Big Data to the Classroom
A Q&A with Mark Frydenberg
Mark Frydenberg |
The tools we use every day are quietly producing enormous amounts
of data. And very common software is allowing easier access and analysis of
that data. Among the tools we now need to consider, in terms of leveraging big
data, is social software. Social media can be found all around campus, often
incorporated in instruction--in fact in many cases it's an essential classroom
tool. And now it's also a tool in the big data arsenal. And we know that
collaboration tools and better communications in general are opening up wider
access to shared academic resources--and now relevant data--and impacting the
scope and nature of conversations in the disciplines
Here, Bentley University's Senior Lecturer in Computer Information
Systems explores how big data is increasingly flourishing all around us, along with new tools to manage and analyze it. He considers
what that means for instruction, for the academic disciplines, and for IT in
higher education.
Mary Grush: What's the emerging role for data--for
strategies with the potential to leverage 'big data' from this new data-rich
environment and turn it toward building a better academic experience?
Mark Frydenberg: When Tim O'Reilly and Dale
Dougherty first described the Web 2.0 phenomenon back in 2005, one of the
characteristics they cited for business models for the next generation of
software was "Data as the next Intel Inside." As companies and their
applications moved to the Web, data gave them power. Today, data is at the
center of a company's value: Tweets, searches, and Likes give
Twitter, Google, and Facebook information that determines the advertising you
see when you use their services and provides revenue from advertisers that
keeps these sites available at no cost to users.
The Library of Congress is archiving every public tweet sent out
on Twitter since it started in 2006. That database will be huge--170 billion
tweets and counting. Such information is of value to students of history,
sociology, culture, and technology, investigating different slices of the human
experience. Computer scientists see an interesting challenge in how, in nearly
real time, they can store, search, and retrieve information from such a vast collection
of unstructured data that keeps on growing.
O’Reilly predicted, "The rise of proprietary databases [would]
result in a Free Data movement" within a decade, and today data from
governments, libraries, researchers, and online databases are available free
online. This changes how students seek primary source materials across the
disciplines. Their challenge now is to have the skills to be able to analyze
it. In an IT class I was teaching last year, I showed data sets from data.worldbank.org. The site offers
information on topics from science and technology to poverty and climate
change. The class was learning to make graphs, charts, and visualizations using
Excel. One student selected to analyze a dataset summarizing the prevalence of
AIDS over the past decade, because he was studying this topic in his sociology
class. Completing this exercise in IT gave him the skills to analyze real data
that was of value in other courses. By 2009, the U.S. Government launched data.gov, a Web site for sharing, visualizing, and
presenting data with the intent of making government transparent. Today there
are hundreds of open government Web sites from cities, states, and countries
across the world.
Big Data has opened new business models worthy of study, new
problems for solving, and access to information that before was not even
possible.
Grush: Can you give me a couple concrete, real-world examples of what
you mean by tools for analyzing and teaching with big data? And might this
address the concept of a more hands-on, authentic experience for the learner?
Frydenberg: Google has a tool called BigQuery which allows users to "interactively analyze massive datasets--up to billions of rows." Google gives
a few examples datasets to explore free: You can now query all of Wikipedia,
Shakespeare, and weather stations from within your browser. Companies with
their own large data sets can upload those using pay-as-you-go pricing based on
the amount of storage and number of queries their application receives in a
day.
For advanced students and developers, they provide a series of
APIs (Application Programming Interfaces) which enable experienced developers
to create their own applications that make use of this data.
Students are familiar with Wikipedia, but asking them how they
might determine the number of articles it contains with 'technology' in the
title, for example, is a problem that until now, would be impossible to
solve.
Data visualizations are becoming increasingly popular as big data
and its sources continue to grow. Think about all the Web sites that plot
information on maps these days: real estate listings, store finders,
earthquakes, or photos uploaded to Instagram.
Tools such as Tableau,
visual.ly, and explore.data.gov are finding their way into
the curricula of many universities, as students learn to create data
visualizations and tell the stories they have discovered because of doing so.
This process engages the learner and provides new ways to relate to big data.
Grush: You sometimes mention the fire hose analogy. What are the
problems or pitfalls associated with tapping the huge amounts of academic data
that are gushing into some disciplines? Is there anything good about 'drinking
from the fire hose'?
Frydenberg: 'Drinking from the fire hose' is a common
expression that refers to the rushing stream of data that comes from a variety
of sources, such as sensors that capture readings and measurements to status
updates on social networks, almost in real time. The information arrives so
fast that we can barely keep up with it all.
Universities are using analytical tools to target prospective
students. Information related to graduation rates, dates of application, and
financial aid awarded in the past help universities determine which students
are more likely to elect to attend if accepted. This past information provides
new insights into possible future outcomes.
As universities grapple with their futures in an age characterized
by the rise of MOOCs (Massive Online Open Courses), the data that online
courses provide allows for analysis that might not have been available
otherwise. In MOOCs, students interact entirely online, leaving behind a record
of every page they visited. The same is true with browsers that record a
history of each transaction we perform on the Web. Analyzing this trail of data
breadcrumbs in educational software tools or learning management systems often
can show teachers what students actually read, exercises on which students got
stuck, or pages where they spent unusually large or small amounts of time.
Grush: You teach students in IT. Is there a particular value in
introducing them to these problems, especially since big data is growing and
all around them?
Frydenberg: One of the lessons learned from big data is
that it can lead to big problems. All of a sudden, we need to think about
storage space, processing power, Internet connectivity, security, and ways to
access or update it. These topics are beginning to make their way into the
curricula for information technology students at all levels, for they need to
be prepared to consider these questions in order to recommend solutions to
their future employers.
Some of the tools for querying and visualizing big data that I
described earlier can be introduced to undergraduates in order to raise
awareness about the topic. Students who learn about relational databases need
to understand where that technology falls short. So often, textbooks provide
sample datasets, which are nicely structured, well formed, and conducive to a
particular problem. In the real world, that is not always the case.
Grush: Here's the Big IT Question about Big Data: What does all this
(incorporating big data into instruction) mean for IT and for the CIO? What are
some of the essential planning aspects of preparing the campus for (I remember
the words Cliff Lynch and others have used in terms of research data) 'a data
deluge' in the academic disciplines?
Frydenberg: From presence on social media sites to
in-house application data files, the amount of data that companies and campuses
generate is staggering. CIOs will need to have in-house expertise to find
sources of and make sense of this data and find ways to leverage the
information that it may provide.
Understanding big data will help see what customers, potential
students, and others are saying. CIO
Magazine last year reported that within the next three to five years there
would be a gap between companies that "get" big data, and those that don't.
The data deluge, or flood of information, reminds us why nearly
real-time processing is so important. The metric "Time to Answer" has become
crucial for companies to improve, in order to be able to keep up with incoming
data, or with the requests for analyzing its content.
Grush: It seems these days that "cloud" strategies come up to calm the
unsettled questions in really challenging technology areas. What do you mean
when you talk about "Data as a Service" in this sense of providing big data
tools and services for instruction? Is there a there, there? Can we really look
to the cloud to address our concerns about managing "Big" instructional data?
Frydenberg: Data as a Service refers to the delivery of
data to applications through a Web browser. Any service with which users
interact through a Web browser, from running applications (Software as a
Service) to online backups (Infrastructure as a Service) is an aspect of cloud
computing. Data is no different.
Learning Management Systems will evolve to have access to all of a
student's grades from his or her entire educational history, and display
current performance as a "dashboard" of information. Assessment and assurance
of learning are important metrics; access to big data and visualization tools
will assist in making these concepts more intuitive.
This is already beginning to happen. I recently read an article
where a school system is using data from standardized test scores to point out
specific issues from students' learning histories. Looking for patterns in
performance over several years, they were able to determine those areas in
which a student succeeds or is in need of academic assistance. This provides a
more personalized course of study based on a student's own history.
The cloud makes access to all of this data seamless, though
because data is stored in the cloud, security becomes a major concern. In a
keynote presentation at the recent Social Media Week conference in New York, Social
media expert and Senior Researcher at Microsoft Research danah boyd described
four challenges of living in
a world
inspired by big data (box, below):
Living with Big Data
Data is persistent. What you say online stays online.
What you wrote online as a teenager is still there. While persistence allows
for the asynchronous interactions on the Web upon which we have come to rely,
at the same time, it makes our online persona much more difficult to manage.
Data is replicable. While we live in a world that
relies on copy and paste to spread information on the Internet, that makes it
difficult to tell the original from a copy. This creates new kinds of conflicts
as people modify what is really happening, so you really don’t know how things
evolved.
Data is searchable. People are now searchable by
others who hold power over them. This creates discomforts of not always knowing
who is looking at you? When? Why?
Data is scalable. Once you put it out there, there
is a potential that millions of people can see it. What information that we
share online spreads the fastest?
Source: danah boyd, Senior Researcher, Microsoft Research
(paraphrased) http://new.livestream.com/smwnyc/events/1868227
|
These issues are a
reminder of the importance of introducing big data both the
boardroom and the classroom.
[Editor's note: Mark Frydenberg
is a Senior Lecturer in Computer Information Systems at Bentley University in
Waltham, MA and the Director of its Learning and Technology Sandbox. He is also
a contributing author of Discovering Computers 2014 and author of Web 2.0 Concepts and
Applications, both
published by Cengage Learning. A classroom educator and innovator, Frydenberg's
research focuses on tools for collaborative learning. He has spoken at academic
and professional conferences, and faculty development events throughout the
U.S. and Europe. Frydenberg is also a member of the Campus Technology
Conference Advisory Board.]