Iowa Mines Data on Campus IT Usage
The University of Iowa is using data mining techniques to
solve a problem as old as the information technology industry itself: Whether,
where and when to add IT resources to the campus technology
infrastructure?
Using data mining tools to extract information from both
automated and non-automated databases, IT administrators at Iowa have been able
to record campus hardware and software usage. This information can expedite and
validate the IT decision-making process and guide decisions regarding the
location of computing facilities, hours of operation, staffing, and operational
costs, as well as the purchasing and distribution of software and hardware.
Data mining is a broad term covering techniques for
analyzing and modeling large datasets. Specifically, data mining uses clustering
and decision-tree algorithms defined through a series of questions to identify
relationships among different variables in the data. It takes some practice and
experience to fully leverage the power of data mining, but when budgets are
tight and the cost of campus technology is under scrutiny, making wise decisions
based on data mining techniques pays off.
At
Iowa, IT administrators used an eight-step approach to data mining that resulted
in the accumulation of useful data, the creation of a set of established data
mining procedures, and the development of a model for determining costs and
benefits. In the process, IT administrators and staff became more knowledgeable
about using machine and related human measures for decision-making purposes.
The IT group examined automated and non-automated
data from 27 of the university's Instructional Technology Centers (ITCs). These
facilities provide students access to a broad spectrum of computer services and
software resources—Web mail and other desktop Web services, including class
registration and student records, library search engines, course Web pages, and
wireless delivery—across different system platforms. The ITCs are located in
academic buildings, residence halls, and service buildings on the main campus.
The first step was to identify what informational
queries about IT operations needed addressing. Stakeholders and decision-makers
met with a data analyst to discuss what kinds of information were needed. At
Iowa, each student, faculty, and staff user has a special log-in authentication
code that is recorded for every transaction. Given the endless possibilities for
gathering information on specific users, it was important to discuss privacy and
first amendment issues up front.
Secondly, the team
met to determine what knowledge the database could yield. The database
administrator and data analyst together explored the structure of the data,
specifically the connections among fields and records from different data
sources. This step, commonly referred to as the "pre-digestion" stage,
established what knowledge the database could or could not yield. Authenticity
is an important issue at this stage—that is, the information gathered must be
both valid (measuring what it purports to measure) and reliable.
As a third step, the team developed
standardized data collection procedures. Issues that arose at this stage were
establishing system flow, tracking data across different systems platforms, and
putting into the database any necessary new fields that would allow the system
to pull disparate data into one basket.
Next, the team developed data mining
and knowledge discovery techniques, selecting appropriate data mining
software/tools for extracting useful information from the database. They then
began exploring and interpreting data. With the aid of statistical and
analytical tools, they discovered relationships among original or derived fields
in data records and produced tables, graphs, and charts to show these
relationships. A variety of data mining tools—including SAS Enterprise Miner,
SAS/Insight, and SPSS Clementine—are available to create these models or
pseudo-models.
The final steps grew out of the
accumulated data. At this stage, the IT administrators worked to validate their
findings, applying information gathered separately to validate the results
obtained through data mining. For instance, although there might seem to be a
causal relationship between two variables, one might know of a third variable
that is not part of the data warehouse that casts doubt on the assumed
relationship. This step corrects for any erroneous assumptions drawn from the
data at hand.
Following the validation stage, the data analyst performed
additional analyses to obtain the maximum amount of information from the data
warehouse. In these reiterations, the analyst might make additional
transformations, explorations, and interpretations. Finally, the group
recommended actions based on their findings, summing up the results of the data
mining project, as well as making the process an important part of the
institution's strategic plan.
When applied to data
on ITC operations at the University of Iowa, this approach yielded useful
information about software and hardware usage, which may have implications for
software licensing, help desk support, and resource planning. A set of
established procedures for collecting, analyzing, and reporting data about ITC
operations was also established, as was the development of a model for
determining the cost of operation and rate of return for individual sites and
units. Finally, the project produced IT staff members and administrators who are
more knowledgeable about using machine and related human measures for
decision-making purposes.
For more information, contact William Knabe, University of Iowa, at
[email protected].