Iowa Mines Data on Campus IT Usage

The University of Iowa is using data mining techniques to solve a problem as old as the information technology industry itself: Whether, where and when to add IT resources to the campus technology infrastructure?
Using data mining tools to extract information from both automated and non-automated databases, IT administrators at Iowa have been able to record campus hardware and software usage. This information can expedite and validate the IT decision-making process and guide decisions regarding the location of computing facilities, hours of operation, staffing, and operational costs, as well as the purchasing and distribution of software and hardware.
Data mining is a broad term covering techniques for analyzing and modeling large datasets. Specifically, data mining uses clustering and decision-tree algorithms defined through a series of questions to identify relationships among different variables in the data. It takes some practice and experience to fully leverage the power of data mining, but when budgets are tight and the cost of campus technology is under scrutiny, making wise decisions based on data mining techniques pays off.
At Iowa, IT administrators used an eight-step approach to data mining that resulted in the accumulation of useful data, the creation of a set of established data mining procedures, and the development of a model for determining costs and benefits. In the process, IT administrators and staff became more knowledgeable about using machine and related human measures for decision-making purposes.
The IT group examined automated and non-automated data from 27 of the university's Instructional Technology Centers (ITCs). These facilities provide students access to a broad spectrum of computer services and software resources—Web mail and other desktop Web services, including class registration and student records, library search engines, course Web pages, and wireless delivery—across different system platforms. The ITCs are located in academic buildings, residence halls, and service buildings on the main campus.
The first step was to identify what informational queries about IT operations needed addressing. Stakeholders and decision-makers met with a data analyst to discuss what kinds of information were needed. At Iowa, each student, faculty, and staff user has a special log-in authentication code that is recorded for every transaction. Given the endless possibilities for gathering information on specific users, it was important to discuss privacy and first amendment issues up front.
Secondly, the team met to determine what knowledge the database could yield. The database administrator and data analyst together explored the structure of the data, specifically the connections among fields and records from different data sources. This step, commonly referred to as the "pre-digestion" stage, established what knowledge the database could or could not yield. Authenticity is an important issue at this stage—that is, the information gathered must be both valid (measuring what it purports to measure) and reliable.
As a third step, the team developed standardized data collection procedures. Issues that arose at this stage were establishing system flow, tracking data across different systems platforms, and putting into the database any necessary new fields that would allow the system to pull disparate data into one basket.
Next, the team developed data mining and knowledge discovery techniques, selecting appropriate data mining software/tools for extracting useful information from the database. They then began exploring and interpreting data. With the aid of statistical and analytical tools, they discovered relationships among original or derived fields in data records and produced tables, graphs, and charts to show these relationships. A variety of data mining tools—including SAS Enterprise Miner, SAS/Insight, and SPSS Clementine—are available to create these models or pseudo-models.
The final steps grew out of the accumulated data. At this stage, the IT administrators worked to validate their findings, applying information gathered separately to validate the results obtained through data mining. For instance, although there might seem to be a causal relationship between two variables, one might know of a third variable that is not part of the data warehouse that casts doubt on the assumed relationship. This step corrects for any erroneous assumptions drawn from the data at hand.
Following the validation stage, the data analyst performed additional analyses to obtain the maximum amount of information from the data warehouse. In these reiterations, the analyst might make additional transformations, explorations, and interpretations. Finally, the group recommended actions based on their findings, summing up the results of the data mining project, as well as making the process an important part of the institution's strategic plan.
When applied to data on ITC operations at the University of Iowa, this approach yielded useful information about software and hardware usage, which may have implications for software licensing, help desk support, and resource planning. A set of established procedures for collecting, analyzing, and reporting data about ITC operations was also established, as was the development of a model for determining the cost of operation and rate of return for individual sites and units. Finally, the project produced IT staff members and administrators who are more knowledgeable about using machine and related human measures for decision-making purposes.

For more information, contact William Knabe, University of Iowa, at [email protected].

Featured