Data Security

Who Knows What Evil Lurks in the Cyber Heart?

The Hackers Know. (Apologies to The Shadow.)

Universities don't become headlines because a $2,000 computer is hacked or lost.  They become headlines when sensitive information such as Social Security numbers, credit card numbers, and medical information is stolen.  In terms of liability, the asset is not equipment; it is information.

There they were, big as life--a list of Social Security numbers buried in a long forgotten folder on my computer.  Back in the early 1990s I'd take a break from bits and bytes to teach an occasional physics course and would do what every university professor did back than---store grades by name and Social Security number.  As a computer jock, I stored them on my computer and made regular archival backups.  (It was also common practice to post a list of grades indexed by Social Security number on your office door.  Names were carefully omitted, however, to prevent student embarrassment.)  

Identity theft wasn't an unknown concept back then.  (The underground book "How to Create a New Identity" by "Anonymous" had been around for more than a decade.)  It just wasn't a big deal.  But times have changed.  Now it is a very big deal.  And one of the keys to identity theft is the Social Security number, or SSN.  Protecting personal information is now a legal requirement; losing it has been a source of embarrassment for many colleges, universities, and businesses.

But how can you protect sensitive data, if you don't know where it is?  Recently a large research university addressed this question by hiring one of the Big-4 accounting firms to do a risk assessment on departmentally managed computers. The computers were all scanned and color-coded red/yellow/green based on the amount of risk they presented the organization. To see whether the university could internally do the same job the Big-4 firm had been hired to do, a small group of computing center staff developed a program based on a computational biology algorithm used to search huge genome databases for specific patterns. To their horror they found that 50 percent of the computers that had been rated "green" (lowest risk) by the Big-4 firm actually had large amounts of sensitive information (including massive spreadsheets and/or fully relational databases with SSNs, contract info, etc.) in extremely insecure configurations. The homegrown scanning program also found "green" machines that had been compromised and contained Trojan horse "backdoors" or were infected with viruses.  The initial reaction was to pursue a course of litigation against the Big-4 firm, but, after assessing the situation, they found that 77 percent of the compromised computers were running up to date anti-virus software and that the firm had followed industry best practices!

Now the story gets even more interesting. Existing tools that scan for sensitive data generally require system administrator access and as are awkward to use for scanning faculty, departmental, and staff computers outside central IT's control.  The university, however, refined its home-grown tool so it could be run on distributed PCs without user intervention--and found copious lists of SSNs, credit card numbers, and other sensitive data that users were unaware were on their computers.  Since those early days, similar experiences at other organizations show an estimated 70 percent of the sensitive information data resides outside the Data Center!

So what should an IT security administrator do?  One of the problems is that the commercially available tools are designed for structured and centralized corporate environments and are awkward to use in the highly decentralized environment that characterizes higher education and a growing number of today's businesses.  While a number of organizations made use of the university's home-grown tool, it still required substantial support from the author of the software to install, run, and interpret the results.  

That was the genesis of small spin-off company, Proventsure.  They have taken the concept of using computational biology techniques to create a commercial product, Asarium, focused on managing risk in a distributed environment.  (Their methodology is described in a white paper available here.)  Asarium does two things.  First it locates confidential data both inside and outside the data center.  It determines what, where, and how much sensitive data is on an individual computer.  Unfortunately, most operational personnel have found that by itself, that information isn't particularly useful because of the time it takes to analyze and then scrub sensitive data from thousands of individual machines.  (It took me a couple of hours to scrub the three computers and associated backup hard drives that I regularly use.  But I must confess I didn't correct all of the backup CDs and DVDs that I have generated over the years.  Imagine doing that for thousands of distributed computers on a college or university campus.)

The second thing that Asarium does is to look at hardware and software characteristics of individual computers and calculates the risk of compromise and the probability of sensitive data being lost.  This information is combined with the type and quantity of sensitive information on the computer to compute a numerical "risk" score for each computer.   Remember the 80/20 rule and focus on the 20 percent of the effort that generates 80 percent of the results.

For example, a computer containing a lot of sensitive information that does not have anti-virus software, is unpatched, and has programs running on it that are structurally similar to backdoors might be ranked "0."  Think of it as "tomorrow's headline."  Deal with it today.  Machines that appear to contain no sensitive data and are well protected might rank as high as 100.  Worry about them later.

Rankings for individual machines can be aggregated for a "departmental" risk rank.  Departments can be aggregated for a "business unit" risk rank.  Again, this allows IT security staff to concentrate on those areas that present the most risk to the organization.  

What I like about this approach is that it treats the information and not the equipment as the asset.  You don't make headlines by losing a $2,000 laptop; you make headlines by losing a few thousand SSNs.  It also works well in higher education's distributed environment.
comments powered by Disqus