Where Data Mining, Privacy Policies, and Identity Theft Intersect

Leo Irakliotis has been the victim of an institutional security breach. Yet, after two years of monitoring his credit reports, the possibility of the thief using his Social Security number for illegal activity doesn't concern him nearly as much as the general incompetence of members of the workforce who have access to his personal information as a matter of course in their jobs.

Irakliotis is a researcher in data mining and database applications, as well as an expert in technology and public policy. Until recently he was a member of the Department of Computer Science at the University of Chicago, where he served as the director of the professional programs, a founding fellow of the Computation Institute, and associate chairman.

Campus Technology caught up with Irakliotis as he was preparing for a move to Florida to join Nova Southeastern University's Graduate School of Computer and Information Sciences as dean. In this interview, Irakliotis shares his views on why worries about privacy are overblown and why schools need to put far more emphasis on educating those who have contact with the data in ethics and responsibility.

Dian Schaffhauser: A lot of institutions are spending more resources on developing their data mining skills--beyond institutional research. What's driving that?

Leo Irakliotis: What happens in general with universities--much like any other corporate entities, whether profit or non-profit--you begin to realize that your data are an asset to your organization. And, in fact, you can improve your business, your processes, and responses by analyzing the data. You can see an increased motivation to doing better data management, better data warehousing, better reporting in universities. It gives you more transparency when it comes to accountability--whether it's fiduciary or marketing accountability. You need the transparency for better managing the organization. This is something that corporate enterprises have known for years. So it's time for universities to become a little more entrepreneurial in that aspect and embrace data analysis and the transparency that comes through it.

If I, for example, were running the state's schools here in Illinois in Cook County, I would very much like to know what the demographics would look five, 10, 15, 20 years down the road. Students that were born this year may be hitting my door for admissions in 2028, if my addition holds. So I want to know what the demographics in the area will look like when today's toddlers will be applying for admissions. How do I get that? I can look at census data, at county records, birth certificates, wedding certificates. I can come up with a little computation module that tells me that for every 10 weddings that take place this year, next year we can expect a certain number of births.

We can analyze past records and come up with simple modules that help the university plan with a 20- or 25-year horizon. We can look at the Bureau of Labor Statistics and try to understand the employment trends that cause different demographics sectors. For example, Chicago has substantial African-American and Latino populations and a notable eastern European population. What are these people doing for employment in different stages of their education endeavors? Where do they go after high school? After their first professional degree? How does this landscape change over time, and how does it affect the regional economy? All of the data is available out there for the asking. It's public data.

Schaffhauser: Institutions are obtaining all of this information about the students. Should the students be concerned?

Irakliotis: There is the school thought that there is no such thing as privacy. Forget that. I get a kick when I get e-mail campaigns to protest about the lack of privacy online when the campaign e-mail originated from a Google e-mail account. Was the irony lost on whoever was initiating this to use a Gmail account? Should the students be concerned? I think they should always be concerned about how much privacy we are giving up without really noticing it. Look at family records--marriage licenses, birth declarations, and things like that--that information has been published for years. What makes it different is that years ago you had to go to the county office, inspect hundreds and hundreds of records to obtain statistical samples that would allow you to make analytic predictions about what kind of seniors you would be recruiting from down the road. Now, you can do the same thing in the comfort of your Starbucks lounge chair using your laptop. And that changes the game.

Should people be concerned about it? I cannot answer it for everyone. Personally, I would be worried if I had an irresponsible data miner digging and correlating things. But the data is out there anyway. My focus is to make sure that people who manipulate and retrieve and process that data are responsible and observe some basic ethics guidelines--knowing that there will always be bad apples in just about any profession that are going to do something stupid that will embarrass the rest of their community. But if I manage to educate about 90 percent of the next generation of data miners to be ethical and responsible, then I can count on them to self-police their profession and rule out the bad guys.

Schaffhauser: Are there aspects of privacy policies that a school needs to evaluate as part of its data analysis efforts?

Irakliotis: I think so, and most of the schools in the [United States] are doing that already because they are motivated by this little piece of legislation called FERPA--the Family Educational Rights and Privacy Act. You have federal guidelines on how to protect student records, but those are applying to students records for individuals who have matriculated--who have applied to your school and have initiated a transaction with the school. There are no guidelines that tell universities how to deal with public records that are available just for the asking. I think that there may be here an opportunity for leadership by example--if the universities decided in their collective meta organizations to do some self policing, to come up with some guidelines that would make sense. But I don't think there are specific guidelines right now beyond the goodwill and good moral character of individuals in the universities who are handling the data.

Schaffhauser: What do you see as the major challenges in formulating a privacy policy in the context of this institutional focus on data-driven decision-making?

Irakliotis: Someone will have to take the initiative--some school or some a configuration of schools. I think that this may be a policy matter could can be initiated within the confines of Educause. I think Educause has the best person in place to talk about this issue--Greg Jackson, who just joined as vice president for policy and analysis. I'm not sure he is going to touch this aspect, but he has been the chief information officer for the University of Chicago for a very long time, and he's a very capable and very experienced information manager. If there is one person in Educause who has the credibility and authority to sustain a productive discussion on how the universities can get together and come up with some guidelines about privacy and analyzing public records, I think that person is Greg. Maybe Educause will make the first step.

Here in the Midwest we have a committee for information and collaboration, an informal association of 10 major schools, including the University of Chicago, Columbia, Northwestern. This committee can start discussing things, but someone has to take the first step. They will have to come up with a very good rationale to say why this is important because I can see objections from the other side, saying, "Well these are public records anyway. What's your problem? Why do we have to have any privacy guidelines when it comes to public records?"

The fine line is what I was saying earlier. Now we have the ability to process public records very fast and very fast analytic aggregate capacity--and even using a simple spreadsheet that was not available 30 or 40 years ago when those records were being made public because of the physical effort that was required to access those records. Now you just download the data, and you can play with it. But I think it will be a tough sell because the records are public, and it will be a moral and ethical discussion more than a technical and legal one.

Schaffhauser: A lot of the data that is being used in institutions now is information compiled from students by virtue of enrolling in an institution and providing financial data, their areas of interest, just registering for classes....

Irakliotis: Where federal guidelines fail to provide a framework is when a university is starting to come up with a mashup of records such as the ones you describe and public records. What if I take the preferences of this year's students and correlate them against some public data I have access to? Am I toying at the fringes of privacy? That is unknown space right now. That's something about which we don't have any substantial cases to test our practices. As much as I don't like our litigious society, a major lawsuit will give us an opportunity to reflect on guidelines and accept our operational parameters for things like this. In the absence of a legal challenge, I don't think we have much to talk about because we don't know what is going on out there.

Schaffhauser: A month doesn't go by that another college or university has to deal with personal information on students, faculty, and staff being exposed in some way. It's almost becoming standardized in how a school needs to respond--a letter sent to those affected, referrals to credit checking agencies, a "mea culpa" issued by the head of the school or department that had the breach. Are we becoming inured to the problem of identity theft?

Irakliotis: I've been a victim of such an intrusion at my university; I got one of those mea culpa letters.

Schaffhauser: What was the situation?

Irakliotis: I think some kid from Russia or somewhere else managed to acquire about 25,000 Social Security numbers. Things like that happen for two reasons. One reason is easily fixed because it has to do with [incompetence] in record keeping. So you go out there and hire more competent people, and you have to throw in a little bit more money in the salary, but it can be done.

The other thing is when you have the best security team in the world--which was the case in the university that I am talking about without naming it. You have world-class security, a world-class information technology group for the university, and you have an information technology user policy that is very liberal. For example, if you want to use Skype on your computer for your personal communications, you are welcome to. And if there is a vulnerability on Skype that is not known, that is not public and someone finds out about it and uses that vulnerability to break into your system and get these 25,000 Social Security numbers, what do you do in a situation like this?

I think there will always be the risk of having this kind of break-in. But if you eliminate the [incompetence] factors, then you can focus on the exceptional situations.

The other thing is that universities and just about any organization that handles large quantities of privileged data have to learn how to better secure that. There is technology out there that allows for encryption. There are techniques that will make unauthorized access to that data very, very difficult. Not impossible, but extremely difficult. I think that universities will become increasingly more sensitive to potential risks and start to encrypt their data. At the end of the day, it doesn't matter how well you lock up your place. There's always the possibility of someone breaking in through the window.

Are we becoming immune or indifferent to the situation? I think that we are not indifferent to the situation. But because it's not within our line of sight, if I don't know that it's happening to me or to my neighbors or to my friends, I don't know it's happening in the first place. We are not quite there yet. It's not like 25,000 Social Security numbers out there in the open are going to make major headlines in The New York Times tomorrow unless those are the 25,000 Social Security numbers of all Times employees from the past 20 years. I'm not blaming the press, but it will be a sad day when this kind of news makes the front page of the Times.

It doesn't absolve us of our responsibility to do a better job from a technical point of view to secure the data. That's what you're going to see down the road from universities and health care organizations. They will do a better job in encrypting the data. So if someone takes my laptop and walks away with it, it will be extremely difficult to get access to my data--not impossible but extremely difficult.

Schaffhauser: How concerned were you as a victim of a data breach? If someone gets your Social Security number, does it matter?

Irakliotis: It doesn't matter. I am more concerned with people that work for Experian and Equifax and the federal bureaus that have access to my Social Security number than with anyone else. Those are the people that seem the most incompetent every time I check my credit reports. I have this 12-year dispute with a cell phone company that doesn't exist anymore. And that dispute still shows up in my credit reports. I get it cleaned up on one and it appears on the others. We have entrusted those corporations with responsibilities, and they do not live up to it. They are failing on a daily basis in my opinion. The casual lapse of security at the university or at big merchants and the release of Social Security numbers, of credit card numbers, it will happen now and then.

But to go back to your question, how did I feel? I got a little bit worried. I signed up for one of those monthly monetary reports to make sure that nothing financial happens. It's been two years since then, and the only thing I see on my credit report is the 12-year dispute doing the round robin between the major credit bureaus. So I don't worry much at this point. I have documents on file that if something finally happens, I can argue that this is a result of an identity theft based on this breach that happened two years ago.

Schaffhauser: At your new university, Nova Southeastern, is security going to be something that you focus on?

Irakliotis: From my perspective in delivering that as part of our education programs, absolutely, yes. I'm not going to be involved with the operational aspects of managing the university data. I will be running a major school of information technology. One of the things that I would like us to focus on as we plot the growth of the school for the next, say, five or six years is what kind of education we need to offer to students to be competent in their workplace where data management and data retrieval is a core and mission critical function. Either by developing classes, or seminars or graduate degree programs, we'll talk about increasing information technology security through encryption and all this new technology available to us right now. We'll also talk about the responsibility of hiring competent technologists, of knowing how to hire the best people.

We have tons of first-rate information technology people graduating from programs across the country. They can take a computer and program it inside out to do whatever you want. And when you have those people sitting across the table at an interview, the hiring managers have absolutely no idea how to conduct a meaningful interview. You can ask technical questions to find their depth of the knowledge, but do you have the interviewing skills to understand how a person will handle a crisis or an unknown situation? Nobody is teaching that in an IT program. They do that in an organizational psychology program, and maybe they do that in an MBA program, if you happen to take an elective in organizational psychology.

This is how I am going to tackle that problem, not just saying, "Here's the new technology and this is how to apply it," but also, "Here are the skills that you need to be a competent manager and to ensure that maybe eight out of 10 members of your people are competent people. And by the way, here are the skills that you need to know in order to turn the other two people into competent people or help them to move on to something more engaging to them."

Schaffhauser: Do you believe we are going to be dealing with the same issues around data security and privacy in five or six years?

Irakliotis: I hope not. I hope that we won't be talking about privacy and security with the same excitement we're talking about it right now. We're going through a transition right now because we're entering the new realm of massive data sets. I think that as we become more comfortable living in that realm, privacy will remain an issue but not to the extent that it concerns us all on a daily basis. Some new threat will grab our attention.

Remember the anxieties when the horseless carriage was introduced? Some of those concerns--"those beasts will kill people"--were valid. And we still have fatal car accidents. We view this as a part of life. We are not banning cars; we are trying to educate people to be more responsible drivers and reinforce traffic laws a little bit more consistently. The same thing will happen with information technology and this widespread availability of data. We will try to educate more responsible users in college, in the workforce.

This is where the government might play a role down the road. Right now, the state and federal governments are not quite clear on what they can do in this domain. They're going through their own learning curve, and we'll see what comes out of it. But we are not going to be as anxious as we are now.

Featured

  • person signing a bill at a desk with a faint glow around the document. A tablet and laptop are subtly visible in the background, with soft colors and minimal digital elements

    California Governor Signs AI Content Safeguards into Law

    California Governor Gavin Newsom has officially signed off on a series of landmark artificial intelligence bills, signaling the state’s latest efforts to regulate the burgeoning technology, particularly in response to the misuse of sexually explicit deepfakes. The legislation is aimed at mitigating the risks posed by AI-generated content, as concerns grow over the technology's potential to manipulate images, videos, and voices in ways that could cause significant harm.

  • close-up illustration of a hand signing a legislative document

    California Passes AI Safety Legislation, Awaits Governor's Signature

    California lawmakers have overwhelmingly approved a bill that would impose new restrictions on AI technologies, potentially setting a national precedent for regulating the rapidly evolving field. The legislation, known as S.B. 1047, now heads to Governor Gavin Newsom's desk. He has until the end of September to decide whether to sign it into law.

  • illustration of a VPN network with interconnected nodes and lines forming a minimalist network structure

    Report: Increasing Number of Vulnerabilities in OpenVPN

    OpenVPN, a popular open source virtual private network (VPN) system integrated into millions of routers, firmware, PCs, mobile devices and other smart devices, is leaving users open to a growing list of threats, according to a new report from Microsoft.

  • interconnected cubes and circles arranged in a grid-like structure

    Hugging Face Gradio 5 Offers AI-Powered App Creation and Enhanced Security

    Hugging Face has released version 5 of its Gradio open source platform for building machine learning (ML) applications. The update introduces a suite of features focused on expanding access to AI, including a novel AI-powered app creation tool, enhanced web development capabilities, and bolstered security measures.