AOL Spills: The Rest of Us Feel Chills
By Terry Calhoun
What if we could know what you are planning to do? What if we thought we knew what you are planning to do? What if law enforcement thought it knew what you were planning to do? These became questions for the real world earlier this week.
Q: Is it possible to identify someone with precision if you know nothing about them except the text of the queries they have typed into a search engine?
How do we know? A research group at AOL thought it would be useful for academics to analyze the search text query data, and shared AOL’s online. They apparently didn’t have the authority to do so, but didn’t know that. AOL took it down pretty fast, but the genie was out of the bottle and it had been copied and transmitted all over the Internet. The data even tells you whether a searcher clicked on a corresponding link. Some sites are now devoted to making it available for anyone to play with.
Now what do the searches really tell us about the people who typed them in, and their intent? One thing is for sure: in many cases, we can find out who they are. The first to be discovered was Thelma Arnold.
The queries typed into the AOL search engine by Thelma were among 20M+ such queries accidentally made available on the Internet by an AOL error. No personal identifying information was revealed by AOL. In the accidentally shared data, Ms. Arnold was identified only as user No. 4417749.
“Buried in a list of 20 million Web search queries collected by AOL and recently released on the Internet is user No. 4417749. The number was assigned by the company to protect the searcher’s anonymity, but it was not much of a shield.”
All the quotations are from a New York Times article, “A Face Is Exposed for AOL Searcher No. 4417749.”
In fact, those 20M+ queries were made by a total of 657,000 users, each identified only by an identification number. Yet investigators were able to identify Ms. Arnold from an analysis of her queries:
“No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from ‘numb fingers’ to ‘60 single men’ to ‘dog that urinates on everything.’”
Other investigators claim that they have managed to analyze the data to identify a whole host of other AOL users. The Electronic Privacy Information Center calls the data that AOL and other search engines compile and cross-link “a ticking time bomb.”
“And search-by-search, click-by-click, the identity of AOL user No. 4417749 became easier to discern. There are queries for ‘landscapers in Lilburn, Ga.,’ several people with the last name Arnold, and ‘homes sold in shadow lake subdivision gwinnett county georgia.’”
Interestingly, it turns out that people frequently search for other people (and themselves) by name. Sometimes a Social Security number is used as a search term.
“It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs. ‘Those are my searches,’ she said, after a reporter read part of the list to her.”
So, Thelma wasn’t investigating her own medical complaints, but those of others she wanted to help. I wonder if someone if medical insurance companies or Internet drug sales spammers are searching through the data. Actually, I don’t wonder. This is a bonanza for many people, including academic researchers. It’s just (probably) an eventual disaster for the (likely) many people whose searches are being analyzed.
So, we know who Thelma is now, but who is user No. 17556639? I can guarantee you that there are law enforcement staff working on that right now. User No. 17556639 had an interesting string of queries that began with “how to kill your wife” and ended with “car crash photo.” Hmm.
But, hey, Thelma was researching on behalf of friends. Maybe user No. 17556639 is researching a book he is writing about a murderous husband.
What about the hundreds of users who searched for “child porn,” “lolita,” or equivalent terms? Just research? Several Web sites now offer up the data in user-friendly formats that allow anyone to conduct searches. I expect that some of the Web’s nastiest vigilante groups are hard at work, as well as police officers in those communities that already have them looking through cyberspace. Heck, if I was a cop, I would. Let’s search for “Ann Arbor,” and see what user number is typing in suggestive queries. Then I’ll see if I can use those queries to narrow down an identity. Then onto real world surveillance. Hmm. Wow.
AOL has had a lot of troubles recently, what with people being unable to cancel the accounts of deceased users and a recent, widely-distributed tape of a terrible customer service call. It may be on its way out, but this latest gaffe ensures that it’s still making waves.