Spam, Spam, ...Lovely Spam

The signal-to-noise ratio of e-mail is rapidly declining. Spam has become the No. 1 scourge of our inbox. Sadly, it's getting harder and harder to distinguish spam from just plain, unsolicited junk. Nicolas Graham, the spokesperson for America Online, called spam the cold virus of the Internet. "It's going to exist in some form or another, like taxes or bad weather." As in any arms race, the sophistication of the aggressor—those generating spam—is in a tightly choreographed dance with those striving to save us from the onslaught, the spam filtering and legislative proposals to limit their distribution. There are multiple threads to this problem: technical, cultural, political, and legal.

Besides the implacable meat-based foodstuff, what is spam? The state of California has a definition that is nicely summarized by FindLaw ( Spam can be defined as "unsolicited e-mail documents," that is, "any e-mailed document or documents consisting of advertising material for the lease, sale, rental, gift offer, or other disposition of any realty, goods, services, or extension of credit when the documents (a) are addressed to recipients who do not have existing business or personal relationships with the initiator and (b) were not sent at the request of or with the consent of the recipient." (Section 17538.4, subd. (e))

To combat the rising tide of spam software companies have developed a variety of software tools to divert spam from landing in your inbox. The approaches taken tend to fall into the following general categories:

  • Content scanners
  • Distributed notification systems
  • Customized or manual blocking systems
  • Real-time blackhole lists (RBLs)

Content scanners: These are mechanisms that examine the individual message to determine if it meets specific criteria that are typical of spam. Software of this sort can be installed on a user's computer or on the receiving e-mail gateway of ISPs. There is much to like about this approach because at least the software installed on the user's PC leaves the final decision of what is or is not spam within their control. But this comes at a price. Examining the content of every e-mail that arrives is resource intensive. Content scanners are using increasingly complex analytic techniques from heuristic analysis to Bayesian inference.

Distributed notification systems: Like content scanners, these systems look at every message. However, once a message is determined to be spam, the leverage of a community is invoked. The user reports the spam message analysis to a centralized database that is checked by other users of the system. Other users can check the centralized spam database to see if their message matches anything in the repository and treat a matching message as spam.

Distributed notification systems begin to move the locus of control from the individual user to some external agency. In doing so, there is a collective leveraging of the filtering power, but a corresponding loss of control. Should messages you want to receive end up classified as spam by others using the system, they become spam to your system and are correspondingly blocked. Getting your preference for what you want to receive, as mail versus spam, is now a negotiation with a third party.

Customized or manual blocking systems: These are often implemented by the mail administrator of your e-mail system. Typically they do textual analysis of incoming mail and if words or phrases like "Nigerian royalty" appear in the message, they are discarded before they reach your mail client. Alternately, the e-mail administrator may have their own list of frequent sources of spam and simply implement a rule to block anything originating from such sources.

Unfortunately, this approach requires constant attention, generating new rules and modifying the old ones to keep up with the continuing message morphing engaged in by the spammers.

Real-time blackhole lists (RBLs): Sometimes good ideas have a way of turning around and biting back. RBLs tend to exhibit this characteristic. In the early days e-mail administrators noticed that a disproportionate amount of spam seemed to originate from common network addresses. As they began using manual blocking systems to protect their users from the onslaught of junk mail, they shared what they were learning with their fellow e-mail administrators. Common lists of network addresses from which spam originated had something else in common: They often came from sources that were open relays. An open relay is a mail server that is configured to relay messages sent to it on to their destination addresses. However, if a spammer encounters a mail server configured this way, he or she can send volumes of e-mail messages on to their destinations through the open relay, making it appear that the source of the message was the open relay itself.

RBLs check the source of e-mail sent to see if the source allows them to send a message through unchecked in this fashion. If it d'es, they conclude the source is an open relay and block e-mail coming to them from this address. As a consequence, e-mail administrators began to close down their mail relays by instituting rules that are applied to messages that ask to be forwarded on. For example, they may require that the originating address be one that is from a list that the mail administrator trusts, such as that of their own college or business. Any other message asking to be relayed will be dumped into the proverbial bit bucket.

This sounds terrific, except if your message really isn't spam. How might that happen? Once the idea developed to begin collecting the lists of open relays to use them as a check by local e-mail filtering rules to reject incoming messages that were from network addresses that came from these locations, their influence grew. But the reason a network address finds its way onto such a black list isn't always that it's from a source of spam.

The real problem is getting an address that isn't from a source of spam off the list. If a spammer managed to grab some accounts from an ISP that was otherwise a good citizen and send out thousands of spam messages through it, the ISP would become a candidate for the RBL. All the users of that ISP could quickly find their e-mail now rejected by others.

The real problem is that control is now exercised "elsewhere." The bottom line is the person controlling access to your inbox should be you. Attempts to externalize this, though sophisticated in design and altruistic in spirit, reduce your control of information coming and going to you. As long as you can look at the filtered list of spam and decide, even if it's in one keystroke, that yes, this is indeed garbage, you're making the decision for yourself. This isn't a responsibility to cede to others.

In the body politic the ultimate reaction to a threat is to make a law against that which the majority believes enhances their well-being, safety, or protection. State and federal legislators have jumped in with a growing list of anti-spam bills. Consider the "CAN–SPAM Act of 2003," introduced by Senators Burns, Wyden, Stevens, Breaux, Thomas, Landrieu, and Schumer, "to regulate interstate commerce by imposing limitations and penalties on the transmission of unsolicited commercial electronic mail via the Internet."

Not to be out done, Congresswoman Z'e Lofgren (D-CA) announced April 28 the "Restrict and Eliminate Delivery of Unsolicited Commercial E-mail (REDUCE) Spam Act." This Act requires that bulk unsolicited commercial e-mail incorporate in the subject line the tag [ADV:] to clearly distinguish it as advertising. But Congresswoman Lofgren raised the bar with an intriguing twist. The bill establishes a bounty for the first person to track down a spammer who violates the labeling or opt-out requirements.

Larry Lessig, the Stanford law professor who recently argued the Eldred v. Ashcroft case before the U.S. Supreme Court, has publicly committed to resigning from his position at the Stanford Law School if a spam bounty law is enacted and it d'esn't practically reduce the amount of spam clogging the veins of the Internet (see:

I'm skeptical that legislative remedies can be of much help, and I certainly don't want to see Larry lose his job, even if I'm wrong and Lofgren's legislation works. Rather, I'm putting my money on the increasing sophistication of locally implemented content filtering tools. The collective creativity of technologists, both in the open source and commercial communities, is formidable. There are some very good tools already on the market. As an example, consider Spamnet by Cloudmark. Spamnet leverages the power of the user community while leaving the ultimate decision of what is or is not spam for you to decide by routing all messages flagged as spam to your individual spam folder. The algorithms for identifying spam involve sophisticated message fingerprinting that is augmented by the Spamnet user community's own contributions ranked by confidence in the member submitting the message for consideration as spam (see "Content Scanners" on p. 10).

Protect yourself from spam, but stay in control of the process—don't get seduced by tools that use RBLs. It's your e-mail, including the decision as to what is and is not spam!

Content Scanners


[Editor's note: Phil Long will moderate a panel on innovative learning spaces on Mon., July 28, at Syllabus2003.]


General Spam References

Fight Spam on the Internet:

Spam Conference, Jan. 17, 2003, at MIT:

Spam Laws:

Spam: How to Fight It—Elsop's Anti-Spam Page:

Recent Spam Articles

Death to Spam: A Guide to Dealing with Unwanted E-Mail

Philip Jacob, The Spam Problem: Moving Beyond RBLs, accessed May 4, 2003.

Sandeep Junnarkar, Net Heavyweights Unite to KO Spam, accessed May 4, 2003.

Saul Hansel, Internet is Losing Ground in Battle Against Spam, accessed May 4, 2003.

Seth Kaplan, How Antispam Software Works, accessed May 4, 2003.

comments powered by Disqus