Privacy on the Internet for IT Majors

Recently the Internet has seen tremendous growth, with the ranks of new users swelling at ever-increasing rates. This expansion has catapulted it from the realm of academic research towards newfound mainstream acceptance and increased social relevance for the everyday individual. Yet this suddenly increased reliance on the Internet has the potential to erode personal privacies we once took for granted.

New users of the Internet generally do not realize that every post they make to a newsgroup, every piece of email they send, every World Wide Web page they access, and every item they purchase online will be monitored or logged by some unseen third party. The impact on personal privacy is enormous; already we are seeing databases of many different kinds, selling or giving away collections of personal data, and this While the Internet brings the danger of diminished privacy, it also ushers in the potential for expanding privacy protection to areas where privacy was previously unheard of. This is our vision: restoration and revitalization of personal privacy for online activities, and betterment of society via privacy protection for fields where that was previously impossible. We want to bring privacy to the Internet, and bring the Internet to everyday privacy practices.
The purpose of this paper is not to present new results, but rather to encourage further research in the area of Internet privacy protection, and to give an overview (necessarily brief in a short paper such as this) of privacy-enhancing technologies. Section 2 explores some motivation for studying privacy issues on the Internet, and Section 3 provides some relevant background. We then discuss Internet privacy technology chronologically, in three parts: Section 4 describes the technology of yesterday, Section 5 explains today’s technology, and Section 6 explores the technology of tomorrow. Finally, we conclude in Section 7.

The threats to one’s privacy on the Internet are two-fold: your online actions could be (1) monitored by unauthorized parties and (2) logged and preserved for future access many years later. You might not realize that your personal information has been monitored, logged, and subsequently disclosed; those who would compromise your privacy have no incentive to warn you.

The threat of long-term storage and eventual disclosure of personal information is especially acute on the Internet. It is technically quite easy to collect information (such as a compendium of all posts you have made to electronic newsgroups) and store it for years or decades, indexed by your name for easy retrieval. If you are looking for a job twenty years from now, do you want your employer to browse through every Usenet posting you’ve ever made? If you are like most people, you have probably said something (however minor) in your past you would prefer to forget–perhaps an incautious word from your indiscreet youth, for instance. Long-term databases threaten your ability to choose what you would like to disclose about your past.
Furthermore, in recent years great advances have been made in technology to mine the Internet for interesting information. This makes it easy to find and extract personal information about you that you might not realize is available. (For instance, one of your family members might have listed information about you on their web page without your knowledge; Internet search engine technology would find this easily.) Did you know your phone number, email address, and street address are probably listed on the Web? Or that your social security number is available on any of several for-pay electronically-searchable databases? Most people probably do not want to make it easy for salesmen, telemarketers, an abusive ex, or a would-be stalker to find them.
In these ways, the Internet contributes to the “dossier effect”, whereby a single query can compile a huge dossier containing extensive information about you from many diverse sources. This increasingly becomes a threat as databases containing personal information become electronically cross-linked more widely. A recent trend is to make more databases accessible from the Internet; with today’s powerful search engine and information-mining technology, this is one of the ultimate forms of cross-linking. (For instance, phone directories, address information, credit reports, newspaper articles, and public-access government archives are all becoming available on the Internet.) The “dossier effect” is dangerous: when it is so easy to build a comprehensive profile of individuals, many will be tempted to take advantage of it, whether for financial gain, vicarious entertainment, illegitimate purposes, or other unauthorized use.

Government is one of the biggest consumers and producers of dossiers of personal information, and as such should be viewed as a potential threat to privacy. The problem is that today’s governments have many laws, surveillance agencies, and other tools for extracting private information from the populace [6]. Furthermore, a great many government employees have access to this valuable information, so there are bound to be some workers who will abuse it. There are many examples of small-scale abuses by officials: a 1992 investigation revealed that IRS employees at just one regional office made hundreds of unauthorized queries into taxpayer databases [2]; employees of the Social Security Administration have been known to sell confidential government records for bribes as small as $10 [22]; highly confidential state records of AIDS patients have leaked [3]. Finally, there is very little control or oversight, so an corrupt leader could easily misuse this information to seize and maintain power. A number of cautionary examples are available: FBI Director Edgar Hoover had his agency spy on political dissidents, activists, and opponents; the NSA, a secret military surveillance agency, has a long history of spying on domestic targets [5]; President Clinton’s Democratic administration found themselves with unauthorized secret dossiers on hundreds of Republican opponents in the “Filegate” scandal.
Anonymity is one important form of privacy protection that is often useful.

We observe that anonymity is often used not for its own sake, but primarily as a means to an end, or as a tool to achieve personal privacy goals. For example, if your unlisted telephone number is available on the web, but can’t be linked to your identity because you have used anonymity tools, then this might be enough to fulfill your need for privacy just as effectively as if you had kept the phone number completely secret. Many applications of online anonymity follow the common theme of “physical security through anonymity”. For instance, political dissidents living in totalitarian regimes might publish an exposé anonymously on the Internet to avoid harassment (or worse!) by the secret police.
In contexts other than the Internet, anonymous social interaction is both commonplace and culturally accepted. For example, the Federalist papers were penned under the pseudonym Publius; many other well-known literary works, such as Tom Sawyer, Primary Colors, etc. were also written anonymously or under a pseudonym. Today, home HIV tests rely on anonymous lab testing; police tip lines provide anonymity to attract informants; journalists take great care to protect the anonymity of their confidential sources; and there is special legal protection and recognition for lawyers to represent anonymous clients. The US Postal Service accepts anonymous mail without prejudice; it is well-known that anonymous voice calls can be easily made by stepping into a payphone; and ordinary cash allows everyday people to purchase merchandise (say, a copy of Playboy) anonymously. In short, most non-Internet technology today grants the ordinary person access to anonymity. Outside of the Internet, anonymity is widely accepted and recognized as valuable in today’s society. Long ago we as a society reached a policy decision, which we have continually reaffirmed, that there are good reasons to protect and value anonymity off the Internet; that same reasoning applies to the Internet, and therefore we should endeavor to protect online anonymity as well.
There are many legitimate uses for anonymity on the Internet. In the long term, as people take activities they’d normally do offline to the Internet, they will expect a similar level of anonymity. In fact, in many cases, they won’t even be able to imagine the extensive use this data could be put to by those with the resources and incentive to mine the information in a less-than-casual way. We should protect the ordinary user rather than requiring them to anticipate the various ways their privacy could be compromised. Moreover, the nature of the Internet may even make it possible to exceed those expectations and bring anonymity to practices where it was previously nonexistent. In the short term, there are a number of situations where we can already see (or confidently predict) legitimate use of Internet anonymity: support groups (e.g. for rape survivors or recovering alcoholics), online tip lines, whistleblowing, political dissent, refereeing for academic conferences, and merely the pursuit of everyday privacy of a less noble and grand nature. As the New Yorker magazine explained in a famous cartoon, “On the Internet, nobody knows you’re a dog”[23]–and this is perhaps one of the greatest strengths of the Internet.

On the other hand, illicit use of anonymity is all too common on the Internet. Like most technologies, Internet anonymity techniques can be used for better or worse, so it should not be surprising to find some unfavorable uses of anonymity. For instance, sometimes anonymity tools are used to distribute copyrighted software without permission (“warez”). Email and Usenet spammers are learning to take advantage of anonymity techniques to distribute their marketing ploys widely without retribution. Denial of service and other malicious attacks are likely to become a greater problem when the Internet infrastructure allows wider support for anonymity. The threat of being tracked down and dealt with by social techniques currently acts as a partial deterrent to would-be intruders, but this would be eroded if they could use Internet tools to hide their identity. We have already seen one major denial of service attack [10] where the attacker obscured his IP source address to prevent tracing. Widespread availability of anonymity will mean that site administrators will have to rely more on first-line defenses and direct security measures rather than on the deterrent of tracing. Providers of anonymity services will also need to learn to prevent and manage abuse more effectively. These topics are discussed at greater length in later sections.

A few definitions are in order. Privacy refers to the ability of the individual to protect information about himself. Anonymity is privacy of identity. We can divide anonymity into two cases: persistent anonymity (or pseudonymity), where the user maintains a persistent online persona (“nym”) which is not connected with the user’s physical identity (“true name”), and one-time anonymity, where an online persona lasts for just one use. The key concept here is that of linkability: with a nym, one may send a number of messages that are all linked together but cannot be linked to the sender’s true name; by using one-time anonymity for each message, none of the messages can be linked to each other or to the user’s physical identity. Forward secrecy refers to the inability of an adversary to recover security-critical information (such as the true name of the sender of a controversial message) “after the fact” (e.g. after the message is sent); providers of anonymity services should take care to provide forward secrecy, which entails (for instance) keeping no logs.

Some of the more obvious uses of persistent anonymity are in “message-oriented” services, such as email and newsgroup postings. Here, the two major problems to be solved are those of sender-anonymity, where the originator of a message wishes to keep his identity private, and of recipient-anonymity, where we wish to enable replies to a persistent persona.
In contrast to “message-oriented” services, we have “online” services. In these services, which include the World-Wide Web, online chat rooms, phones, videoconferences, and most instances of electronic commerce, we wish to enable two parties to communicate in real time, while allowing one or both of them to maintain their anonymity. The added challenges for online services stem from the increased difficulty involved in sending low-latency information without revealing identity via timing coincidences; to support these online services, we want to erect a general-purpose low-level infrastructure for anonymous Internet communications. In addition, certain specific applications, such as private electronic commerce, require sophisticated application-level solutions.

In past years email was the most important distributed application, so it should not be surprising that early efforts at bringing privacy to the Internet primarily concentrated on email protection. Today the lessons learned from email privacy provide a foundation of practical experience that is critically relevant to the design of new privacy-enhancing technologies.
The most primitive way to send email anonymously involves sending the message to a trusted friend, who deletes the identifying headers and resends the message body under his identity. Another old technique for anonymous email takes advantage of the lack of authentication for email headers: one connects to a mail server and forges fake headers (with falsified identity information) attached to the message body. (Both approaches could also be used for anonymous posting to newsgroups.) Of course, these techniques don’t scale well, and they offer only very minimal assurance of protection.
The technology for email anonymity took a step forward with the introduction of anonymous remailers. An anonymous remailer can be thought of as a mail server which combines the previous two techniques, but using a computer to automate the header-stripping and resending process [4, 16, 17, 24]. There are basically three styles of remailers; we classify remailer technology into “types” which indicate the level of sophistication and security.
The (“type 0”) remailer was perhaps the most famous. It supported anonymous email senders by stripping identifying headers from outbound remailed messages. It also supported recipient anonymity: the user was assigned a random pseudonym at, the remailer maintained a secret identity table matching up the user’s real email address with his nym, and incoming email to the nym at was retransmitted to the user’s real email address. Due to its simplicity and relatively simple user interface, the remailer was the most widely used remailer; sadly, it was shut down recently after being harassed by legal pressure [18].

The disadvantage of a style (type 0) remailer is that it provides rather weak security. Users must trust it not to reveal their identity when they send email through it. Worse still, pseudonymous users must rely on the confidentiality of the secret identity table–their anonymity would be compromised if it were disclosed, subpoenaed, or bought–and they must rely on the security of the site to resist intruders who would steal the identity table. Furthermore, more powerful attackers who could eavesdrop on Internet traffic traversing the site could match up incoming and outgoing messages to learn the identity of the nyms.

Cypherpunk-style (type I) remailers were designed to address these types of threats. First of all, support for pseudonyms is dropped; no secret identity table is maintained, and remailer operators take great care to avoid keeping mail logs that might identify their users. This diminishes the risk of “after-the-fact” tracing. Second, type I remailers will accept encrypted email, decrypt it, and remail the resulting message. (This prevents the simple eavesdropping attack where the adversary matches up incoming and outgoing messages.) Third, they take advantage of chaining to achieve more robust security. Chaining is simply the technique of sending a message through several anonymous remailers, so that the second remailer sees only the address of the first remailer and not the address of the originator, etc. Typically one combines chaining with encryption: the originator encrypts repeatedly, nesting once for each remailer in the chain; the advantage is that every remailer in a chain must be compromised before a chained message can be traced back to its sender. This allows us to take advantage of a distributed collection of remailers; diversity gives one a better assurance that at least some of the remailers are trustworthy, and chaining ensures that one honest remailer (even if we don’t know which it is) is all we need. Type I remailers can also randomly reorder outgoing messages to prevent correlations of ciphertexts by an eavesdropper. In short, type I remailers offer greatly improved security over type 0, though they do have some limitations which we will discuss next.

The newest and most sophisticated remailer technology is the Mixmaster, or type II, remailer [7, 11]. They extend the techniques used in a type I remailer to provide enhanced protection against eavesdropping attacks. First, one always uses chaining and encryption at each link of the chain. Second, type II remailers use constant-length messages, to prevent passive correlation attacks where the eavesdropper matches up incoming and outgoing messages by size. Third, type II remailers include defenses against sophisticated replay attacks. Finally, these remailers offer improved message reordering code to stop passive correlation attacks based on timing coincidences. Because their security against eavesdropping relies on “safety in numbers” (where the target message cannot be distinguished from any of the other messages in the remailer net), the architecture also calls for continuously-generated random cover traffic to hide the real messages among the random noise.
Another new technology is that of the “newnym”-style nymservers. These nymservers are essentially a melding of the recipient anonymity features of a style remailer with the chaining, encryption, and other security features of a cypherpunk-style remailer: a user obtains a pseudonym (e.g. from a nymserver; mail to that pseudonym will be delivered to him. However, unlike, where the nymserver operator maintained a list matching pseudonyms to real email addresses, newnym-style nymservers only match pseudonyms to “reply blocks”: the nymserver operator does not have the real email address of the user, but rather the address of some type I remailer, and an encrypted block of data which it sends to that remailer. When decrypted, that block contains the address of a second remailer, and more encrypted data, etc. Eventually, when some remailer decrypts the block it receives, it gets the real email address of the user. The effect is that all of the remailers mentioned in the reply block would have to collude or be compromised in order to determine the email address associated with a newnym-style pseudonym.
Another simple technique for recipient anonymity uses message pools. Senders encrypt their message with the recipient’s public key and send the encrypted message to a mailing list or newsgroup (such as alt.anonymous.messages, set up specifically for this purpose) that receives a great deal of other traffic. The recipient is identified only as