The Data Collection Process
The Data Collection Process
The Data Collection Process



   There are several reliable ways to collect data on you or any individual computer user. The first is by directly monitoring your Internet account; this is what Carnivore does, and some British Internet Service Providers are reported to do this too. The other way is by cookie. We'll talk about cookies first.

   Cookies should be familiar to many computer users. In case you haven't heard, they are small text files containing information on what sites you visit, a unique ID number, and may contain any othe information the website designer chooses to include, such as how long you were at the site, if you made any purchases, where else you went, your name, credit card number, etc. When you visit most popular websites, or click on any ad, a cookie is generated and set on your computer. It can either be stored on your computer's hard drive (called persistent cookies) or are temporary and will be erased once you shut down your browser (called per-session cookies.) In either case, the website can store the information they keep and or reveal.
   Data collection works by association: that is, there must be a unique identifying number assigned to you. This is necessary in order for any data collection to generate reliable and useful data. If the number were different every time, the website company and its advertisers would not know you've been there before because last time you may have had a different identifying number.
   Each website uses a slightly different mechanism for generating this number, but it is generated by sending requests to your browser. The browser, in turn, can generate the number based on the account in use at the time (some browsers, like Netscape 4 and above, can allow multiple "accounts" and each will have it's own unique identifier.
   We'll illustrate this by example. In order to avoid any liability by naming names, let us assume that our website was a big shopping site, that offered a search engine, email, chat rooms, and all sorts of goodies. Now, you make your first visit to this website, and we would set a cookie (by the way, we DO NOT do any of this, nor do we set any cookies except as a test later on, and then only with your permission!) Anyway, this is what the cookie might look like:

security.com/      12345

   That means that your visited www.security.com/ (the '/' means you have only been to the main page so far) and you have been assigned a unique identifying number of 12345. It is this number that can be used to identify you as you make your way through our fictitious site. Likewise, the fictitious sites' advertising partners, and anyone else who uses the same number-generation mechanism this site does will know your 'identity' and where you've been.
   Now, let's say your click on a shopping link. Now a cookie is set reflecting that you are interested in doing some shopping. Now, your cookie file might look like this:

security.com/      12345
security.com/shopping      12345

   Now, we have two pieces of information. These two places can be associated with each other because they have the same unique identifier.
   You click on a few links and spend a while browsing ski equipment and cookbooks, although you don't make any purchases. However, our fictitious website now knows some of your interests. Our cookie may keep that information on your computer, and our fictitious website can store that information too and sell it to marketers. Your cookies so far might look like this:

security.com/      12345
security.com/shopping      12345
security.com/shopping/SportingGoods/Ski      12345
security.com/shopping/Cooking      12345

   Now, we have four pieces of information on you, linked together because we have that unique number, which allows them to be associated.
   Next, let's say you click on an ad for a Ford Explorer. The ad is handled by an outside company, though they are a partner. Now, they know what you are interested in.

security.com/      12345
security.com/shopping      12345
security.com/shopping/SportingGoods/Ski      12345
security.com/shopping/Cooking      12345
ouradvertiser.com      67890      security.com/      ford.com/explorer
ford.com/explorer      abcde

   Our adverisers, which just so happens to run a website called "ouradvertiser.com" was visited by you when you clicked on the ad. Even if the ad sent you directly to Ford's website, you passed through the advertisers website, and they know that whoever 67890 is (you) might be interested in a Ford Explorer. In the process, they assigned you a unique number, 67890, in case you ever click on one of their ads again. It also gave you the referring site (our fictitious site) so they would know where we came in from (and also so our fictitious site could get paid for the ad.) Now, you've seen an example of referring sites. We'll cover that in more detail later. They sent you over to the advertisers' website who assigned you another cookie with its own unique identifier, abcde.
   If all this seems mind-boggling, don't worry. It can be for advertisers, too. So far, though, three different websites know you only as a number, but none of them have your name, address, or any other information yet, so the data collected so far is pretty useless aside from allowing the website operators to figure out what is popular.

   That's about to change though.

How can they Figure Out Who I Am?

   So far, our fictitious site only knows a few of your interests, but only knows you as the number 12345. But, let's say you return to our site sometime or someday later and you want to sign up for an email account. You fill out a form that asks you what online name you want, your real name, your income, perhaps your address and some other information. Now, this new information can be associated with your unique identifier, so our website knows who you are, and maybe where you live and what you make. This may or may not be stored in your cookie, although it is stored in our ad servers. If it was stored in your cookie, it would look something like this:

security.com/      12345
security.com/shopping      12345
security.com/shopping/SportingGoods/Ski      12345
security.com/shopping/Cooking      12345
ouradvertiser.com      67890      security.com/      ford.com/explorer
ford.com/explorer      abcde
security.com/email      12345      Jane Doe 123 Winding Way Anywhere AZ 22222 $100000+ AZperson44 password=mypassword

   Now, we know a lot about you. We know your name is Jane Doe, you live at 123 Winding Way, Anywhere, Arizona, 22222. and you make over $100,000 a year. We also know your screen name, AZperson44@security.com If you elected to store your account name and password on your computer (many web-based email services ask you if you want to) we also have your email password in our cookie. Now, anywhere you go on our fictitious site, any links you click, and advertisers you visit, and, possibly, anybody you email under you screen name can be associated with you.
   Specifically, we also figure out from the rest of the cookie, as well as the data already stored on our site, that Jane Doe is interested in cooking, skiing, and might be interested in buying a Ford Explorer. Since Jane Doe and her household makes a lot of money, you are a prime target for advertisers.

Data Reliability

   Now, it's possible that you could have given false information: many, if not most, people do that when signing up for services like web-based email. This is where the concept of data reliability comes into play.
   In order for data to be useful, it has to be reliable. Otherwise, advertisers and marketers would be wasting their money trying to target people who don't exist or who aren't honestly interested in the products they are trying to sell. There are ways, however, to determine how reliable the data in your cookies and the profile our fictitous website is building on you is. This is critical to security and privacy, and this is why we say that it's never to late to protect either. Here's how:
   First, we know off the bat that at least half of people give false information when signing up for an email account. So, we know that there is a 50-50 chance that your name and other personalize info are accurate. We'll assigned that a percentage, 50%. Now, that can be modified based on a few factors. First, since you apparently had no stored cookies when you first visited our site (at least not from us or our partners) then we can guess that you might be a new user. We know people who are new to the Internet are extremely unlikely to be aware that they can give a fake name or that it might be good to protect their privacy. We also know when you first visited our website (that was stored by our fictitous website. It also could have been stored in a cookie, although that is not illustrated in the above examples.) Since you appear to be a novice, we can assume that you didn't know better and gave accurate information. Now, we can adjust our reliability upward a certain amount. Let's say, 90%; now, we can assume that the info we have on you is 90% reliable.
   Could this be wrong? Certainly. First of all, you could have deleted all your cookies off your hard drive. Some websites keep track of their users solely by those cookies; they don't store the actual places they visit on their own website, just the identity and unique ID number. You could have switched computers or reinstalled your operating system and/or browser, which means that you will get a different unique number assigned to you by our website. You could be using a different Internet Service Provider account, screen name, or another browser account, which can also change the unique identifier.
(IMPORTANT NOTE: Many experienced Internet users who want to protect their security and/or privacy will use fake name or handles all the time. When they first buy a computer or reinstall the operating system, one will usually have to "register" the operating system. Many use fake names or handles, and fake addresses - there aren't many checks anyway. They then use this phony name to reinstall their software, sign up for new web-based email accounts, etc. This may sound nefarious, but it is actually an excellent and absolutely crucial way to protect yourself online, because it destroys the unique identifying numbers that advertisers and websites use to track you.)
   Is it possible that you are not just a clueless novice? Sure There are a few ways we can check. First, we can see if there is another Jane Doe of 123 Winding Way, Anywhere, Arizona, 22222 in our database. Perhaps, you signed up for another email address sometime in the past. If so, we now know not only that you are more experienced to the Internet, but we may also know what sites and services you have visted before because we can associate that past information with present info.
More to come...