There is a tremendous increase in spam traffic these days [2]. Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread worms and viruses. Our goal is to present a definite figure about the characteristics of spam and spammers. Since spammers change their mode of operation to counter anti spam technology, continues evaluation of the characteristics of spam and spammers technology has become mandatory. These evaluations help us to enhance the existing technology to combat spam effectively. We collected 400 thousand spam mails from a spam trap set up in a corporate mail server for a period of 14 months form January 2006 to February 2007. Spammers use common techniques to spam end users regardless of corporate server and public mail server. So we believe that our spam collection is a sample of world wide spam traffic. Studying the characteristics of this sample helps us to better understand the features of spam and spammers technology. We believe that this analysis could be useful to develop more efficient anti spam techniques.
1. Introduction
E-mail has emerged as an important communication source for millions of people world wide by its convenience and cost effectiveness [6]. Email provides user’s low cost message to large number of people by simply clicking the send button. Email message sizes ranges from 1 kb to multiple mega bytes which is much larger than fax and other communication devices. The byproducts of email like instant messaging, chat etc., make life easier and adds more sophisticated facilities to the Internet users. According to a Radicati Group [18] study from the first quarter of 2006, there were about 1.1 billion email users worldwide. Traditionally the Internet penetration is very high in USA and Europe. But due to the recent upraise of Asian power houses like China and India, the number of email users have increased tremendously [15]. These days spam has become a serious problem to the Internet Community [8]. Spam is defined as unsolicited, unwanted mail that endangers the very existence of the e-mail system with massive and uncontrollable amounts of message [4]. Spam brings worms, viruses and unwanted data to the user’s mailbox. Spammers are different from hackers. Spammers are well organized business people or organizations that want to make money. DDoS attacks, spy ware installations, worms are not negligible portion of spam traffic. According to research [5] most spam originates from USA, South Korea, and China respectively. Nearly 80% of all spam are received from mail relays [5]. Our aim is to present clear characteristics of spam and spam senders. We setup a spam trap in our mail server and collected spam for the past 14 months from January 2006 to February 2007.
We used this data for our study to characterize spam and its senders. We conducted several standard spam tests to separate spam from incoming mail traffic. The standard test includes various source filters, content filter. The various source filter tests includes Baysean filter, DNSBL, SURBL, SPF, Grey List, rDNS etc. The learning is enabled in content filters. The size of the dictionary is 50000 words. At our organization we strictly implement mail policies to avoid spam mails. The users are well instructed on how to use mail service for effective communication.
The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 provides data collection of legitimate and spam mails. In section 4, we describe our classification and characteristics of spam traffic. Section 5 provides details of spammers and their technology. We conclude in section 6.
2. Related work
In [1] propose a novel approach to defend DDoS attack caused by spam mails. Their study reveals the effectiveness of SURBL, DNSBLs, content filters. They have presented inclusive characteristics of virus, worms and trojans accompanied spam as an attachment. Their approach is a combination of fine tuning of source filters, content filters, strictly implementing mail policies, educating user, network monitoring and logical solutions to the ongoing attack. In [3] examines the use of DNS black lists. They have examined seven popular DNSBL and found that 80% of the spam sources are listed in some DNSBL. In [4] presented a comprehensive study of clustering behavior of spammers and group based anti spam strategies. Their study exposed that the spammers has demonstrated clustering structures. They have proposed a group based anti spam frame work to block organized spammers. In [5] presented a network level behavior of spammers. They have analyzed spammers IP address ranges, modes and characteristics of botnet. Their study reveals that blacklists were remarkably ineffective at detecting spamming relays. Their study states that to trace senders the internet routing structure should be secured. In [6] presented a comprehensive study of spam and spammers technology. His study reveals that few work email accounts suffer from spam than private email. In [8] Gomez, Crsitino presented an extensive study on characteristics of spam traffic in terms of email arrival process, size distribution, the distributions of popularity and temporal locality of email recipients etc., compared with legitimate mail traffic. Their study reveals major differences between spam and non spam mails.
3. Data Collection
Our characterization of spam is based on 14 months collection of data over 400,000 spam from a corporate mail server. The web server provides service to 200 users with 20 group email IDs and 200 individual mail accounts. The speed of the Internet connection is 100 Mpbs for the LAN, with 20 Mbps upload and download speed (Due to security and privacy concerns we are not able to disclose the real domain name). To segregate spam from legitimate mail, we conducted a standard spam detection tests in our server. The spam mails detected by these techniques were directed to the spam trap in the mail server.
Spammers do not change their tactics on day to day basis. Our study shows that spammers follow the same technology until the anti spammers find efficient way to keep them at bay. The time period ranges from 8 months to 1 year. We found that the major spammers follow the same technology from May 2006 to February 2007. The figure 1 shows the incoming mail traffic of our mail server for 2 weeks. The figure shows that the spam traffic is not related to legitimate mail traffic. The legitimate mail traffic is two way traffic induced by social network [8]. But the spam traffic is one way traffic. From this picture we can understand that the server is handling more number of spam than legitimate mail traffic.
The Figure.1 shows the number of legitimate mails, spam and spam with virus as an attachment for a period of 2 weeks from February 1 to February 15. The x axis is day and y axis is the number of spam received by the spam trap on server. Roughly the number of legitimate mails ranges from 720 to 7253 with an average rate of 906 per day. The spam mails ranges from 1701 to 8615 with an average rate of 4736 per day. The spam with viruses as an attachment ranges from 209 to 541 with an average rate of 403 per day.
Note: Full paper is available in pdf format
Home