Intrusion Detection System in Linux Kernel
Background
Network security is being
a big concern in the near future. As the Internet technologies are expanded
quickly, people are more concerned about the security of the network. The
Internet is a stunning technological advance that provides access to
information, and the ability to publish information, in revolutionary ways. But
it's also a major danger that provides the ability to pollute and destroy
information. Therefore, people are trying to provide protection of networks and
their services from “unauthorized modification, destruction or disclosure, and
provision of assurance that the network performs its critical functions
correctly.” (“Come from web definitions / Need to add
reference later”) The objective of network security is to provide data
confidentiality and integrity.
Implementing a firewall
is currently a popular way to protect your system or network against attacks. Firewall
filters network traffic with specific IP addresses and port numbers. However, firewalls lack some degree of intelligence when it comes to
observing, recognizing, and identifying attack signatures that may be present
in the traffic they monitor and the log files they collect. This is the reason
why people combine with intrusion detection system.
Intrusion detection
System (IDS) is intended to observe network traffic to identify threats by
“detecting scans, probes and attacks.”[2] There are two types of IDS, which is
signature based and anomaly based. Signature based IDS pre-installs some
information of well-known vulnerabilities in the system. Any attempts that
exploit them will trigger an alarm. [1] In contrast, anomaly based notice
attempts that are different from normal behavior, which the details are coming
from reference information. [1]
(May add
the advantages of implementing IDS in the kernel as well.)
Objective
Both IDS help system
handling threats appropriately, and therefore the system can be protected
against various attacks. [2] My research project will investigate anomaly based
intrusion detection in depth. There are two common anomaly IDS, which their
combination provide the full security over all layers in the network. PHAD
(Packet Header Anomaly Detector) detects anomalies in Ethernet, IP, TCP, UDP,
and ICMP packet headers, while ALAD (Application Layer Anomaly Detector)
detects anomalies at application layer. [6]
In this research, I will
analyze the PHAD and ALAD. Then I will design a new IDS kernel module for an
open source firewall (iptables). After that, the implementation
of relevant PHAD and ALAD functionality in iptables
will be performed. Once the kernel module has been compiled and run
successfully, I will conduct a testing of this new module against MIT’s 1999
DARPA attack database.
Annotated Bibliography
[1] H. Debar, "An
Introduction to Intrusion-Detection Systems."
In
the real world, it is difficult to provide a fully secure information system
and keep them in a secure state in long term. Therefore, Intrusion Detection
System (IDS) has been designed to detect attacks against computer systems,
networks and information systems. It monitors the traffic of the systems and
detects attempts and active misuses that exploit security vulnerabilities. Other
than Accuracy, Performance and Completeness, this paper suggested two other
parameters to measure the efficiency of IDS. They are Fault tolerance and
timeliness. Additionally, it explained the characteristic of general IDS and
classified it into two categories, which are knowledge-based and behavior-based.
Inside these two categories, there are two types of IDS. They are host-based
and network-based. Host-based IDS monitor the status of system, while
network-based IDS observe the traffic to and from the system.
This
paper is relevant to my research topic as it introduces some important concepts
of IDS which enhance my understanding of the whole project.
[2] H. Debar, M. Dacier, and A. Wespi,
"Towards a Taxonomy of Intrusion-Detection Sytems,"
1998.
The
purpose of the Intrusion Detection System (IDS) is to perceive attacks in
computer systems and networks. In the real world, it is hard to maintain the
systems or network in a fully secure state in the entire lifetime. Therefore,
the concept of IDS is introduced to examine and monitor the status as well as
usage of the system or networks. It detects abnormal activities which exploit
security vulnerabilities. The paper categorized different types of IDS based on
their specific properties. There are some important characteristics of IDS have
been clarified, including detection method, behavior on detection, audit source
location and usage frequency. This paper also pointed out the two complementary
trends in intrusion detection, which are knowledge-based (signature-based) and
behavior-based (anomaly-based) intrusion detection. Knowledge-based intrusion
detection system includes information about well-known or specific
vulnerabilities. An alarm will be triggered if there are attempts exploit them.
Unlike knowledge-based intrusion detection, the behavior-based intrusion
detection system detects attacks by examining a difference from normal behavior
of the system. The model of valid behavior is extracted from reference
information collected by various means. Although knowledge-based intrusion detection
has lower false alarm rate, it is unlikely to detect new intrusions unless the
information is updated frequently. Behavior-based intrusion detection has
higher false alarm rate, however it is likely to detect attempts to exploit new
and unforeseen vulnerabilities.
This
paper gives the solid background information to my research topic. It
illustrated the pros and cons between knowledge and behavior-based intrusion
detection. This provided a good reason why behavior-based intrusion detection
technique is worth research and should try to improve.
[3] L. T. Heberlein, G. V. Dias, K. N. Levitt,
B. Mukherjee, J. Wood, and D. Wolber,
"A network security monitor," presented at Research in Security and
Privacy, 1990,
Network
security is being a big concern since computer networking has become more
popular. Some network securities techniques, such as encryption, have been
suggested in previous papers. However, those techniques are not efficient and
expensive. Therefore, this paper discusses the development of a monitoring
technique that maintains information of normal network activity. The monitor is
called Network Security Monitor (NSM). It examines network traffic in real time
by comparing current traffic matrix against a static matrix. There are five
main components to help monitoring the traffic. Packet Catcher, Parser, Matrix
Generator and Matrix Archiver generate the profile
data, while Matrix Analyzer investigates the traffic matrix to look for unusual
traffic patterns.
The
NSM that proposed in this paper is working in a basic network environment that
contains hosts, LAN and Internet. My later stage of the research project may be
involved in such operating environment. The NSM can be one of the testing tools
to check my kernel IDS module.
[4] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba,
and K. Das, "The 1999 Darpa
Off-Line Intrusion Detection Evaluation."
This
paper discusses the 1998 and 1999 DARPA offline intrusion detection evaluation.
The 1998 evaluation was intended to perform a comprehensive technical
evaluation of intrusion detection technology. It did not evaluate the
commercial system but only the DARPA funded intrusion detection technology. Moreover,
it measure false alarm rates using background traffic. When it comes to the
1999 evaluation, a Window NT workstation has been added as a victim and an
inside tcpdump sniffer machine
has been put into the evaluations. The 1999 evaluation performed two new types
of analyses, which are the analysis of misses and high-scoring false alarms and
the permission of participants to submit attack forensic information. Eight
sites involved in the 1999 evaluation. A test bed produced live background
traffic, which launched 200 instances of 58 attack types against victim UNIX
and Windows NT in five weeks. It concluded that best detection was provided by
network-based system for old probe and old denial of service attacks and by
host-based system for Solaris user-to-root attacks. Additionally, combined
system that used both host and network based intrusion detection offers best
overall performance.
This
paper offers an overview of 1999 DARPA offline intrusion detection evaluation
data set. This data set will be implemented in my research to train the
intrusion detection system (IDS) that I created. Therefore, it is important to
compare the result that generated by my IDS to those suggested in the paper.
[5] M. V. Mahoney
and P. K. Chan, "PHAD: Packet Header Anomaly Detection for Indentifying Hostile Network Traffic," 2001.
Packet
Header Anomaly Detector (PHAD) studies the normal range of values in the header
field of well-known protocols like Ethernet, IP and TCP. After training by the 1999 DARPA
off-line intrusion detection evaluation dataset, PHAD detects 1/3 of instances
(nearly half types) of attacks. There are total 10 false alarms each day after
a 7 days training of attack-free internal network traffic. Compare to other network
intrusion detector and firewalls, they only identify 8 of 201 instances (6 of
59 types) of attacks. What makes PHAD successful is examining packets and
fields in isolation and implementing nonstationary
model that estimate probability based on the time since the last event rather
than the average rate of events. Even PHAD did a satisfied performance, it did
not deal with some particular attacks such as exploit bugs at the application
level and U2R. This paper suggested that perform an anomaly detections at the
system call level may solve this problem. Some ideas have been pointed out in
this paper to improve the PHAD performance on the DARPA evaluation data set.
They included increasing the number the clusters, setting the storing hashed
values equal 1000, parsing the header into 1 byte fields, using stationary
models, examining combinations of fields and running PHAD without TTL.
This
paper gave the background information of PHAD. It demonstrated the characteristic
of PHAD as well as its limitations. This helps my investigation of the research
as one of my focuses is PHAD analysis in the network layer.
[6] M. V. Mahoney
and P. K. Chan, "Learning Nonstationary Models
of Normal Network Traffic for Detecting Novel Attacks," presented at
Proceedings of the eighth ACM SIGKDD international conference on Knowledge
discovery and data mining, 2002.
The
main disadvantage of the traditional Intrusion Detection System (IDS) is the
inability of detecting new attacks which do not have known signatures. Therefore
a new learning algorithm, which creates normal behavior model from attack free
network traffic, has been introduced. Behaviors that different from this model
will be treated as possible attacks. This new IDS is nonstationary,
which the modeling probabilities are based on the time since the last event. Furthermore,
the IDS learns protocol vocabularies so that it detects unknown attacks. This IDS
contains two nonstationary components. The first
component is Packet Header Anomaly Detector (PHAD) that examines the Ethernet,
network and transport layer. The second component is Application Layer Anomaly
Detector (ALAD) that monitors the traffic of application layer such as HTTP,
FTP and SMTP traffic. PHAD and ALAD have been evaluated by running them at the
same time on DARPA IDS evaluation dataset. The experimental results showed that
the attacks can be categorized into 5 types, which are learned signatures,
induced anomalies, evasion anomalies, attacker errors and user behaviors. The
SNORT, which is not the original DARPA participant, is being examined, too. SPADE
is a plug-in of SNORT that is similar to ALAD. The results showed that SNORT
detects more attacks than PHAD and ALAD. However, it may caused by the lack explicit
training period than PHAD and ALAD.
This
paper explained and analysis the performance that PHAD and ALAD on DARPA
evaluation dataset. It gives some evidences that PHAD and ALAD are performed as
well as systems that combined both signature and anomaly detection.
[7] M. Mahoney and
P. Chan, "An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data
for Network Anomaly Detection," presented at Recent Advances in Intrusion
Detection: 6th International Symposium,
The
1999 DARPA dataset is a popular benchmark in intrusion detection. This paper
investigates the impacts of overoptimistic evaluation of network anomaly
detection systems. The test set called IDEVAL has been discussed. Unlike other
scholars that wrote papers to critique IDEVAL, the authors of this paper
concentrate on an analysis of the data, rather than the procedure to generate
the data. To analyze the IDEVAL background network traffic, this paper performs
the comparison between IDEVAL and real traffic. It points out that the
preconditions for simulation artifact that the attributes must well behaved,
but poorly behaved in real traffic. Then it monitors the mixed traffic of
IDEVAL and real traffic. It concludes that after mixing the traffic, the artifacts
are removed since the data set have poorly behaved attributes.
This
paper provides another point of view of critics of 1999 DARPA data set. Compare
to the paper that is written by McHugh, this paper does not discuss attack simulation
and host-based data. It also mentions the test results of PHAD, ALAD and SNORT,
which can be a reference of my research topic.
[8] R. Maxion and K. Tan, "Benchmarking anomaly-based
detection systems," presented at Dependable Systems and Networks, 2000,
This
paper carries out a metric for characterizing structure in data environments
and tests the hypothesis that intrinsic structure (regularity) influences
probabilistic detection. The hypothesizes are: differences in data regularities
affect detector performance and such differences will be appeared in natural
environment. The paper also discusses the method that the datasets were created
and combined with anomalies to form the benchmark datasets eventually. Once the
dataset has been generated, an anomaly detector is used to detect anomalous
subsequences embedded in longer sequences of data. Finally, the paper concluded
that both of the hypotheses were confirmed.
This
paper explained the generations of training data and test data in a mathematical
way. The results of this paper will affect the conclusion of my research topic.
I have to concern about the regularities of the data sequences and it is
necessary to change the parameters of my kernel IDS module if regularity
changes,
[9] J. McHugh,
"Testing Intrusion Detection Systems: A Critique of the 1998 and 1999
DARPA Off-line Intrusion Detection System Evaluation as Performed by
MIT
Lincoln Laboratory carried out a well-known evaluation of intrusion detection
systems (IDSs) in 1998 and 1999. Even this evaluation
is a significant breakthrough, some problems associated with its design and
execution occur. The author pointed out that the evaluation methods in DARPA
are not that appropriated and its representation of results is questionable. For instance, the problem of determining suitable
units of analysis, unrealistic detection approaches and presentations of false
alarm data. After identifying the problems, this paper proposed some possible
solutions, such as developing better measures of performance, validating the
artificial test data in some way, evaluating commercial IDS and providing new
attack capabilities.
This
paper critiqued the work done by MIT Lincoln Laboratory group. Although the
solutions have not been suggested, it pointed out the possible problems that
may be occurred while implementing the DARPA dataset. This makes me aware of those
aspects and I will adjust my testing procedures or methods if necessary. The
limitation of this paper is mainly concentrated on the discussion of the 1998
DARPA dataset, but not the 1999 dataset which I am going to use as the training
dataset.
[10] S. Northcutt, Z. Lenny, S.
Winters, K. K. Frederick, and R. W. Ritchey, "Inside Network Perimeter
Security."
This
chapter introduced the basic concepts of intrusion detection system (IDS). Particularly
it explained the needs of IDS and their roles in perimeter defense. The roles
included identifying weakness, providing host protections, handling incidents
and forensics. On the other hand, the authors suggested several architectures
of IDS. The network sensors could be deploying near filter device or in
internal network.
This
piece of reading provides the background of IDS theoretically. It talked about
the importance of IDS in the network environment. It also gave some basic ideas
and concepts of IDS. However, it only described the context of IDS generally.
That is, it didn’t discuss technical details of IDS in a professionally way as
conference papers and journal articles.
[11] N. Puketza,
K. Zhang, M. Chung, B. Mukherjee, and R. Olsson,
"A methodology for testing intrusion detection systems," vol. 22, pp.
719-729.
Since
the popularity of Intrusion Detection System (IDS) has been increased, it is
necessary to create a well-formed methodology for testing IDS. This paper
explained the general objectives of IDS performance, which includes broad
detection range, economy in resource usage and resilience to stress. The
proposed methodology is tested on the UNIX package “except”, which simulate
users and intruders in the experiment. A test case has been used to simulate
the user session during the experiment, too. To test the methodology, there are
several procedures have been carried out. For instances, generating a set of
test scripts, establishing the desired conditions in the computing environment
and investigating the IDS’s output. The test can be
divided into three categories. They are intrusion identification tests,
resource usage tests and stress tests. The results show that the methodology
can successfully reveal useful information about and IDS and its capabilities.
This
paper contributes to my research topic because it suggests a method to evaluate
an IDS. Although I may not implement this method in my topic eventually, it can
be a useful source of information to do comparisons with other IDS evaluations.
[12] R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, A. Tiwari, H. Yang, and
S. Zhou, "Specification-based anomaly detection: a new approach for
detecting network intrusions," presented at The 9th ACM conference on
Computer and communications security, Washington, DC, USA, 2002.
Not
to sure how this paper is relevant to my research. Need to read the paper again.
[13] D. Wagner and D. Dean,
"Intrusion Detection via Static Analysis," presented at 2001 IEEE
Symposium on Security and Privacy,
Modeling
typical application behavior to identify attacks without generating a huge
amount of false alarm has been a major challenge in intrusion detection. This
paper suggested static analysis helps developing a model of application behavior
automatically. The proposed framework is designed to detect the case where an
application is broken through and then taken advantage of harming other parts
of the systems. There is a specification of expected application behavior, and
then the actual behavior of the system will be monitored to check whether it
deviates from the specification. Four models have been pointed out in this
paper to specify expected application behavior. Particularly, the false alarms
should never occur in each model. The
authors discuss the implementation of IDS using static analysis as well.
Need to add the critique later.