欢迎来到留学生英语论文网

客服信息

我们支持 澳洲论文代写 Assignment代写、加拿大论文代写 Assignment代写、新西兰论文代写 Assignment代写、美国论文代写 Assignment代写、英国论文代写 Assignment代写、及其他国家的英语文书润色修改代写方案.论文写作指导服务

唯一联系方式Q微:7878393

当前位置:首页 > 论文范文 > Internet

Spynodes in peer-to-peer (p2p) networks

发布时间:2018-02-23
该论文是我们的学员投稿,并非我们专家级的写作水平!如果你有论文作业写作指导需求请联系我们的客服人员

ABSTRACT

Peer-to-Peer (P2P) networks have become a major traffic source on the internet. Securing the p2p networks is becoming a very significant factor, as they provide sharing resources on the internet and even acts as a Traffic source. The strategic importance of most of the peer-to-peer service providers is securing them. In p2p networks, p2p worms pose heavy threats. Worms exploit common vulnerabilities in member host of a p2p network and spread topologically in p2p networks. To secure peer-to-peer networks, there is a necessity of some of the propagation strategies that worms can use to propagate using peer-to-peer overlay networks. The major problem is with the detection of peer-to-peer worm attacks because they are able to merge malicious traffic within the p2p traffic. Even though the traffic generated by infected and uninfected nodes appear similar at the network level, absence of anomalous behavior at network level makes it further more difficult to detect p2p worm attacks using the existing techniques. Most of these problems would not have a wide spread on the networks, if there is an effective mechanism in place to monitor p2p traffic and track malicious node behavior.

To solve this problem, there is need of mechanism that monitors malicious node behavior and peer to peer traffic. For this, i consider the concept of spy nodes which are special nodes hidden within the network and continuously monitor p2p traffic for malicious peer behavior. Spy nodes detect the anomalous behavior of infected nodes by generating special traffic and hence they can distinguish infected and uninfected nodes. With such advantageous features, spy nodes can defeat passive worm attacks. In this report, my work is concerned with detailed outlook of how spy nodes will be connected to the network and how they will be connected with each other. This is provided with a detailed description of spy node architecture along with worm signature generations. There are several factors that affect the working of spy nodes, all these factors are briefly described in my report. As per my knowledge, this is one of the first works on monitoring peer-to-peer traffic for detecting and containing passive peer-to-peer worms using special nodes called spy nodes.

Chapter 1

INTRODUCTION

Peer-to-peer applications enjoy enormous popularity in internet space for resource sharing. They have become major drivers for the large growth of the broadband users. The analysis of internet traffic has shown that a large majority of internet traffic is created by p2p applications[10]. Even though today p2p applications are, to a very large extent, responsible for the sharing of illegal data, they do carry the potential for legal sharing of data. Due to the widespread use of p2p network such as Gnutella, Kazaa, Bittorrent it has become very important to secure p2p network against malicious users and malicious data.

The p2p network architecture emerged from a motivation to realize a computing architecture which cannot be taken down by attacking any single point[7]. The p2p networks are not based on client/server model which is probably the most wide spread model for creating internet applications. The nodes of p2p network act both as client and servers. This makes p2p systems scalable and resilient to single node failure.

Over the years we have seen the spread of various kinds of worms such as internet scanning worms, web worms, email worms etc. Internet scanning worms have taken up most of the press because of their fast propagation behavior. The worms like Code Red and Slammer were able to infect thousands of nodes within very short span of time. Now a new class of worms is emerging which rely on P2P networks to propagate called P2P worms. They can be divided into two classes Passive Worms and Active Worms. Passive worms are the worms those wait for the susceptible node to contact the infected node. Active worms are the worms those actively attack susceptible nodes. These worms do not waste time probing unused IP addresses and hence, they do not generate high rate of failed connections. Also, these have the ability to merge malicious traffic into the p2p network traffic, thus making them very difficult to detect. These worms have been discussed in detail in the next chapter. It has been shown [8] that detection systems based on the analysis of worm scans cannot differentiate between the normal p2p activities of a client from a worm. The working of these worms is discussed in detail in the next chapter.

The earlier approaches to contain such attacks are based on building trust systems within the network which will allow peers to download files based on their rating in the system. In my report, i had provided a discussion on the vulnerabilities present in such approaches and how a worm can defeat them. In this report I will present the feasibility of creating self-defense architecture for Gnutella like distributed p2p networks to monitor P2P traffic and to detect and defeat passive worm attacks. I will provide a discussion on concept of spy nodes and how they can be used to monitor traffic on internet. I propose an architecture which includes spy nodes in a network and describe their usage for traffic monitoring on internet and thus defeating passive worm attacks.

Overview:

Peer-to-Peer (P2P) networks have become a major source of traffic on internet. They enjoy enormous popularity for sharing resources on internet. Therefore, securing p2p networks has become a subject of strategic importance for most of p2p service providers. In this report some of the propagation strategies that worms can use to propagate using p2p overlay networks will be presented and concept of spy nodes which are hidden within the network and help us to monitor the traffic in p2p networks and track peer behavior. This architecture will allow us to detect worm attack and take measure against it to prevent further propagation of the worm.

Issues and Requirements:

Malicious data transfer using peer to peer networks is becoming a significant issue now-a-days. To prevent the malicious data transfer, we require particular mechanism in peer to peer networks. Spy nodes are special nodes which are hidden in peer to peer network, identical to regular network node. It is unable to distinguish these spy nodes from the network nodes. Spy nodes with some advantageous properties can able to increase the communication range and preventing the vulnerable attacks to the networks due to various worms. The major issue discussed in this project is preventing the malicious data transfer in peer to peer networks. For this we require spy nodes which are hidden in the peer to peer networks.

Objectives:

  1. Review of working of a distributed P2P protocol and the problems such a protocol would face.
  2. Considering a distributed P2P protocol Gnutella, analysis on its vulnerabilities to the worm attacks.
  3. Providing an overview of related work by other researchers.
  4. Proposing the concept of spy nodes in distributed peer to peer networks.
  5. Proposing the architecture which include spy nodes in a network and description on their usage for traffic monitoring on internet and thus, defeat passive worm attacks.

Structure of thesis:

The report is organized as follows in chapter 2, I will discuss a distributed P2P protocol Gnutella in detail and its vulnerabilities to worm attacks. And also I will give an overview of the related work. In chapter 3, I will discuss the concept of spy nodes and how they can be used to monitor traffic on internet and thus defeat passive worm attacks. And also design the architecture of spy nodes to show the effectiveness of my project and also I will give an overview of related work.

1.1 Literature review on Distributed P2P protocols- Gnutella, Kazaa and Bittorent

As a background research, two papers from literature are considered in this section.

a) First paper is by Subhabtra sen at AT&T labs research of Florham park new jersey titled as “Accurate, scalable in-network identification of p2p traffic application signatures”. In this paper they provide a method for identifying the p2p application traffic by using application level signatures. In this they use the identified signatures to build online filters that can accurately track the p2p traffic even on the high speed network links ,for this purpose they use the most popular p2p protocols like Kazaa , bit torrent, Gnutella etc. this approach also shows that it requires only first few packets to identify the p2p connections ,so this approach is highly scalable ,the main goal of this is to obtain application level signatures for p2p protocols to achieve high accuracy and robustness while at least being used at high speed Ethernet in real time. Historical in client/server models content is stored in server and all clients download content from them which causes the overload on the server, the p2p file sharing address this problem by allowing the peers to transfer the data directly. Signaling and download is the two phases involved when peer want to download the content. For this we use the most important peer to peer protocols like Kazaa, Gnutella, bit Torrent.

Gnutella protocol is completely distributed protocol, in this network every client acts as server and vice versa , so the client and server are implemented in one system called servent. When the servent is connected to the other servents using the Gnutella protocol descriptors for searching the network and communicates with them this is called the signaling and download is achieved by using other protocols like http between the requesting and processing servents. The following is the signature format for identifying the Gnutella protocol traffic downloads

1) The _rst string following the TCP/IP header is `GNUTELLA', GET', or `HTTP'.

2) If the _rst string is `GET' or `HTTP', there must be a _eld with one of following strings:

User-Agent: <Name>

UserAgent: <Name>

Server: <Name>

For the robustness we include the signatures for the request and response header, this is way we identify the Guntella traffic even though is one directional.

Bit Torrent protocol network consists of clients and a central server, the clients connect to each other and send and receive portions of the file directly and the central server manages the connection and coordination between the clients. In the bit torrent networks the clients download the file they want by locating the torrent file in the webs and download it by pressing on the hyperlink. Communication in bit torrent network starts with handshakes followed by never ending fixed stream of prefix messages.

Kazaa protocol is a distributed self organized network; in the kazaa network some client's acts as supernodes who have powerful connections and fast computers. Clients other than supernodes connect to their nearby supernodes for performing the file sharing and search operations. This procedure for identify kazaa downloads traffic.

1. The TCP/IP head is followed by the string :`GET', and `HTTP'.

2. There should be a field with string: X-Kazaa.

Finally it has be demonstrated the feasibility ,robustness and accuracy of application based p2p detection in high speed networks by using the p2p protocols/applications and this work benefits for those who need to identify the traffic of peer to peer networks today. [21]

b) The second paper is by J.A.pouwelse , P.Garbacki at Delft University of technology, the Netherlands.

This paper gives a detailed behavior of peer to peer protocols, mainly the Bit Torrent in comparison with the other protocols like Kazaa, Gnutella. Characteristics of p2p systems are as follows.

The number of user participating over a period of time indicates the popularity of the p2p systems. The relation between the size of the file and the time needed for download is described as download performance. In this paper we are going to describe the various types of peer to peer protocols based on the five characteristics of p2p protocols like content life time, popularity, injection time download speed and availability

Bit Torrent protocol is one of the p2p protocols that is gaining its importance during these days. high level of one click download are the important factors for the success of Bit Torrent. Downloading

A file in Bit Torrent network includes mainly three steps .Firstly it does not perform any search operation but relays on the websites like supernova.org which maintains the list of all current available files for download. Secondly it performs the file sharing policy instead of directory level sharing policy. Finally it performs a bartering mechanism for the clients who are downloading the files. The strong points about Bit Torrent is its popularity ,download performance content injection time, pollution time, And the drawback or week points availability and content lifetime of the file.

Gnutella p2p protocol the download performance is the key factor for the success, and the drawbacks of the Gnutella p2p network is its pollution level and decrease in its popularity due to the flat flooding design which does not scale well. In one of the studies related to download performance over a 35000 Gnutella peers over a period of day almost 70 % peers does not participate in bandwidth usage

Kazaa p2p protocol the key points are popularity availability and content life time where as the drawback is its pollution level. In kazaa type of p2p networks file sharing is done at directory level know as directory level sharing policy which helps to access the files in that directory as long as the client stays connected. In kazaa use a hash code verification along with hash databases which helps the clients of it to identify the correct file and get rid of fake files. A popularity measurement study performed indicates that the number of unique IP number involved in kazaa traffic increased from 3,403,900 in sep2001 to 5,924,072 in bes 2001 tells us the increase in the kazaa popularity.

Finally the we have given the description of the peer to peer protocols like kazaa , Bit Torrent ,Gnutella based on the five characteristics :the popularity, the availability, the download speed, the content life time and the injection time.

Chapter 2

GNUTELLA PROTOCOL

Introduction

In this section I will explain the working of Gnutella Protocol. Gnutella can be described as a protocol for searching and downloading files in a distributed p2p environment. It explains how a peer connects with other peers, how a search query propagates in the network, how the responses to this search query are collected by the sender of the query and finally, how a file is downloaded. The words peer and node will be used interchangeably.

2.1 Literature review of Gnutella Protocol

As a background research, two papers are reviewed from literature in this section.

a)First paper is by John Risson, Tin Moors from University of new South Wales, Sydney titled as “ Survey of research towards robust peer-to-peer networks: Search methods”. According to them, the three specific characteristics exhibited by peer-to-peer (P2P) networks are self-organization, distributed control and symmetric communication. A P2P network can automatically adapt to arrival, departure of nodes and even failure of nodes. Such a characteristic of a P2P network is called as Self-organization. Peers itself acts as clients and servers for the symmetric communication. In that case, control point or centralized directory is not available.

For distributed nature of P2P networks, several distributed protocols are available. Leading examples are Gnutella, Edutella, Freenet, Kazaa and Bittorent. The distributed hash tables (DHT's) can be used for best comparative research on P2P dependability. There are two classifications in P2P routing algorithms namely “Structured and Unstructured Algorithms”. Considering the Gnutella protocol, early Gnutella was unstructured and hence keyword queries are widely flooded. Algorithms like Pastry, Tapestry, chord, Plaxton and Content Addressable network (CAN) are early structured algorithms. The early unstructured Gnutella peers use either ultra peers or leaf nodes for their scalability. In early Gnutella, “Ping” messages and “Pong” responses are used for broadcasting the P2P network and building the node index respectively. In Gnutella, if a node lives active for a long time, then it remains active for more time. For simple keyword match, the most widely referenced P2P system is “Gnutella”. Gnutella queries have got string of keywords. Because of these advantages of Gnutella, in December 2000 the Gnutella nodes amounted 1.7% of the total internet Backbone traffic of United States. In this journal, authors provided a very brief description of Gnutella protocol, its applications and especially advantages and deployment from unstructured early Gnutella and latest Structured Gnutella protocols. Even other Distributed Peer-to-peer protocols were mentioned in the journal with a precise explanation.[23]

b) The second paper is by Matei Ripeneanu titled as Peer-to-Peer Architecture Case Study: Gnutella Network, Department of computer science, university of Chicago.

Now a day's peer to peer systems applications are getting popularity in both social and technical I the recent year. The two main reasons are as follows firstly the low cost and high availability of large number of computing and storage resources and secondly the increased network a vailability .In peer to peer systems computers communicate directly with each other and share the information and recourses without using a dedicated server. Mostly the Peer to peer applications has increased the usage of internet bandwidth

A measurement and analysis on Gnutella networks driven two major questions, Firstly its connectivity structure and secondly the its virtual network topology maps with physical Internet infrastructure .

Later the research of Gnutella network is Performed by dividing into three sections and are as follows Gnutella protocol & applications description, a clawer is developed for Gnutella virtual network & Applications ,Finally the analysis of network and answer the questions.

Gnutella Protocol is an open, decentralized and search protocol used for file sharing. Ability to operate in dynamic environment, performance and Scalability, Reliability, Anonymity are the Goals designed to achieve in Gnutella Protocol. In Gnutella protocol the developers call the clients as servants, which communicate with other Clients and servers. In this Protocol the search operation as messages and are as follows Group membership messages ,search messages and file transfer messages, nodes decide where to connect on the network by the local information available and to become a member a servant has to connect one or more nodes already in the network. Next to collect the topology information developed a clawer that joins to the network as servants. It stores the IP address, port number, the number of files and total space shared of newly connected node. And a different of clawers are developed and extended to overcome the problems in previous ones. Finally the Gnutella network analysis is performed by collecting the information from these factors1)Estimate of Gnutella Generated Traffic 2) Connectivity and Reliability in Gnutella Network. Power-law distributions. 3) Internet Infrastructure and Gnutella Network.

It is concluded that architecture, achieved scale, self organized structure made Gnutella protocol as interesting p2p protocol than other peer to peer protocols .The analysis states that the Gnutella node connectivity follows a multi mode distribution which makes this network as most reliable and most harder to attack by a malicious adversary, for this type of protocol we need not take many steps from potential attacks. The analysis of Gnutella network lead to two directions for improvements firstly the volume of generated traffic, search success rate, reliability of application is determined by application level topology .Secondly orthogonal, direction is to replace flooding with a smarter routing and group communication mechanism.[24]

2.2 Bootstrapping

A peer needs to find and store addresses of other peers to connect to the network. There are three ways of doing this:

* The peer can call a GWebCache server [1] to get the IP address of online peers. The GWebCache protocol allows peers to get host address by sending requests over HTTP to a GWebCache server. No need to query a GWebCache in most casesmore than once per session. This means that if the user gets disconnected and opts to reconnect, the peer should not query a GWebCache. Instead, a peer should implement a host cache to store the hosts found by any of the three ways which are being described here, and use it in conjunction with the GWebCache System.

Below is a bootstrapping algorithm [1] which is used by Gnutella clients to ensure minimum number of calls to the GWebCache Server.

- Client launching and local Hostcache loading.

- Try to connect to the Gnutella Network, using the host cache.

- If after a while, there's no connection to the Gnutella Network, query the GWeb-Cache.

- Wait for a moment before sending a request to another GWebCache (it may happen that the first chosen GWebCache is down, or very slow to answer).

- After a certain delay, if no results have been received from the first GWebCache call, call a new GWebCache periodically, until a response is received from one of them.

- Once results have been received from at least one GWebCache, the peer should not send a new request during the next session, unless the host cache is empty (which should not happen if there is an initial host cache provided with the software and no option to clear the host cache is available).

* Storing hosts' addresses in pong messages (after at least one connection is established with the Gnutella Network).

* Storing hosts' addresses read from QueryHit messages (after a minimum of two connections are established with the Gnutella Network). This assumes that the remote peer uses the same port for upload slots than for Gnutella Network slots (which are the case with most peers).

2.3 Descriptors

Once a peer has connected successfully to the network, it communicates with other peers by sending and receiving Gnutella protocol descriptors. The descriptors consists of Descriptor ID, Payload Descriptor, Time to Live (TTL), Hop count, Payload Length and Payload. The Descriptor ID is used to identify the message sent by the peers. A Payload Descriptor describes the kind of payload message contains i.e. Ping, Pong, Query, QueryHit or Push. The TTL mechanism is used by the peers to expire the descriptors. When a peers receives a descriptor it should lower down its TTL appropriately before forwarding it further. Abuse of TTL may lead to unnecessary traffic in the network. Let's have a look at various kinds of descriptors used by Gnutella for communicating over the network.

Ping and Pong: A peers uses Ping messages to actively probe the network for other online peers. A peer receiving a Ping message may elect to reply with one or more Pong messages of active Gnutella peers. A pong message contains the information about the Gnutella peer and the amount of resources shared by it.

Query and QueryHit: When a peer needs to locate a resource, it generates a Query Message. The query message contains the search condition which the resource should satisfy. The peer forwards this query to its neighbours. If any of the neighbours has a resource matching the search query they will generate a QueryHit message and send it back to the sender of the query. If the TTL of the query message is not equal to 0 and then the neighbour forwards the query to their neighbours. This process goes on till the TTL of the message becomes 0.

Any peer which gets the query and has a resource matching the query, will generate a Query-

Hit message and this message will travel the path of the query message to reach the sender of the query. The queryhit will contain information about the file such as filename, file size, hash of the file etc.

Download: As explained in the last section, once a peer sends a Query, it may get more than one QueryHits which claim to satisfy the search Query. Now, the user needs to select one of these QueryHit messages to download the file. To download a file, the peer selects a QueryHit and sends a download request to the source of that QueryHit a download request message. These peers will then establish a HTTP or FTP connection for file transfer. An important thing to notice here is that the file sent by the user may or may not match its description. Figure 2.1 shows the working of Gnutella Protocol.

2.4 TTL and Degree of Connectivity

Now I will explain the importance of TTL and Degree of Connectivity. The limit on TTL introduces a limit on the network node's reachability in the network, called horizon. Each peer will normally have the possibility to interact with only a portion of Gnutella Network. As, the TTL is increased the horizon of a nodes reachability grows exponentially. The limit on TTL is important to avoid unnecessary traffic in the system. The TTL helps peers destroy the messages when enough number of peers has seen it. Typically, the TTL value is set to 7. If a node gets a message with high TTL value like 20, it should reduce its TTL to 0 and should not pass it further. The peers also maintain connections with their neighbours. The number of neighbours a peer is connected to is called the degree of connectivity of a peer. The typical value of degree of connectivity of nodes is 2 to 14. The horizon of peer's reachability grows linearly with an increase in the value of degree of connectivity.

2.5 vulnerabilities of Gnutella to the worm attacks

In this section we will have a look at some of the vulnerabilities present in the Gnutella protocol. We will see how these vulnerabilities can be exploited by the attackers to attack the system. This section forms the premise of my work.

2.5.1 Topological Scan based P2PWorms

Traditional worms work by infecting a node and probing the IP space to locate vulnerable node. Such kind of worms would generate a lot of failed connections and exhibit port and IP scan behavior. Most of the approaches used to locate such worms are based on detecting such behavior.

In case of a p2p topological scan based [20] worm also called Active Worms, the worm gains control of the application. Now, peer to peer architecture works by contacting other peer's which are running the same protocol and with high probability, the same application. In such a case, once the worm has gained control of an application it will exactly know the addresses of the vulnerable nodes. Thus it can easily attack such vulnerable nodes without generating high failure rates. These worms are not utilizing a protocol level vulnerability but an application level vulnerability. Such an attack is possible only if an attacker is able to find vulnerability in an application such as a buffer overflow attack.

These worms do not use IP and port scanning methods for further propagation. Hence, the number of failures generated by these kinds of worms for propagation will be very low. These worms will be successfully able to merge their malicious traffic within the p2p traffic and will easily dodge any kind of detection scheme which is in place for detecting worm attacks. With the advent of IPv6, the size of IP space is going to increase a lot. Hence, attackers will be on constant lookout for methods for better propagation of worms. P2P clients will become very good candidates for such attacks.

This kind of worm attack was studied by Zhou et al [20]. They proposed the concept of guardian nodes which are nodes present within the P2P network and act like honey-pots for worm detection. If a topological scanning worm happens to attack a guardian node, the guardian node will be able to detect the attack by identifying the infection process inside running application. My approach of spynodes which I will propose in this report can be seen as an extension to the guardian nodes and can be used to defeat the passive worm attacks, which are explained in the next section.

2.5.2 Passive P2PWorms

Passive worms unlike active worms do not actively search out vulnerable nodes. Passive worms wait for the vulnerable nodes to contact infected nodes. On getting contacted, these nodes take actions which may or may not lead to the transfer of worm from the infected node to the vulnerable node. Distributed P2P networks like Gnutella offer ideal platform for the passive worms to spread in the network.

The Gnutella protocol is based on the assumption that all the peers would share only good resources. But that is not the case with an attacker. An attacker would breach this trust to attack Gnutella protocol. An attacker can develop a worm which instead of providing good resources on the network, may start providing infected resources. These infected resources can cause the worm to spread in the network. Following are some of the propagation strategies which a worm may use to breach the trust and spread in network:

* The first kind of passive worms are based on the share folder [10] of p2p clients. Every p2p client has a share folder, where the files shared by the client are stored. A worm, in this case would attack an application and put infected copies of itself in the share folder under attractive names. In such a case, the infected copies of the worm will become available on the network. Since, these copies have been put under attractive names, there is a possibility that a peer searching for such files would end up downloading an infected file from the infected peer. As soon as the downloader executes this file, it will also get infected and start sharing infected copies of the worm in the shared folder. The propagation speed of such worms would be quite slow, since there will be a lot of other copies of the same resource available on the network. There have been quite a few cases of these kinds of worms such as VBS. Gnutella which attacked Gnutella network, Benjamin worm, Duload worm which attacked Kazaa network. A large list of such worms can be found at [2].

* The second kind of worm attack is based on generating automatic positive QueryHit [10]. Let's call such worms as AutoPositiveReply (APR) Worms. The decision of a downloader to download a file is more or less based on the name of the file. Also, while searching for a file, users tend to input the name of the file in the search query. An attacker can take advantage of this behaviour of users. The attacker can develop a worm which on receiving a Query will reply positively to that query with some probability. For example, if a user searches for fire fox the infected node may reply with fire fox.exe. In such a case, the sender of the query may get tricked into downloading this response. Once the download is complete and the file is executed, this node will end up getting infected and starts behaving in the same way. In this case, the worm may attack by taking control of the downloading client or it may contain a client with basic implementation of P2P protocol and become part of the network. Figure 2.2 shows how these kinds of worms work. Gnuman which attacked Gnutella network and was a just a proof-of-concept worm is an example of such kind of worms.

* The third kind of worm attack is similar to the second kind of worm attack. Let's call such worms Query and Reply (QR) Worms. The automatic positive replies generated by the infected node may not be very good. To make those replies better, the infected node will forward the query which it receives from other peers to its neighbours, so as to collect legitimate responses for the query. Now, when the infected node is finished with collecting legitimate responses, it will reply with some of those responses as its own responses or with the files already available with itself to the peer which sent the query. In this case the infected peer will be able to cover infected files behind much more attractive names and also will be able to provide extra information about the file (such as bit rate in case of an mp3 file) and thereby, increasing the probability of download from itself. Figure 2.3 shows the working of such kind of worms. No instance of such a worm has yet been detected.

2.5.3 Problem Statement

These vulnerabilities can be attributed to the fact that there is no mechanism available in distributed P2P networks to monitor the traffic and track the peer behavior. The presence of a system which can track the user behavior can help reduce the effect of these vulnerabilities considerably. There is clearly a need for providing a proactive self defense mechanism in P2P networks which can protect it against worms. Thus, the problem we need to solve is how to monitor P2P traffic while still maintaining the distributed nature of the protocol.

In this report I will describe the concept of spy nodes and how to use them to monitor traffic in a distributed P2P environment while still maintaining the distributed nature of the protocol. I will explain their effectiveness against second and third kind of passive p2p worms. Spynodes are special nodes hidden within the network to monitor its activities. These spynodes will collaborate with each other to defeat such passive worm attacks. These spynodes will help us monitor traffic in a distributed P2P environment. Most of the work which has been done in this area deals with defeating pollution in the P2P network using trust systems [19] [5] [9]. Pollution results from all kind of files which do not match their description. Trust systems won't be very effective against worms, if the worms attack the trust system itself. To my knowledge, there doesn't exist any earlier work to monitor traffic in distributed P2P environments while still maintaining their distributed nature.

2.6 Related work

In this section I will present the work which has been done by the researchers which is related to my work. During my research I was not able to come across any approach which dealt with the problems I have described in this report directly. Hence, to my knowledge this is first of its kind work in the field of protecting distributed P2P networks while still maintaining their distributed nature of work.

Guardian Nodes: Zhou et al. [20] introduced the concept of guardian nodes in Gnutella network to protect p2p networks against worm attacks. This approach is based on the assumption that guardian nodes are invulnerable to the worm attacks. It detects worms by tracking how information from untrusted sources propagates its influence in memory during code execution. A worm can be detected when the control flow of the program is arbitrarily controlled by information from untrusted sources. This approach deals with topological scan worms only. It didn't explain how to counter attack passive worms. Moreover, the guardian nodes will require hardware changes for worm detection and hence, only very few guardian nodes can be deployed in the network. To some extent my work can be seen as an extension to Guardian Nodes but in the context of passive worms. The approach can be very well adapted to merge with guardian nodes and form a much better defense mechanism.

Wormshield: A lot of my work was inspired by Wormshield [3]. Wormshield is an approach proposed by Cai et al. for fast and accurate worm signature generation which is essential to combat zero day worms at the internet scale. This work presented collaborative worm signature generation method using multiple edge networks. This approach would be able to detect worms which exhibit internet scanning properties and generate lot of failures. After the detection of the worm is complete it collects data from the infected sources and using Rabin-Karp fingerprinting and window sampling processes this data to form a worm signature. This approach though effecting against internet scanning worms would be effective against P2P Worms since, such characteristics are not exhibited by P2P worms. Other similar worm signature generation approaches are autograph [11] and Early bird [16].

X2Rep [5] and Credence [19]: deals with p2p pollution in general. Both are decentralized systems which enables peers to confidently gauge object authenticity, the degree to which an object's data matches its advertised description. They employ a simple network wide voting scheme where users contribute positive and negative evaluations of objects. They enable clients to weigh votes according to the statistical correlation between the client and its peers. They also allow clients to extend the scope of their correlations through selective information sharing. As I discussed in the last chapter, both these systems would be able to slow down the propagation of polluted files which do not attack the trust system but when it comes to protecting them against worms which attack the trust system itself they will be rendered helpless. Eigentrust [9], P2Prep [6] are some other reputation systems which are highly appreciated in research community.

I also studied the SI epidemic model [12] that describes the growth of infection in a network through contacts between Susceptible and Infected peers. Susceptible peers are the peers which are susceptible to the attack and have not been attacked yet. Infected peers are the peers which have been attacked and are infected. In this model, N is the size of total vulnerable population. S(t) are the number of susceptible peers at time t. I(t) are the number of infected peers at time t. β is the contact rate between the peers. S(t) is the ratio of S(t) and N. i(t) is the ratio of I(t) and

N. Now, the SI model is defined by

dIdt=βISN .................................................. (3.1)

dSdt=-βISN ................................................... (3.2)

This gives

didt=βi(1-i) ................................................. (3.3)

Solving this equation provides the proportion of infected individuals at time t for some constant of integration T:

it= eβ(t-T)1+eβ(t-T) .................................................. (3.4)

The β can be expressed as the function of worms probe rate r. Assuming that an infected node chooses targets randomly, β=r N232 where 232 is the size IPv4 space. This function has the property that the worm grows exponentially for small values of t until the majority of the population is infected. At this point it slows down exponentially, since all the population has been infected.

Analytical modeling of random scanning worms is a well researched area. But the passive worms such as QR and APR require some human intervention for their propagation. The correct analytical modeling of such worms would be a difficult task to do. Modeling of worm containment system would be an even more difficult task to do. Hence, these worms will be studied using simulations and not by an analytical model.

My work can be seen as complementary to the work of Zhou et al [20]. The guardian nodes as explained by Zhou et al deals only with topological scan worms. I in this work have extended guardian nodes to be able to work against passive scan worms also. For my work these nodes will be called spy nodes. The two approaches can be easily coupled with each other to get a complete system to fight against both topological scan worms and passive worms. I have provided much more detailed outlook of how spy nodes will be connected to the network and how they will be connected with each other. I have also gone into the details of how they will work in collaboration with each other to generate worm signatures. This is one of the first works on monitoring P2P traffic for detecting and containing passive p2p worms.

Chapter 3

SPYNODES

Overview

The major problem with detection of p2p worm attacks is that they are able to merge malicious traffic within the p2p traffic. At the network level, the traffic generated by an infected node is similar to the traffic generated by an infected node. Absence of any anomalous behavior at the network level makes it difficult to detect them using the existing techniques.

To detect anomalous behavior of infected nodes I propose the concept of Spy nodes. Spy nodes are special nodes which are hidden within the network and monitor the traffic being generated by the nodes of the network. These nodes generate special traffic which is meant to detect anomalous behavior of infected nodes. By monitoring this traffic, these nodes are able to deduce worm attacks and take appropriate action against such attacks.

Another important thing to notice here is that the spy nodes are hidden by which I mean that other nodes may be aware of their presence but they are unable to pin point these nodes. Thus like normal nodes, spy nodes receive queries and other messages from other nodes. The Spy nodes work in collaboration with central servers like GWebCache to become a part of the network. In the later sections I will discuss architectural requirements in detail.

This chapter is divided into following sections: Section 3.2 gives an overview of the system, section 3.3 explains how spy nodes are used to detect worms, section 3.4 explains how worm signatures are generated after the detection, and section 3.5 explains importance of certain parameters in signature generation, section 3.6 explains how the signatures are distributed in the system, section 3.7 explains the chord protocol which is used to connect the spy nodes, section 3.8 explains the limitations of my approach.

3.1 Literature review of Spy nodes

As a background research, three papers from literature are reviewed in this section.

a) First paper is by Katrin Hoper, Guang Gong titled as “Preventing or utilising Key Escrow in identity based schemes employed in mobile Ad Hoc networks”. According to authors, key generation centre uses spy nodes to record network communications and report them. Spy nodes increase the communication range of any network. Spy nodes have several advantageous properties. The spy nodes can act as regular network nodes and even appear as them. It is difficult to distinguish spy nodes from other regular nodes. Spy nodes monitor the traffic normally as regular nodes and detect the nodes that transfer malicious data to stop their functioning. Like regular nodes, spy nodes have same power constraints. They use multi hope roots for communication. Spy node shave limitations like unable to intercept a message which needs jamming, but jamming cannot be performed by a spy node. This jamming property of spy nodes is considered unavailable for one hope scenario. But spy nodes doesn't require message interception because its a part of routing path spy nodes haven't got knowledge about network nodes impersonating and hence cannot launch an attack itself. For effective performance of a network, communication between key generation centre and its spy nodes should be very fast so that the attack can work. If not, the message delay leads the communicating nodes to drop session, choosing another routing path.

In specific scenarios, to analyze probability of successful attack, it is necessary to assume that in a network, the network nodes and spy nodes should be distributed uniformly. We have to note that probability of a successful attack in network increases with an increase in number of spy nodes. It is significant to consider that at least one spy node should be on the path for attack discussion. Spy nodes will each perform the attack execution as they don't know each other when more than one spy node is on the path. This leads to a condition that the spy nodes cannot recognize the protocol which is already under attack. If there are multiple spy nodes which execute multiple attacks in a network, this leads to communication delay increase but does not show any effect on the attack success. Considering ay network performance it is significant to maintain a very less communication delay. So while building the architecture for a effective network, network engineers should make sure that there should be at least one spy node on the path for attack discussion and not more than one spy node on each path which leads to communication delay increase.

In this paper, authors provided a rough estimation on success probability on an attack. According to them, the success probability of an attack highly depends on the average path length which is based on routing protocol efficiency and the nodes mobility. They concluded that spy nodes with their various advantages properties, increases the communication range of a network and avoids the transfer of malicious data. But several spy nodes on the path can result in large communication delay. So for effective network performance, they should be more spy nodes in a network but at least one on each path and not more than one. In this paper from experimental analysis it is observed that a regular network without spy nodes have very less success probability of monitoring nodes. Whereas a network with increased number of deployed spy nodes increases the success probability. Hence spy nodes functioning and their advantages in a network are clearly illustrated in this paper. [25]

b) Second paper is by Katrin Hoper, Guang Gong titled as “limitations of key Escrow in identity-based schemes in Ad Hoc networks”. In this paper the authors introduced a series of adversary models for dishonest Trust Authority (TA) and which uses spy nodes for communications recording in the network and reporting them to the TA. In any network, spy nodes can be used to increase the communication range and provide the security against malicious data to the network. Spy nodes record communications and send the recorded data back to trust authority. Spy nodes stores this communication data in their communications range. When considering the network with a dishonest TA, spy nodes can be used to increase their communication range. Spy nodes uses multi hope roots for TA communication. In such a case, both spy and regular network nodes are held in the routing paths. The advantages properties of spy nodes like similar power constraints, unable to distinguish from network nodes helps in appearing in them as regular network nodes. In this paper, authors analysed two models based on TA and at least one of its spy nodes which are in communication range. So that it is easy to launch an attack in such cases. For the attack to work, the communication between TA and its spy nodes should be very fast. If the communications is not fast between them it may result a delay which leads the communicating nodes to choose some other routing path. To analyze the success probability of an attack, it is important to assume that in a network the spy nodes and network nodes should be distributed uniformly. Hence the authors concluded that in a network with a TA, more number of spy nodes increases the success probability of an attack and decreasing the path length decreases the chance. It is observed form the experiments that shorter the routing path, less likely a successful attack. In this case author suggest that a new shared key should be established when the two nodes are close to each other. Hence using spy nodes results in increased communication range of any network and provides security against malicious data transfer. [26]

c) Third paper is by Katrin Hoeper at the university of Waterloo and titled as Authentication and key Exchange in Mobile Ad Hoc Networks.

Increase in the use of cellular phones ,PDAs, Laptops are increasing the use of wireless services and let wireless mobile communications as an important part of the life. Accessing or Sharing of Highly sensitive information across the unprotected wireless links with in unidentified and untrusted endpoints lead to the development of security in mobile wireless devices .MANET is one designed for the purpose in that use of spy nodes for security purpose and latter in this we are going to study the detailed use of spy nodes in a network.

Introduction of the concept of Novel so called spy nodes which distribute in networks that may be deployed by KGC to increase the key escrow abilities in the MANET. Analyze key escrow in the special context of MANETs and propose the novel concept of spy nodes. Spy nodes a are introduced in networks to increase its communication range. Spy nodes use the multi- hop routing to increase their limited communication range.KGC or the network provider might use the sophisticated strategy to place spy nodes in the networks. These record all the communications in their range and send back the data the KGC. Spy nodes has the properties like act and appear as normal nodes in the network, has same power constraint as normal nodes , can send the recorded data back to the KGC and play back the message received from KGC in to the network. However spy nodes are used in the networks for the passive attacks by KGC for impersonation attack on two network nodes and can be so called as SPY in the middle of attacks. This would increase the probability of the messages that are routed through spy nodes placed in the network. Note that the spy node does not need to intercept messages because it is a part of the routing path. Spy nodes cannot launch the attacks themselves because they do not possess the necessary key material. Communication between the spy nodes and KGC must be fast. However in advanced implementation of spy nodes might have more powerful devices for instance like a large communication range with direct antennas.

Finally the conclusions is the introduction of adversary models of which Novel model which is so called as spy nodes are deployed by KGC to increase its ability, analyze key escrow in the special context of MANETs and propose the novel concept of spy node. In this discussion it is concluded that how KGC could utilize spy nodes to monitor nodes and enhances its key escrow capabilities and analyze the probabilities of the successful escrow attacks.[27]

3.2 Design and Architecture

As discussed in the last chapter, a lot of vulnerabilities exist in the distributed P2P protocols such as Gnutella, due to its distributed nature and absence of any mechanism to monitor the traffic. Presence of a system which can monitor P2P traffic can help reduce the effect of a worm attack which may happen on the network. At the same time, it is important to maintain the distributed nature of P2P network.

Spy nodes are the nodes which are hidden within the network and form a chord [18] overlay network over the distributed P2P network. The application of chord network has been discussed in details section 3.7. These nodes with the help of heuristics, which have been programmed into them, monitor the traffic and detect any malicious behavior. These heuristics may require spy nodes to generate extra traffic in the network. With the help of worm detection heuristic, these spy nodes can get suspicious about malicious nodes in the network and collect possibly infected data from them.

After the suspicion of infected nodes and collection of possibly infected data is complete, spy nodes collaborate with each other to process the data collected and generate worm signatures. The worm signature is generated based on the approach proposed in Wormsheid [3]. After the worm signature has been generated, the worm signature is disseminated into the network to protect it against the further spread of infection. Figure 3.2 shows the System Architecture to detect and contain the worm attack.

To study the effectiveness of this approach against worm attacks, I will target APR and QR worms. I will look into the details of heuristics to detect such worm behavior, the process of worm signature generation and the process of worm signature dissemination.

3.3 Worm detection

3.3.1 APR worms

In the last chapter I introduced, the working of APR worms. These worms reply positively to all or a proportion of queries which reach them. These worms are based on the assumption that user while searching for a file will input the name of the file as the search query. The infected nodes on getting the search query will auto generate a reply with a file name based on the search query. For example, if a user searches for Firefox, it will generate a reply firefox.exe. The user which sent the search query might get tricked into downloading the file and get infected on executing the file. The weakness of such a worm is, it can reply positively to any query even queries which are not supposed to have any answer. The heuristic that spy nodes will use to detect such a worm is to generate random query which are not supposed to have any answer. An infected node on getting such a random query might get assume the replying to the query. Spy node on getting a reply to such a query will get suspicious and assume the replying node to be infected. Thus, spy node has detected an infected node. The spy node will go ahead and download the file being advertised by the infected peer. After the download is finished the spy node will use the infected file for generating the signature of the worm. Figure 3.3 shows how spy nodes would work against APR worms. The process of worm signature generation will be explained in greater details in the next section.

This process of worm detection will be executed by all the spy nodes present in the network let, the count of such spy nodes be n. Algorithm 1 sketches the working of spy nodes for detecting APR worms.

Output: infected file, infected peer id

Forall spy node I in 1, 2, 3,….,n do

i.generate_random_query(q);

i.send_random_query_to_neighbors (q);

if i. receive_reply (q, QueryHit qr) then

i.mark_in f ected_node(qr:source);

i.download_file(File f, qr:source);

i.mark_in f ected_file) f);

end

end

Algorithm 1: Algorithm to detect a node infected with APR worm

3.3.2 QR worms

The approach describe in the last section though effective against APR worms, will fail against QR worms. A node infected with QR worm will forward the query which it receives from other peers to its neighbors, so as to collect legitimate responses for the query. Now, when the infected node is finished with collecting legitimate responses, it will reply with some of that response as its own responses to the peer which sent the query. In this case the infected peer will be able to cover infected files behind much more attractive names and information and thereby increase the probability of download from itself.

The approach of spy nodes as explained earlier would not be effective against such worms because for the random queries the node infected with QR worm will not receive any reply from any other node. Hence, a node infected with QR worm will not reply to the random queries.

Input: Query Database

Output: infected file, infected peer id

Forall spy node I in 1, 2, 3…, n do

i.derive_query_from_database (q);

i.send_query_to_neighbors (q);

if i. receive_reply(q, queryHit qr) AND not_spynode(qr.source) BthenB

i.mark_in fected_node(qr:source);

i.download_file(file f ;qr:source);

i.mark_in fected_file(f);

end

end

forall spynode I in 1, 2, 3.., n do

i.receive_query(q);

if (exists_in__query_database(q)) then

i.reply(q);

end

end

Algorithm 2: Algorithm to detect a node infected with APR worm or QR worm

To improve the functionality of spy nodes against such worms, the following heuristic can be used by the spy nodes. The spy nodes now will derive queries from a database of query. These queries are still such that no other node except for spy nodes should be able to reply to them. Thus a spy node will derive a query from the query database and send it into the system. If another spy node gets this query, it will generate a reply to the query and send it back. Thus for every spy nodes receive they search for it in the query database. If the query is present in the query database, the spy node will reply else not.

The infected node in this case, when it forward the query sent by a spy node might end up getting replies for the query from other spy nodes and thus would end up replying to the query. The spy node which sent the query expected a reply only from another spy node but in this case it will receive a reply from a node which is not a spy node. The spy node will assume this node to be infected and go ahead and download the file being advertised by the infected peer. The infected file will then be used to generate the signature of the worm. Important things to notice here is that this approach will work even against APR worm. Algorithm 2 and Figure 3.4 describe the functionality of spy nodes in this case.

3.4 worm signature generation

Automatic signature generation is essential for signature based filtering to contain zero-day worm [11]. Manually it can take hours or even days to extract worm signature. However, reaction time available for worm containment systems could be less than a day or ever hours. Also, moore et al. [13] Have shown that signature based filtering is much more efficient than address blacklisting for containing worms over the entire network.

Most of the recent work on automatic signature generation for internet worms is based on the assumption that all infected sessions of a worm share at least an invariant substring, that is, the worm signature. This all infected sessions of a worm share at least an invariant substring, that is, the worm signature. This assumption is valid for most of the existing monomorphic worms. Autograph [11], Early bird [16] and worm shield [3] have shown that the process of signature generation can be automated by analyzing and early bird for worm signature generation. Hence, I will use wormshield for signature in our passive p2p worm containment approach. The previous section explained how to collect data for worm signature generation. In this section I will explain how to generate signature of the worm automatically.

All the spy nodes work in collaboration with each other to generate the worm signature spy node is organized s a chord [18] overlay network. Each spy node collects infected file from the infected peers and uses them to generate worm signature.

For each spy node I in a network of n spy node, it first uses Rabin-Karp [14] algorithm to computer the fingerprint of each slide window in the collected file. Once the fingerprints are generated, they are sampled using a window sampling algorithm such as winnowing [15]. Within a given window of fingerprint the window-sampling algorithm selects the fingerprint with lowest value. For each sampled fingerprint j, the spy node I updates its local fingerprint

Repetition counts ri(j) as well as the source address set Si (j). Once ri(j) and |Si(j)| crosses their local threshold value Lr and Ls respectively, the fingerprint j becomes a local candidate of worm signature and is subjected to global aggregation.

To aggregate the information globally a root node for each fingerprint j is selected. This is called root (j) and is computer by using chord consistent hashing. This root node of j is responsible for the computation of global repetition and source address dispersion of j. since, root ( j) aggregates the information about j from all the spy nodes, it has a global overview of fingerprint repetition count (ri(j)) and source address set (Si(j)). Once, these value exceed their global threshold Gr andGs, j because a potential candidate for the worm signature. The procedure of information gathering at the root monitor has been described in section 3.7. Algorithm 3 describes this process in the form of an algorithm. This algorithm is based on algorithm suggested in [3].

Once a potential worm signature is identified, its root node disseminates the signature to all the other spy nodes. These spy nodes in turn disseminate the signature to all their neighbors and so on. Every node will update its signature database with the new signature and in future, would reject any file containing that signature. Hence, making it immune to that worm attack. The distribution of signature is discussed in detail in section 3.6.

Input: local thresholds (Lr and Ls ) and global threshold (Gr and Gs)

Output: worm signature

Forall spy node I in 1, 2, 3…, n do

Forall local fingerprints j in infected files do

ri(j)(ri(j)+1;

Si(j)υ(Si(j)[src_ip(j);

end

if ri(j)_ LrAND |Si(j)|_ Ls then

mark j as a global fingerprint;

rg(j)(rg(j)+ ri(j);

Sg(j)(Sg(j)(Si(j);

end

end

forall global fingerprint j at root node do

if rg(j) > Gr AND |Sg(j)|> Gs then

output the j as worm signature

end

end

Algorithm 3: algorithm for distributed worm signature generation based on [3]

3.5 Distributed Fingerprint Filtering

As describe by cai et al. [3], this method of worm signature generation uses fingerprint filtering at distributed spy nodes to filter out most content substrings. Only those substrings which are suspicious ones and cross the local thresholds Ls and Lr are subjected to global aggregation.

This process has two phases: in the first phase, a filter is used to select frequent fingerprint that have repeated at least Lr times. The counts are calculated as described in algorithm 3. In the second phase, the source address dispersion of each frequent fingerprint is tracked. The fingerprint algorithm selects only those fingerprint which are both frequent and dispersed.

To aggregate as much information as possible, the local thresholds need to be low enough. Lower threshold will allow nodes to become suspicious faster and subject fingerprint to global aggregation. On the other hand, the higher local thresholds are, the less aggregate traffic

上一篇:use of the internet 下一篇:Internet application