The DHT chord

DHT CHORD

Introduction:

To understand the use of DHT chord as location manager it is important to study the working of DHT chord. In this chapter we will study the basics of peer-to-peer networks, and the performance of chord in detail. Later section will cover a simple location management application with DHT chord.

Peer to Peer systems:

Peer to Peer systems and applications are distributed systems with no hierarchical organization and centralized control. Systems in network are normally categorized as servers and clients according to some specific rules, however in a p2p systems this categorization is difficult because typical server or client might have a server or client relation with other nodes e.g. in a file sharing application like that of bit torrent, a peer will act as a client by getting some files from other peers, however it will also act as a server at that time by sharing part of a file with the neighboring peers [2]. So it is safe to say that in a pure peer to peer system all nodes provides the same services, in fact the main characteristics of these systems is the similarity of all participating nodes.

With popular growth of sharing files over the internet, p2p networking has gained a lot of interest by networking professionals and internet users. There are some properties that are desired in p2p systems [4]. The first and most important property is of decentralization. Use of a central server in p2p systems like that of DNS will make it vulnerable to single point failures and hence greatly degrades its performance. The second property is scalability, as p2p systems can host hundreds and thousands of peers at a time. Thirdly we should also consider the robustness of p2p systems. As increase in size of the network is somewhat inversely proportional to its performance, which is why robustness is important in p2p systems so that the size of large p2p networks does not affect its performance in content distribution and discovery.

First Generation P2P systems:

This system employs a single index server or flooding based mechanisms to lo locate desired data [5]. This system can be divided in two parts.

Centralized Systems:

As the name shows, this system depend on a centralized system to store all data address information. When a node wants to locate an item, it forwards its query to the centralized server. Such systems are fast and reliable however they suffer from drawbacks like single point of failure and giant communication traffic on storage systems [5].

Napster is an example of such systems; that works by operating a central server which directs traffic between registered users. When a user submits a query for a song; the central server creates a list of users who are currently connected to Napster with the queried song in their collection.

Decentralized Systems:

Earlier decentralized peer to peer systems were based on a central server; however most of them had no deterministic information of data location [5]. Main technique used to locate a data item was broadcasting i.e. when a node receives a request for a key; it first attempts to retrieve the file associated with that key locally. If this attempt fails it forwards the request to other neighboring nodes, and this process continues until the desired data item is found and reported.

Gnutella and Freenet are examples of such peer to peer systems. Disadvantages of these systems are no guarantee of reliable content location information, flooding mechanism for resource discovery and scalability.

Second Generation P2P systems (DHT):

Distributed Hash Table can be termed as second generation peer to peer systems [5], as they have introduced scalability, fault-tolerance and load balance to p2p systems along with decentralized content discovery [6]. As size of the p2p networks increase, the amount of shared resources across the network also increases, therefore finding a certain resource across tens and thousands of nodes in a scalable manner is a real design challenge. Early p2p systems like Napster and Gnutella provided these functionalities but at the cost of higher network traffic, minimum guarantees and single point of failures. Therefore it was necessary to adopt a new approach leading towards a completely decentralized content distribution system.

DHT is a distributed data structure whose main function is to hold the key/value pair in a completely distributed manner and any participating peer in the overlay network can retrieve the value associated with the given key [2]. DHT uses consistent hashing to efficiently map the responsible node for a key/value pair. Along with efficient mapping of a key/value pair to nodes, DHT also has the ability to isolate the network changes to a small part of it thus limiting the overhead and bandwidth consumption during networks and resources updates [2].

DHT algorithms:

DHT is an abstract idea that helps us to achieve complete independence from a central lookup entity and tolerance to changes in the network [5]. Different algorithms have been designed to implement and refine DHT idea. CAN, PASTRY, TAPSTERY, CHORD are the most popular implementations of DHT. We will use CHORD as a basic look up mechanism for our mobility framework due to simplicity, availability and acceptance in the research community.

CHORD:

"Chord is an efficient distributed lookup service based on consistent hashing which provides support for only one operation: given a key and it efficiently maps the key onto a node" [1]. At its heart chord provides a fast and efficient computation of the hash function by mapping keys onto the nodes [3]. Chord forms a one dimensional identifier circle which ranges from 0 to , where m is the number of bits in the identifier. Each node and keys are assigned an m-bit identifier, the nodes identifier is usually chosen by hashing the node ip address and the port using it where as the key identifier is produced by simply hashing the provided key. It uses a base hash function such as SHA-1 to assign m bit identifiers in such a way that the probability of two nodes having same hashed identifiers is negligible.

Chord protocol simplifies the design of peer to peer systems and applications [1] based on it by addressing to the following problems.

Load Balance:

Chord has the ability to spread the keys evenly over the participating nodes, which provides a degree of natural load balance.

Decentralization:

No node in chord is important then the participating node. This ability of chord makes the system more robust and failure tolerant.

Scalability:

The cost of the chord lookup grows as the log of the number of nodes as chord lookup is based on 0(Log N) messages, so no parameter tuning is required to achieve scalability even in a large system.

Availability:

Chord automatically adjusts its finger tables whenever a node joins or leaves the network. This makes it able to provide the key/value pair every time a query is made despite of failures in the underlying network.

Flexible naming:

Chord puts no constraints on the structure of keys it lookup i.e. the key space provided by chord is flat.

Consistent Hashing:

Chord use consistent hashing to map keys onto responsible nodes [1]. Consistent hashing provides the same functionality as that of a hash table in such a way that the removal or addition of one or more slots does not change the entire mapping of keys to slots [2]. It is designed to let the nodes enter and leave the network with minimum disruption. Consistent hashing maps the keys in such a way that when an node joins or leaves the system, only 0(1/N) keys are moved to new locations. It also has the ability to evenly distribute the keys across the participating nodes. It splits the key space into partitions and uses the concept of distance to map keys onto a specific node. Distance used by consistent hashing is a logical concept and is different from that of the physical or network distance of the nodes.

Operation:

To clearly explain the operation of chord, it is necessary to explain the following parameters.

Finger table:

The list of peers that a peer uses to send messages to other peers [7]. A finger table entry contains both the hashed chord identifier and IP address of the relevant node. First entry in the finger table is also referred to as successor of a node n on the identifier circle.

Figure shows finger table for node N0 from fig. Start presents the hashed identifiers on the chord circle, whereas successor represents the hashed node ID of respective nodes in the overlay network.

Successor node:

It refers to a peer directly after a particular peer or key on the identifier circle. Let m be the number of nodes, then entry in the finger table of node n contains the identity of the first node, that succeeds node n by on the identifier circle [1], where 1 < I < m. As shown in the figure, successor of identifier 1, 2 and 4 is node N7. It is simply the node that immediately follows a node n or a key K on the identifier circle. Key K is always assigned to a node whose identifier value is equal to or follows k in the identifier space. Such a node is also known as successor of k, and the value of k, in the key value pair is stored at the successor node on the identifier circle.

Preceder node:

Predecessor node refers to a peer directly before a participating peer on the identifier circle. It does not mean that the predecessor' Peer-ID is one less than the peer for which it is termed as predecessor; it simply means that there are no peers in the namespace between the peer and the predecessor peer [7].

Operation:

Operation of chord can be split in following parts:

Key-Node mapping:

When a node in chord overlay network wants something, it will simply hash its identifier i.e. key and search for a node with that key. A total of 0(Log N) messages are required to locate an object, where N is the number of nodes [1]. Figure shows, key to node mapping process in which node N7 receives a request for key K27 from node X residing outside the network. Node N7 on receiving the query checks if it has the key or not. With zero response it will use the mapping function to check which of its neighbors are close to the key-value storing location [2]. It will use its finger table to find the interval containing key K27.

Finger table of node N7 shows that key K27 is located in the interval [23, 7] which is related to the successor node N26. So it will forward the request to node N26. Node N26 will use the map function to check how close it is to the key place, and will search its finger table for the interval containing key 27. Node 26 will find that key 27 is located in interval [27, 30], with successor 30. Now node 26 will forward the request to node 30 along with an IP address of the request originator. Node 30 finally finds the requested key-value pair for key 27, and initiates an association with node X

Key-value insertion:

Chord uses DHT to store a key-value pair at a nodes to which the key maps [4]. Inserting a key-value pair in required DHT is analogous to finding the key value pair in the identifier space. When a node receives a request with the hashed key identifier, it will use the map function to check how close it is to the key place on the identifier circle as shown in fig where N0 want to put a value in K27. It will follow the same steps as in previous section to reach N30. When the query reaches N30, it will check how close it is to the hashed key place in identifier space, and will understand that it is the best place to store the key [2]. N30 will send its IP address to node N0, which will use this IP address to insert the value in node N30. However additional RPCs will be required to route the values from node N0 towards node N30 DHT [3].

Node Insertion:

In dynamic networks, nodes can leave and join the network any time. Therefore chord must be able to locate the key efficiently even in highly unstable conditions. To achieve this goal, it has to maintain the following two invariants [1]

  1. Successors list in each node is correctly maintained.
  2. For every key k, node successor (k) is responsible for k.

When a node wishes to enter the overlay network, it simply hashes its IP address and port number to create a Node-ID, The hashed Node-ID using SHA-1 function looks like 3752ac6235a888e2b938f9c2f77ef7236d261f39= h (192.168.2.2:3997), after hashing its IP address and port number it sends a REGISTER message to any bootstrap node in the overlay network.

The bootstrap node will search the node it knows nearest to the Node-ID of the joining node. The joining node will repeat these steps until it finds the node on the overlay network currently responsible for the space it will occupy. The joining node then exchanges additional REGISTER messages with the admitting node. This will allow the joining node to learn about other neighboring nodes in the network and to obtain information from the neighboring nodes about resources the joining node will be responsible for maintaining [7].

Existing nodes must also update their finger tables to reflect the insertion of a new node. Each node in chord ring maintains a predecessor pointer to simplify the join/ leave process [1]. Predecessor pointer simply contains a Chord identifier and IP address of the node which is its immediate predecessor. When a node n joins the network, it has to know about an existing node say n'. figure shows an example of node insertion in Chord ring, where N16 wants to joins the network.

When N16 wants to join the network, it will have to locate a node inside the chord ring, let's say from any out of bound method it locates node N0. After locating N0, node N16 will send a join request to N0, in other words it will ask node N0 to locate its fingers and predecessors. Node N0 then uses the find_successor algorithm to obtain the successor of each logical start node [1]. The initialized finger table is sent back to node N16, as a response message. Node N16 then sends an update request to inform all nodes, whose finger tables include 16. As a result the concerned nodes will update their respective finger tables. In this case node N7 and N 12 finger tables will be updated as shown in the figure.

The last operation that has to be performed when a node enter of leave a network, is to move responsibility of the keys to the newly entered nodes. This can be achieved by notifying the higher level software, so that it can transfer keys and values to a node n that is now responsible for them [1].

Chord Lookup application:

Chord is a basic lookup mechanism that can be used as a building block in larger protocols that implements, for example a data authentication, file distribution system, redundant storage, and efficient data location [4]. However such services are not implemented directly by chord, it simply provides a high performance, flexible lookup primitive upon which such services can be efficiently layered. Chord can also be used to provide IP address to host name mapping like that of DNS but chord is not a storage system, it only associate keys with nodes responsible for them rather than giving the values linked with those keys [2].

With the help of additional layers like that of DHT, which can translate high level names into chord identifiers, chord can be used as a powerful lookup service for name to IP mapping [1], as chord utilizes DHT to store key-value pair at nodes to which the key maps [4]. By layering additional features on top of chord it can gain robustness and scalability, therefore a useful extension to the chord lookup system is DHT as shown in the figure, which can provide the required values for any desired key and hence increase its efficiency and performance of chord as location manager.

Figure shows a simple Location Management Application that can provide values i.e. IP address for any key that is queried by CN using the following commands.

Put (key, value): This is used to store a key value pair into DHT using chord to find the successor node of the hashed key on the identifier circle.

Get (value): This can be used to get the value from DHT, which is associated with the given key. Chord is used to locate the node on the identifier circle with the requested key-value pair.

Remove (Key): Remove (Key) as the name suggest is used to remove the key and its value from its responsible node.

LM application interacts with DHT chord in two main ways. First it provides a lookup (id) algorithm via Get and Put commands, that yields IP address of nodes responsible for the key and hence the value related with that key. Second, it notifies application of any changes in the set of keys that the node is responsible for [2]. This allows corresponding application to move the values to their new homes i.e. update the responsible DHT's whenever a node enters or leaves the network. Additional RPC interface is required to move the values and data to and from the responsible nodes.

By layering DHT over chord we can extend the functionalities of the overall system [2]. DHT can utilize the list of successor nodes in chord finger table to store the values not only at the immediate successor of the key on the identifier circle, but also at the next few successors. This in turn will increase the reliability of the system in case of unexpected node failures.

References:

1. A scalable p2p lookup services

2. A survey on p2p and dht

3. Building p2p systems with chord

4. Chord opnet

5. P2P Systems Based on Distributed Hash Table

6. Clean slate mobility framework

7. dSIP rfc

Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!