[Reader-list] [Fourth posting] Implementation of a peer-to-peer news distribution network

Soumava Das soumava at vsnl.com
Fri Jul 2 00:12:23 IST 2004


Hi, 
	This is the third part of our survey on peer-to-peer networks. 
This part focuses on the different routing techniques used by some of 
the important p2p networks.

Soumava Das

-----------------------------------------------------------------------------

                Study of different routing techniques
               .............................................................

1. An introduction to Peer-to-Peer networks
-------------------------------------------

1.1 General characteristics -

The term peer stems from the Latin for equal. It thus characterizes 
individuals, who may be regarded as with respect to some function or 
some situation. This describes one major feature of Peer-to-Peer 
networking, namely that all participants have the same rights and 
duties, at least in the pure Peer-to-Peer architecture. In such a 
network peers, i.e entities with the same networking functionalities, 
interact directly with each other to perform a service or exchange 
data directly, without the need for a central management.

If central entity is used in a Peer-to-Peer network, then we call it a 
hybrid Peer-to-Peer network. The central entity is in most cases only 
used as a central index which is managed and maintained only by the 
connected peers of such a network.

Peer-to-Peer networking gives us the concept of an entity acting as a 
Servent. Servent is an artificial word which is derived from the first 
syllable of the term server ("Serv-") and second syllable of the term 
client ("-ent"). Thus this term Servent shall represent the capability 
of the nodes of a peer-to-peer network of acting at the same time as a 
server as well as a client. As a result a participant of a Peer-to-Peer 
network is on the one hand able to direct requests to other nodes, and 
on the other hand is able to work on and answer incoming requests 
from other nodes of this Peer-to-Peer network.

A Peer-to-Peer network is mostly located on top of the IP-protocol 
stack (UDP, TCP, FTP) and establishes virtual paths between all peers. 
These paths are used for signalling, as well as for data transfer. 
Further on, this virtual network is managed and controlled by the peers, 
which therefore provide the networking functionalities. The content and 
services offered by the network are offered by the peers. Therefore the 
peers contribute parts of their resources (storage capacity, network 
link capacity, computing power) to the network. Hence in these kind of 
networks, peers share their storage capacity to bring in the content, 
their processing power for routing issues and their network link 
capacity to forward incoming search requests, as well as to upload the 
requested content.

Hence a general definition of Peer-to-Peer networks can be given as 
follows: A distributed network architecture may be called a Peer-to-Peer 
network, if the participants share a part of their own hardware resources 
(processing power, storage capacity, network link capacity, printers, ...).
These shared resources are necessary to provide the service and 
content offered by the network. They are accessible by other peers 
directly, without passing intermediary entities. The participants of such 
a network are thus resource providers as well as resource requestors.

1.2 "Pure" Peer-to-Peer networks -

A "Pure" Peer-to Peer network is characterized by the fact that in a 
"Pure" Peer-to-Peer network all participating nodes are peers. That 
means all participants have the same rights and same duties, although 
they may rely on a different hardware basis. Thus no central institution 
to manage, control, co-ordinate or to provide content/services for the 
Peer-to-Peer network exists in this network. Thus any single point of 
failure is avoided by this network, which makes Denial-of-Service (DoS) 
attacks difficult, as all tasks are distributed throughout the 
network.

As the peers of a "Pure" Peer-to-Peer network equally interact with 
each other via the virtual paths described before, a very symmetric 
networking structure arises. The requests of the clients are not directed 
to one central entity, but rather evenly distributed over the whole 
network. Thus this network is a self organizing network without any 
central management instance.

As there is no central index or database exists, routing is done via a 
distributed search, which is based on direct messaging between the 
peers, until the requested peer is found. To prevent the network from 
being flooded by the search requests, Time-to-Live counter is attached 
to each message in this protocol This Time-to-Live counter counts the 
number of hops as the message is forwarded from peer to peer, during 
the lifetime of the search message. As soon as the counter reaches a 
predefined value, the message is killed and not forwarded any further.

1.3 "Hybrid" Peer-to-Peer networks -

The key distinction of "Hybrid" Peer-to-Peer compared to "Pure" 
Peer-to-Peer is the fact, that the former always includes a central 
entity. The central entity is not a commonly known general purpose 
server from which content or services can be requested and 
downloaded. It resembles more to a central peer group database or 
catalogue of addresses, from which the location of a service or content 
can be retrieved. The actual service is then executed between the 
peers, without any further support by the central entity. Just the search, 
where the content/service can be retrieved from, is performed with the 
help of the central entity.

The most prominent representative of "Hybrid" Peer-to-Peer networking 
is Napster, which is widely known because of the lawsuit filed against it 
by the Recording Industry Association of America (RIAA). The Napster 
protocol is used for the exchange of files, mainly audio files. To retrieve 
a file, a peer first has to request the location of the demanded file at 
the central entity, in this case the Napster server. The server provides 
a central index, in which the IP address of peers and the characteristics 
of their shared certain files are stored. If the demanded file matches 
the characteristics of a file listed in the index, the Napster server 
returns the IP address, where the file can be retrieved from, to the 
requestor. Any further handling and management regarding the file 
transfer, is now done between the two peers. The server is not involved 
any further.


2. Peer-to-Peer networks and their routing techniques
-----------------------------------------------------

Peer-to-Peer (P2P) networks provide the capability to establish virtual 
overlay networks. "Pure" P2P networks are completely self organizing 
and therefore do not need central instances to manage the network. 
>From the discussions in section 1.2 it can be said that, an important 
characteristic of these networks is that the terminals of these networks 
communicate in a bidirectional and symmetric way with each other, 
therefore a virtual overlay network is established above the IP-Layer. 
Such a network consists in most cases only of the servents and the 
TCP/IP connections between the different servents.

-> Three system models of Peer-to-Peer networks :-

i) Centralized model: In this model there is a global index held by some 
centralized authority, but there is direct contact between requestors and 
providers. Naturally, the threat of a central point of failure is 
prevalent in this model. This type of networks falls under "Hybrid" 
Peer-to-Peer networks, as discussed in section 1.3. Example - Napster.

ii) Decentralized model: This model speaks of no global index and no 
central co-ordination. The contact between requestors and providers 
can be direct or mediated by a chain of intermediaries. This type of 
networks can be called "Pure" Peer-to-Peer networks as discussed in 
section 1.2. Example - Gnutella and Freenet.

iii) Hierarchical model: This model is basically a mix of centralized and 
de-centralized models. In this model, a few peers with better processing 
power, storage capacity and network link capacity, are termed as 
"Super-peers" or "Ultra-peers". This model also falls roughly under 
"Hybrid" Peer-to-Peer networks. Example - FastTrack.

-> Main Peer-to-Peer design requirements :-

i)	Resource discovery and routing
ii)	Managing updates
iii)	Scalability
iv)	Robustness and fault tolerance
v)	Trust assessment and management

2.1 Gnutella -

The Gnutella protocol is an open, decentralized group membership and 
search protocol, mainly used for file sharing. The term Gnutella also 
designates the virtual network Internet accessible hosts running 
Gnutella-speaking applications and a number of smaller and often 
private, disconnected networks.

As most P2P file sharing applications, Gnutella protocol was designed 
to meet the following goals:
Ability to operate in a dynamic environment - P2P applications operate 
in dynamic environments, where hosts may join or leave the network 
frequently. They must achieve flexibility in order to keep operating 
transparently despite a constantly changing set of resources.

Performance and scalability - P2P paradigm shows its full potential only 
on large-scale deployments where the limits of the traditional 
client/server paradigm become obvious. Moreover scalability is 
important as P2P applications exhibit what economists call the "network 
effect" : the value of a network to an individual user increases with the 
total number of users participating in the network. Ideally when 
increasing the number of nodes, aggregate storage space and file 
availability should grow linearly, response time should remain constant, 
while search throughput should remain high.

Reliability - External attacks should not cause significant data or 
performance loss.

Anonymity - Anonymity is valued as a means to protect privacy of 
people seeking or providing information that may not be popular.

2.1.1 How Gnutella works?

Gnutella servents provide client side interfaces through which user's 
can issue queries and view search results, accept queries from other 
servents, check for matches against their local data set and respond 
with corresponding results. These nodes are also responsible for 
managing the background traffic that spreads the information used to 
maintain network integrity.

Every servent is connected dynamically to an average of 7 servents, 
depending on the bandwidth of the servent's network connection. The 
messages, routed via these connections can be divided into two 
categories. 
One type are the query messages, and the second type are the 
respond messages. The query messages are used to explore the 
structure of a terminals neighbourhood, by sending out PING 
messages. Secondly query messages are used to search for certain 
content, e.g. mp3 compressed audio files in the network, by sending out 
QUERY messages.

The Gnutella network employs a routing concept, known as 'Viral 
propagation" for the query messages. This means that a servent 
searching for content or exploring the network, sends out a query 
message, i.e. a QUERY or a PING message, to all the neighbouring 
servents it is currently directly connected to via TCP/IP connections in 
the virtual overlay network. Thus every servent is able to explore in a 
completely decentralized manner without the need for a central entity, 
by more or less simply flooding the network.

The second type of messages, which are used in the Gnutella network, 
are the respond messages, which are used to answer received query 
messages. The answer is QUERY_HIT message, if a QUERY was 
received, and the peer hosts the demanded content. A PONG-message 
is used to answer a PING message, and thus to make the querying 
client aware of its presence. These respond message are of no interest 
to the rest of the network, and they therefore have to be routed only to 
the querying servent. To avoid flooding, respond messages are routed 
back to the querying terminal on the same way that the original query 
message traveled to the receiving servent.

Beside the application routed signalling messages, the respond and 
query messages, the content a servent is querying for must also be 
distributed through the virtual overlay network. However, to minimize 
the load on the existing overlay network and especially of its 
servents/routers, the demanded data is transmitted "out-band". 
"Out-band" in this context means that with the address provided in the 
QUERY_HIT message, a direct - only IP routed - connection between 
the querying and the responding servent is established. This 
connection is used to transmit the content directly between the peers. If 
the responding servent is behind a firewall a PUSH message is sent by 
the querying servent requesting file download.

The following table summarizes these messages:

Type 	     Description     	      Contained information
--------------------------------------------------------------------
PING        Announce availability          None
                and probe for other         
                servents.	                 

PONG        Response to a PING	     IP address and port# of 
						     responding servent; 
number 					     and total kb of files 
shared.

QUERY       Search request            	    Minimum network bandwidth 
of 
		                                        responding servent; search 
						    criteria.


QUERY HIT Returned by servents	   IP address,port number and 
 	        that have the requested      network bandwidth of 	
	        file.			   responding servent; number 
of 					   results and result set.


PUSH    File  download  requests  	   Servent ID; index of 
requested 
        	  for servents behind a 	   file; IP address and port to 
	  firewall.	                	   send file to.
                                        
The major problem of the Gnutella protocol v0.4 is that parts of the 
virtual overlay network are flooded with ping and query messages, 
which causes a high signalling load. To reduce this load, a time-to-live 
value (TTL) is attached to each query message in the Gnutella protocol 
v0.4. This means that a query message is only forwarded by a servent, 
if the TTL-value, which is decreased with every hop, is not equal to 
zero. Thus a certain transmission range for these messages 
is defined, which prevents the network from being flooded by query 
messages, which could eventually lead to scalability problems, 
especially for servents connected to the network with only a low 
bandwidth connection.                                        

2.1.2 Query Routing

We can reduce the number of flooded query messages, by routing 
these queries based on the search keywords. The basic idea of query 
routing in the virtual overlay network is, that servents exchange their 
query routing tables with their neighbours periodically. The query 
routing tables contain metadata of hosted content, i.e. keywords, and 
the corresponding IP-address, of the servent from which the metadata 
was received. Any incoming query is then analyzed for its search 
keywords, and then compared to the local query-routing table. If one of 
the search keywords matches to one or more entries in the routing 
table, the query is forwarded in the direction, given by the routing table, 
instead of being flooded to all neighbours of the servent. If no match 
with the routing table can be found, the query is forwarded to all 
neighbours of the servent, as 
long as the TTL-value of the query message has not expired.

To minimize the amount of bandwidth necessary to propagate the 
routing tables, a variant of Bloom Filters are used. This means, that 
each keyword is hashed, and then all keywords of the content of one 
servent are compressed in a bitmap. Thus a whole set of keywords and 
IP-addresses need not have to be exchanged periodically, but only a 
comparatively small bitmap. Further, incremental updates could also be 
used, if only small changes have taken place since the last routing table 
has been propagated to its neighbours.

However, the major problem with the implementation of query-routing 
tables is again, how to keep them up to date, if the network is very 
dynamic. The problem is, that routing information for a certain file A 
which is hosted by the servent X, may still propagate through the 
network, although the servent X is not  a member of the network. Thus 
queries may be directed in a wrong direction, which leads to useless 
traffic, and to unsatisfied users, as the content they search for cannot 
be found any more.

A solution to this problem could be to set a timer for every routing 
table entry. After the expiration, this routing table entry is deleted 
to prevent any misleading routings. Further more the propagation 
reach of each routing table must be limited, to prevent the routing 
table from propagating through the whole network. This could be done 
with a hop-counter, which avoids routing tables from being spread any 
further, as soon as a certain value for the hop-count has been reached.

2.1.3 Summary

We can now summarize the properties of Gnutella network:

o	Completely decentralized
o	Hit rates are high
o	High fault tolerance
o	Adopts well and dynamically to changing peer populations
o	Simple, robust and scalable


o	Protocol causes high network traffic
o	No estimates on the durations of queries can be given
o	No probability for successful queries can be given
o	Topology is unknown
o	Reputation of peers is not addressed

2.2 Freenet -

Freenet is a distributed information storage and retrieval system which 
addresses the concerns such as - privacy and availability. The system 
operates as a location independent distributed file system across many 
servents, that allow files to be inserted, stored and requested 
anonymously. There are five main design goals:

#	Anonymity for both producers and consumers of information
#	Deniability for storers of information
#	Resistance to attempts by third parties to deny access to 
        information
#	Efficient dynamic storage and routing of information
#	Decentralization of all network functions

The system is designed to respond adaptively to usage patterns, 
transparently moving replicating and deleting files as necessary to 
provide efficient service without resorting to broadcast searches and 
centralized indexes. It is not intended to guarantee permanent file 
storage, although it is hoped that a sufficient number of nodes will 
join with enough storage capacity that most files will be able to 
remain indefinitely. In addition, the system operates in the application 
layer and assumes the existence of a secure transport layer, although 
it is transport layer independent. It does not seek to provide anonymity 
for general network usage, only for Freenet file transactions.

Maintaining privacy for creating and retrieving files means little 
without also protecting the files themselves in particular, keeping 
their holders hidden from attack. Freenet thus makes it hard to discover 
exactly which computers store which files. Together with redundant 
replication of data, holder privacy makes it extremely difficult for 
censors to block or destroy files on the network.

2.2.1 Architecture

Freenet is implemented as an adaptive Peer-to-Peer network of nodes 
that query one another to store and retrieve data files, which are named 
by location independent keys. Each node maintains its own local data-
store which it makes available to the network for reading and writing, as 
well as a dynamic routing table containing addresses of other nodes 
and the keys that they are thought to hold. It is intended that most users 
of the system will run nodes, both to provide security guarantees 
against inadvertently using a hostile foreign node and to increase the 
storage capacity available to the network as a whole.

2.2.1.1 Keys

Freenet participants each run a node that provides the network some 
storage space. To add a new file, a user sends the network an insert 
message containing the file and its assigned location-independent 
globally unique identifier (GUID), which causes the file to be stored on 
some set of nodes. During a file's lifetime, it might migrate to or be 
replicated on other nodes. To retrieve a file, a user sends out a 
request message containing the GUID key. When the request reaches 
one of the nodes where the file is stored, that node passes the data 
back to the request's originator.

Freenet GUID keys are calculated using SHA-1 secure hashes. The 
network employs two main types of keys: content-hash keys, used for 
primary data storage, and signed-subspace keys, intended for higher-
level human use. The two are analogous to "inodes" and filenames in a 
conventional file system.

2.2.1.2 Messaging and Privacy

Freenet was designed from the beginning under the assumption of 
hostile attack from both inside and out. Therefore, it intentionally makes 
it difficult for nodes to direct data toward themselves and keeps its 
routing topology dynamic and concealed. Unfortunately, these 
considerations have had the side effect of hampering changes that 
might improve Freenet's routing characteristics.

Privacy in Freenet is maintained using a variation of Chaum's mix-net 
scheme for anonymous communication. Rather than move directly from 
sender to recipient, messages travel through node-to-node chains, in 
which each link is individually encrypted, until the message finally 
reaches its recipient.

Because each node in the chain knows only about its immediate 
neighbours, the end points could be anywhere among the network's 
hundreds of thousands of nodes, which are continually exchanging 
indecipherable messages. Not even the node immediately after the 
sender can tell whether its predecessor was the message's originator or 
was merely forwarding a message from another node. Similarly, the 
node immediately before the receiver can't tell whether its successor is 
the true recipient or will continue to forward it. This arrangement is 
intended to protect not only information producers and consumers (at 
the beginning of chains), but also information holders (at the end of 
chains). By protecting the latter, it can prevent an adversary from 
destroying a file by attacking all of its holders. Of course, ensuring 
privacy is not enough; queries must be able to locate data as well.

2.2.1.3 Routing

Routing queries is the most important element of the Freenet 
system. The simplest routing method, used by services like Napster, is 
to maintain a central index of files, so that users can send requests 
directly to information holders. Unfortunately, centralization creates 
a single point of failure that is easy to attack. For example, if someone 
is trying to phone Sachin Tendulkar, the simplest way to get his number 
would ordinarily be to call directory assistance. However, because 
directory assistance is centralized, his access can be easily blocked if 
Sachin or someone else decides to remove his directory entry, or if the 
service goes down.

Systems like Gnutella broadcast queries to every connected node 
within some radius. Using this method, someone would ask all of his 
friends if any of them knew Sachin's number, get them to ask their 
friends, and so on. Within a few steps, thousands of people could be 
looking for his number. Although this process would eventually find the 
answer, it is clearly wasteful and unscalable.

Freenet avoids both problems by using a steepest-ascent hill-climbing 
search: Each node forwards queries to the node that it thinks is closest 
to the target. A person might start searching for Sachin by asking a 
friend who once played college cricket, for example, who might pass his 
request on to a former coach, who could pass it to someone else, who 
might pass it to Sachin's agent, who could put him in touch with the 
man himself.

Requesting files: Every node maintains a routing table that lists the 
addresses of other nodes and the GUID keys it thinks they hold. When 
a node receives a query, it first checks its own store, and if it finds 
the file, returns it with a tag identifying itself as the data holder. 
Otherwise, the node forwards the request to the node in its table with 
the closest key to the one requested. That node then checks its store, 
and so on. If the request is successful, each node in the chain passes 
the file back upstream and creates a new entry in its routing table 
associating the data holder with the requested key. Depending on its 
distance from the holder, each node might also cache a copy locally.

To conceal the identity of the data holder, nodes will occasionally 
alter reply messages, setting the holder tags to point to themselves 
before passing them back up the chain. Later requests will still locate 
the data because the node retains the true data holder's identity in its 
own routing table and forwards queries to the correct holder. Routing 
tables are never revealed to other nodes.

To limit resource usage, the requester gives each query a time-to-live 
limit that is decremented at each node. If the TTL expires, the query 
fails, although the user can try again with a higher TTL (up to some 
maximum). Because the TTL can give clues about where in the chain 
the requester is, Freenet offers the option of enhancing security by 
adding an initial mix-net route before normal routing. This effectively 
re-positions the start of the chain away from the requester.

If a node sends a query to a recipient that is already in the chain, the 
message is bounced back and the node tries to use the next-closest 
key instead. If a node runs out of candidates to try, it reports failure 
back to its predecessor in the chain, which then tries its second choice, 
and so on.

With this approach, the request homes in closer with each hop until the 
key is found. A subsequent query for this key will tend to approach the 
first request's path, and a locally cached copy can satisfy the query 
after the two paths converge. Subsequent queries for similar keys will 
also jump over intermediate nodes to one that has previously supplied 
similar data. Nodes that reliably answer queries will be added to more 
routing tables, and hence, will be contacted more often than nodes that 
do not.

Inserting files: An insert message follows the same path that a request 
for the same key would take, sets the routing table entries in the same 
way, and stores the file on the same nodes. Thus, new files are placed 
where queries would look for them.

To insert a file, a user assigns it a GUID key and sends an insert 
message to the user's own node containing the new key.

Upon receiving an insert, a node checks its data store to see if the 
key already exists. If so, the insert fails - either because the file 
is already in the network (for CHKs) or the user has already inserted 
another file with the same description (for SSKs). In the latter case, 
the user should choose a different description or perform an update 
rather than an insert.

If the key does not already exist in the node's data store, the node 
looks up the closest key and forwards the message to the 
corresponding node as it would for a query. If the TTL expires without 
collision, the final node returns an "all clear" message. The user then 
sends the data down the path established by the initial insert message. 
Each node along the path verifies the data against its GUID, stores it, 
and creates a routing table entry that lists the data holder as the final 
node in this chain. As with requests, if the insert encounters a loop 
or a dead end, it backtracks to the second-nearest key, then the 
third-nearest, and so on, until it succeeds.

2.2.2 Network Evolution

The network evolves over time as new nodes join and existing nodes 
create new connections after handling queries. As more requests are 
handled, local knowledge about other nodes in the network improves, 
and routes adapt to become more accurate without requiring global 
directories.

2.2.2.1 Adding Nodes

To join the network, a new node first generates a public-private key 
pair for itself. This pair serves to logically identify the node and is 
used to sign a physical address reference. Note that public keys are not 
certified. We don't need to link them to real-world identities because 
the node's public key is its identity, even if it changes physical 
addresses. Certification might be useful in the future for deciding 
whether to trust a new node, but for now Freenet uses no trust 
mechanism.

Next, the node sends an announcement message including the public 
key and physical address to an existing node, located through some 
out-of-band means such as personal communication or lists of nodes 
posted on the Web, with a user-specified TTL. The receiving node 
notes the new node's identifying information and forwards the 
announcement to another node chosen randomly from its routing table. 
The announcement continues to propagate until its TTL runs out. At 
that point, the nodes in the chain collectively assign the new node a 
random GUID in the key-space using a cryptographic protocol for 
shared random number generation that prevents any participant from 
biasing the result. This procedure assigns the new node responsibility 
for a region of key-space that all participants agree on while 
guaranteeing that a malicious node cannot influence the assignment for 
a specific key that it might want to attack.

2.2.2.2 Training Routes

As more requests are processed, the network's routing should become 
better trained. Nodes' routing tables should specialize in handling 
clusters of similar keys because each node will mostly receive requests 
for keys that are similar to the keys it is associated with in other 
nodes' routing tables. When those requests succeed, the node learns 
about previously unknown nodes that can supply such keys and creates 
new routing entries for them. As the node gains more experience in 
handling queries for those keys, it will successfully answer them more 
often and, in a positive feedback loop, get asked about them more 
often.

Nodes' data stores should also specialize in storing clusters of files 
with similar keys. Because inserts follow the same paths as requests, 
similar keys tend to cluster in the nodes along those paths. Nodes 
should similarly cluster files cached after requests because most 
requests will be for similar keys.

Taken together, the twin effects of clustering in routing tables and 
data stores should improve the effectiveness of future queries in a 
self-reinforcing cycle.

2.2.3 Summary

This discussion on Freenet can be summarized with the following 
points:

o	Completely decentralized
o	High fault tolerance
o	Robust and scalable
o	Automatic replication of content
o	Adopts well and dynamically to changing peer populations
o	Spam content less of a problem
o	Adaptive routing preserves network bandwidth
o	Supports anonymity of publishers and readers

o	No estimates on the duration of queries can be given
o	No probability for successful queries can be given
o	Topology is unknown
o	Reputation of peers is not addressed


3. Some structured routing techniques
-------------------------------------

Structured peer-to-peer (p2p) overlays like CAN, Chord, Pastry and 
Tapestry provide a self-organizing substrate for large-scale peer-to-
peer applications. These systems provide a powerful platform for the 
construction of a variety of decentralized services, including network 
storage, content distribution, and application-level multicast. Structured 
overlays allow applications to locate any object in a probabilistically 
bounded, small number of network hops, while requiring per-node 
routing tables with only a small number of entries. Moreover, the 
systems are scalable, fault tolerant and provide effective load 
balancing.

However, to fully realize the potential of the p2p paradigm, such 
overlay networks must be able to support an open environment where 
mutually distrusting parties with conflicting interests are allowed to 
join. Even in a closed system of sufficiently large scale, it may be 
unrealistic to assume that none of the participating nodes have been 
compromised by attackers. Thus, structured overlays must be robust to 
a variety of security attacks, including the case where a fraction of the 
participating nodes act maliciously. Such nodes may mis-route, corrupt, 
or drop messages and routing information. Additionally, they may 
attempt to assume the identity of other nodes and corrupt or delete 
objects they are supposed to store on behalf of the system.

We can think of an abstract model of a structured p2p routing overlay, 
designed to capture the key concepts common to overlays like CAN, 
Chord, Tapestry and Pastry. In this model, participating nodes ate 
assigned uniform random identifiers, nodelDs, from a large ID space 
(e.g., the set of 128-bit unsigned integers). Application specific objects 
are assigned unique identifiers, called keys, selected from the same ID 
space. Each key is mapped by the overlay to a unique live node, called 
the key's root. The protocol routes messages with a given key to its 
associated root.

To route messages efficiently, each node maintains a routing table with 
nodelDs of other nodes and their associated IP addresses. Moreover, 
each node maintains a neighbour set, consisting of some number of 
nodes with nodelDs close to the that of the current node. Since nodelD 
assignment is random, any neighbour set represents a random sample 
of all participating nodes.

For fault tolerance, application objects are stored at more than one 
node in the overlay. A replica function maps an object's key to a set 
of replica keys, such that the set of replica roots associated with the 
replica keys represents a random sample of participating nodes in the 
overlay. Each replica root stores a copy of the object.

The following is a discussion on different existing structured p2p 
overlay protocols and how they relate to our abstract model.

3.1 Pastry -

Pastry nodelDs are assigned randomly with uniform distribution from a 
circular 128-bit id space. Given a 128-bit key, Pastry routes an 
associated message toward the live node whose nodelD is numerically 
closest to the key. Each Pastry node keeps track of its neighbour set.

Node state: For the purpose of routing, nodelDs and keys are thought 
of as a sequence of digits in base 2b (b is a configuration parameter 
with typical value 4). A node's routing table is organized into 128/2b 
rows and 20 columns. The 2b entries in row r of the routing table 
contain the IP addresses of nodes whose nodelDs share the first r digits 
with the present node's nodelD; the r+1 th nodelD digit of the node in 
column c of row r equals c. The column in row r that corresponds to the 
value of the r +1 th digit of the local node's nodelD remains empty. A 
routing table entry is left empty if no node with the appropriate nodelD 
prefix is known. 

Message routing: At each routing step, a node seeks to forward the 
message to a node in the routing table whose nodelD shares with the 
key a prefix that is at least one digit (or b bits) longer than the 
prefix that the key shares with the present node's id. If no such node 
can be found, a `die' message is forwarded to a node whose nodelD 
shares a prefix with the key as long as the current node, but is 
numerically closer to the key than the present node's ID. If no 
appropriate node exists either in the routing table or neighbour set, then 
the current node or its immediate neighbour is the message's final 
destination.

To achieve self-organization, Pastry nodes must dynamically maintain 
their node state, i.e., the routing table and neighbour set, in the 
presence of node arrivals and node failures. A newly arriving node with 
the new nodelD X can initialize its state by asking any existing Pastry 
node A to route a special message using X as the key. The message is 
routed to the existing node Z with nodeld numerically closest to X. X 
then obtains the neighbour set from Z and constructs its routing table by 
copying rows from the routing tables of the nodes it encountered on the 
original route from A to Z. Finally, X announces its presence to the 
initial members of its neighbour set, which in turn update their own 
neighbour sets and routing tables. Similarly, the overlay can adapt to 
abrupt node failure by 
exchanging a small number of messages (O(log2b N) among a small 
number of nodes.

3.2 CAN, Chord, Tapestry -
Tapestry is very similar to Pastry but differs in its approach to 
mapping keys to nodes and in how it manages replication. In Tapestry, 
neighbouring nodes in the namespace are not aware of each other. 
When a node's routing table does not have an entry for a node that 
matches a key's nth digit, the message is forwarded to the node with 
the next higher value in the nth digit, modulo 2^b, found in the routing 
table. This procedure, called surrogate routing, maps keys to a unique 
live node if the node routing tables are consistent. Tapestry does not 
have a direct analog to a neighbour set, although one can think of the 
lowest populated level of the Tapestry routing table as a neighbour set. 
For fault tolerance, Tapestry's replica function produces a set of 
random keys, yielding a set of replica roots at random points in the ID 
space. The expected number of routing hops in Tapestry is log(2^b)N.

Chord uses a 160-bit circular ID space. Unlike Pastry, Chord forwards 
messages only in clockwise direction in the circular ID space. Instead 
of the prefix-based routing table in Pastry, Chord nodes maintain a 
routing table consisting of up to 160 pointers to other live nodes 
(called a "finger table"). The i entry in the finger table of node n refers to 
the live node with the smallest nodelD clockwise from n+2^(i-1). The 
first entry points to n's successor, and subsequent entries refer to 
nodes at repeatedly doubling distances from n. Each node in Chord 
also maintains pointers to its predecessor and to its n successors in the 
nodelD space. The expected number of routing hops in Chord is 
5*log(e)N.

CAN routes messages in a d-dimensional space, where each node 
maintains a routing table with O(d) entries and any node can be 
reached in (d/4)*(N^(1/d)) routing hops on average. The entries in a 
node's routing table refer to its neighbours in the d-dimensional space. 
Unlike Pastry, Tapestry and Chord, CAN's routing table does not grow 
with the network size, but the number of routing hops grows faster than 
logN in in this case.



More information about the reader-list mailing list