[Reader-list] [Fourth posting] Implementation of a peer-to-peer news distribution network
Soumava Das
soumava at vsnl.com
Fri Jul 2 00:12:23 IST 2004
Hi,
This is the third part of our survey on peer-to-peer networks.
This part focuses on the different routing techniques used by some of
the important p2p networks.
Soumava Das
-----------------------------------------------------------------------------
Study of different routing techniques
.............................................................
1. An introduction to Peer-to-Peer networks
-------------------------------------------
1.1 General characteristics -
The term peer stems from the Latin for equal. It thus characterizes
individuals, who may be regarded as with respect to some function or
some situation. This describes one major feature of Peer-to-Peer
networking, namely that all participants have the same rights and
duties, at least in the pure Peer-to-Peer architecture. In such a
network peers, i.e entities with the same networking functionalities,
interact directly with each other to perform a service or exchange
data directly, without the need for a central management.
If central entity is used in a Peer-to-Peer network, then we call it a
hybrid Peer-to-Peer network. The central entity is in most cases only
used as a central index which is managed and maintained only by the
connected peers of such a network.
Peer-to-Peer networking gives us the concept of an entity acting as a
Servent. Servent is an artificial word which is derived from the first
syllable of the term server ("Serv-") and second syllable of the term
client ("-ent"). Thus this term Servent shall represent the capability
of the nodes of a peer-to-peer network of acting at the same time as a
server as well as a client. As a result a participant of a Peer-to-Peer
network is on the one hand able to direct requests to other nodes, and
on the other hand is able to work on and answer incoming requests
from other nodes of this Peer-to-Peer network.
A Peer-to-Peer network is mostly located on top of the IP-protocol
stack (UDP, TCP, FTP) and establishes virtual paths between all peers.
These paths are used for signalling, as well as for data transfer.
Further on, this virtual network is managed and controlled by the peers,
which therefore provide the networking functionalities. The content and
services offered by the network are offered by the peers. Therefore the
peers contribute parts of their resources (storage capacity, network
link capacity, computing power) to the network. Hence in these kind of
networks, peers share their storage capacity to bring in the content,
their processing power for routing issues and their network link
capacity to forward incoming search requests, as well as to upload the
requested content.
Hence a general definition of Peer-to-Peer networks can be given as
follows: A distributed network architecture may be called a Peer-to-Peer
network, if the participants share a part of their own hardware resources
(processing power, storage capacity, network link capacity, printers, ...).
These shared resources are necessary to provide the service and
content offered by the network. They are accessible by other peers
directly, without passing intermediary entities. The participants of such
a network are thus resource providers as well as resource requestors.
1.2 "Pure" Peer-to-Peer networks -
A "Pure" Peer-to Peer network is characterized by the fact that in a
"Pure" Peer-to-Peer network all participating nodes are peers. That
means all participants have the same rights and same duties, although
they may rely on a different hardware basis. Thus no central institution
to manage, control, co-ordinate or to provide content/services for the
Peer-to-Peer network exists in this network. Thus any single point of
failure is avoided by this network, which makes Denial-of-Service (DoS)
attacks difficult, as all tasks are distributed throughout the
network.
As the peers of a "Pure" Peer-to-Peer network equally interact with
each other via the virtual paths described before, a very symmetric
networking structure arises. The requests of the clients are not directed
to one central entity, but rather evenly distributed over the whole
network. Thus this network is a self organizing network without any
central management instance.
As there is no central index or database exists, routing is done via a
distributed search, which is based on direct messaging between the
peers, until the requested peer is found. To prevent the network from
being flooded by the search requests, Time-to-Live counter is attached
to each message in this protocol This Time-to-Live counter counts the
number of hops as the message is forwarded from peer to peer, during
the lifetime of the search message. As soon as the counter reaches a
predefined value, the message is killed and not forwarded any further.
1.3 "Hybrid" Peer-to-Peer networks -
The key distinction of "Hybrid" Peer-to-Peer compared to "Pure"
Peer-to-Peer is the fact, that the former always includes a central
entity. The central entity is not a commonly known general purpose
server from which content or services can be requested and
downloaded. It resembles more to a central peer group database or
catalogue of addresses, from which the location of a service or content
can be retrieved. The actual service is then executed between the
peers, without any further support by the central entity. Just the search,
where the content/service can be retrieved from, is performed with the
help of the central entity.
The most prominent representative of "Hybrid" Peer-to-Peer networking
is Napster, which is widely known because of the lawsuit filed against it
by the Recording Industry Association of America (RIAA). The Napster
protocol is used for the exchange of files, mainly audio files. To retrieve
a file, a peer first has to request the location of the demanded file at
the central entity, in this case the Napster server. The server provides
a central index, in which the IP address of peers and the characteristics
of their shared certain files are stored. If the demanded file matches
the characteristics of a file listed in the index, the Napster server
returns the IP address, where the file can be retrieved from, to the
requestor. Any further handling and management regarding the file
transfer, is now done between the two peers. The server is not involved
any further.
2. Peer-to-Peer networks and their routing techniques
-----------------------------------------------------
Peer-to-Peer (P2P) networks provide the capability to establish virtual
overlay networks. "Pure" P2P networks are completely self organizing
and therefore do not need central instances to manage the network.
>From the discussions in section 1.2 it can be said that, an important
characteristic of these networks is that the terminals of these networks
communicate in a bidirectional and symmetric way with each other,
therefore a virtual overlay network is established above the IP-Layer.
Such a network consists in most cases only of the servents and the
TCP/IP connections between the different servents.
-> Three system models of Peer-to-Peer networks :-
i) Centralized model: In this model there is a global index held by some
centralized authority, but there is direct contact between requestors and
providers. Naturally, the threat of a central point of failure is
prevalent in this model. This type of networks falls under "Hybrid"
Peer-to-Peer networks, as discussed in section 1.3. Example - Napster.
ii) Decentralized model: This model speaks of no global index and no
central co-ordination. The contact between requestors and providers
can be direct or mediated by a chain of intermediaries. This type of
networks can be called "Pure" Peer-to-Peer networks as discussed in
section 1.2. Example - Gnutella and Freenet.
iii) Hierarchical model: This model is basically a mix of centralized and
de-centralized models. In this model, a few peers with better processing
power, storage capacity and network link capacity, are termed as
"Super-peers" or "Ultra-peers". This model also falls roughly under
"Hybrid" Peer-to-Peer networks. Example - FastTrack.
-> Main Peer-to-Peer design requirements :-
i) Resource discovery and routing
ii) Managing updates
iii) Scalability
iv) Robustness and fault tolerance
v) Trust assessment and management
2.1 Gnutella -
The Gnutella protocol is an open, decentralized group membership and
search protocol, mainly used for file sharing. The term Gnutella also
designates the virtual network Internet accessible hosts running
Gnutella-speaking applications and a number of smaller and often
private, disconnected networks.
As most P2P file sharing applications, Gnutella protocol was designed
to meet the following goals:
Ability to operate in a dynamic environment - P2P applications operate
in dynamic environments, where hosts may join or leave the network
frequently. They must achieve flexibility in order to keep operating
transparently despite a constantly changing set of resources.
Performance and scalability - P2P paradigm shows its full potential only
on large-scale deployments where the limits of the traditional
client/server paradigm become obvious. Moreover scalability is
important as P2P applications exhibit what economists call the "network
effect" : the value of a network to an individual user increases with the
total number of users participating in the network. Ideally when
increasing the number of nodes, aggregate storage space and file
availability should grow linearly, response time should remain constant,
while search throughput should remain high.
Reliability - External attacks should not cause significant data or
performance loss.
Anonymity - Anonymity is valued as a means to protect privacy of
people seeking or providing information that may not be popular.
2.1.1 How Gnutella works?
Gnutella servents provide client side interfaces through which user's
can issue queries and view search results, accept queries from other
servents, check for matches against their local data set and respond
with corresponding results. These nodes are also responsible for
managing the background traffic that spreads the information used to
maintain network integrity.
Every servent is connected dynamically to an average of 7 servents,
depending on the bandwidth of the servent's network connection. The
messages, routed via these connections can be divided into two
categories.
One type are the query messages, and the second type are the
respond messages. The query messages are used to explore the
structure of a terminals neighbourhood, by sending out PING
messages. Secondly query messages are used to search for certain
content, e.g. mp3 compressed audio files in the network, by sending out
QUERY messages.
The Gnutella network employs a routing concept, known as 'Viral
propagation" for the query messages. This means that a servent
searching for content or exploring the network, sends out a query
message, i.e. a QUERY or a PING message, to all the neighbouring
servents it is currently directly connected to via TCP/IP connections in
the virtual overlay network. Thus every servent is able to explore in a
completely decentralized manner without the need for a central entity,
by more or less simply flooding the network.
The second type of messages, which are used in the Gnutella network,
are the respond messages, which are used to answer received query
messages. The answer is QUERY_HIT message, if a QUERY was
received, and the peer hosts the demanded content. A PONG-message
is used to answer a PING message, and thus to make the querying
client aware of its presence. These respond message are of no interest
to the rest of the network, and they therefore have to be routed only to
the querying servent. To avoid flooding, respond messages are routed
back to the querying terminal on the same way that the original query
message traveled to the receiving servent.
Beside the application routed signalling messages, the respond and
query messages, the content a servent is querying for must also be
distributed through the virtual overlay network. However, to minimize
the load on the existing overlay network and especially of its
servents/routers, the demanded data is transmitted "out-band".
"Out-band" in this context means that with the address provided in the
QUERY_HIT message, a direct - only IP routed - connection between
the querying and the responding servent is established. This
connection is used to transmit the content directly between the peers. If
the responding servent is behind a firewall a PUSH message is sent by
the querying servent requesting file download.
The following table summarizes these messages:
Type Description Contained information
--------------------------------------------------------------------
PING Announce availability None
and probe for other
servents.
PONG Response to a PING IP address and port# of
responding servent;
number and total kb of files
shared.
QUERY Search request Minimum network bandwidth
of
responding servent; search
criteria.
QUERY HIT Returned by servents IP address,port number and
that have the requested network bandwidth of
file. responding servent; number
of results and result set.
PUSH File download requests Servent ID; index of
requested
for servents behind a file; IP address and port to
firewall. send file to.
The major problem of the Gnutella protocol v0.4 is that parts of the
virtual overlay network are flooded with ping and query messages,
which causes a high signalling load. To reduce this load, a time-to-live
value (TTL) is attached to each query message in the Gnutella protocol
v0.4. This means that a query message is only forwarded by a servent,
if the TTL-value, which is decreased with every hop, is not equal to
zero. Thus a certain transmission range for these messages
is defined, which prevents the network from being flooded by query
messages, which could eventually lead to scalability problems,
especially for servents connected to the network with only a low
bandwidth connection.
2.1.2 Query Routing
We can reduce the number of flooded query messages, by routing
these queries based on the search keywords. The basic idea of query
routing in the virtual overlay network is, that servents exchange their
query routing tables with their neighbours periodically. The query
routing tables contain metadata of hosted content, i.e. keywords, and
the corresponding IP-address, of the servent from which the metadata
was received. Any incoming query is then analyzed for its search
keywords, and then compared to the local query-routing table. If one of
the search keywords matches to one or more entries in the routing
table, the query is forwarded in the direction, given by the routing table,
instead of being flooded to all neighbours of the servent. If no match
with the routing table can be found, the query is forwarded to all
neighbours of the servent, as
long as the TTL-value of the query message has not expired.
To minimize the amount of bandwidth necessary to propagate the
routing tables, a variant of Bloom Filters are used. This means, that
each keyword is hashed, and then all keywords of the content of one
servent are compressed in a bitmap. Thus a whole set of keywords and
IP-addresses need not have to be exchanged periodically, but only a
comparatively small bitmap. Further, incremental updates could also be
used, if only small changes have taken place since the last routing table
has been propagated to its neighbours.
However, the major problem with the implementation of query-routing
tables is again, how to keep them up to date, if the network is very
dynamic. The problem is, that routing information for a certain file A
which is hosted by the servent X, may still propagate through the
network, although the servent X is not a member of the network. Thus
queries may be directed in a wrong direction, which leads to useless
traffic, and to unsatisfied users, as the content they search for cannot
be found any more.
A solution to this problem could be to set a timer for every routing
table entry. After the expiration, this routing table entry is deleted
to prevent any misleading routings. Further more the propagation
reach of each routing table must be limited, to prevent the routing
table from propagating through the whole network. This could be done
with a hop-counter, which avoids routing tables from being spread any
further, as soon as a certain value for the hop-count has been reached.
2.1.3 Summary
We can now summarize the properties of Gnutella network:
o Completely decentralized
o Hit rates are high
o High fault tolerance
o Adopts well and dynamically to changing peer populations
o Simple, robust and scalable
o Protocol causes high network traffic
o No estimates on the durations of queries can be given
o No probability for successful queries can be given
o Topology is unknown
o Reputation of peers is not addressed
2.2 Freenet -
Freenet is a distributed information storage and retrieval system which
addresses the concerns such as - privacy and availability. The system
operates as a location independent distributed file system across many
servents, that allow files to be inserted, stored and requested
anonymously. There are five main design goals:
# Anonymity for both producers and consumers of information
# Deniability for storers of information
# Resistance to attempts by third parties to deny access to
information
# Efficient dynamic storage and routing of information
# Decentralization of all network functions
The system is designed to respond adaptively to usage patterns,
transparently moving replicating and deleting files as necessary to
provide efficient service without resorting to broadcast searches and
centralized indexes. It is not intended to guarantee permanent file
storage, although it is hoped that a sufficient number of nodes will
join with enough storage capacity that most files will be able to
remain indefinitely. In addition, the system operates in the application
layer and assumes the existence of a secure transport layer, although
it is transport layer independent. It does not seek to provide anonymity
for general network usage, only for Freenet file transactions.
Maintaining privacy for creating and retrieving files means little
without also protecting the files themselves in particular, keeping
their holders hidden from attack. Freenet thus makes it hard to discover
exactly which computers store which files. Together with redundant
replication of data, holder privacy makes it extremely difficult for
censors to block or destroy files on the network.
2.2.1 Architecture
Freenet is implemented as an adaptive Peer-to-Peer network of nodes
that query one another to store and retrieve data files, which are named
by location independent keys. Each node maintains its own local data-
store which it makes available to the network for reading and writing, as
well as a dynamic routing table containing addresses of other nodes
and the keys that they are thought to hold. It is intended that most users
of the system will run nodes, both to provide security guarantees
against inadvertently using a hostile foreign node and to increase the
storage capacity available to the network as a whole.
2.2.1.1 Keys
Freenet participants each run a node that provides the network some
storage space. To add a new file, a user sends the network an insert
message containing the file and its assigned location-independent
globally unique identifier (GUID), which causes the file to be stored on
some set of nodes. During a file's lifetime, it might migrate to or be
replicated on other nodes. To retrieve a file, a user sends out a
request message containing the GUID key. When the request reaches
one of the nodes where the file is stored, that node passes the data
back to the request's originator.
Freenet GUID keys are calculated using SHA-1 secure hashes. The
network employs two main types of keys: content-hash keys, used for
primary data storage, and signed-subspace keys, intended for higher-
level human use. The two are analogous to "inodes" and filenames in a
conventional file system.
2.2.1.2 Messaging and Privacy
Freenet was designed from the beginning under the assumption of
hostile attack from both inside and out. Therefore, it intentionally makes
it difficult for nodes to direct data toward themselves and keeps its
routing topology dynamic and concealed. Unfortunately, these
considerations have had the side effect of hampering changes that
might improve Freenet's routing characteristics.
Privacy in Freenet is maintained using a variation of Chaum's mix-net
scheme for anonymous communication. Rather than move directly from
sender to recipient, messages travel through node-to-node chains, in
which each link is individually encrypted, until the message finally
reaches its recipient.
Because each node in the chain knows only about its immediate
neighbours, the end points could be anywhere among the network's
hundreds of thousands of nodes, which are continually exchanging
indecipherable messages. Not even the node immediately after the
sender can tell whether its predecessor was the message's originator or
was merely forwarding a message from another node. Similarly, the
node immediately before the receiver can't tell whether its successor is
the true recipient or will continue to forward it. This arrangement is
intended to protect not only information producers and consumers (at
the beginning of chains), but also information holders (at the end of
chains). By protecting the latter, it can prevent an adversary from
destroying a file by attacking all of its holders. Of course, ensuring
privacy is not enough; queries must be able to locate data as well.
2.2.1.3 Routing
Routing queries is the most important element of the Freenet
system. The simplest routing method, used by services like Napster, is
to maintain a central index of files, so that users can send requests
directly to information holders. Unfortunately, centralization creates
a single point of failure that is easy to attack. For example, if someone
is trying to phone Sachin Tendulkar, the simplest way to get his number
would ordinarily be to call directory assistance. However, because
directory assistance is centralized, his access can be easily blocked if
Sachin or someone else decides to remove his directory entry, or if the
service goes down.
Systems like Gnutella broadcast queries to every connected node
within some radius. Using this method, someone would ask all of his
friends if any of them knew Sachin's number, get them to ask their
friends, and so on. Within a few steps, thousands of people could be
looking for his number. Although this process would eventually find the
answer, it is clearly wasteful and unscalable.
Freenet avoids both problems by using a steepest-ascent hill-climbing
search: Each node forwards queries to the node that it thinks is closest
to the target. A person might start searching for Sachin by asking a
friend who once played college cricket, for example, who might pass his
request on to a former coach, who could pass it to someone else, who
might pass it to Sachin's agent, who could put him in touch with the
man himself.
Requesting files: Every node maintains a routing table that lists the
addresses of other nodes and the GUID keys it thinks they hold. When
a node receives a query, it first checks its own store, and if it finds
the file, returns it with a tag identifying itself as the data holder.
Otherwise, the node forwards the request to the node in its table with
the closest key to the one requested. That node then checks its store,
and so on. If the request is successful, each node in the chain passes
the file back upstream and creates a new entry in its routing table
associating the data holder with the requested key. Depending on its
distance from the holder, each node might also cache a copy locally.
To conceal the identity of the data holder, nodes will occasionally
alter reply messages, setting the holder tags to point to themselves
before passing them back up the chain. Later requests will still locate
the data because the node retains the true data holder's identity in its
own routing table and forwards queries to the correct holder. Routing
tables are never revealed to other nodes.
To limit resource usage, the requester gives each query a time-to-live
limit that is decremented at each node. If the TTL expires, the query
fails, although the user can try again with a higher TTL (up to some
maximum). Because the TTL can give clues about where in the chain
the requester is, Freenet offers the option of enhancing security by
adding an initial mix-net route before normal routing. This effectively
re-positions the start of the chain away from the requester.
If a node sends a query to a recipient that is already in the chain, the
message is bounced back and the node tries to use the next-closest
key instead. If a node runs out of candidates to try, it reports failure
back to its predecessor in the chain, which then tries its second choice,
and so on.
With this approach, the request homes in closer with each hop until the
key is found. A subsequent query for this key will tend to approach the
first request's path, and a locally cached copy can satisfy the query
after the two paths converge. Subsequent queries for similar keys will
also jump over intermediate nodes to one that has previously supplied
similar data. Nodes that reliably answer queries will be added to more
routing tables, and hence, will be contacted more often than nodes that
do not.
Inserting files: An insert message follows the same path that a request
for the same key would take, sets the routing table entries in the same
way, and stores the file on the same nodes. Thus, new files are placed
where queries would look for them.
To insert a file, a user assigns it a GUID key and sends an insert
message to the user's own node containing the new key.
Upon receiving an insert, a node checks its data store to see if the
key already exists. If so, the insert fails - either because the file
is already in the network (for CHKs) or the user has already inserted
another file with the same description (for SSKs). In the latter case,
the user should choose a different description or perform an update
rather than an insert.
If the key does not already exist in the node's data store, the node
looks up the closest key and forwards the message to the
corresponding node as it would for a query. If the TTL expires without
collision, the final node returns an "all clear" message. The user then
sends the data down the path established by the initial insert message.
Each node along the path verifies the data against its GUID, stores it,
and creates a routing table entry that lists the data holder as the final
node in this chain. As with requests, if the insert encounters a loop
or a dead end, it backtracks to the second-nearest key, then the
third-nearest, and so on, until it succeeds.
2.2.2 Network Evolution
The network evolves over time as new nodes join and existing nodes
create new connections after handling queries. As more requests are
handled, local knowledge about other nodes in the network improves,
and routes adapt to become more accurate without requiring global
directories.
2.2.2.1 Adding Nodes
To join the network, a new node first generates a public-private key
pair for itself. This pair serves to logically identify the node and is
used to sign a physical address reference. Note that public keys are not
certified. We don't need to link them to real-world identities because
the node's public key is its identity, even if it changes physical
addresses. Certification might be useful in the future for deciding
whether to trust a new node, but for now Freenet uses no trust
mechanism.
Next, the node sends an announcement message including the public
key and physical address to an existing node, located through some
out-of-band means such as personal communication or lists of nodes
posted on the Web, with a user-specified TTL. The receiving node
notes the new node's identifying information and forwards the
announcement to another node chosen randomly from its routing table.
The announcement continues to propagate until its TTL runs out. At
that point, the nodes in the chain collectively assign the new node a
random GUID in the key-space using a cryptographic protocol for
shared random number generation that prevents any participant from
biasing the result. This procedure assigns the new node responsibility
for a region of key-space that all participants agree on while
guaranteeing that a malicious node cannot influence the assignment for
a specific key that it might want to attack.
2.2.2.2 Training Routes
As more requests are processed, the network's routing should become
better trained. Nodes' routing tables should specialize in handling
clusters of similar keys because each node will mostly receive requests
for keys that are similar to the keys it is associated with in other
nodes' routing tables. When those requests succeed, the node learns
about previously unknown nodes that can supply such keys and creates
new routing entries for them. As the node gains more experience in
handling queries for those keys, it will successfully answer them more
often and, in a positive feedback loop, get asked about them more
often.
Nodes' data stores should also specialize in storing clusters of files
with similar keys. Because inserts follow the same paths as requests,
similar keys tend to cluster in the nodes along those paths. Nodes
should similarly cluster files cached after requests because most
requests will be for similar keys.
Taken together, the twin effects of clustering in routing tables and
data stores should improve the effectiveness of future queries in a
self-reinforcing cycle.
2.2.3 Summary
This discussion on Freenet can be summarized with the following
points:
o Completely decentralized
o High fault tolerance
o Robust and scalable
o Automatic replication of content
o Adopts well and dynamically to changing peer populations
o Spam content less of a problem
o Adaptive routing preserves network bandwidth
o Supports anonymity of publishers and readers
o No estimates on the duration of queries can be given
o No probability for successful queries can be given
o Topology is unknown
o Reputation of peers is not addressed
3. Some structured routing techniques
-------------------------------------
Structured peer-to-peer (p2p) overlays like CAN, Chord, Pastry and
Tapestry provide a self-organizing substrate for large-scale peer-to-
peer applications. These systems provide a powerful platform for the
construction of a variety of decentralized services, including network
storage, content distribution, and application-level multicast. Structured
overlays allow applications to locate any object in a probabilistically
bounded, small number of network hops, while requiring per-node
routing tables with only a small number of entries. Moreover, the
systems are scalable, fault tolerant and provide effective load
balancing.
However, to fully realize the potential of the p2p paradigm, such
overlay networks must be able to support an open environment where
mutually distrusting parties with conflicting interests are allowed to
join. Even in a closed system of sufficiently large scale, it may be
unrealistic to assume that none of the participating nodes have been
compromised by attackers. Thus, structured overlays must be robust to
a variety of security attacks, including the case where a fraction of the
participating nodes act maliciously. Such nodes may mis-route, corrupt,
or drop messages and routing information. Additionally, they may
attempt to assume the identity of other nodes and corrupt or delete
objects they are supposed to store on behalf of the system.
We can think of an abstract model of a structured p2p routing overlay,
designed to capture the key concepts common to overlays like CAN,
Chord, Tapestry and Pastry. In this model, participating nodes ate
assigned uniform random identifiers, nodelDs, from a large ID space
(e.g., the set of 128-bit unsigned integers). Application specific objects
are assigned unique identifiers, called keys, selected from the same ID
space. Each key is mapped by the overlay to a unique live node, called
the key's root. The protocol routes messages with a given key to its
associated root.
To route messages efficiently, each node maintains a routing table with
nodelDs of other nodes and their associated IP addresses. Moreover,
each node maintains a neighbour set, consisting of some number of
nodes with nodelDs close to the that of the current node. Since nodelD
assignment is random, any neighbour set represents a random sample
of all participating nodes.
For fault tolerance, application objects are stored at more than one
node in the overlay. A replica function maps an object's key to a set
of replica keys, such that the set of replica roots associated with the
replica keys represents a random sample of participating nodes in the
overlay. Each replica root stores a copy of the object.
The following is a discussion on different existing structured p2p
overlay protocols and how they relate to our abstract model.
3.1 Pastry -
Pastry nodelDs are assigned randomly with uniform distribution from a
circular 128-bit id space. Given a 128-bit key, Pastry routes an
associated message toward the live node whose nodelD is numerically
closest to the key. Each Pastry node keeps track of its neighbour set.
Node state: For the purpose of routing, nodelDs and keys are thought
of as a sequence of digits in base 2b (b is a configuration parameter
with typical value 4). A node's routing table is organized into 128/2b
rows and 20 columns. The 2b entries in row r of the routing table
contain the IP addresses of nodes whose nodelDs share the first r digits
with the present node's nodelD; the r+1 th nodelD digit of the node in
column c of row r equals c. The column in row r that corresponds to the
value of the r +1 th digit of the local node's nodelD remains empty. A
routing table entry is left empty if no node with the appropriate nodelD
prefix is known.
Message routing: At each routing step, a node seeks to forward the
message to a node in the routing table whose nodelD shares with the
key a prefix that is at least one digit (or b bits) longer than the
prefix that the key shares with the present node's id. If no such node
can be found, a `die' message is forwarded to a node whose nodelD
shares a prefix with the key as long as the current node, but is
numerically closer to the key than the present node's ID. If no
appropriate node exists either in the routing table or neighbour set, then
the current node or its immediate neighbour is the message's final
destination.
To achieve self-organization, Pastry nodes must dynamically maintain
their node state, i.e., the routing table and neighbour set, in the
presence of node arrivals and node failures. A newly arriving node with
the new nodelD X can initialize its state by asking any existing Pastry
node A to route a special message using X as the key. The message is
routed to the existing node Z with nodeld numerically closest to X. X
then obtains the neighbour set from Z and constructs its routing table by
copying rows from the routing tables of the nodes it encountered on the
original route from A to Z. Finally, X announces its presence to the
initial members of its neighbour set, which in turn update their own
neighbour sets and routing tables. Similarly, the overlay can adapt to
abrupt node failure by
exchanging a small number of messages (O(log2b N) among a small
number of nodes.
3.2 CAN, Chord, Tapestry -
Tapestry is very similar to Pastry but differs in its approach to
mapping keys to nodes and in how it manages replication. In Tapestry,
neighbouring nodes in the namespace are not aware of each other.
When a node's routing table does not have an entry for a node that
matches a key's nth digit, the message is forwarded to the node with
the next higher value in the nth digit, modulo 2^b, found in the routing
table. This procedure, called surrogate routing, maps keys to a unique
live node if the node routing tables are consistent. Tapestry does not
have a direct analog to a neighbour set, although one can think of the
lowest populated level of the Tapestry routing table as a neighbour set.
For fault tolerance, Tapestry's replica function produces a set of
random keys, yielding a set of replica roots at random points in the ID
space. The expected number of routing hops in Tapestry is log(2^b)N.
Chord uses a 160-bit circular ID space. Unlike Pastry, Chord forwards
messages only in clockwise direction in the circular ID space. Instead
of the prefix-based routing table in Pastry, Chord nodes maintain a
routing table consisting of up to 160 pointers to other live nodes
(called a "finger table"). The i entry in the finger table of node n refers to
the live node with the smallest nodelD clockwise from n+2^(i-1). The
first entry points to n's successor, and subsequent entries refer to
nodes at repeatedly doubling distances from n. Each node in Chord
also maintains pointers to its predecessor and to its n successors in the
nodelD space. The expected number of routing hops in Chord is
5*log(e)N.
CAN routes messages in a d-dimensional space, where each node
maintains a routing table with O(d) entries and any node can be
reached in (d/4)*(N^(1/d)) routing hops on average. The entries in a
node's routing table refer to its neighbours in the d-dimensional space.
Unlike Pastry, Tapestry and Chord, CAN's routing table does not grow
with the network size, but the number of routing hops grows faster than
logN in in this case.
More information about the reader-list
mailing list