[Reader-list] [Third posting] Implementation of a peer-to-peer news distribution network

Fri Apr 23 23:40:08 IST 2004

Hi,
	This is my third posting and it is a survey of various approaches
for choosing reliable peers in a P2P network.

Soumava Das

Major areas that need to be studied to design the P2P network :
----------------------------------------------------------------------------------------

Content searching : A key challenge to the usability of a data-sharing
peer-to-peer system is implementing efficient techniques for search and
retrieval of data. The best search techniques for a system depend on
the needs of the application. For example, search techniques based on
distributed hash tables (DHTs) are well-suited for web caches or
archival systems focused on availability, because they guarantee
location of content if it exists, within a bounded number of hops. In
many scenarios, the increased search efficiency makes structured
networks preferable to the widely deployed unstructured networks
which rely on flooding. To achieve these properties, these techniques
tightly control the data placement and topology within the network, and
currently only support search by identifier. In contrast, other work
focuses on more flexible applications with rich queries such as regular
expressions, meant for a wide range of users from autonomous
organizations. A survey on the various search techniques has been
posted to the list last month.

Resource management : Aggregating and allocating peer-to-peer
resources is much more difficult than in a centralized system. One
reason is the autonomous nature of peers: rational, essentially selfish,
peers must be given an incentive to contribute resources. In addition,
the scale of the system, with perhaps very many nodes, makes it hard
to get a complete picture of what resources are available. This is
especially true in a dynamic system, with nodes constantly joining and
leaving, where resources and resource demands are constantly
changing. Our approach to dealing with these issues is to use concepts
from economics to construct a resource market-place, where peers can
buy and sell or trade resources as necessary. Economic incentives are
used to encourage resource sharing, while the problem of system-wide
resource allocation is broken down into numerous exchanges between
pairs of nodes to enhance scalability. We hope to post a write-up on
this topic next month.

Security : P2P data sharing systems are highly susceptible to many
forms of malicious attacks. Nodes in a P2P system operate in an
autonomous fashion, and any node that speaks the system protocol
may participate in the system. However, just because a node can speak
the protocol does not mean that it will do so with good intentions. As a
result, nodes cannot necessarily assume that other nodes will respond
to their queries, limit the number of queries they generate, produce
authentic results, or keep the contents of their queries private. So
protocols must provide mechanisms to mitigate attacks by nodes that
abuse the P2P network by exploiting the implicit trust peers place on
them.

Here we focus on the security aspects of the P2P network. A survey of
various approaches for choosing reliable peers in a P2P network is
done here. Specifically we discuss research meant to address the
security issues related to availability, authenticity and trust. The survey
will help choosing or proposing a trust management system for the P2P
news distribution system that has been proposed.

Why trust management ?
------------------------------------

Peer-to-Peer overlay networks are increasingly gaining acceptance on
the Internet as they provide an infrastructure in which the desired
information and products can be located and traded. While P2P
systems based on central indices have run into legal problems, the
decentralized systems have continued to flourish. Gnutella, Kazaa,
Freenet are extremely popular amongst the WWW community with
millions of users world-wide. One of the most attractive features of a
typical P2P resource-sharing application is the anonymity that it
provides to the requester and the provider of a resource. However, the
open nature of the P2P networks also makes the system vulnerable to
malicious users trying to abuse the network. Genuine looking files may
actually contain viruses, which can potentially destroy data and infect
programs on a peer's hard drive. Though the use of anti-virus software
can detect such incidents, there is hardly any mechanism to prevent
this threat or to punish such malicious users.

The first issue is related to trust models used for building trust among
peers. The second issue is related to secure storage and secure
access of trust values against possible misuses and abuses by
malicious peers. A fair amount of work has been done in the area of
computing reputation-based trust ratings . However, the area of
developing secure underlying protocols to distribute and access the
trust ratings in the overlay network has been relatively unexplored.

Attacks against a system's availability are often called denial-of-service
(DoS) attacks, and are targeted at degrading system performance, or
shutting down a system completely by having malicious clients use up
resources (CPU cycles, disk space, network bandwidth, etc.) such that
these resources cannot be used by legitimate clients in the system. In
addition, a common characteristic of such attacks is that it is often hard
to distinguish nodes that are malicious from those that are simply under
a high load. As a result, there is a need to balance the generated load
so that malicious nodes can use a portion of the system's but not a
disproportionate amount of the resources.

It has been suggested that the future development of P2P systems will
depend largely on the availability of novel methods for ensuring that
peers obtain reliable information on the quality of resources they are
receiving. In this context, attempting to identify malicious peers that
provide non-authentic files or bogus content is more effective than
attempting to identify non-authentic resources themselves, since
malicious peers can easily generate a virtually unlimited number of non-
authentic resources if they are not banned from participating in the
network. The process of tracking the apparent behaviour of peers and
selecting resource providers based on such information is the work of a
reputation system. One weakness of reputation systems is their reliance
on persistent identity in order to maintain a behavioral history of nodes
in the network. Due to the open and anonymous nature of P2P
networks, it may be infeasible to enforce the usage of persistent non-
repudiable identities by all nodes.

Types of security systems:
---------------------------------------

Several researchers have recently addressed the problem of enforcing
security in the peer-to-peer scenario. One main line of work in the
security community has been devoted to the enhancement of access
control approaches with new authentication and authorization
capabilities to address the fact that access requests may represent
interactions between parties that know little about each other. All these
works focused on allowing a peer acting as a server to restrict others'
ability to access its resources. Peer-to-peer systems, however, also
introduce other problems that reverse the security assumptions of
traditional access control and require to focus the attention on providing
protection from those who offer resources (servers), rather than from
those who want to access them (clients). This paradigm shift is due to
the inherent vulnerability of peer-to-peer systems from providers
abusing the network to spread tampered-with resources. Proposals to
prevent or discouraging peers from distributing invalid or malicious
content into the network are based on two main techniques:
micropayment and reputation-based trust systems.
Micropayment techniques (e.g., Mojo Nation, www.mojonation.net) are
less closely related to our approach as they require peers to offer
something of value in exchange of their participation in the system.
Therefore, they impose a cost on malicious peers, as to insert invalid
content into the network they would first need to provide a certain
amount of resources. Reputation models allow the expression and
reasoning about trust in a peer based on its past behaviour and
interactions other peers have experienced with it.

Reputation Management Systems Currently in Use:
-------------------------------------------------------------------------

There are several reputation management systems that are currently
being used in situations such as online auctions and file sharing. While
these systems demonstrate why trust management is needed in "peer-
to-peer" situations, they also illustrate some of the issues that need to
be addressed in the design of a reputation management system. In the
following sections, brief overviews of reputation management systems
that have been implemented is presented, along with their drawbacks.

Reputation management on The eBay system:
------------------------------------------------------------------

eBay is a Web site where buyers and sellers can meet and conduct
auctions entirely online. In eBay's reputation management system,
called the "eBay Feedback System," buyers and sellers can rate each
other based on their past transactions with each other. This is done via
users' leaving comments on one another's eBay user pages, which can
be positive, neutral, or negative. An eBay member's reputation is
calculated by assigning each type of comment a point value (+1 point
for positive comments, 0 points for neutral comments, and -1 point for
negative comments) and summing the point values of all of a member's
transactions to obtain that member's "reputation," which is the number
of positive comments that the user has received. An eBay user's
reputation score can then be used by other users as a factor in their
decisions on whether to conduct a transaction with that user or not.
Unfortunately, eBay's Feedback System can be compromised in several
ways. For example, a seller may gain a high reputation score by leaving
positive comments using forged identities. eBay also uses a centralized
structure to store and present the trust knowledge, this is undesirable in
a P2P system.

 Reputation Management on The Kazaa
------------------------------------------------------------

Kazaa [5] is a peer-to-peer file sharing application that allows its users
to share files with other Kazaa users. It also allows users to search for
and download files from other users who are sharing them. Kazaa has
implemented a reputation management system consisting of two
components, "Integrity Rating" and "Participation Levels," each of which
is described below.

 Integrity Rating on Kazaa

Kazaa has a feature called Integrity Rating that allows peers who share
files to rate their own files in terms of their technical merits, such as
whether the files have accurate metadata and are of high quality. In
order to guide other peers toward the highest quality files available for
download, Kazaa users are encouraged to Integrity Rate their files and
delete files that should not be shared (e.g., virus-infected files or files
that are corrupted), but it is not required that a peer Integrity Rate its
files in order to participate in the Kazaa network. There are four levels
of Integrity Rating for files on Kazaa:

Excellent: File has complete metadata and is of a high technical quality.

Average: File has some metadata (which may not be complete) and the
file is of mediocre technical quality.

Poor: File is of poor technical quality and has no metadata.

Delete file: File should not be shared (e.g., it is virus-infected, bogus, or
unusable).

When a peer Integrity Rates its files, it will earn double points toward its
Participation Level (which is explained below) each time an Integrity
Rated file is downloaded from the peer.

Participation Levels on Kazaa

In Kazaa's reputation management system, each peer has a
"Participation Level" that is based on the quality and amount of files that
it shares. A peer's Participation Level is a number that reflects the ways
in which that peer has used Kazaa to upload and download files. A
peer's Participation Level can be within one of six ranges and is meant
to reward peers who share many Integrity Rated files by providing those
peers with increased bandwidth that they can use to download files
from other peers on the network. A peer's participation level is
calculated using:

Plevel_i      =   ( uploaded_i / downloaded_i ) × 100

where plevel_i is the participation level of peer i, uploaded_i is the
amount of data (in megabytes) that peer i has uploaded to other peers,
and downloaded_i is the amount of data (in megabytes) that peer i has
downloaded from other peers. In the above formula, if peer i uploads a
file that has not been Integrity Rated to another peer, that file will be
counted as half its size when calculating the peer's Participation Level.
Kazaa Participation Levels have a minimum of 0 points and a maximum
of 1000 points. Upon installing the Kazaa file sharing software, each
peer's Participation Level starts in the "Medium" range with 100 points
(which is the second-lowest of the six ranges) and the level can go up
or down from there depending on how the peer uses the network as
described above. A major issue with Kazaa's reputation management
system is that it is designed to reward peers who demonstrate good
peer-to-peer behaviour but it does not punish those who do not.

XRep :
-------------

A representative work is presented by Cornelli et al [3], in which any
peer, say Peer A, who wants to query for the trust value of another
peer, say Peer B, broadcasts a query to the network. Then the peers
who have interacted with Peer B and would like to express their
opinions reply back with their  (IP, Port) tuple, encrypted with public key
of Peer A. After receiving the replies, Peer A will individually contact the
voters and ask them to confirm their votes to filter out incorrect fake
messages. There are a number of drawbacks of this approach:

No persistence: The trust metrics are not persistent. All the peers who
have interacted with Peer B, but are not present in network cannot have
their reviews counted. This can be potentially exploited by a collective
of malicious peers who are always present in the network sending the
same high/low value for Peer B, thus masking the opinions of various
other peers who would have genuine ratings. A small number of
malicious peers can totally dominate the ratings of a targeted peer.
targeted
No anonymity: The peers expressing opinions lose their anonymity. A
peer can potentially query for its own trust value (if protected against
this, ask a friend to query), and identify the peers who are giving poor
trust values for it. Now these peers can be selectively targeted with
other attacks like DoS. This is equivalent to voters not having a right to
secret ballot.

Tedious decision-making: The decision making process becomes
extremely lengthy and tedious. Peer A has to contact all the voters and
confirm their votes, thus increasing the time taken to make a decision
on whether it wants to trust Peer B or not. Also Peer A has to combine
all the valid votes before arriving at a decision.

Eigenrep :
------------------------------

Eigenrep presented an alternative approach. They propose a simplistic
underlying protocol based on a Directed Hash Table (DHT) based
mechanism like CAN or Chord[13]. Each peer has a set of mother
peers, which hold the trust values of that peer. The mother peers are
decided based on hashing the ID of the peer (Different hashes are used
to obtain a number of mother peers). If Peer A wants to query for the
trust value of Peer B, it just hashes its ID to obtain the various mother
peers and then queries them for the trust values. Then, it decides by
taking the majority of those values. This kind of approach also suffers
from a number of short-comings:

 Insecure communication: After hashing a peer's ID to obtain various
mother peers, the communication between the mother peer and the
querying peer is not secured. This is vulnerable to a host of threats like
Man-in-the-middle/Bucket-brigade attacks.

DHT Threats: By relying on a DHT based design, the system
automatically becomes prone to numerous other possible attacks.
Malicious routing-information tampering, malicious lookup replies and a
host of other possible scenarios come into the picture. Even a small
percentage of malicious nodes can have devastating consequences.
The security solutions to some of these attacks are extremely complex
and are not compatible with the current available systems.

No anonymity: Also in such a scheme the identity of mother peers is
exposed. And as already mentioned, a malicious node can attack those
mother peers to prevent them from sending the true trust values.

Group threats: Another problem is that since user chosen IDs are used
to hash, after significant monitoring, a malicious group of nodes (or a
single node simulating a group of nodes) can select good combinations
of IDs to have a favourable scenario, for example, in which trust values
of a particular node in the group are most likely to be hosted by other
group members.

TrustMe: A Protocol for Anonymous Trust Management
------------------------------------------------------------------------------

TrustMe, a protocol developed by Ameek Singh and Ling Liu [7] is of
great importance.
TrustMe is an anonymous and secure protocol for maintaining and
accessing trust rating information. TrustMe uses Public Key
cryptography schemes to provide security and is resistant to various
attacks. TrustMe uses a random assignment of Trust-Holding Agent
peers (henceforth called THA peers) and uses smart Public Key
mechanisms to prevent any loss of anonymity. Also it ensures all
communication to be secure. They argue that the distribution and
access of trust ratings should be distinguished from the routine
functions such as file inquiry and downloading performed in a P2P
system. A unique characteristic of the TrustMe protocol is its support for
mutual anonymity in managing peers' trust relationships. Not only the
peers who access trust ratings of other peers remain anonymous but
also peers who store trust ratings of other peers are protected from
targeted attacks by keeping their identity hidden. In addition, the
TrustMe design ensures the following properties:
1. Security: Due to decentralized management of trust relationships, the
trust rating of a peer is stored at other peers in the network and it is
extremely important to protect these trust hosting peers from targeted
attacks.
2. Reliability: It is important to ensure that anybody querying for a trust
value gets the true trust value inspite of presence of various malicious
users.
3. Accountability: In a peer-review based trust systems, it is important
that peers are accountable for the feedback they provide about other
peers. Any malicious peer trying to manipulate trust ratings should be
identifiable.

TrustMe broadly functions in the following manner. Each peer is
equipped with a couple of public-private key pairs. The trust values of a
peer (say Peer B) are randomly assigned to another peer (THA peer) in
the network. This assignment is done by the bootstrap server in a way
that the trust holding responsibilities are equally distributed amongst
the participating peers. This assignment is unknown to all peers
including Peer B. All the communication with the THA peer is carried
out using a special key which indicates its knowledge of the trust value
of Peer B. Any peer (say Peer A) interested in querying for the trust
value of Peer B can broadcast a trust query for Peer B. The THA peer
replies with the trust value along with some other information.
Depending upon the trust value, Peer A can decide to interact with Peer
B or not. Also after an interaction, Peer A can securely file a report
(after giving adequate proof of the interaction) for Peer B, indicating
Peer A's new trust value for Peer B. Then, the THA peer can modify the
trust rating of Peer B. To provide security, reliability and accountability,
TrustMe uses smart public key cryptography mechanisms.

The protocol achieves complete anonymity, security, reliability and
accountability. In the following, we summarize a few other important
aspects of the TrustMe protocol:

Persistence: Any peer just has to file a report to make sure its
experiences are accounted for in the trust value of the interacting peer.
After that, even if that peer logs out of the system, its review is counted.
This way, we provide persistence to votes and provide a stronger trust
mechanism. Non-persistent systems can provide highly misleading trust
values in the presence of even a small number of  malicious peers.

No Central Trusted Authority: It is important to notice that the bootstrap
server does not act as a CTA. It is rather a form of a certification
authority. All the trust mechanisms are within the network and the
bootstrap server does not participate in it.

Small decision time: Only a reply messages is enough for a peer to
make a decision on whether to interact with a particular peer or not.
This is extremely convenient and fast.

Ease of contribution: It is extremely easy for a peer to contribute its trust
value for another peer - by sending a single report message. This is a
highly convenient as opposed to notifying each THA peer as in
Eigenrep.

TrustMe over other protocols:
------------------------------------------

After studying a variety of trust management protocols, we see that
amongst the protocols studied so far, the TrustMe protocol is best
suitable to be adopted as the trust management protocol for the P2P
news distribution network that we proposed.

It ensures the total anonymity of the querying as well as the answering
peers. This will help in providing privacy to all the users and escape
censorship, which is a desire property of our proposed P2P news
distribution network. Also we see that the decision making process
(about which peer to take the service from) of the TrustMe protocol is
pretty lenient in terms of  bandwidth consumption. Only the THA peers'
reply is needed and a decision is made from the THA replies only. This
is very important since, in India, majority of the users use a low
bandwidth Internet connection.  This is much better than the polling
based mechanisms since they have to collect the replies from all the
peers and combine them.Every peer while forwarding the trust value,
can cache the trust value. This can be stored for a certain amount of
time in its cache, this will significantly reduce the response time of the
querying peers. Unlike eBay and some other systems, TrustMe does
not use any centralized server or global control mechanism to store and
distribute the trust values. The protocol is fully decentralized. This
prevents the system from Denial of Service attacks. Being fully
decentralized will also help in maintaining a pure ideal P2P network
without incorporating any hierarchy  in the system.
The stored trust values are persistent in TrustMe, which helps in better
trust as the vote of a particular user stays in even after a user has
logged out, as the vote is stored in a secret ballot box.

Future work :
---------------------
We need a survey about the trust metrics that we can select to correctly
signify the trust values of each peer. The trust metrics could be server
based or content based or could be a collection of both.

References :
------------------

[1] S. Kamvar, M. Schlosser, and H. Garcia-Molina. Eigenrep:
Reputation management in p2p networks. In Twelvth International
World Wide Web Conference, 2003.

[2] K. Aberer and Z. Despotovic. Managing trust in a peer-2-peer
information system. In CIKM, 2001.

[3] F. Cornelli, E. Damiani, S. D. C. di Vimercati, S. Paraboschi, and P.
Samarati. Choosing reputable servents in a p2p network. In Eleventh
International World Wide Web Conference, Honolulu, Hawaii, May
2002.

[4] K. Aberer and Z. Despotovic. Managing trust in a peer-2-peer
information system. In Proc. of the Tenth International Conference on
Information and Knowledge Management (CIKM 2001), Atlanta,
Georgia, November 2001.

[5] KaZaA. http://www.kazaa.com

[6] The Gnutella Protocol Specification v0.4 (Document Revision 1.2),
June 2001. http://www9.limewire.com/developer/gnutella protocol
0.4.pdf.

[7] Ameek Singh, Ling Liu. TrustMe, Anonymous management of trust
relationship in decentralized P2P systems. College of computing,
Georgia Tech.

[8] Beverly Yang, Patrick Vinograd, Hector Garcia-Molina. Evaluating
GUESS and Non-Forwarding Peer-to-Peer Search.
.
[9] Groove networks, see http://www.groove.net.

[10] Napster, see www.napster.com.

[11] Seti at home, see http://setiathome.ssl.berkeley.edu.

[13] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H.
Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet
applications. In Proceedings of the 2001 Conference on Applications,
Technologies, Architectures, and Protocols for Computer
Communications, 2001.