Wednesday, November 4, 2009

Looking Up Data in P2P Systems

P2P networks raise the "lookup problem": how can you locate data on the P2P network without a centralized server or hierarchy? Simplest form of lookup: searcher broadcasts a message to all neighbors with a request for X. Each neighbor checks its local database; if it doesn't have the item, it forwards to its neighbors, and repeat. However, all these broadcast messages will take up a lot of bandwidth.

Distributed hash tables (DHTs) offer a solution. In a DHT, all keys & nodes haves IDs. Each key is stored at a node with an ID that's "close" to the ID's key. Each node sends a lookup request to another node "closer" in the ID space until the lookup request gets to the right place. However, nodes need a routing algorithm to be able to *find* a closer node.

-- Skiplist-like routing: Each node has a table of node IP & ID pairs of nodes whose IDs are 1/2 around the ID space, 1/4 around, 1/8 around, etc. A searcher node will send a lookup request to the node with the closest ID that does not exceed the key ID.
-- Tree-like routing: Each node has a randomly assigned ID, and each one knows a node with a certain "prefix". It uses knowledge of the tree structure to find nodes.
-- Multi-dimensional routing: Not sure I quite follow this. D-dimensional Cartesian coordinate space partitioned into hyper-rectangles (zones), and each node has a zone & identified by the boundaries of its zone. A key gets a point in the d-dimensional space and is stored in the node that owns the key's point's zone. The lookup follows the straight line path from the searcher to the key.

I found this paper interesting because I've been wondering for a while how DHTs work, and this is a review of what seem like important topics for understanding anything about DHTs in more detail. I really like that this paper was on the reading list.

No comments:

Post a Comment

About Me

Berkeley EECS PhD student