Tuesday, November 17, 2009

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

Implementing reliable multicast is not as easy as making slight changes to unicast protocols. With TCP, the sender is responsible for loss detection & recovery. This does not work for multicast because (1) if all recipients constantly send the sender ACKS, then the sender will be overwhelmed with traffic (ACK implosion), and (2) the set of recipients is unknown and may be constantly shifting. Essentially, the issue is that multicast does not imply "fate-sharing", whereas unicast does. The authors conclude that receiver-based reliability is a better approach to reliable multicast. The authors also note that the use of communication state (sequence numbers) for naming data does not work well for multicast because state synchronization is weaker; instead, the authors advocate "naming in application data units (ADUs)".

This paper (from 1995) presents Scalable Reliable Multicast, a framework that provides reliable (but not necessarily ordered) multicast delivery over IP. They suggest that ordered delivery and other application-specific details could be built on top of SRM. If I understand it correctly, they are splitting the transport layer in half: SRM would be on the bottom of the transport layer, and then some number of ordering protocols would be on top of it. In my mind, this is equivalent to making the unicast assumptions of UDP and TCP explicit and then building UDP and TCP on top of the unicast framework.
-- Note that I did not understand their description of "Application-level framing". Do they simply mean that there need to be many different transport layer protocols for different types of applications (much like there is UDP vs TCP), or do they mean that you shouldn't really have anything on the transport layer, and applications should be handling this directly?

They use SRM with a whiteboard conferencing tool named "wb" from LBNL. Many users (maybe thousands) simultaneously use a wb instance. Wb *does not* require ordered delivery because its operations are either idempotent or incorrect ordering can be easily fixed. Every change to the whiteboard is multicast to all of the users of a whiteboard, and whiteboard users (the system receivers) are responsible for their own loss and retransmission. If a receiver detects a loss, it schedules a request timer that is drawn from a uniform distribution based on the receiver's estimate of the one-way delay to the source of the data. The request is then multicast, as is the response to the request. If a receiver sees *another* request for the same data before its own request timer expires, then t will do a random exponential backoff and reset its timer. If it gets the missing data, it will cancel its request timer altogether. This provides reliability without the ACK implosion. The point of the randomization & timer scheduling based on location is to de-synchronize host actions to prevent unnecessary duplicates.

They ran some simulations of their request & repair algorithms on small contrived topologies. They say the point of the simulations is to look at the performance for individual packet drops; "We do not claim to be presenting realistic topologies or typical patterns of packet loss." In their scenarios, they show that request & repair algorithms work well in random and bounded-degree trees when every node is a member of the multicast session; it doesn't work as well in a sparse session. This is a bit concerning since the sparse session is the expected real-world scenario, so they make some changes to their request & repair algorithms. Notably, they add adaptive adjustment of timer parameters based on the past behavior of the algorithms in the multicast session. If the average number of dupe requests is high, then the request timer interval increases; if the average number is OK but the delay in sending a request is too high, they decrease the request timer interval. They then do some new simulations and show that that adaptive algorithm works better ("well") under many conditions. However, in their simulations, note that none of the requests or repairs are dropped, which you would see in the real world.
-- Honestly, I am not really sure what the value of their simulations is.

They also suggest (but do not implement or simulate) "local recovery". The idea is that a receiver belongs to a "loss neighborhood" where a set of nodes are all experiencing the same losses. Receivers learn about their loss neighborhoods from session messages (which are sent out periodically). Requests & repairs could be broadcast only to a loss neighborhood, thereby reducing the overall bandwidth used.

Different apps have different tradeoffs between delay and bandwidth. wb wants to minimize delay but isn't worried about bandwidth, whereas something like Usenet would want to minimize bandwidth without concern for delay. This is a little confusing to me -- unlike ordering, these properties aren't something you can build "on top" of SRM. So I am back to my original confusion of whether this is meant to be transport-layer or part of an application itself.

-----------

SRM - transport
IP - routing

No comments:

Post a Comment

About Me

Berkeley EECS PhD student