Monday, September 21, 2009

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

[paper]

PortLand is a follow-up to SEATTLE. They address the issue of networking within a data center. They want to have both the plug-and-play properties of Ethernet plus the ability to support a huge number of end hosts. (Layer 2 networks are easier to administer than Layer 3 networks since they can use flat MAC addresses, but Ethernet bridging doesn't scale because of its broadcasting and difficulty with forward spanning trees in topologies with multiple equal cost paths.) They improve on SEATTLE by adding plug-and-play switches (not just plug-and-play hosts) and removing forwarding loops and all-to-all broadcasts. Their improvement to SEATTLE stems from the observation that data centers are physically organized into multi-rooted trees; SEATTLE was intended for a more general case. The topology is the tree structure we discussed in the last class.

With PortLand, each end host gets a hierarchically-assigned PMAC (pseudo MAC) address that encodes the location of the recipient host in the topology. ARP requests return PMAC addresses rather than actual MAC addresses, and egress switches do PMAC-AMAC rewriting. Edge switches get "pod" and position numbers. Forwarding is done using this topological information to avoid loops. PortLand also has a centralized "fabric manager" that maintains soft state about the network topology. Since it's soft state, (A) the fabric manager does not need to be configured, and (B) replicating the fabric manager does not require complete consistency with the original fabric manager. If a switch sees an ARP request, it will be forwarded to the fabric manager who will do a lookup in its PMAC table and send it back to the switch; this prevents the need for broadcast ARPs. If the fabric manager doesn't have the PMAC information due to a recent failure, it will resort to a broadcast.

-- Unlike the SEATTLE paper, the authors evaluated their work using microbenchmarks (or what I'd call microbenchmarks -- not sure if that's actually the appropriate term) rather than a full simulation. I'm not sure if this approach is very valid since there is no other traffic on the network, so nothing is being stressed. However...I guess the purpose of a full simulation isn't to provide stress, since the loads for stress testing would be different from regular traffic.

-- I like this paper, I think their use of topology information is simple and clever.

POST-CLASS EDIT:
I didn't realize that the fat tree idea was crucial to their idea, since they try to play it off as more generalizable ("the fat tree is simply an instance of the traditional data center multi-rooted tree topology....We present the fat tree because our available hardware/software evaluation platform is built as a fat tree."). I understand that maybe they were trying to forestall criticisms about tying their work too closely to fat tree, but I think that it's kind of deceptive and it hurt my understanding of the paper. I took their dismissal of the fat tree topology fairly seriously and didn't pay much attention to it, whereas after our discussion in class it seems like routing for a fat tree in particular is their main contribution.

No comments:

Post a Comment

About Me

Berkeley EECS PhD student