APF's Network Summaries

Thursday, September 17, 2009

Detailed Diagnosis in Enterprise Networks

[paper]

NetMedic is a network diagnostic tool that is supposed to find the source of network faults at the granularity of a responsible process or firewall configuration. (So not only does it identify the host, but it identifies the culpable application running on the host.) I found this paper confusing because I couldn't figure out where they were planning on placing NetMedic -- is it somewhere in the network, or is it running on host machines? If it's the former, then how does it get enough application information? If it's the latter, then how can you expect it to work correctly given that the host machine is the one experiencing the problem (e.g., it may have a lack of network connectivity)? Furthermore, if it's on the host, what if NetMedic itself interferes with other host applications in some unexpected way? The implementation section finally concretely states this (it was implemented on the host using OS tools), but I was confused until then.

Basically they are trying to correlate the fault with component state and the interaction of components, and then a high correlation is considered causation. A "component" here is any part of the system, and their tool automatically infers relationships /dependencies between components. This seems like it would not work well (given that correlation != causation). However, they did find the correct component in 80% of their cases. Interestingly enough, they didn't have any direct competitors to compare their tool against so they implemented a different technique based loosely on previous work and compared themselves to that. Although unusual, I kind of like this approach for comparison...of course the down side is that since they are implementing both, they are biased and won't try as hard on the competitor.

Floodless in SEATTLE

[Paper]

SEATTLE claims to "...[achieve] the best of both worlds: The scalability of IP combined with the simplicity of Ethernet." On one hand, Ethernet is simple to manage thanks to flat addressing. On the other hand, Ethernet doesn't scale well -- Ethernet bridges use broadcast messages (e.g. ARP requests and DHCP) & paths need to be a spanning tree. To reconcile this problem, network admins break up their large networks into small Ethernet networks connected by IP or use VLANs. The authors assert that both of those solutions are not good enough (inefficient and harder to manage since you need to worry about addressing). SEATTLE is intended to provide the same simple plug-and-play semantics of Ethernet, but also scale well to large networks.

[ to be done: further summarization ]

My criticisms of their simulations:

-- For the packet-level simulation, they start with a real LBNL trace and then inject extra traffic into it. They have fake hosts sending traffic to random destinations. I find this curious considering they make this earlier claim: "In enterprise networks, hosts typically communicate with a small number of other hosts [5], making caching highly effective."
-- They also have to make some kludgy assumptions about the number of hosts connected to each switch due to the anonymization of the traces. I suppose they can't really help that, it's a general field problem.
-- They have really huge error bars in many of their graphs. Is this normal for network simulations due to the nature of error in the physical links etc? Notably, the SEATTLE results have tight error bars even when ROFL and ETH have huge ones (e.g Fig 5a and 5c)...is their simulation somehow skewed for their own results?

Sorry about the lack of summaries......I had prelims on Tuesday and have two papers due Friday......multi-tasking is not going so well.

Monday, September 7, 2009

Congestion Avoidance and Control

[Paper Link]

Networks at the time were suffering from widespread congestion, despite the fact that the TCP specification had a theoretically correct description of how to avoid it. The principle they wish to adhere to is the 'conservation of packets' principle, which states that a new packet should not be transmitted until another packet has left the network. Since the protocol itself adheres to conservation of packets but networks at the time were still failing due to load, then they observe that this must be due to implementation bugs. Thanks to the bugs, one of the following three things happens: a connection doesn't get to equilibrium, a sender injects a new packet before one has exited, or an equilibrium can't be reached because of resource limits.

Descriptions of the changes they made to the 4BSD TCP implementation:

Slow-start. A connection can potentially not get to equilibrium if it acts wonky on startup. A host doesn't send its next packet until it's received an ACK. However, if restarted hosts start off firing off packets really fast and then throttle downwards, or starts at 0 and has no defined way of going upwards, that can cause the connection to not be at equilibrium. Instead, slow-start gradually increases the number of packets being sent. It starts at 1 and increases as more and more ACKs come back.

Round-trip-time variance estimation and exponential retransmit timer backoff. Sometimes packet transmission fails and therefore no ACK is generated. At what point should the sender decide to retransmit? Their round-trip-time variance estimation algorithm addresses this. And, if a packet needs to be retransmitted repeatedly, their exponential retransmit timer backoff algorithm handles that.

More aggressive receiver ack policy and dynamic window sizing on congestion. If recipients ALWAYS send an ACK upon receipt, then the lack of an ACK is equivalent to the network notifying the sender of congestion on that link. That near-guarantee of an ACK if all is well is what they mean by an aggressive receiver ACK policy. Additionally, they describe 'dynamic window sizing', which talks about how to increase/decrease (multiplicative increase and additive increase) to fill up a connection.

-- Unlike the previous paper, none of the main schemes they discuss need the cooperation of anything in the core. They can infer congestion by dropped packets, so no specific indicator (or view of the network state) is necessary. The end-to-end argument is satisfied here.

-- In their future work, they address the issue of misbehaving hosts. They argue that ONLY with the participation of gateways can you ensure fairness, since you can't trust hosts to be fair without another agent to police the situation. The idea is that if a host starts using more than it's fair share, the gateway starts dropping its packets.

-- The paper takes a very practical tone. I especially like how they explain their logic, the major takeaway points, the relevant math, AND the actual algorithm. It's a well-written, easy-to-read paper, in my opinion.

Thursday, September 3, 2009

Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

[Paper link]

Congestion control is designed to keep network devices from being overwhelmed by traffic. A congested network will see high rates of dropped packets as queues overflow. Congestion control is a dynamic resource management problem, where hosts should modulate the frequency of their packet transmissions to reflect the current state of the network. This paper focuses on congestion avoidance, which focuses on keeping traffic at a manageable level that does not approach excessive. (This is opposed to congestion recover, which keeps the traffic load right at the edge of being overly congested.) More specifically, the authors analyze the increase/decrease algorithms of the binary feedback scheme, a decentralized decision-making system. The binary feedback scheme has network resources monitor their state, and when above their optimal range they send a feedback packet with 1 for the congestion bit. Users then adjust their sending according to an increase/decrease algorithm.

They look at 4 criteria for algorithm controls: efficiency, fairness, distributedness (binary feedback vs universal knowledge), and convergence. Their questions based on these criteria are: (1) "What are all the possible solutions that converge to efficient and fair states?" and (2) "How do we compare those controls that converge?" Their main result is that increase should be additive and decrease should be multiplicative.

- Notably, this is an example of something that goes against the (strict?) end-to-end argument. Although the hosts are doing the actual throttling according to the increase/decrease algorithm, this still relies on the addition of a binary feedback bit in the header whose value is set by routers (in the "dumb" core). However, it seems unreasonable to think that any given host could see enough of the network to be able to make such a judgement without advice from routers (as they discuss when they say they desire "distributedness"). It seems like sometimes routers know best about routing (e.g., traffic conditions).

- This model assumes that users will respond honestly and in everyone's best interests when told that the network is being overwhelmed. What if users try to ride the fact that other users will rate limit, and therefore don't limit themselves for the advantage of getting more bandwidth for themselves? Oddly enough they don't mention this in their "Practical Considerations" section.

- Not a huge fan of this paper. It's mathy and details-oriented, but it's not clear to me that understanding those details (and not just the principles/guidelines derived from them) is useful unless you are directly doing research in this area, especially since they fail to consider practical things such as misbehaving hosts.

Understanding BGP Misconfiguration

[Paper link]

This paper presents a study of BGP misconfiguration problems. They watched route advertisements from 23 places and checked them for incorrect information, and then contacted the ISPs responsible for incorrect broadcasts to find out what had caused the problem. They look at two types of problems: 1) "origin misconfiguration", described as the injection of routes into global BGP tables (e.g., the Pakistani-YouTube problem), and 2) "export misconfiguration", which means exporting routes against an ISP's policy. (From the previous paper, one should recall that ISPs have a financial motivation for withholding some of the routes in their table.)

They determined that an event in the BGP update stream was a misconfiguration based on time. If a policy suddenly changes (and then quickly is changed back) then that is indicative of a misconfiguration, rather than a policy shift. They look at changes that lasted for less than a day. They can miss potential misconfigurations if those misconfigurations go undetected by the ISP for a while, but I suppose that those misconfigurations couldn't be too important if nobody noticed them. They can also potentially miscategorize a legitimate policy shift as a misconfiguration, but their communication with the ISP about the cause of the misconfiguration should set the record straight. They also limit the scope of misconfigurations to the two types I listed above, because they aren't able to detect all kinds of misconfigurations (e.g., the inverse of #2 -- filtering something that should have been released -- looks identical to a failed node/link). They observed misconfigurations for 21 days.

The results for origin misconfiguration are astounding. Their results indicate that 72% of route updates per day are actually from a misconfiguration. 13% of the misconfigurations result in a connectivity problem. Does that mean that 9.4% of route updates overall lead to connectivity problems?? That's insanely high...so high that it actually makes me suspicious of their results. They present this as being a low number compared to actual failures, but it's really high (in my mind) compared to the total number of route updates sent. If about 10% of all route updates are causing connectivity problems...well, that seems terrible to me!

Criticisms:

-- They present data showing that most of the misconfigurations are really short-lived. I'm not sure why they present this information, because they have already selected for short misconfigurations. Their categorization method decides that short-lived updates are a misconfiguration, and don't ever consider longer-lived updates. So: if there were misconfigurations of longer duration (e.g., a set of subtler misconfigurations), they would have thrown them out anyway.

-- A note on their presentation: I find their result tables really hard to read. It took me a few minutes to figure out that the table alternated between paths and incidents.

-- A general philosophical point is that this paper is on small, relatively minor errors. Are we really concerned about those? Should we be? Or have people learned to live with this background noise, and we should be looking at what causes the disastrous failures. The disastrous failures might be caused by entirely different factors in entirely different contexts. Since the scale of the failures is different, it seems like other factors would be different too.

Interdomain Internet Routing

[Paper link]

This paper is informational (part of a textbook?), rather than making an argument or evaluating a system. It explains BGP and how inter-ISP route selection is influenced by $$. Routing is not done in a way to maximize global efficiency and performance; instead it is done in a cooperative-competitive fashion where money is a large factor in routing decisions. Notably, transit routing is profitable whereas peering usually is not, and ISPs will value their customers' routing announcements over other routing announcements. The design of BGP has space for attributes that can be used to decide between routes based on factors relevant to profitability.

My specific notes on the paper follow...

- Topology of the Internet. Topology is made of Autonomous Systems (ASes). BGP routes between ASes. Different IGPs (Interior Gateway Protocols) like RIP run inside ASes. A small AS is a university/corporation, a large AS is a nation-wide ISP.

- Two types of AS-AS interconnection:

1. Provider-customer transit. "One ISP provides access to all destinations in its routing tables." Customer is paying provider for this universal access. If there were only transit relationships then there would be a single huge Tier-1 AS at the root, and it would control all Internet traffic.
2. Peering. "Two ASes provide mutual access to a subset of each other's routing tables." Sometimes paid, often reciprocal. Currently there are 9 big Tier-1 ISPs who do peering so that they can have explicit routes to all destinations. Smaller ASes do peering simply to save money, if there's a lot of traffic going between the two ASes.

- "A route advertisement from B to A for a destination prefix is an agreement by B that it will forward packets sent via A destined for any destination in the prefix." Other way to think about it is that ISPs charge customers for entries in routing tables.

- How BGP works. Router A establishes TCP connection to Router B. A sends B "OPEN" message. They then exchange tables of all active routes that they want to share. Each router then adds the routes they want into their own routing table. Routers may then later send UPDATE messages, which can be either announcements or withdrawals. eBGP is used between ASes. iBGP is also used within an AS because an AS may have several eBGP routers that need to be kept consistent. iBGP is *not* the same as IGP. Notable BGP attributes are LOCAL PREF (a local policy for choosing between routes), ASPATH (route length plus a way to discard circular route announcements), and MED (used for comparing two routes from the same neighboring AS).

- Problems with BGP. BGP does not support origin authentication, meaning there's no way to tell which AS owns a prefix, so any AS can claim that it owns any prefix. Traffic from other networks would then be sent to that AS instead of where it should actually go.

- With the Pakistan/You Tube problem, a Pakistan ISP was redirecting internal Pakistani traffic to their own error message to block YouTube. They accidentally leaked this listing to their parent ISP, which started to propagate it, and YouTube traffic started going to the Pakistani ISP. Since the Pakistani ISP was a customer of the larger ISP, it prioritized that listing over the existing one. Additionally, since the listing was MORE specific than the actual YouTube listing, it propagated very quickly.
- This can also be used for spam. A bad guy claims it owns a prefix and sends it upstream, and a questionable or careless ISP propagates it. They then send spam "from" IPs in that prefix and are able to receive responses for a short period of time. Then a few hours or days later, they disappear!
- Perhaps this is part of the larger problem of accountability/authentication not being part of the original goals of the Internet.

Tuesday, September 1, 2009

End-to-End Arguments in System Design

[Paper link]

This paper discusses the question of what responsibilities should be given to a communication system vs. the applications using it. They believe that most functions are more suited to the applications rather than the communication system, and they call this the "end-to-end argument." A motivating example they give is file transfer reliability: should the communication system worry about checksumming and internally confirming each packet that is sent across the network, or should the host application checksum the files once and then request a retransmit of the whole file if the recipient application's checksum fails to match? The high cost of executing the former option outweighs the low odds of a file transmission error occurring. Additionally, one must consider problems external to the communication system, such as the reliability of the hard drives; in order to account for this, the host applications would need to perform their own integrity check even if the communication system provided its own guarantee of reliability. The "end-to-end argument" is that many functions (e.g., reliability) are better performed by the end applications rather than the communication system itself. The communication system should only offer these functions as performance enhancements, and one needs to be careful that the performance "enhancement" actually works out to be beneficial and not overly complex.

In the previous paper on DARPA Internet design, the author also talked about this issue as being a motivating factor for splitting up TCP/IP. Different applications (e.g., voice vs rlogin) need different reliability guarantees; rlogin will wait to receive its packets in order, whereas a voice application would rather drop a packet and insert silence rather than delay transmission for an out-of-order packet. For this reason, TCP and IP were separated so that IP was a lower building block with TCP and other alternatives like IP on top of it, handling reliability as they saw fit. I like this paper because it formalizes the rationale behind TCP/IP separation and makes it a more general principle.

The authors also apply the end-to-end argument to encryption, duplicate message suppression, guaranteeing FIFO message delivery, and transaction management. The general theme is the same as with the file transfer example: the end hosts need to perform the function anyway (and probably can do it better since it has more semantic information), and doing the work in the communication system would just be wasteful duplication. Another important point is that not all applications need the same underlying "help" from the communication system, so what is useful performance enhancement for one type of application might slow down another application.

This same type of discussion could be applied to NIDSs -- should the work of intrusion detection be done on the network or by the host? Doing work on a dedicated NIDS will save hosts computation time, and it's easier to keep a NIDS programmed and updated than it is to keep host security software updated. On the other hand, understanding network traffic sometimes means the NIDS has to infer application behavior, which can become very expensive and lead to DOS attacks. Instead of having a NIDS do bifurcation analysis, it might make more sense to have the host be responsible for filtering its own incoming traffic.

The paper does acknowledge that not EVERYTHING should be moved to the end applications -- there is always a tradeoff. For example, retransmitting a single packet makes more sense than retransmitting an entire file. I would be interested in seeing a formal cost/benefit analysis that could be used to make the decision of when one approach makes more sense.