[paper]
NetMedic is a network diagnostic tool that is supposed to find the source of network faults at the granularity of a responsible process or firewall configuration. (So not only does it identify the host, but it identifies the culpable application running on the host.) I found this paper confusing because I couldn't figure out where they were planning on placing NetMedic -- is it somewhere in the network, or is it running on host machines? If it's the former, then how does it get enough application information? If it's the latter, then how can you expect it to work correctly given that the host machine is the one experiencing the problem (e.g., it may have a lack of network connectivity)? Furthermore, if it's on the host, what if NetMedic itself interferes with other host applications in some unexpected way? The implementation section finally concretely states this (it was implemented on the host using OS tools), but I was confused until then.
Basically they are trying to correlate the fault with component state and the interaction of components, and then a high correlation is considered causation. A "component" here is any part of the system, and their tool automatically infers relationships /dependencies between components. This seems like it would not work well (given that correlation != causation). However, they did find the correct component in 80% of their cases. Interestingly enough, they didn't have any direct competitors to compare their tool against so they implemented a different technique based loosely on previous work and compared themselves to that. Although unusual, I kind of like this approach for comparison...of course the down side is that since they are implementing both, they are biased and won't try as hard on the competitor.
Blog Archive
-
▼
2009
(32)
-
▼
September
(11)
- Understanding TCP Incast Through Collapse in Datac...
- Safe and Effective Fine-grained TCP Retransmission...
- VL2: A Scalable and Flexible Data Center Network
- PortLand: A Scalable Fault-Tolerant Layer 2 Data C...
- Detailed Diagnosis in Enterprise Networks
- Floodless in SEATTLE
- Congestion Avoidance and Control
- Analysis of the Increase and Decrease Algorithms f...
- Understanding BGP Misconfiguration
- Interdomain Internet Routing
- End-to-End Arguments in System Design
-
▼
September
(11)
Subscribe to:
Post Comments (Atom)
About Me
- Adrienne
- Berkeley EECS PhD student
I am fine with short summaries like this. I agree with your comments on correlation not necessarily suggesting causality. However there is much in the structure of distributed system interactions, where components are linked in pipelines, from which causality could be extracted. I would like to pursue the approach of active experiments to validate causality, as discussed in class.
ReplyDelete