APF's Network Summaries: Detailed Diagnosis in Enterprise Networks

[paper]

NetMedic is a network diagnostic tool that is supposed to find the source of network faults at the granularity of a responsible process or firewall configuration. (So not only does it identify the host, but it identifies the culpable application running on the host.) I found this paper confusing because I couldn't figure out where they were planning on placing NetMedic -- is it somewhere in the network, or is it running on host machines? If it's the former, then how does it get enough application information? If it's the latter, then how can you expect it to work correctly given that the host machine is the one experiencing the problem (e.g., it may have a lack of network connectivity)? Furthermore, if it's on the host, what if NetMedic itself interferes with other host applications in some unexpected way? The implementation section finally concretely states this (it was implemented on the host using OS tools), but I was confused until then.

Basically they are trying to correlate the fault with component state and the interaction of components, and then a high correlation is considered causation. A "component" here is any part of the system, and their tool automatically infers relationships /dependencies between components. This seems like it would not work well (given that correlation != causation). However, they did find the correct component in 80% of their cases. Interestingly enough, they didn't have any direct competitors to compare their tool against so they implemented a different technique based loosely on previous work and compared themselves to that. Although unusual, I kind of like this approach for comparison...of course the down side is that since they are implementing both, they are biased and won't try as hard on the competitor.

1 comment:

Randy H. KatzSeptember 18, 2009 at 3:02 PM
I am fine with short summaries like this. I agree with your comments on correlation not necessarily suggesting causality. However there is much in the structure of distributed system interactions, where components are linked in pipelines, from which causality could be extracted. I would like to pursue the approach of active experiments to validate causality, as discussed in class.

APF's Network Summaries

Blog Archive

Thursday, September 17, 2009

Detailed Diagnosis in Enterprise Networks

1 comment:

About Me