Network failures continue to plague datacenter
operators as their symptoms may not have
direct correlation with where or why they occur. We
introduce 007, a lightweight, always-on diagnosis application
that can find problematic links and also
pinpoint problems for each TCP connection. 007 is
completely contained within the end host. During
its two month deployment in a tier-1 datacenter, it
detected every problem found by previously deployed
monitoring tools while also finding the sources of
other problems previously undetected.