Slide 47
Slide 47 text
References
17. Jaeger: open source, end-to-end distributed tracing, 2022. https://www.jaegertracing.io/.
18. Pinpoint: problem determination in large, dynamic internet services. DSN, 2002.
19. X-Trace: A Pervasive Network Tracing Framework. NSDI 2007.
20. Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code. SIGCOMM, 2023.
21. perf: Linux pro
fi
ling with performance counters, 2022. https://perf.wiki.kernel.org/index.php/Main_Page.
22. Google-Wide Pro
fi
ling: A Continuous Pro
fi
ling Infrastructure for Data Centers. IEEE Micro, 2010.
23. BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more. https://github.com/iovisor/bcc
24. Tail Sampling with OpenTelemetry: Why it’s useful, how to do it, and what to consider https://opentelemetry.io/blog/2022/tail-sampling/
25. TraceState: Probability Sampling https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling-experimental/
26. Canopy: An End-to-End Performance Tracing And Analysis System. SOSP, 2017.
27. How to diagnose nanosecond network latencies in rich endhost stacks. NSDI, 2022.
28. Computer performance microscopy with shim. ISCA, 2015.
29. Lprof: A Non-Intrusive Request Flow Pro
fi
ler for Distributed Systems. OSDI, 2014.
30. Domino: Understanding Wide-Area, Asynchronous Event Causality in Web Applications. SoCC, 2015
31. NonIntrusive Performance Pro
fi
ling for Entire Software Stacks Based on the Flow Reconstruction Principle. OSDI 2016
32. Minder: Faulty Machine Detection for Large-scale Distributed Model Training. NSDI, 2025.