18. Pinpoint: problem determination in large, dynamic internet services. DSN, 2002. 19. X-Trace: A Pervasive Network Tracing Framework. NSDI 2007. 20. Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code. SIGCOMM, 2023. 21. perf: Linux pro fi ling with performance counters, 2022. https://perf.wiki.kernel.org/index.php/Main_Page. 22. Google-Wide Pro fi ling: A Continuous Pro fi ling Infrastructure for Data Centers. IEEE Micro, 2010. 23. BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more. https://github.com/iovisor/bcc 24. Tail Sampling with OpenTelemetry: Why it’s useful, how to do it, and what to consider https://opentelemetry.io/blog/2022/tail-sampling/ 25. TraceState: Probability Sampling https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling-experimental/ 26. Canopy: An End-to-End Performance Tracing And Analysis System. SOSP, 2017. 27. How to diagnose nanosecond network latencies in rich endhost stacks. NSDI, 2022. 28. Computer performance microscopy with shim. ISCA, 2015. 29. Lprof: A Non-Intrusive Request Flow Pro fi ler for Distributed Systems. OSDI, 2014. 30. Domino: Understanding Wide-Area, Asynchronous Event Causality in Web Applications. SoCC, 2015 31. NonIntrusive Performance Pro fi ling for Entire Software Stacks Based on the Flow Reconstruction Principle. OSDI 2016 32. Minder: Faulty Machine Detection for Large-scale Distributed Model Training. NSDI, 2025.