HLT CPU Consumption

8231885873e2149e491d180073311049?s=47 Sasha Mazurov
February 06, 2012

HLT CPU Consumption

8231885873e2149e491d180073311049?s=128

Sasha Mazurov

February 06, 2012
Tweet

Transcript

  1. HLT CPU Consumption Sasha Mazurov 6 Febrary 2012

  2. Tool Gaudi Auditor & Intel® VTune™ Amplifier XE 2011 Can

    be run on any lxplus node
  3. Benefits ➔ Can focus on a specific sequence/algorithm(s). ➔ Skip

    initialization & finalization phase. ➔ Report CPU consumption per algorithm / function / class / module. ➔ Perfect GUI & reports.
  4. http://amazurov.ru/cern/intelprofiler/ - installation - documentation - screencasts $> intelprofiler -o

    /where/to/store/profiler/output myJob.py
  5. None
  6. Profiler vs. HLT1 Lines (Offline )

  7. https://github.com/mazurov/HltProfiling profiler = IntelProfilerAuditor() profiler.StartFromEventN = 5000 profiler.StopAtEventN = 15000

    profiler.IncludeAlgorithms = ["Hlt1TrackAllL0", "Hlt1DiMuonHighMass", "Hlt1DiMuonLowMass"] Jop Options Moore v12r10
  8. Hotspots

  9. Top Hotspots

  10. CPU/Per Function

  11. CPU / Per Module

  12. CPU/Per Algorithm

  13. http://amazurov.ru/cern/hltprofilingresults/

  14. CPU / Per Function In Algorithm

  15. CPU / Per Source Code (debug mode)

  16. TCMalloc vs. “new” Operator

  17. Before: After: CPU: 238 s CPU: 222 s

  18. Results ➔ tc_new is twice faster than “new” operator. ➔

    5% total improvement for Hlt1 job.
  19. GCC 4.3 vs. GCC 4.6

  20. GCC 4.3 GCC 4.6 -O2 flag ~ 3.6% worth

  21. Two profiles comparison

  22. Result (preliminary) ➔ It's not evident, that GCC 4.6 optimize

    better than GCC 4.3 (for HLT1 jobs).
  23. Future plans ➔ Profile code compiled with GCC 4.6 and

    -O3 flag. ➔ Profile code compiled with GCC 4.6's profile driven optimization. ➔ Create a web interface to display collected profiler results.
  24. http://amazurov.ru/cern/hltprofilingpresentation