Speaker Deck
Speaker Deck Pro
Sign in
Sign up
for free
HLT CPU Consumption
Sasha Mazurov
February 06, 2012
Science
0
260
HLT CPU Consumption
Sasha Mazurov
February 06, 2012
Tweet
Share
More Decks by Sasha Mazurov
See All by Sasha Mazurov
mazurov
0
29
mazurov
0
25
mazurov
0
40
mazurov
0
30
mazurov
0
26
mazurov
0
100
mazurov
0
35
mazurov
0
25
mazurov
0
55
Other Decks in Science
See All in Science
shuntaros
0
470
shuntaros
0
240
usamik26
1
920
tomohideshibata
5
2.5k
nkimoto
0
110
itakeshi
3
1.3k
yuya4
3
380
shuntaros
0
490
dwhgg
0
450
tagtag
0
110
qiringji
13
6.2k
nowism
0
1.4k
Featured
See All Featured
maltzj
500
36k
jnunemaker
PRO
40
4.5k
roundedbygravity
84
7.8k
danielanewman
1
470
lauravandoore
437
28k
chriscoyier
780
240k
kneath
294
39k
morganepeng
18
1.1k
swwweet
206
6.8k
moore
125
21k
erikaheidi
13
4.2k
trishagee
20
2k
Transcript
HLT CPU Consumption Sasha Mazurov 6 Febrary 2012
Tool Gaudi Auditor & Intel® VTune™ Amplifier XE 2011 Can
be run on any lxplus node
Benefits ➔ Can focus on a specific sequence/algorithm(s). ➔ Skip
initialization & finalization phase. ➔ Report CPU consumption per algorithm / function / class / module. ➔ Perfect GUI & reports.
http://amazurov.ru/cern/intelprofiler/ - installation - documentation - screencasts $> intelprofiler -o
/where/to/store/profiler/output myJob.py
None
Profiler vs. HLT1 Lines (Offline )
https://github.com/mazurov/HltProfiling profiler = IntelProfilerAuditor() profiler.StartFromEventN = 5000 profiler.StopAtEventN = 15000
profiler.IncludeAlgorithms = ["Hlt1TrackAllL0", "Hlt1DiMuonHighMass", "Hlt1DiMuonLowMass"] Jop Options Moore v12r10
Hotspots
Top Hotspots
CPU/Per Function
CPU / Per Module
CPU/Per Algorithm
http://amazurov.ru/cern/hltprofilingresults/
CPU / Per Function In Algorithm
CPU / Per Source Code (debug mode)
TCMalloc vs. “new” Operator
Before: After: CPU: 238 s CPU: 222 s
Results ➔ tc_new is twice faster than “new” operator. ➔
5% total improvement for Hlt1 job.
GCC 4.3 vs. GCC 4.6
GCC 4.3 GCC 4.6 -O2 flag ~ 3.6% worth
Two profiles comparison
Result (preliminary) ➔ It's not evident, that GCC 4.6 optimize
better than GCC 4.3 (for HLT1 jobs).
Future plans ➔ Profile code compiled with GCC 4.6 and
-O3 flag. ➔ Profile code compiled with GCC 4.6's profile driven optimization. ➔ Create a web interface to display collected profiler results.
http://amazurov.ru/cern/hltprofilingpresentation