Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
HLT CPU Consumption
Search
Sasha Mazurov
February 06, 2012
Science
0
340
HLT CPU Consumption
Sasha Mazurov
February 06, 2012
Tweet
Share
More Decks by Sasha Mazurov
See All by Sasha Mazurov
L1Calo Offline Software Status
mazurov
0
75
Performance and Regression tests for Simulation
mazurov
0
77
About v2
mazurov
0
68
L1Calo Offline Software Status
mazurov
0
100
L1Calo Offline Software Status
mazurov
0
99
LHCbPR V2
mazurov
0
130
Paper approval
mazurov
0
65
Conventions' Publications
mazurov
0
60
Ph.D final exam
mazurov
0
110
Other Decks in Science
See All in Science
データベース10: 拡張実体関連モデル
trycycle
PRO
0
980
データベース08: 実体関連モデルとは?
trycycle
PRO
0
930
動的トリートメント・レジームを推定するDynTxRegimeパッケージ
saltcooky12
0
190
AI(人工知能)の過去・現在・未来 —AIは人間を超えるのか—
tagtag
1
120
Quelles valorisations des logiciels vers le monde socio-économique dans un contexte de Science Ouverte ?
bluehats
1
500
05_山中真也_室蘭工業大学大学院工学研究科教授_だてプロの挑戦.pdf
sip3ristex
0
630
機械学習 - SVM
trycycle
PRO
1
880
ランサムウェア対策にも考慮したVMware、Hyper-V、Azure、AWS間のリアルタイムレプリケーション「Zerto」を徹底解説
climbteam
0
110
機械学習 - K近傍法 & 機械学習のお作法
trycycle
PRO
0
1.2k
Agent開発フレームワークのOverviewとW&B Weaveとのインテグレーション
siyoo
0
340
07_浮世満理子_アイディア高等学院学院長_一般社団法人全国心理業連合会代表理事_紹介資料.pdf
sip3ristex
0
610
Lean4による汎化誤差評価の形式化
milano0017
1
300
Featured
See All Featured
KATA
mclloyd
32
14k
Documentation Writing (for coders)
carmenintech
74
5k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
30
9.7k
Fireside Chat
paigeccino
39
3.6k
Faster Mobile Websites
deanohume
309
31k
Music & Morning Musume
bryan
46
6.8k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
BBQ
matthewcrist
89
9.8k
Statistics for Hackers
jakevdp
799
220k
Side Projects
sachag
455
43k
A Tale of Four Properties
chriscoyier
160
23k
Done Done
chrislema
185
16k
Transcript
HLT CPU Consumption Sasha Mazurov 6 Febrary 2012
Tool Gaudi Auditor & Intel® VTune™ Amplifier XE 2011 Can
be run on any lxplus node
Benefits ➔ Can focus on a specific sequence/algorithm(s). ➔ Skip
initialization & finalization phase. ➔ Report CPU consumption per algorithm / function / class / module. ➔ Perfect GUI & reports.
http://amazurov.ru/cern/intelprofiler/ - installation - documentation - screencasts $> intelprofiler -o
/where/to/store/profiler/output myJob.py
None
Profiler vs. HLT1 Lines (Offline )
https://github.com/mazurov/HltProfiling profiler = IntelProfilerAuditor() profiler.StartFromEventN = 5000 profiler.StopAtEventN = 15000
profiler.IncludeAlgorithms = ["Hlt1TrackAllL0", "Hlt1DiMuonHighMass", "Hlt1DiMuonLowMass"] Jop Options Moore v12r10
Hotspots
Top Hotspots
CPU/Per Function
CPU / Per Module
CPU/Per Algorithm
http://amazurov.ru/cern/hltprofilingresults/
CPU / Per Function In Algorithm
CPU / Per Source Code (debug mode)
TCMalloc vs. “new” Operator
Before: After: CPU: 238 s CPU: 222 s
Results ➔ tc_new is twice faster than “new” operator. ➔
5% total improvement for Hlt1 job.
GCC 4.3 vs. GCC 4.6
GCC 4.3 GCC 4.6 -O2 flag ~ 3.6% worth
Two profiles comparison
Result (preliminary) ➔ It's not evident, that GCC 4.6 optimize
better than GCC 4.3 (for HLT1 jobs).
Future plans ➔ Profile code compiled with GCC 4.6 and
-O3 flag. ➔ Profile code compiled with GCC 4.6's profile driven optimization. ➔ Create a web interface to display collected profiler results.
http://amazurov.ru/cern/hltprofilingpresentation