Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Advanced Modular Software Performance Monitoring
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Sasha Mazurov
May 30, 2012
Technology
270
1
Share
Advanced Modular Software Performance Monitoring
CPU profiling with Intel® VTune™ Amplifier XE
Sasha Mazurov
May 30, 2012
More Decks by Sasha Mazurov
See All by Sasha Mazurov
L1Calo Offline Software Status
mazurov
0
79
Performance and Regression tests for Simulation
mazurov
0
98
About v2
mazurov
0
72
L1Calo Offline Software Status
mazurov
0
110
L1Calo Offline Software Status
mazurov
0
110
LHCbPR V2
mazurov
0
150
Paper approval
mazurov
0
83
Conventions' Publications
mazurov
0
72
Ph.D final exam
mazurov
0
140
Other Decks in Technology
See All in Technology
M5Stack CoreS3とZephyr(RTOS)で Edge AIっぽいことしてみた
iotengineer22
0
330
Route 53 Global Resolver で高額課金発生!
otanikohei2023
0
120
【技術書典20】OpenFOAM(自宅で深める流体解析)流れと熱移動(2)
kamakiri1225
0
270
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
2.4k
巨大プラットフォームを進化させる「第3のROI」
recruitengineers
PRO
2
1.3k
AWS Transform CustomでIaCコードを自由自在に変換しよう
duelist2020jp
0
160
AIが書いたコードを信じられない問題 〜レビュー負荷を下げるために変えたこと〜 / The AI Code Trust Gap: Reducing the Review Burden
bitkey
PRO
8
1.4k
Oracle Cloud Infrastructure:2026年4月度サービス・アップデート
oracle4engineer
PRO
0
110
AIが盛んな時代に 技術記事を書き始めて起きた私の中での小さな変化
peintangos
0
220
サイボウズ 開発本部採用ピッチ / Cybozu Engineer Recruit
cybozuinsideout
PRO
10
79k
[OpsJAWS 40]リリースしたら終わり、じゃなかった。セキュリティ空白期間をAWS Security Agentで埋める
sh_fk2
3
260
目的ファーストのハーネス設計 ~ハーネスの変更容易性を高めるための優先順位~
gotalab555
8
2.5k
Featured
See All Featured
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.4k
Into the Great Unknown - MozCon
thekraken
41
2.4k
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
490
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
330
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.2k
Documentation Writing (for coders)
carmenintech
77
5.3k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.6k
Technical Leadership for Architectural Decision Making
baasie
3
340
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
280
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
170
Transcript
Advanced Modular Software Performance Monitoring CPU profiling with Intel® VTune™
Amplifier XE Alexander Mazurov Ferrara University, CERN
2 I. Event Processing Software II. Profilers III. Intel® VTune™
Amplifier XE IV. Gaudi Framework V. Gaudi Intel Profiler Auditor VI. Profiling examples
3 Physics events The Higgs Boson Simulation * Trigger *
Analysis I. Event Processing Software
4 Detector events Events to storage 106 events/sec 4500 events/sec
LHCb High Level Trigger (HLT) Software Moore
5 II. Profilers Collect information related to how an
application or system perform.
6 Measure frequency and duration of functions calls and/or code
instructions. CPU Profiler
7 Profiling Techniques - Hardware counters - Instrumenting the code
8 Hardware counters Exploit hardware performance counters from Performance Monitoring
Unit (PMU) Counters: - Translation lookaside buffer (TLB) misses - Cache misses - Stall cycles - Memory access latency - ... Perfmon2 * Intel VTune Amplifier
9 Instrumenting the code - Statically: * Change code manually
/ automatically * Compiler assisted (gcc -pg) - Dynamically (at runtime): * Change code in runtime - Valgrind - Google Performance Tools - Intel VTune Amplifier
10 III. VTune™ Amplifier XE Performance Profiling Tool - x86
(32 and 64-bit) - GUI and CLI
11 VTune™ Features Runtime instrumenting profiler - User-mode sampling -
Hardware-based sampling - Concurrency and locks and waits analysis - Threading timeline - Attach to a running process - Source view
12 1) Interupts a process 2) Collect samples of all
active instruction addresses 3) Restore a call sequence upon each sample. How user-mode sampling works?
13 User-mode analysis types - Hotspots - Concurrency - Locks
and Waits
14 User-mode sampling Hotspots analysis:
15 Group results
16 Call Stack
17 Filter by timeline
18 CPU time by code line Debug mode (-g)
19 User-mode sampling is a statistical method and does not
provide a 100% accurate results. Accuracy depends on: - Duration of the collection - Speed of processor - Amount of software activity - Sampling interval * recommended value is 10 ms * profiling is only 5% slower Sampling Accuracy
20 Integrating VTune™ Amplifier to Event Processing Framework
21 IV. Gaudi Event processing framework Moore Trigger Gauss Simulation
Brunel Reconstruction Online Monitoring and commissioning DaVinci Physics analysis
22 Gaudi Architecture Algorithms * Services * Tools
23 Moore Event Loop Hlt1DiMuonHighMassFilterSequence Hlt1DiMuonHighMassStreamer FastVeloHlt MuonRec Velo2CandidatesDiMuonHighMass GECLooseUnit
createITLiteClusters createVeloLiteClusters Algorithms Sequence How to profile algorithms?
24 V. Gaudi Intel Profiling Auditor VTune™ User API +
Gaudi Auditors API
25 VTune™ User API - Start/Pause profiling - Mark profiling
regions
26 Gaudi Auditors API Algorithm Start event End
event Callback functions
27 Algorithms profiling (I) CPU time per sequence branch
28 Algorithms profiling (II)
29 Gaudi configuration from Configurables import IntelProfilerAuditor profiler = IntelProfilerAuditor()
profiler.StartFromEventN = 5000 profiler.StopAtEventN = 15000 AuditorSvc().Auditors += [profiler]
30 Run: $> intelprofiler -o /collected/data job.py Analyze (GUI): $>
amplxe-gui /collecter/data/r001hs Analyze (CLI): $> amplxe-cl -reports hotspots -r /collecter/data/r001hs
31 VI. Profiling examples 1. Memory allocation functions 2. Measuring
profiling accuracy 3. Custom reports
32 1. Memory allocation functions operatornew from libstdc++ library: tc_new
from tcmalloc library: tc_new uses twice less time then operatornew
33 2. Measuring profiling accuracy Intel Profiling Auditor vs .
Timing Auditor Measures the absolute time of algorithm's run 1000 events
34 3. Custom reports Build reports using CSV files exported
from VTune Amplifier
35 Conclusions Intel® VTune™ Amplifier XE: + Various analysis types
and reports + Rich User API + Reasonable overhead time