GT ICR - Ophélie RENAUD

SimSDP: Proof of Concept for Radio Astronomy Imaging on High-Performance
Architectures O. Renaud*, M. Quinson, N. Gac *ENS Rennes, IRISA, CNRS, SATIE Paris-Saclay, France [email protected] GT SKA

2 SDP SDP Output image Square Kilometre Array (SKA) CSP
CSP Visibilities Introduction CSP: Central Signal Processor SDP: Science Data Processor

3 CSP CSP SDP SDP Visibilities Output image Square Kilometre
Array (SKA) Introduction Huge collecting surface Storage constraints 2Pb/s 20Tb/s 8.9Tb/s 7.8Tb/s Process as fast as possible

How to simulate SDP imaging pipelines on HPC systems? ?
Not parallel programming expert Algorithm in development Not yet built HPC System SKA objectives Introduction

Objectives • Optimize • Allocate • Analyse Introduction

Resource Allocation process on HPC systems

[r] C. Erbas, et. al, « Multi objective optimization and
evolutionary algorithms for the application mapping problem in multiprocessor system-on-chip design » Graph based Algorithm-Architecture Adequation (AAA) 7 Model of Computation (MoC) Model of Architecture (MoA) Adequation Code generation Previous work

Dataflow MoC i+1 i+1 Core1 A C A C Core2
B C B C Core3 D E E D Task // Data // Pipeline // ✓ Expression of several types of parallelism i+1 i+1 i+1 i+1 C 1 1 1 A 2 B 2 D 1 2 E 1 2 D D_1 D_2 P P 8 Previous work SDF: Synchronous Dataflow PiSDF: Parameterized and Interfaced SDF [r] E. Lee and D. Messerschmitt. “Static scheduling of synchronous data flow programs for digital signal processing”. [r] K. Desnos et. al. "Pimm : Parameterized and interfaced dataflow meta-model for mpsocs runtime reconfiguration" ✓ Ensure consistency, prevents manual mistakes. ✓ High predictability. ✓ Allow automatic resource allocation.

Inputs Mapping Scheduling Timing Extraction flattening DAG Ω C A
B x 1 x 2 x 1 x 4 A0 2(B)0 2(B)1 C0 C1 Core1 Core0 A0 2(B)0 2(B)2 C0 C0 Translation C A 2(B) x 1 x 2 x 2 9 Resource allocation on standard system SCAPE clustering Ω C A 2(B) x 1 x 2 x 1 x 2 Core0 Core1 Shared Memory Previous work [1] O. Renaud, D. Gageot, K. Desnos, J.-F. Nezan. SCAPE: HW-Aware Clustering of Dataflow Actors for Tunable Scheduling Complexity, DASIP, 2023 [2] O. Renaud, N. Haggui, K. Desnos, J.-F. Nezan. Automated Clustering and Pipelining of Dataflow Actors for Controlled Scheduling Complexity, EUSIPCO, 2023 [3] O. Renaud, H. Miomandre, K. Desnos, J.-F. Nezan. Automated Level-Based Clustering of Dataflow Actors for Controlled Scheduling Complexity, JSA, 2024

SimSDP - Resource allocation on HPC system 10 Previous work
[4] O. Renaud, A. Gougeon, K. Desnos, C. Phillips, J. Tuthill, M. Quinson, J.-F. Nezan. SimSDP: Automatic Workload-Balancing on Multi-Node & Multi-Core HPC Architectures based on dataﬂow models, TPDS. SimSDP Thread-Level partitioning (the previous slide) Node-Level partitioning Simulation

SimSDP [5] Ophélie Renaud, Karol Desnos, Erwan Raffin, Jean-François, “Multicore
and Network Topology Codesign for Pareto-Optimal Multinode Architecture”, EUSIPCO, 2024 Build architecture model Store latency, memory, energy, cost for(archi α ∊ 〈nNode, nCore, Topology〉) Stop → ∃i ≥ δα : Lfinal(α, i) ≤ Lfinal(Smax) This prove the reliability of the SimSDP and its exploitability in HPC DSE Co-designing HPC HW/SW with SimSDP Previous work

Radio astronomy imaging algorithms

13 Radio astronomy imaging principle Existing work CSP visibilities Set
up Δ (major loop) degridding-gridding Ψ (minor loop) deconvolution G Output image ↑ correlation point of a pair of antenna

14 Radio astronomy imaging principle CSP visibilities Set up Δ
(major loop) degridding-gridding Ψ (minor loop) deconvolution G u v Allow FFT-1 Allow vis comparison + adjust model Dirty image Existing work

15 Radio astronomy imaging principle CSP visibilities Set up Δ
(major loop) degridding-gridding Ψ (minor loop) deconvolution G Dirty image ∗ PSF Sky Image Existing work

Deconvoluted image Dirty Image Cycle Existing work

17 Generic imaging pipeline Set up Δ (major loop) degridding-gridding
Ψ (minor loop) deconvolution G Direct Fourier Transform (DFT) (simple but long) Fast Fourier Transform (FFT) (faster on big grid) Grid to Grid (G2G) (faster, same O) N. Monnier Högbom CLEAN (simple but long) [r] S. Wang, N. Gac, H. Miomandre, J.-F. Nezan, K. Desnos, F. Orieux « An Initial Framework for Prototyping Radio-Interferometric Imaging Pipelines» Selection of algorithms specifying major and minor loops Existing work This pipeline is a very good entry point for comparing the performance of algorithms on 1 CPU architecture node

18 Polynomial ﬁt function for timing simulation [r] S. Wang,
N. Gac, H. Miomandre, J.-F. Nezan, K. Desnos, F. Orieux « An Initial Framework for Prototyping Radio-Interferometric Imaging Pipelines» Existing work actor_timing(target, param1,param2) = polynomial(param1,param2), where param ∊ NUM_VIS, GRID_SIZE, NUM_MINOR_CYCLE # Building the scripted benchmark FOR each param1 ∊ PARAM1 DO: FOR each param2 ∊ PARAM2 DO: EXECUTE Instrumented_code(param1,param2) SAVE {timing,conﬁg} → actor_timing.csv # Calculating the polynomial and RMSE FOR each actor_timing.csv ∊ CSV FOR each dof ∊ DOF COMPUTE polynomial(dof) COMPUTE RMSE(measure, polynomial) This method is currently manual, so that limits it: • in sampling (up to 8 samples per parameter) • in the degree of polynomial evaluated (limited to 2) The goal is to facilitate pipeline comparison varying parameters

Radio astronomy imaging algorithms on HPC system

Target architectures f0 - pipeline f1 - pipeline f20 -
pipeline … software pipeline representation Multinode - multicore node0 Router Shared mem C0 Cn … node20 Shared mem C0 Cn … … Multinode - monoGPU node0 Router Shared mem C0 GPU node20 Shared mem C0 GPU … Multinode - multiGPU node0 Router Shared mem C0 G0 node20 Shared mem C0 G0 … … G0 G0 Ongoing work

Simulating generic imaging pipelines - 21 freq - CPU frequency
based node partitioning This corresponds to the Sunrise benchmark [DASIP] applied to HPC (the comparison with the measurement is missing). Ongoing work

22 Polynomial fit function for timing simulation [r] S. Wang,
N. Gac, H. Miomandre, J.-F. Nezan, K. Desnos, F. Orieux « An Initial Framework for Prototyping Radio-Interferometric Imaging Pipelines» Existing work actor_timing(target, param1,param2) = polynomial(param1,param2), where param ∊ NUM_VIS, GRID_SIZE, NUM_MINOR_CYCLE This will reduce the gap between estimated timing and measured value The goal is to facilitate pipeline comparison varying parameters # Building the scripted benchmark FOR each target ∊ [CPU∨GPU] DO: FOR each param1 ∊ PARAM1 DO: FOR each param2 ∊ PARAM2 DO: EXECUTE Instrumented_code(param1,param2) SAVE {timing,config} → actor_timing.csv # Calculating the polynomial with the degree offering the best RMSE FOR each actor_timing.csv ∊ CSV FOR each dof ∊ DOF COMPUTE polynomial(dof) COMPUTE RMSE(measure, polynomial) SAVE best_RMSE_config → parameterized_actor_timing.csv

Summary conclusion • On going work: 🔜 Automated radio astronomy
imaging benchmarking on HPC systems. 🔜 Dataﬂow methodology available on Github. 🔜 Comparison with manual implementation [N. Monnier g2g] 🔜 validation on Ruche and Grid5000 🌐 SimSDP Tutorial available on the PREESM website: SimSDP: Multinode Design Space Exploration - Preesm 🌐 Radio astronomy imaging benchmark available on central supelec gitlab: SIMSDP - Generic Imaging Pipeline • Future work: 🚀 Enhancing SimSDP reliability automating ﬁne-grained description

GT ICR - Ophélie RENAUD

GT ICR - Ophélie RENAUD

François Orieux

More Decks by François Orieux

Other Decks in Research

Featured

Transcript

SimSDP: Proof of Concept for Radio Astronomy Imaging on High-Performance

2 SDP SDP Output image Square Kilometre Array (SKA) CSP

3 CSP CSP SDP SDP Visibilities Output image Square Kilometre

How to simulate SDP imaging pipelines on HPC systems? ?

Objectives • Optimize • Allocate • Analyse Introduction

Resource Allocation process on HPC systems

[r] C. Erbas, et. al, « Multi objective optimization and

Dataﬂow MoC i+1 i+1 Core1 A C A C Core2

Inputs Mapping Scheduling Timing Extraction ﬂattening DAG Ω C A

SimSDP - Resource allocation on HPC system 10 Previous work

SimSDP [5] Ophélie Renaud, Karol Desnos, Erwan Rafﬁn, Jean-François, “Multicore

Radio astronomy imaging algorithms

13 Radio astronomy imaging principle Existing work CSP visibilities Set

14 Radio astronomy imaging principle CSP visibilities Set up Δ

15 Radio astronomy imaging principle CSP visibilities Set up Δ

Deconvoluted image Dirty Image Cycle Existing work

17 Generic imaging pipeline Set up Δ (major loop) degridding-gridding

18 Polynomial ﬁt function for timing simulation [r] S. Wang,

Radio astronomy imaging algorithms on HPC system

Target architectures f0 - pipeline f1 - pipeline f20 -

Simulating generic imaging pipelines - 21 freq - CPU frequency

22 Polynomial ﬁt function for timing simulation [r] S. Wang,

Summary conclusion • On going work: 🔜 Automated radio astronomy