PyConZA 2013: "Building the SKA" by Simon Ratcliffe

SKA Simon Ratcliffe SSPA www.ska.ac.za *Spartan School of Presentation Aesthetics
(in association with Just Say No to Bullets)

SR chapeaux MeerKAT SKA DOME

In the beginning...

There were some hazards...

Free time at last.. Kuyunjik Collection, British Museum, London. Kuyunjik
Collection, British Museum, London.

Just 20,000 stars in the sky = www.stellarium.org

...and just 12 monthly installments I just need a few
florins to buy some pants

Can I have fifty pounds to mend the shed? =
Ewan McTeagle Johannes Hevelius 60ft - 1673

Optical Astronomy = ESO/MPG 2.2-m telescope, La Silla

http://xkcd.com/273/

Radio Astronomy + = = + Jansky VLA, New Mexico

1 Jy = 10-26 Wm-2Hz-1 Eskom ? I canna do
it, captain, ye canna alter the laws of physics

105 Jy 108 Jy 0 Jy Baddies Sun @ 5
GHz GSM Phone @ 1km iOS 7

Scale Up 1936: Grote Reber 9m 1955: Bernard Lovell 76m
1971: Effelsberg 100m

Karoo Array Telescope Scale Out

MeerKAT - 2016 Medium Telescope

On the ground... http://www.gigapan.com/gigapans/135410

Big Telescope Square Kilometer Array – Artists Impression

Big Data Andrew Cooper 1000000000000000000 B @ 1 bit per
grain of standardised* sand * assumptions apply (10gpmm^3, 3kmx2kmx700m, only valid when calculated on paper napkin, just say no to assumptions) X

Big, really big... 1 exbibyte - 1 exabyte 38, 230
x

Big Energy Peter M. Kogge. Energy at exaflops. (2009) 16+
MW for SKA

How do we build this ?

Big Iron

Big Iron 10 -100 PFlop Many PFlop up to 500
Tb/s 0.2 - 2.5 EFlop SKA Phase 2 up to 5 Pb/s 1 EFlop/ EMAC … Incoming Data from collectors Switch Buffer store Switch Buffer store HPC Bulk Store Correlator Beamformer UV Processor Imaging: Non-Imaging: Corner Turning Course Delays Fine F-step/ Correlation Visibility Steering Observation Buffer Gridding Visibilities Imaging Image Storage Corner Turning Course Delays Beamforming/ De-dispersion Beam Steering Observation Buffer Time-series Searching Search analysis Object/timing Storage HPC science processing 1 - 5 PFlop Few PFlop Software Complexit y & Data*: 400 - 1000 Gb/s 30 PFlop SKA Phase 1 up to 50 Tb/s 15 PFlop/ PMAC * All numbers made up for increased wow factor Slide Credit: Andy Faulkner

Big, serious, grown up compute i-uniVac 9000 Extreme Edition

Tegra ARM QorIQ Power Exynos ARM Atom x86 SoC μ
Servers TDP IO FLOPS $ ρ

cooling Ground Loop Hive FERRO 50 TFLOPS 2 TB RAM
8 TB Flash 20 Gbps 2.5 kW Shielded Rugged Cheap

Big Storage MeerKAT has a primary archive of 10 PB
SKA will need at least 1 EB in around 2023

ebay - $20.29

Big Software 200 billion lines of COBOL and counting

IO / Cache / FLOPS / kW / $ V
ij = M ij B ij G ij D ij E ij P ij T ij V ij IDEAL MAGIC SKA mid

Software “Life is too short to write C++” David Beazley

Deliver only what is actually needed Scale requires better ideas,
not more code Simple is better than complex Optimise last Commodity is king Perfection is an illusion Philosophy (to combat the malaise that is big software)

You hip fanboi – don't you know Python is too
slow... agggtaaa|tttaccct 0 [cgt]gggtaaa|tttaccc[acg] 3 a[act]ggtaaa|tttacc[agt]t 9 ag[act]gtaaa|tttac[agt]ct 8 agg[act]taaa|ttta[agt]cct 10 aggg[acg]aaa|ttt[cgt]ccct 3 agggt[cgt]aa|tt[acg]accct 4 agggta[cgt]a|t[acg]taccct 3 agggtaa[cgt]|[acg]ttaccct 5 regex_dna (http://shootout.alioth.debian.org/) C: 7.21 seconds Python: 31.7 seconds 4.4 times slower ! { }

No really, it is so sloooooow... Laplace solver (http://www.scipy.org/PerformancePython) C:
0.97 seconds Python: 6345.6 seconds 6,500 times slower Suck it Python !

You're holding it wrong...

Cereal ? Totally serial 6345.6 seconds 2.57 seconds Vectorised* 2469
times faster ! (only 2.6 times slower than C) *Like Totally

Simple optimisation gives us close to C performance That's because
numpy is just optimised C underneath True, but I didn't actually have to write any C Don't you get paid by the kloc ? Someone else can write code, I am off to the beach.

Admittedly, we are still slower than C. But the language
itself does not dictate performance*. The compiler / interpeter does. It's not Python, but CPython to blame. *yes, yes, I know. But outrageous claims are my forté, after all I invented the question mark.

Ok, so how fast do we need to be ?
This is your CPU

Optimised gridding ~ 1ns per point per core (Xeon E5-2690)
We need 64 Gbps to keep an 8-core CPU busy In reality we have an order of magnitude less

If our IO is efficient we can sacrifice an order
of magnitude in CPU performance without overall impact* *For our specific data intensive use cases. Terms and conditions apply. If you can read this you are too close.

Is our IO efficient ? Yes* * Blah blah blah
HDF5 blah blah Blosc blah blah C is the only real language blah blah SPEAD blah blah numpy

It's not about the bike language llvm IR bitcode

Our biggest cost is development time, not hardware This is
where using high level languages, without excessive optimisation, really win.

Python 40+ in house modules +

My hovercraft is full of eels Images created With KAT-7

Control

Data Transport - SPEAD Self describing, high speed, low overhead
numpy transport connecting a wide range of potentially disparate devices.

<fail type='demo' class='epic' /> Data Analysis

WWW.SKA.AC.ZA

PyConZA 2013: "Building the SKA" by Simon Ratcl...

PyConZA 2013: "Building the SKA" by Simon Ratcliffe

More Decks by Pycon ZA

Other Decks in Programming

Featured

Transcript