Slide 1

Slide 1 text

‘LOAD’ Considered Harmful Tom Lyon HPTS 2017 Gong Show .

Slide 2

Slide 2 text

How does a processor move data? Memory I/O Network LOAD, STORE READ, WRITE SEND, RECEIVE 2 ©2017 DriveScale Inc. All Rights Reserved.

Slide 3

Slide 3 text

Memory 3 ©2017 DriveScale Inc. All Rights Reserved. §  When memory is private, local, and fast, LOAD works fine

Slide 4

Slide 4 text

Memory – Private vs Shared 4 ©2017 DriveScale Inc. All Rights Reserved. §  Reasoning about memory is easy when the memory is private to a thread §  Memory shared between threads, even in a single piece of code, becomes notoriously difficult to deal with – races, etc. §  The bane of Java and C++ developers §  Erlang and Go show the way – message based concurrency §  At the HW level, shared memory requires coherency protocols which can introduce extraordinary performance delays – even between cores on the same chip

Slide 5

Slide 5 text

Memory – Local vs Remote 5 ©2017 DriveScale Inc. All Rights Reserved. §  Remote memory – out of the box §  No failure model – what if target is temporarily or permanently unavailable? –  PCIe over cables? –  Software DSM? §  No performance transparency –  Need NUMA aware memory allocation, even in single box §  Giant SMP –  Reliability drops as you add nodes

Slide 6

Slide 6 text

Memory – Fast vs Slow 6 ©2017 DriveScale Inc. All Rights Reserved. §  DRAM hasn’t been fast for 30 years! §  THE WALL! §  Huuuge caches! §  Processors are designed around the performance of local DRAM §  Anything slower (NVDIMM? Remote?) wastes a huge amount of silicon and power §  Need massively multi-threaded HW for slow memory §  But multi-threading SW sucks

Slide 7

Slide 7 text

FAIL #1 – Persistent Memory 7 ©2017 DriveScale Inc. All Rights Reserved. §  NV tech is slower than DRAM - Nobody wants slower memory §  Everybody wants faster storage! §  The memory model is just wrong for storage –  We already have mmap –  Not really that useful §  Persistence not adequate for storage – you need replication/ redundancy

Slide 8

Slide 8 text

FAIL #2 – Gen-Z, external PCIe, “The Machine” 8 ©2017 DriveScale Inc. All Rights Reserved. §  Shared memory with coherency is a PIA §  Shared memory without coherency is a huge PIA §  Failure semantics are MIA for LOAD/STORE networks

Slide 9

Slide 9 text

Receive – Network Semantics 9 ©2017 DriveScale Inc. All Rights Reserved. §  Synchronous LOAD vs asynchronous RECEIVE §  Defined error model: expect the worst §  Events loops/actor model instead of heavyweight threads

Slide 10

Slide 10 text

Summary §  Communicating through memory is a bad idea §  Don’t drag the memory model out of the box, drag the network model into the box §  Every processor should be a network processor 10 ©2017 DriveScale Inc. All Rights Reserved.

Slide 11

Slide 11 text

DriveScale Inc. 1230 Midas Way, Suite 210 Sunnyvale CA 94085 www.drivescale.com Thanks! ©2017 DriveScale Inc. All Rights Reserved. 11