2017: "LOAD" Considered Harmful - HPTS 2017 Gong Show

‘LOAD’ Considered Harmful Tom Lyon HPTS 2017 Gong Show .

How does a processor move data? Memory I/O Network LOAD,
STORE READ, WRITE SEND, RECEIVE 2 ©2017 DriveScale Inc. All Rights Reserved.

Memory 3 ©2017 DriveScale Inc. All Rights Reserved. §  When
memory is private, local, and fast, LOAD works fine

Memory – Private vs Shared 4 ©2017 DriveScale Inc. All
Rights Reserved. §  Reasoning about memory is easy when the memory is private to a thread §  Memory shared between threads, even in a single piece of code, becomes notoriously difficult to deal with – races, etc. §  The bane of Java and C++ developers §  Erlang and Go show the way – message based concurrency §  At the HW level, shared memory requires coherency protocols which can introduce extraordinary performance delays – even between cores on the same chip

Memory – Local vs Remote 5 ©2017 DriveScale Inc. All
Rights Reserved. §  Remote memory – out of the box §  No failure model – what if target is temporarily or permanently unavailable? –  PCIe over cables? –  Software DSM? §  No performance transparency –  Need NUMA aware memory allocation, even in single box §  Giant SMP –  Reliability drops as you add nodes

Memory – Fast vs Slow 6 ©2017 DriveScale Inc. All
Rights Reserved. §  DRAM hasn’t been fast for 30 years! §  THE WALL! §  Huuuge caches! §  Processors are designed around the performance of local DRAM §  Anything slower (NVDIMM? Remote?) wastes a huge amount of silicon and power §  Need massively multi-threaded HW for slow memory §  But multi-threading SW sucks

FAIL #1 – Persistent Memory 7 ©2017 DriveScale Inc. All
Rights Reserved. §  NV tech is slower than DRAM - Nobody wants slower memory §  Everybody wants faster storage! §  The memory model is just wrong for storage –  We already have mmap –  Not really that useful §  Persistence not adequate for storage – you need replication/ redundancy

FAIL #2 – Gen-Z, external PCIe, “The Machine” 8 ©2017
DriveScale Inc. All Rights Reserved. §  Shared memory with coherency is a PIA §  Shared memory without coherency is a huge PIA §  Failure semantics are MIA for LOAD/STORE networks

Receive – Network Semantics 9 ©2017 DriveScale Inc. All Rights
Reserved. §  Synchronous LOAD vs asynchronous RECEIVE §  Defined error model: expect the worst §  Events loops/actor model instead of heavyweight threads

Summary §  Communicating through memory is a bad idea § 
Don’t drag the memory model out of the box, drag the network model into the box §  Every processor should be a network processor 10 ©2017 DriveScale Inc. All Rights Reserved.

2017: "LOAD" Considered Harmful - HPTS 2017 Gon...

2017: "LOAD" Considered Harmful - HPTS 2017 Gong Show

Tom Lyon

More Decks by Tom Lyon

Other Decks in Technology

Featured

Transcript

‘LOAD’ Considered Harmful Tom Lyon HPTS 2017 Gong Show .

How does a processor move data? Memory I/O Network LOAD,

Memory 3 ©2017 DriveScale Inc. All Rights Reserved. §  When

Memory – Private vs Shared 4 ©2017 DriveScale Inc. All

Memory – Local vs Remote 5 ©2017 DriveScale Inc. All

Memory – Fast vs Slow 6 ©2017 DriveScale Inc. All

FAIL #1 – Persistent Memory 7 ©2017 DriveScale Inc. All

FAIL #2 – Gen-Z, external PCIe, “The Machine” 8 ©2017

Receive – Network Semantics 9 ©2017 DriveScale Inc. All Rights

Summary §  Communicating through memory is a bad idea §

DriveScale Inc. 1230 Midas Way, Suite 210 Sunnyvale CA 94085