Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2017: "LOAD" Considered Harmful - HPTS 2017 Gong Show

Tom Lyon
October 09, 2017

2017: "LOAD" Considered Harmful - HPTS 2017 Gong Show

My entry for the HPTS 2017 Gong Show - I didn't make it very far into this before my 5 minutes was up.

Tom Lyon

October 09, 2017
Tweet

More Decks by Tom Lyon

Other Decks in Technology

Transcript

  1. How does a processor move data? Memory I/O Network LOAD,

    STORE READ, WRITE SEND, RECEIVE 2 ©2017 DriveScale Inc. All Rights Reserved.
  2. Memory 3 ©2017 DriveScale Inc. All Rights Reserved. §  When

    memory is private, local, and fast, LOAD works fine
  3. Memory – Private vs Shared 4 ©2017 DriveScale Inc. All

    Rights Reserved. §  Reasoning about memory is easy when the memory is private to a thread §  Memory shared between threads, even in a single piece of code, becomes notoriously difficult to deal with – races, etc. §  The bane of Java and C++ developers §  Erlang and Go show the way – message based concurrency §  At the HW level, shared memory requires coherency protocols which can introduce extraordinary performance delays – even between cores on the same chip
  4. Memory – Local vs Remote 5 ©2017 DriveScale Inc. All

    Rights Reserved. §  Remote memory – out of the box §  No failure model – what if target is temporarily or permanently unavailable? –  PCIe over cables? –  Software DSM? §  No performance transparency –  Need NUMA aware memory allocation, even in single box §  Giant SMP –  Reliability drops as you add nodes
  5. Memory – Fast vs Slow 6 ©2017 DriveScale Inc. All

    Rights Reserved. §  DRAM hasn’t been fast for 30 years! §  THE WALL! §  Huuuge caches! §  Processors are designed around the performance of local DRAM §  Anything slower (NVDIMM? Remote?) wastes a huge amount of silicon and power §  Need massively multi-threaded HW for slow memory §  But multi-threading SW sucks
  6. FAIL #1 – Persistent Memory 7 ©2017 DriveScale Inc. All

    Rights Reserved. §  NV tech is slower than DRAM - Nobody wants slower memory §  Everybody wants faster storage! §  The memory model is just wrong for storage –  We already have mmap –  Not really that useful §  Persistence not adequate for storage – you need replication/ redundancy
  7. FAIL #2 – Gen-Z, external PCIe, “The Machine” 8 ©2017

    DriveScale Inc. All Rights Reserved. §  Shared memory with coherency is a PIA §  Shared memory without coherency is a huge PIA §  Failure semantics are MIA for LOAD/STORE networks
  8. Receive – Network Semantics 9 ©2017 DriveScale Inc. All Rights

    Reserved. §  Synchronous LOAD vs asynchronous RECEIVE §  Defined error model: expect the worst §  Events loops/actor model instead of heavyweight threads
  9. Summary §  Communicating through memory is a bad idea § 

    Don’t drag the memory model out of the box, drag the network model into the box §  Every processor should be a network processor 10 ©2017 DriveScale Inc. All Rights Reserved.
  10. DriveScale Inc. 1230 Midas Way, Suite 210 Sunnyvale CA 94085

    www.drivescale.com Thanks! ©2017 DriveScale Inc. All Rights Reserved. 11