Automatically Scalable Computation (RICON East 2013)

Automatically Scalable Computation (RICON East 2013)

Opening Keynote Presentation delivered by Dr. Margo Seltzer at RICON East 2013

As our computational infrastructure races gracefully forward into increasingly parallel multi-core and blade-based systems, our ability to easily produce software that can successfully exploit such systems continues to stumble. For years, we've fantasized about the world in which we'd write simple, sequential programs, add magic sauce, and suddenly have scalable, parallel executions. We're not there. We're not even close. I'll present trajectory-based execution, a radical, potentially crazy, approach for achieving automatic scalability. To date, we've achieved surprisingly good speedup in limited domains, but the potential is tantalizingly enormous.

About Dr. Seltzer

Margo I. Seltzer is a Herchel Smith Professor of Computer Science in the Harvard School of Engineering and Applied Sciences. Her research interests include provenance, file systems, databases, transaction processing systems, and applying technology to problems in healthcare. She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a founder and CTO of Sleepycat Software, the makers of Berkeley DB, and is now an Architect at Oracle Corporation. She is currently the President of the USENIX Association and a member of the Computing Research Association's Computing Community Consortium. She is a Sloan Foundation Fellow in Computer Science, an ACM Fellow, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship. She is recognized as an outstanding teacher and mentor, having received the Phi Beta Kappa teaching award in 1996, the Abrahmson Teaching Award in 1999, and the Capers and Marion McDonald Award for Excellence in Mentoring and Advising in 2010.

Dr. Seltzer received an A.B. degree in Applied Mathematics from Harvard/Radcliffe College in 1983 and a Ph. D. in Computer Science from the University of California, Berkeley, in 1992.

E0f4dbccf64a1d37a92e224b070ee84f?s=128

Basho Technologies

May 13, 2013
Tweet

Transcript

  1. Automa'cally  Scalable   Computa'on   Jonathan  Appavoo,  Boston  University  

    Amos  Waterland,  Elaine  Angelino,  Margo  Seltzer   Harvard  School  of  Engineering  and  Computer  Science     May  13,  2013  
  2. Imagine   RICON  May  2013   2   Hmmm,  here’s

     my  nice   sequen;al  program.   Sure  wish  I  could  run  it   on  a  million  cores.   Whoaaaaaaa….  
  3. Join  me  in  a  thought  experiment…   RICON  May  2013

      3   Inst  Ptr   Registers   Memory  
  4. Execu'on  in  a  Really  Big  State  Space   RICON  May

     2013   4   Inst  Ptr   Registers   Memory   Really  big  state  space   ini;aliza;on   Inst  Ptr   ldi  0,  r0   addi  1,  r0   Program   Data   0   Inst  Ptr   ldi  0,  r0   addi  1,  r0   Program   Data   1   Inst  Ptr   ldi  0,  r0   addi  1,  r0   Program   Data  
  5. Trajectory-­‐Based  Execu'on   RICON  May  2013   5  

  6. Parallel  Trajectory-­‐Based  Execu'on   RICON  May  2013   6  

    Run  in  1/N  of  the  ;me!  
  7. But  Wait  …   RICON  May  2013   7  

    Infinite  speedup!  
  8. Something  More  Realis'c   RICON  May  2013   8  

    You  are  here  
  9. Something  More  Realis'c   RICON  May  2013   9  

  10. Something  More  Realis'c   RICON  May  2013   10  

    Anyone   been  here?   Anyone   been  here?   Anyone   been  here?   Yup!   Anyone   been  here?   Anyone   been  here?   Yup!  
  11. You  Must  be  Kidding!?   RICON  May  2013   11

      You  can’t   possibly  guess   places  in  the   trajectory!?   How  does   the  master   know  when   to  check?   How  do  you   compare   states?   Where  do   you  store   trajectories?   Are  you  on   drugs????  
  12. I’m  not  Kidding   RICON  May  2013   12  

    LASC speedup for Ising on Blue Gene/P Number of cores in log2 scale Speedup in log2 scale 2 4 8 16 32 64 128 256 512 1024 2048 4096 2 4 8 16 32 64 128 256 512 1024 4096 Ideal speedup LASC cycle count speedup LASC speedup
  13. How  Can  This  Possibly  Be?   RICON  May  2013  

    13   Pseudo-random Program Trivial Program Real-world Program
  14. Programs  are  like  People   RICON  May  2013   14

      for ( i = 0; i < gazillion; i++) {! !<do a few things>! !<do a few more things>! !<do something else>! }! <do something>! foo(a, b, c);! <do something else>! foo(a, b, c);! <do another thing>! foo(a, b, c);!
  15. An  Automa'cally  Scalable  Architecture   RICON  May  2013   15

      ???   hit   Trajectory based   execu;on   engine   state   vectors   recognizer   States  from  which   to  speculate   Trajectory  Cache   selected   states   Predictors   predicted  states   Allocator   Recognized  IPs  
  16. Poten'al  Speedup  of  ASC   RICON  May  2013   16

     
  17. allocator   predicted  states   Implemen'ng  ASC  using  Learning  (LASC)

      RICON  May  2013   17   ???   hit   Trajectory based   execu;on   engine   state   vectors   recognizer   States  from  which   to  speculate   Trajectory  Cache   selected   states   Predictors   predicted  states   Allocator   Recognized  IPs   VM   state   vectors   predictors   IP  =  ξ VM   VM   VM   VM   VM   speculators   trajectory  cache  
  18. The  LASC  VM   RICON  May  2013   19  

    VM   VM   VM   VM   VM   VM   Inst  Ptr   ldi  0,  r0   addi  1,  r0   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   0   1   1   1   0   0   0   0   0   0   1   .   .   .   1  
  19. The  LASC  Recognizer   RICON  May  2013   20  

    VM   VM   VM   VM   VM   Here’s  an  IP?  I   wonder  if  I   can  predict  it.   It’s  been   awhile;  I’ll  try   another  IP.   Hey  –  I  just  saw   the  same  IP!  I   can  build  a   model.    Yippee!!!  
  20. The  LASC  Recognizer   RICON  May  2013   21  

    VM   VM   VM   VM   VM   predictors   predictors   predictors   Here  is  a  state  with  IP  0x1008   Here  is  a  state  with  IP  0x1012   I  think  you’ll  see  this  state   Hey  –  that  predic;on   was  right  –  I  like  that  IP!  
  21. The  LASC  Predictors   RICON  May  2013   22  

    VM   VM   VM   VM   VM   Weatherman:   •  Bit-­‐level  predic;ons   •  Each  bit  is  the  same  as  the  last  observa;on   Mean:   •  Bit-­‐level  predic;ons   •  Predict  the  mean  value  observed   Logis;c  Regression:   •  Bit-­‐level  predic;ons   •  1-­‐layer  neural  net   Linear  Regression:   •  32-­‐bit  feature  predic;ons   •  Fits  a  curve  to  the  sequence  of   observed  values.  
  22. The  LASC  Allocator   RICON  May  2013   23  

    VM   VM   VM   VM   VM   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   weatherman   mean   Logis;c   regressor   Linear   regressor   1   1   0   1   Weights   for  these   guys  are:   1,  1,  1,  1   Here  were   their   predic;ons   for  this  bit   Mr.  logis;c   there  gets   his  weight   cut  in  half   1   1   1   1   0.5  
  23. The  LASC  Allocator   RICON  May  2013   24  

    VM   VM   VM   VM   VM   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   weatherman   mean   Logis;c   regressor   Linear   regressor   1   1   0   0   Weights   for  these   guys  are:   1,  1,  1,  1   Next   predic;ons   please!   1   1   1   0.5   OK  –  I   predict  1!   0   1   1   0   0   0   1   1   1   0   1   .   .   .   1   Bummer!   Upda;ng   weights…   0.5   0.5  
  24. The  LASC  Allocator   RICON  May  2013   25  

    VM   VM   VM   VM   VM   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   weatherman   mean   Logis;c   regressor   Linear   regressor   0   1   0   0   Weights   for  these   guys  are:   1,  1,  1,  1   Next   predic;ons   please!   0.5   0.5   1   0.5   0   1   1   0   0   0   1   1   1   0   1   .   .   .   1   0   1   1   0   0   0   1   1   1   0   1   .   .   .   1   I  predict  0!  
  25. The  LASC  Speculators   RICON  May  2013   26  

    VM   VM   VM   VM   VM   VM   101011001000100001000100010   IP   S0   Add  8  into  A   IP   Address  A   8   R-­‐mask   W-­‐mask   111111110000111100011111111   000000000000000000000000000   101100001000100001000101010   S1   000000000000000000000000000   111111110000000000011111111   read   000000000000000000000000000   101011000000100000000100010   wrihen   000000000000000000000000000   101011000000000000000101010   Cache  Entry  
  26. The  LASC  Trajectory  Cache   RICON  May  2013   27

      VM   VM   VM   VM   VM   111000000111100010100100010   Sn   Entry1   R   111111110000111100011111111   W   111111110000000000011111111   read   101011000000100000000100010   wrihen   101011000000100000000100010   Entry2   R   111111110011000000000000000   W   111111110000000011000000000   read   111000000011000000000000000   wrihen   111001000000000010000000000   trajectory  cache  
  27. The  LASC  Trajectory  Cache   RICON  May  2013   28

      VM   VM   VM   VM   VM   111000000111100001100100010   Sn   Sn+1   Entry1   R   111111110000111100011111111   W   111111110000000000011111111   read   101011000000100000000100010   wrihen   101011000000000000000100010   Entry2   R   111111110011000000000000000   W   111111110000000011000000000   read   111000000011000000000000000   wrihen   111001000000000010000000000   trajectory  cache   11100100   10   01111000   100100010  
  28. The  LASC  Implementa'on   RICON  May  2013   29  

    state   vectors   predictors   IP  =  ξ allocator   predicted  states   VM   VM   VM   VM   VM   VM   speculators   trajectory  cache   ???   tunnel  
  29. How  Well  Does  it  Work?   RICON  May  2013  

    30   LASC speedup for Ising on Blue Gene/P Number of cores in log2 scale Speedup in log2 scale 2 4 8 16 32 64 128 256 512 1024 2048 4096 2 4 8 16 32 64 128 256 512 1024 4096 Ideal speedup LASC cycle count speedup LASC speedup struct node node = head;! while (node) {! energy=potential(node);! ! if (energy < GROUND + e)! break;! ! node = node->next;! }! int! potential(struct node *node) {! int i, j, spin, energy;! energy = 0;! ! for (i=0; i < I; i++) {! for (j = 0;j < J; j++) {! spin = node->spins[i][j];! /* Calculate energy. */! }! }! ! return (energy);! }!
  30. Even  Cooler  Results   RICON  May  2013   31  

    main(argc, argv)! {! !int i, s;! ! !for (i = 2; i< 100000000; i++) {! ! !s = i;! ! !while (s > 1) {! ! ! !if (s % 2 == 0) {! ! ! ! !s = s / 2;! ! ! !} else {! ! ! ! !s = 3 * s + 1;! ! ! !}! ! !}! !}! }! SC speedup for collatz on Blue Gene/P Number of cores in log2 scale 4 16 64 256 2048 16384 Ideal speedup LASC cycle count speedup LASC speedup 2e+07 4e+07 6e+07 8e+07 1e+08 0.8 0.9 1.0 1.1 1.2 1.3 1.4 LASC speedup for collatz on 1-core laptop Instructions Speedup Baseline LASC speedup
  31. We  Get  the  Other  Speedup  Too!   RICON  May  2013

      32   20 25 30 latz on 32-core server of cores nt speedup LASC speedup for collatz on Blue Gene/P Number of cores in log2 scale Speedup in log2 scale 1 4 16 64 256 2048 16384 1 4 16 64 256 2048 16384 Ideal speedup LASC cycle count speedup LASC speedup 2e+07 4 0.8 0.9 1.0 1.1 1.2 1.3 1.4 LASC speedup Speedup
  32. You  Must  be  Kidding!?   RICON  May  2013   33

      You  can’t   possibly  guess   places  in  the   trajectory!?   How  does   the  master   know  when   to  check?   How  do  you   compare   states?   Where  do   you  store   trajectories?   Are  you  on   drugs????  
  33. Par'ng  Thoughts   RICON  May  2013   34   These

     are   just  toy   problems!   Your  VM   is  s;ll  way   slow.   You  need  really   good  predic;on  and   you’ll  never  get  that   for  meaningful   programs   Couldn’t  you  just  do   this  all  in  the   compiler  and  save   yourself  a  lot  of   work?   It  seems  that  the   trajectory  cache  is   always  going  to  be   an  enormous   bohleneck.  
  34. Thank  You!   RICON  May  2013   35   margo@eecs.harvard.edu

      state   vectors   predictors   IP  =  ξ allocator   predicted  states   VM   VM   VM   VM   VM   VM   speculators   trajectory  cache   ???   tunnel