Slide 1

Slide 1 text

Automa'cally  Scalable   Computa'on   Jonathan  Appavoo,  Boston  University   Amos  Waterland,  Elaine  Angelino,  Margo  Seltzer   Harvard  School  of  Engineering  and  Computer  Science     May  13,  2013  

Slide 2

Slide 2 text

Imagine   RICON  May  2013   2   Hmmm,  here’s  my  nice   sequen;al  program.   Sure  wish  I  could  run  it   on  a  million  cores.   Whoaaaaaaa….  

Slide 3

Slide 3 text

Join  me  in  a  thought  experiment…   RICON  May  2013   3   Inst  Ptr   Registers   Memory  

Slide 4

Slide 4 text

Execu'on  in  a  Really  Big  State  Space   RICON  May  2013   4   Inst  Ptr   Registers   Memory   Really  big  state  space   ini;aliza;on   Inst  Ptr   ldi  0,  r0   addi  1,  r0   Program   Data   0   Inst  Ptr   ldi  0,  r0   addi  1,  r0   Program   Data   1   Inst  Ptr   ldi  0,  r0   addi  1,  r0   Program   Data  

Slide 5

Slide 5 text

Trajectory-­‐Based  Execu'on   RICON  May  2013   5  

Slide 6

Slide 6 text

Parallel  Trajectory-­‐Based  Execu'on   RICON  May  2013   6   Run  in  1/N  of  the  ;me!  

Slide 7

Slide 7 text

But  Wait  …   RICON  May  2013   7   Infinite  speedup!  

Slide 8

Slide 8 text

Something  More  Realis'c   RICON  May  2013   8   You  are  here  

Slide 9

Slide 9 text

Something  More  Realis'c   RICON  May  2013   9  

Slide 10

Slide 10 text

Something  More  Realis'c   RICON  May  2013   10   Anyone   been  here?   Anyone   been  here?   Anyone   been  here?   Yup!   Anyone   been  here?   Anyone   been  here?   Yup!  

Slide 11

Slide 11 text

You  Must  be  Kidding!?   RICON  May  2013   11   You  can’t   possibly  guess   places  in  the   trajectory!?   How  does   the  master   know  when   to  check?   How  do  you   compare   states?   Where  do   you  store   trajectories?   Are  you  on   drugs????  

Slide 12

Slide 12 text

I’m  not  Kidding   RICON  May  2013   12   LASC speedup for Ising on Blue Gene/P Number of cores in log2 scale Speedup in log2 scale 2 4 8 16 32 64 128 256 512 1024 2048 4096 2 4 8 16 32 64 128 256 512 1024 4096 Ideal speedup LASC cycle count speedup LASC speedup

Slide 13

Slide 13 text

How  Can  This  Possibly  Be?   RICON  May  2013   13   Pseudo-random Program Trivial Program Real-world Program

Slide 14

Slide 14 text

Programs  are  like  People   RICON  May  2013   14   for ( i = 0; i < gazillion; i++) {! !! !! !! }! ! foo(a, b, c);! ! foo(a, b, c);! ! foo(a, b, c);!

Slide 15

Slide 15 text

An  Automa'cally  Scalable  Architecture   RICON  May  2013   15   ???   hit   Trajectory based   execu;on   engine   state   vectors   recognizer   States  from  which   to  speculate   Trajectory  Cache   selected   states   Predictors   predicted  states   Allocator   Recognized  IPs  

Slide 16

Slide 16 text

Poten'al  Speedup  of  ASC   RICON  May  2013   16  

Slide 17

Slide 17 text

allocator   predicted  states   Implemen'ng  ASC  using  Learning  (LASC)   RICON  May  2013   17   ???   hit   Trajectory based   execu;on   engine   state   vectors   recognizer   States  from  which   to  speculate   Trajectory  Cache   selected   states   Predictors   predicted  states   Allocator   Recognized  IPs   VM   state   vectors   predictors   IP  =  ξ VM   VM   VM   VM   VM   speculators   trajectory  cache  

Slide 18

Slide 18 text

The  LASC  VM   RICON  May  2013   19   VM   VM   VM   VM   VM   VM   Inst  Ptr   ldi  0,  r0   addi  1,  r0   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   0   1   1   1   0   0   0   0   0   0   1   .   .   .   1  

Slide 19

Slide 19 text

The  LASC  Recognizer   RICON  May  2013   20   VM   VM   VM   VM   VM   Here’s  an  IP?  I   wonder  if  I   can  predict  it.   It’s  been   awhile;  I’ll  try   another  IP.   Hey  –  I  just  saw   the  same  IP!  I   can  build  a   model.    Yippee!!!  

Slide 20

Slide 20 text

The  LASC  Recognizer   RICON  May  2013   21   VM   VM   VM   VM   VM   predictors   predictors   predictors   Here  is  a  state  with  IP  0x1008   Here  is  a  state  with  IP  0x1012   I  think  you’ll  see  this  state   Hey  –  that  predic;on   was  right  –  I  like  that  IP!  

Slide 21

Slide 21 text

The  LASC  Predictors   RICON  May  2013   22   VM   VM   VM   VM   VM   Weatherman:   •  Bit-­‐level  predic;ons   •  Each  bit  is  the  same  as  the  last  observa;on   Mean:   •  Bit-­‐level  predic;ons   •  Predict  the  mean  value  observed   Logis;c  Regression:   •  Bit-­‐level  predic;ons   •  1-­‐layer  neural  net   Linear  Regression:   •  32-­‐bit  feature  predic;ons   •  Fits  a  curve  to  the  sequence  of   observed  values.  

Slide 22

Slide 22 text

The  LASC  Allocator   RICON  May  2013   23   VM   VM   VM   VM   VM   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   weatherman   mean   Logis;c   regressor   Linear   regressor   1   1   0   1   Weights   for  these   guys  are:   1,  1,  1,  1   Here  were   their   predic;ons   for  this  bit   Mr.  logis;c   there  gets   his  weight   cut  in  half   1   1   1   1   0.5  

Slide 23

Slide 23 text

The  LASC  Allocator   RICON  May  2013   24   VM   VM   VM   VM   VM   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   weatherman   mean   Logis;c   regressor   Linear   regressor   1   1   0   0   Weights   for  these   guys  are:   1,  1,  1,  1   Next   predic;ons   please!   1   1   1   0.5   OK  –  I   predict  1!   0   1   1   0   0   0   1   1   1   0   1   .   .   .   1   Bummer!   Upda;ng   weights…   0.5   0.5  

Slide 24

Slide 24 text

The  LASC  Allocator   RICON  May  2013   25   VM   VM   VM   VM   VM   0   1   1   1   0   0   1   1   1   0   1   .   .   .   1   weatherman   mean   Logis;c   regressor   Linear   regressor   0   1   0   0   Weights   for  these   guys  are:   1,  1,  1,  1   Next   predic;ons   please!   0.5   0.5   1   0.5   0   1   1   0   0   0   1   1   1   0   1   .   .   .   1   0   1   1   0   0   0   1   1   1   0   1   .   .   .   1   I  predict  0!  

Slide 25

Slide 25 text

The  LASC  Speculators   RICON  May  2013   26   VM   VM   VM   VM   VM   VM   101011001000100001000100010   IP   S0   Add  8  into  A   IP   Address  A   8   R-­‐mask   W-­‐mask   111111110000111100011111111   000000000000000000000000000   101100001000100001000101010   S1   000000000000000000000000000   111111110000000000011111111   read   000000000000000000000000000   101011000000100000000100010   wrihen   000000000000000000000000000   101011000000000000000101010   Cache  Entry  

Slide 26

Slide 26 text

The  LASC  Trajectory  Cache   RICON  May  2013   27   VM   VM   VM   VM   VM   111000000111100010100100010   Sn   Entry1   R   111111110000111100011111111   W   111111110000000000011111111   read   101011000000100000000100010   wrihen   101011000000100000000100010   Entry2   R   111111110011000000000000000   W   111111110000000011000000000   read   111000000011000000000000000   wrihen   111001000000000010000000000   trajectory  cache  

Slide 27

Slide 27 text

The  LASC  Trajectory  Cache   RICON  May  2013   28   VM   VM   VM   VM   VM   111000000111100001100100010   Sn   Sn+1   Entry1   R   111111110000111100011111111   W   111111110000000000011111111   read   101011000000100000000100010   wrihen   101011000000000000000100010   Entry2   R   111111110011000000000000000   W   111111110000000011000000000   read   111000000011000000000000000   wrihen   111001000000000010000000000   trajectory  cache   11100100   10   01111000   100100010  

Slide 28

Slide 28 text

The  LASC  Implementa'on   RICON  May  2013   29   state   vectors   predictors   IP  =  ξ allocator   predicted  states   VM   VM   VM   VM   VM   VM   speculators   trajectory  cache   ???   tunnel  

Slide 29

Slide 29 text

How  Well  Does  it  Work?   RICON  May  2013   30   LASC speedup for Ising on Blue Gene/P Number of cores in log2 scale Speedup in log2 scale 2 4 8 16 32 64 128 256 512 1024 2048 4096 2 4 8 16 32 64 128 256 512 1024 4096 Ideal speedup LASC cycle count speedup LASC speedup struct node node = head;! while (node) {! energy=potential(node);! ! if (energy < GROUND + e)! break;! ! node = node->next;! }! int! potential(struct node *node) {! int i, j, spin, energy;! energy = 0;! ! for (i=0; i < I; i++) {! for (j = 0;j < J; j++) {! spin = node->spins[i][j];! /* Calculate energy. */! }! }! ! return (energy);! }!

Slide 30

Slide 30 text

Even  Cooler  Results   RICON  May  2013   31   main(argc, argv)! {! !int i, s;! ! !for (i = 2; i< 100000000; i++) {! ! !s = i;! ! !while (s > 1) {! ! ! !if (s % 2 == 0) {! ! ! ! !s = s / 2;! ! ! !} else {! ! ! ! !s = 3 * s + 1;! ! ! !}! ! !}! !}! }! SC speedup for collatz on Blue Gene/P Number of cores in log2 scale 4 16 64 256 2048 16384 Ideal speedup LASC cycle count speedup LASC speedup 2e+07 4e+07 6e+07 8e+07 1e+08 0.8 0.9 1.0 1.1 1.2 1.3 1.4 LASC speedup for collatz on 1-core laptop Instructions Speedup Baseline LASC speedup

Slide 31

Slide 31 text

We  Get  the  Other  Speedup  Too!   RICON  May  2013   32   20 25 30 latz on 32-core server of cores nt speedup LASC speedup for collatz on Blue Gene/P Number of cores in log2 scale Speedup in log2 scale 1 4 16 64 256 2048 16384 1 4 16 64 256 2048 16384 Ideal speedup LASC cycle count speedup LASC speedup 2e+07 4 0.8 0.9 1.0 1.1 1.2 1.3 1.4 LASC speedup Speedup

Slide 32

Slide 32 text

You  Must  be  Kidding!?   RICON  May  2013   33   You  can’t   possibly  guess   places  in  the   trajectory!?   How  does   the  master   know  when   to  check?   How  do  you   compare   states?   Where  do   you  store   trajectories?   Are  you  on   drugs????  

Slide 33

Slide 33 text

Par'ng  Thoughts   RICON  May  2013   34   These  are   just  toy   problems!   Your  VM   is  s;ll  way   slow.   You  need  really   good  predic;on  and   you’ll  never  get  that   for  meaningful   programs   Couldn’t  you  just  do   this  all  in  the   compiler  and  save   yourself  a  lot  of   work?   It  seems  that  the   trajectory  cache  is   always  going  to  be   an  enormous   bohleneck.  

Slide 34

Slide 34 text

Thank  You!   RICON  May  2013   35   [email protected]   state   vectors   predictors   IP  =  ξ allocator   predicted  states   VM   VM   VM   VM   VM   VM   speculators   trajectory  cache   ???   tunnel