Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatically Scalable Computation (RICON East 2013)

Automatically Scalable Computation (RICON East 2013)

Opening Keynote Presentation delivered by Dr. Margo Seltzer at RICON East 2013

As our computational infrastructure races gracefully forward into increasingly parallel multi-core and blade-based systems, our ability to easily produce software that can successfully exploit such systems continues to stumble. For years, we've fantasized about the world in which we'd write simple, sequential programs, add magic sauce, and suddenly have scalable, parallel executions. We're not there. We're not even close. I'll present trajectory-based execution, a radical, potentially crazy, approach for achieving automatic scalability. To date, we've achieved surprisingly good speedup in limited domains, but the potential is tantalizingly enormous.

About Dr. Seltzer

Margo I. Seltzer is a Herchel Smith Professor of Computer Science in the Harvard School of Engineering and Applied Sciences. Her research interests include provenance, file systems, databases, transaction processing systems, and applying technology to problems in healthcare. She is the author of several widely-used software packages including database and transaction libraries and the 4.4BSD log-structured file system. Dr. Seltzer was a founder and CTO of Sleepycat Software, the makers of Berkeley DB, and is now an Architect at Oracle Corporation. She is currently the President of the USENIX Association and a member of the Computing Research Association's Computing Community Consortium. She is a Sloan Foundation Fellow in Computer Science, an ACM Fellow, a Bunting Fellow, and was the recipient of the 1996 Radcliffe Junior Faculty Fellowship. She is recognized as an outstanding teacher and mentor, having received the Phi Beta Kappa teaching award in 1996, the Abrahmson Teaching Award in 1999, and the Capers and Marion McDonald Award for Excellence in Mentoring and Advising in 2010.

Dr. Seltzer received an A.B. degree in Applied Mathematics from Harvard/Radcliffe College in 1983 and a Ph. D. in Computer Science from the University of California, Berkeley, in 1992.

Basho Technologies

May 13, 2013
Tweet

More Decks by Basho Technologies

Other Decks in Technology

Transcript

  1. Automa'cally  Scalable  
    Computa'on  
    Jonathan  Appavoo,  Boston  University  
    Amos  Waterland,  Elaine  Angelino,  Margo  Seltzer  
    Harvard  School  of  Engineering  and  Computer  Science  
     
    May  13,  2013  

    View full-size slide

  2. Imagine  
    RICON  May  2013   2  
    Hmmm,  here’s  my  nice  
    sequen;al  program.  
    Sure  wish  I  could  run  it  
    on  a  million  cores.  
    Whoaaaaaaa….  

    View full-size slide

  3. Join  me  in  a  thought  experiment…  
    RICON  May  2013   3  
    Inst  Ptr  
    Registers   Memory  

    View full-size slide

  4. Execu'on  in  a  Really  Big  State  Space  
    RICON  May  2013   4  
    Inst  Ptr  
    Registers  
    Memory  
    Really  big  state  space  
    ini;aliza;on  
    Inst  Ptr  
    ldi  0,  r0  
    addi  1,  r0  
    Program  
    Data  
    0  
    Inst  Ptr  
    ldi  0,  r0  
    addi  1,  r0  
    Program  
    Data  
    1  
    Inst  Ptr  
    ldi  0,  r0  
    addi  1,  r0  
    Program  
    Data  

    View full-size slide

  5. Trajectory-­‐Based  Execu'on  
    RICON  May  2013   5  

    View full-size slide

  6. Parallel  Trajectory-­‐Based  Execu'on  
    RICON  May  2013   6  
    Run  in  1/N  of  the  ;me!  

    View full-size slide

  7. But  Wait  …  
    RICON  May  2013   7  
    Infinite  speedup!  

    View full-size slide

  8. Something  More  Realis'c  
    RICON  May  2013   8  
    You  are  here  

    View full-size slide

  9. Something  More  Realis'c  
    RICON  May  2013   9  

    View full-size slide

  10. Something  More  Realis'c  
    RICON  May  2013   10  
    Anyone  
    been  here?  
    Anyone  
    been  here?  
    Anyone  
    been  here?  
    Yup!  
    Anyone  
    been  here?  
    Anyone  
    been  here?  
    Yup!  

    View full-size slide

  11. You  Must  be  Kidding!?  
    RICON  May  2013   11  
    You  can’t  
    possibly  guess  
    places  in  the  
    trajectory!?  
    How  does  
    the  master  
    know  when  
    to  check?  
    How  do  you  
    compare  
    states?  
    Where  do  
    you  store  
    trajectories?  
    Are  you  on  
    drugs????  

    View full-size slide

  12. I’m  not  Kidding  
    RICON  May  2013   12  
    LASC speedup for Ising on Blue Gene/P
    Number of cores in log2
    scale
    Speedup in log2
    scale
    2 4 8 16 32 64 128 256 512 1024 2048 4096
    2 4 8 16 32 64 128 256 512 1024 4096
    Ideal speedup
    LASC cycle count speedup
    LASC speedup

    View full-size slide

  13. How  Can  This  Possibly  Be?  
    RICON  May  2013   13  
    Pseudo-random Program
    Trivial Program
    Real-world Program

    View full-size slide

  14. Programs  are  like  People  
    RICON  May  2013   14  
    for ( i = 0; i < gazillion; i++) {!
    !!
    !!
    !!
    }!
    !
    foo(a, b, c);!
    !
    foo(a, b, c);!
    !
    foo(a, b, c);!

    View full-size slide

  15. An  Automa'cally  Scalable  Architecture  
    RICON  May  2013   15  
    ???  
    hit  
    Trajectory
    based  
    execu;on  
    engine  
    state  
    vectors  
    recognizer  
    States  from  which  
    to  speculate  
    Trajectory  Cache  
    selected  
    states  
    Predictors  
    predicted  states  
    Allocator  
    Recognized  IPs  

    View full-size slide

  16. Poten'al  Speedup  of  ASC  
    RICON  May  2013   16  

    View full-size slide

  17. allocator  
    predicted  states  
    Implemen'ng  ASC  using  Learning  (LASC)  
    RICON  May  2013   17  
    ???  
    hit  
    Trajectory
    based  
    execu;on  
    engine  
    state  
    vectors  
    recognizer  
    States  from  which  
    to  speculate  
    Trajectory  Cache  
    selected  
    states  
    Predictors  
    predicted  states  
    Allocator  
    Recognized  IPs  
    VM  
    state  
    vectors  
    predictors  
    IP  =  ξ

    VM  
    VM  
    VM  
    VM  
    VM  
    speculators  
    trajectory  cache  

    View full-size slide

  18. The  LASC  VM  
    RICON  May  2013   19  
    VM  
    VM  
    VM  
    VM  
    VM  
    VM  
    Inst  Ptr  
    ldi  0,  r0  
    addi  1,  r0  
    0  
    1  
    1  
    1  
    0  
    0  
    1  
    1  
    1  
    0  
    1  
    .  
    .  
    .  
    1  
    0  
    1  
    1  
    1  
    0  
    0  
    0  
    0  
    0  
    0  
    1  
    .  
    .  
    .  
    1  

    View full-size slide

  19. The  LASC  Recognizer  
    RICON  May  2013   20  
    VM  
    VM  
    VM  
    VM  
    VM  
    Here’s  an  IP?  I  
    wonder  if  I  
    can  predict  it.  
    It’s  been  
    awhile;  I’ll  try  
    another  IP.  
    Hey  –  I  just  saw  
    the  same  IP!  I  
    can  build  a  
    model.    Yippee!!!  

    View full-size slide

  20. The  LASC  Recognizer  
    RICON  May  2013   21  
    VM  
    VM  
    VM  
    VM  
    VM  
    predictors  
    predictors  
    predictors  
    Here  is  a  state  with  IP  0x1008  
    Here  is  a  state  with  IP  0x1012  
    I  think  you’ll  see  this  state  
    Hey  –  that  predic;on  
    was  right  –  I  like  that  IP!  

    View full-size slide

  21. The  LASC  Predictors  
    RICON  May  2013   22  
    VM  
    VM  
    VM  
    VM  
    VM  
    Weatherman:  
    •  Bit-­‐level  predic;ons  
    •  Each  bit  is  the  same  as  the  last  observa;on  
    Mean:  
    •  Bit-­‐level  predic;ons  
    •  Predict  the  mean  value  observed  
    Logis;c  Regression:  
    •  Bit-­‐level  predic;ons  
    •  1-­‐layer  neural  net  
    Linear  Regression:  
    •  32-­‐bit  feature  predic;ons  
    •  Fits  a  curve  to  the  sequence  of  
    observed  values.  

    View full-size slide

  22. The  LASC  Allocator  
    RICON  May  2013   23  
    VM  
    VM  
    VM  
    VM  
    VM  
    0  
    1  
    1  
    1  
    0  
    0  
    1  
    1  
    1  
    0  
    1  
    .  
    .  
    .  
    1  
    weatherman  
    mean  
    Logis;c  
    regressor  
    Linear  
    regressor  
    1  
    1  
    0  
    1  
    Weights  
    for  these  
    guys  are:  
    1,  1,  1,  1  
    Here  were  
    their  
    predic;ons  
    for  this  bit  
    Mr.  logis;c  
    there  gets  
    his  weight  
    cut  in  half  
    1  
    1  
    1  
    1  
    0.5  

    View full-size slide

  23. The  LASC  Allocator  
    RICON  May  2013   24  
    VM  
    VM  
    VM  
    VM  
    VM  
    0  
    1  
    1  
    1  
    0  
    0  
    1  
    1  
    1  
    0  
    1  
    .  
    .  
    .  
    1  
    weatherman  
    mean  
    Logis;c  
    regressor  
    Linear  
    regressor  
    1  
    1  
    0  
    0  
    Weights  
    for  these  
    guys  are:  
    1,  1,  1,  1  
    Next  
    predic;ons  
    please!  
    1  
    1  
    1  
    0.5  
    OK  –  I  
    predict  1!  
    0  
    1  
    1  
    0  
    0  
    0  
    1  
    1  
    1  
    0  
    1  
    .  
    .  
    .  
    1  
    Bummer!  
    Upda;ng  
    weights…  
    0.5  
    0.5  

    View full-size slide

  24. The  LASC  Allocator  
    RICON  May  2013   25  
    VM  
    VM  
    VM  
    VM  
    VM  
    0  
    1  
    1  
    1  
    0  
    0  
    1  
    1  
    1  
    0  
    1  
    .  
    .  
    .  
    1  
    weatherman  
    mean  
    Logis;c  
    regressor  
    Linear  
    regressor  
    0  
    1  
    0  
    0  
    Weights  
    for  these  
    guys  are:  
    1,  1,  1,  1  
    Next  
    predic;ons  
    please!  
    0.5  
    0.5  
    1  
    0.5  
    0  
    1  
    1  
    0  
    0  
    0  
    1  
    1  
    1  
    0  
    1  
    .  
    .  
    .  
    1  
    0  
    1  
    1  
    0  
    0  
    0  
    1  
    1  
    1  
    0  
    1  
    .  
    .  
    .  
    1  
    I  predict  0!  

    View full-size slide

  25. The  LASC  Speculators  
    RICON  May  2013   26  
    VM  
    VM  
    VM  
    VM  
    VM  
    VM  
    101011001000100001000100010  
    IP  
    S0  
    Add  8  into  A  
    IP   Address  A  
    8  
    R-­‐mask  
    W-­‐mask  
    111111110000111100011111111  
    000000000000000000000000000  
    101100001000100001000101010  
    S1  
    000000000000000000000000000  
    111111110000000000011111111  
    read   000000000000000000000000000  
    101011000000100000000100010  
    wrihen   000000000000000000000000000  
    101011000000000000000101010  
    Cache  Entry  

    View full-size slide

  26. The  LASC  Trajectory  Cache  
    RICON  May  2013   27  
    VM  
    VM  
    VM  
    VM  
    VM  
    111000000111100010100100010  
    Sn  
    Entry1  
    R   111111110000111100011111111  
    W   111111110000000000011111111  
    read   101011000000100000000100010  
    wrihen   101011000000100000000100010  
    Entry2  
    R   111111110011000000000000000  
    W   111111110000000011000000000  
    read   111000000011000000000000000  
    wrihen   111001000000000010000000000  
    trajectory  cache  

    View full-size slide

  27. The  LASC  Trajectory  Cache  
    RICON  May  2013   28  
    VM  
    VM  
    VM  
    VM  
    VM  
    111000000111100001100100010  
    Sn  
    Sn+1  
    Entry1  
    R   111111110000111100011111111  
    W   111111110000000000011111111  
    read   101011000000100000000100010  
    wrihen   101011000000000000000100010  
    Entry2  
    R   111111110011000000000000000  
    W   111111110000000011000000000  
    read   111000000011000000000000000  
    wrihen   111001000000000010000000000  
    trajectory  cache  
    11100100   10  
    01111000   100100010  

    View full-size slide

  28. The  LASC  Implementa'on  
    RICON  May  2013   29  
    state  
    vectors  
    predictors  
    IP  =  ξ

    allocator  
    predicted  states  
    VM  
    VM  
    VM  
    VM  
    VM  
    VM  
    speculators  
    trajectory  cache  
    ???  
    tunnel  

    View full-size slide

  29. How  Well  Does  it  Work?  
    RICON  May  2013   30  
    LASC speedup for Ising on Blue Gene/P
    Number of cores in log2
    scale
    Speedup in log2
    scale
    2 4 8 16 32 64 128 256 512 1024 2048 4096
    2 4 8 16 32 64 128 256 512 1024 4096
    Ideal speedup
    LASC cycle count speedup
    LASC speedup
    struct node node = head;!
    while (node) {!
    energy=potential(node);!
    !
    if (energy < GROUND + e)!
    break;!
    !
    node = node->next;!
    }!
    int!
    potential(struct node *node) {!
    int i, j, spin, energy;!
    energy = 0;!
    !
    for (i=0; i < I; i++) {!
    for (j = 0;j < J; j++) {!
    spin = node->spins[i][j];!
    /* Calculate energy. */!
    }!
    }!
    !
    return (energy);!
    }!

    View full-size slide

  30. Even  Cooler  Results  
    RICON  May  2013   31  
    main(argc, argv)!
    {!
    !int i, s;!
    !
    !for (i = 2; i< 100000000; i++) {!
    ! !s = i;!
    ! !while (s > 1) {!
    ! ! !if (s % 2 == 0) {!
    ! ! ! !s = s / 2;!
    ! ! !} else {!
    ! ! ! !s = 3 * s + 1;!
    ! ! !}!
    ! !}!
    !}!
    }!
    SC speedup for collatz on Blue Gene/P
    Number of cores in log2
    scale
    4 16 64 256 2048 16384
    Ideal speedup
    LASC cycle count speedup
    LASC speedup
    2e+07 4e+07 6e+07 8e+07 1e+08
    0.8 0.9 1.0 1.1 1.2 1.3 1.4
    LASC speedup for collatz on 1-core laptop
    Instructions
    Speedup
    Baseline
    LASC speedup

    View full-size slide

  31. We  Get  the  Other  Speedup  Too!  
    RICON  May  2013   32  
    20 25 30
    latz on 32-core server
    of cores
    nt speedup
    LASC speedup for collatz on Blue Gene/P
    Number of cores in log2
    scale
    Speedup in log2
    scale
    1 4 16 64 256 2048 16384
    1 4 16 64 256 2048 16384
    Ideal speedup
    LASC cycle count speedup
    LASC speedup
    2e+07 4
    0.8 0.9 1.0 1.1 1.2 1.3 1.4
    LASC speedup
    Speedup

    View full-size slide

  32. You  Must  be  Kidding!?  
    RICON  May  2013   33  
    You  can’t  
    possibly  guess  
    places  in  the  
    trajectory!?  
    How  does  
    the  master  
    know  when  
    to  check?  
    How  do  you  
    compare  
    states?  
    Where  do  
    you  store  
    trajectories?  
    Are  you  on  
    drugs????  

    View full-size slide

  33. Par'ng  Thoughts  
    RICON  May  2013   34  
    These  are  
    just  toy  
    problems!   Your  VM  
    is  s;ll  way  
    slow.  
    You  need  really  
    good  predic;on  and  
    you’ll  never  get  that  
    for  meaningful  
    programs  
    Couldn’t  you  just  do  
    this  all  in  the  
    compiler  and  save  
    yourself  a  lot  of  
    work?  
    It  seems  that  the  
    trajectory  cache  is  
    always  going  to  be  
    an  enormous  
    bohleneck.  

    View full-size slide

  34. Thank  You!  
    RICON  May  2013   35  
    [email protected]  
    state  
    vectors  
    predictors  
    IP  =  ξ

    allocator  
    predicted  states  
    VM  
    VM  
    VM  
    VM  
    VM  
    VM  
    speculators  
    trajectory  cache  
    ???  
    tunnel  

    View full-size slide