Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Survey of Transactional Memory

ytakano
May 31, 2016

Survey of Transactional Memory

ytakano

May 31, 2016
Tweet

More Decks by ytakano

Other Decks in Research

Transcript

  1. %FBEMPDL 3 t Thread 1 Thread 2 Lock B Lock

    A try to acquire A and fail try to acquire B and fail
  2. 4UBSWBUJPO 4 t High Priority Thread (acquire A) High Priority

    Thread (acquire B) Lock B Lock A Low Priority Thread (acquire A and B) try to acquire A and fail Lock A try to acuire B and fail Lock A Release A
  3. -PDL$POWPZ 6 Scheduler Thread1 Thread2 Thread3 ThreadN 1. contention Thread2

    4. acquire 2. event 3. contention (spin lock) 4. reschedule high overhead when many threads
  4. $PNQMFYJUZPG .VMUJUISFBE1SPHSBNNJOH 7 algorithm data structure ideal world algorithm data

    structure parallelism parallel algorithm parallel data structure real world complicated source code simple source code buggy difficult to maintain actually we want
  5. -PDLBOE 5SBOTBDUJPOBM.FNPSZ w -PDL w FYFDVUFDSJUJDBMTFDUJPOFYDMVTJWFMZ w POMZPOFDPEFFOUFSUIFDSJUJDBMTFDUJPO w 5SBOTBDUJPOBM.FNPSZ

    w FYFDVUFDSJUJDBMTFDUJPOTQFDVMBUJWFMZ w NVMUJQMFDPEFTFOUFSTBNFDSJUJDBMTFDUJPOTJNVMUBOFPVTMZ w DPOqJDUTBSFEFUFDUFECPUIXIJMFFYFDVUJOHDSJUJDBMTFDUJPOBOEUIFFOE PGDSJUJDBMTFDUJPO 8
  6. 4QJOMPDLCZ"UPNJD0QFSBUJPO w $"4 DPNQBSFBOETXBQ  w DPNQBSFBOETXBQBSFQFSGPSNFEBUPNJDBMMZ w UFTUBOETFU DPNQBSFBOEBEE

    FUDʜ w TQJOMPDLJTBDIJFWFECZVTJOH$"4 9 int locked; lock_spin() { while (__sync_lock_test_and_set(&locked, 1)) { while (locked) ; // busy-wait } } unlock_spin() { __sync_lock_release(&locked); } JGMPDLFEJT TFU
  7. 4ZOUBYPG5SBOTBDUJPOBM.FNPSZ BUPNJD SFUSZ PS&MTF 10 atomic { // transaction if

    (q.size() == 0) { // rollback and retry // transactions is restarted when // read-set is updated retry; } … // do something } orElse { // detect rollback and retry }
  8. 4PGUXBSF5SBOTBDUJPOBM .FNPSZ w 5- w %BWF%JDF 0SJ4IBMFW BOE/JS4IBWJUl5SBOTBDUJPOBMMPDLJOH**z UI*OUFSOBUJPOBM$POGFSFODFPO %JTUSJCVUFE$PNQVUJOHz

    %*4$ w -4" w 5PSWBME3JFHFM 1BTDBM'FMCFS BOE$ISJTUPG'FU[FS l"-B[Z4OBQTIPU"MHPSJUINXJUI&BHFS 7BMJEBUJPOz UI*OUFSOBUJPOBM$POGFSFODFPO%JTUSJCVUFE$PNQVUJOH %*4$ w -PH5. w ,FWJO&.PPSF +BZBSBN#PCCB .JDIFMMF+.PSBWBO .BSL%)JMM %BWJE"8PPE l-PH5.MPH CBTFEUSBOTBDUJPOBMNFNPSZz )1$" w %&6$& w (VZ,PSMBOE /JS4IBWJUBOE1BTDBM'FMCFS l/POJOWBTJWF+BWB$PODVSSFODZXJUI%FVDF45.z  .VMUJ1SPH 12 FUD
  9. 5-7BSJBCMFT 14 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL WBSJBCMF

    WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU
  10. 5-"MHPSJUIN  15 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } MPBEUIFHMPCBMWFSTJPODMPDLBOETUPSFJUJOB UISFBEMPDBMSFBEWFSTJPOOVNCFS 
  11. 5-"MHPSJUIN  16 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU SVOUISPVHIBTQFDVMBUJWFFYFDVUJPO transaction { load var1; load var2; … store var3; } SVO
  12. 5-"MHPSJUIN  17 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU MPHSFBEBEESFTTFTUPUIFSFBETFU transaction { load var1; load var2; … store var3; } MPHSFBETFU
  13. 5-"MHPSJUIN  18 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU MPHXSJUFBEESFTTFTBOEWBMVFTUP UIFXSJUFTFU transaction { load var1; load var2; … store var3; } MPHXSJUFTFU
  14. 5-"MHPSJUIN  19 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ XSJUFMPDL WBSJBCMF

    WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL UISFBE QPJOUFSUP QPJOUFSUP SFBETFU QPJOUFSUP XSJUFTFU QPJOUFSUP WBMVFPG WBSJBCMFJTTUPSFEBOEMPBEFE /PUFUIBUJGBWBSJBCMFJOUIFSFBETFUBMSFBEZBQQFBST JOUIFXSJUFTFU SFGFSUPUIFWBSJBCMFJOUIFXSJUFTFU GSPNUPBWPJESFBEBGUFSXSJUFIB[BSE
  15. 5-"MHPSJUIN  20 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLWBSJBCMFTBSFOPUNPEJpFEXIFO MPBEJOHNBLFTVSFUIBUWFSTJPOOVNCFSTBSF MFTTUIBOUIFSFBEWFSTJPOOVNCFS transaction { load var1; load var2; … store var3; }  JGNPEJpFE BCPSUUSBOTBDUJPO
  16. 5-"MHPSJUIN  21 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLXSJUFMPDLTBSFGSFF transaction { load var1; load var2; … store var3; } GSFF GSFF JGMPDLFE BCPSUUSBOTBDUJPO
  17. 5-"MHPSJUIN  22 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU BDRVJSFXSJUFMPDLTVTJOHCPVOEFETQJOMPDL transaction { load var1; load var2; … store var3; } MPDL JGGBJMFEUPBDRVJSFXSJUFMPDLMPDLFE BCPSUUSBOTBDUJPO
  18. 5-"MHPSJUIN  23 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU JODSFNFOUUIFHMPCBMWFSTJPODMPDL $"4PQFSBUJPO  BOETUPSFJUUPUIFXSJUFWFSTJPOOVNCFS transaction { load var1; load var2; … store var3; } JODSFNFOU BOETUPSF
  19. 5-"MHPSJUIN  24 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLWBSJBCMFTBSFOPUNPEJpFEXIFO MPBEJOHNBLFTVSFUIBUWFSTJPOOVNCFSTBSF MFTTUIBOUIFSFBEWFSTJPOOVNCFS transaction { load var1; load var2; … store var3; }  JGNPEJpFE BCPSUUSBOTBDUJPO
  20. 5-"MHPSJUIN  25 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLXSJUFMPDLTBSFGSFF transaction { load var1; load var2; … store var3; } GSFF GSFF JGMPDLFE BCPSUUSBOTBDUJPO
  21. 5-"MHPSJUIN  26 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } SW XW JOUIFTQFDJBMDBTF XIFSFSFBEWFSTJPO OVNCFS XSJUFWFSTJPOOVNCFS JUJTOPU OFDFTTBSZUPWBMJEBUFUIFSFBETFU

  22. 5-"MHPSJUIN  27 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } DPNNJUWBMVFTPGUIFXSJUFTFU

  23. 5-"MHPSJUIN  28 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } VQEBUFWFSTJPOOVNCFSTCZUIF XSJUFWFSTJPOOVNCFS SFMFBTF
  24. 5-"MHPSJUIN  29 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } SFMFBTFUIFXSJUFMPDLT SFMFBTF
  25. .&4* .PEJpFE4UBUF 33 main memory CPU0 CPU1 cache 0 cache

    1 cache line dirty, must write back not shared with other CPU
  26. .&4* &YDMVTJWF4UBUF 34 main memory CPU0 CPU1 cache 0 cache

    1 cache line not modified not shared with other CPU
  27. .&4* 4IBSFE4UBUF 35 main memory CPU0 CPU1 cache 0 cache

    1 cache line not modified shared with other CPU
  28. .&4* &YDMVTJWF-PBE 37 main memory CPU0 CPU1 cache 0 cache

    1 1. request exclusive load 2. write back if modified 3. change state to invalid 4. load state with exclusive state
  29. .&4* 4IBSFE-PBE 38 main memory CPU0 CPU1 cache 0 cache

    1 1. request shared load 2. write back if modified 3. change state to shared 4. load state with shared state
  30. .&4* FWJDUJPO 39 main memory CPU0 CPU1 cache 0 cache

    1 1. write back if modified 2. discard
  31. 5SBOTBDUJPOBM $BDIF$PIFSFODF  40 main memory CPU0 CPU1 cache 0

    cache 1 0 prepare transactional bit in each cache line 0: not in transaction 1: in transaction
  32. 5SBOTBDUJPOBM $BDIF$PIFSFODF  41 main memory CPU0 CPU1 cache 0

    cache 1 1 abort transaction if MESI protocol invalidates transaction entry shared or exclusive state
  33. 5SBOTBDUJPOBM $BDIF$PIFSFODF  42 main memory CPU0 CPU1 cache 0

    cache 1 1 discard modified value and abort transaction if MESI protocol invalidates or evicts transaction entry modified
  34. 5SBOTBDUJPOBM $BDIF$PIFSFODF  43 main memory CPU0 CPU1 cache 0

    cache 1 1 abort transaction if MESI protocol evicts transaction entry because cache coherence protocol cannot detect conflicts evicted
  35. 1SPCMFN  w JOpOJUFMPPQJOUSBOTBDUJPO w EFUFDUJPOPGWBSJBCMFWFSTJPOJOMPPQTTIPVMESFEVDF QFSGPSNBODFTJHOJpDBOUMZ w SFRVJSFNFOUPGDMPTFENFNPSZNBOBHFNFOU w

    DPEFTPVUPGUSBOTBDUJPODBOSFGFSBOEVQEBUFWBSJBCMFT JOUSBOTBDUJPOJOMBOHVBHFTMJLF$ $  w DPNQJMFSPSSVOOJOHFOWJSPONFOUTIPVMEDBSFBCPVU 45
  36. 1SPCMFN  46 atomic { … launchMissile(); … } .JTTJMFTNBZCF

    MBVODIFENBOZUJNFT *0JOUSBOTBDUJPONVTUDBVTFTBCPSU
  37. *OUFM54935. 51 xbegin ABORT . . . xend ABORT: //

    fallback if aborted sometimes, must go to fallback codes (such as spin lock)
  38. -PDLCZVTJOHUTYUPPMT IUUQTHJUIVCDPNBOEJLMFFOUTYUPPMT 52 volatile int lock = 0; rtm_lock() {

    for (int i = 0; i < RTM_MAX_RETRY; i++) { unsigned status = _xbegin(); if (status == _XBEGIN_STARTED) { if (! lock) return; // successfully started _xabort(0xff); } if ((status & _XABORT_EXPLICIT) && _XABORT_CODE(status) == 0xff && ! (status & _XABORT_NESTED) { while (lock) _mm_pause(); // busy-wait } else if (!(status & _XABORT_RETRY)) { break; } } while (__sync_lock_test_and_set(&lock, 1)) { // fallback to spin-lock while (lock) _mm_pause(); // busy-wait } } MPDLCZVTJOH*OUFM54935.
  39. 55 Applying Intel® TSX scaling Threads scaling Threads Application with

    Coarse Grain Lock Application re-written with Finer Grain Locks An example of secondary benefits of Intel® TSX Coarse Grain Lock Coarse Grain Lock + Intel® TSX Fine Grain Locks Fine Grain Locks + Intel® TSX Fine Grain Behavior at Coarse Grain Effort GSPN*OUFM%FWFMPQFS'PSVN
  40. 56 Intel® TSX Can Enable Simpler Scalable Algorithms Enabling Simpler

    Algorithms Lock-Free Algorithm • Don’t use critical section locks • Developer manages concurrency • Very difficult to get correct & optimize – Constrain data structure selection – Highly contended atomic operations State of the art lock-free algorithm Ops/sec Threads Ops/sec Threads TSX lock based algorithm Lock-Based + Intel® TSX • Use critical section locks for ease • Let hardware extract concurrency • Enables algorithm simplification – Flexible data structure selection – Equivalent data structure lock-free algorithm very hard to verify Real World Example GSPN*OUFM%FWFMPQFS'PSVN