Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Survey of Transactional Memory

Avatar for ytakano ytakano
May 31, 2016

Survey of Transactional Memory

Avatar for ytakano

ytakano

May 31, 2016
Tweet

More Decks by ytakano

Other Decks in Research

Transcript

  1. %FBEMPDL 3 t Thread 1 Thread 2 Lock B Lock

    A try to acquire A and fail try to acquire B and fail
  2. 4UBSWBUJPO 4 t High Priority Thread (acquire A) High Priority

    Thread (acquire B) Lock B Lock A Low Priority Thread (acquire A and B) try to acquire A and fail Lock A try to acuire B and fail Lock A Release A
  3. -PDL$POWPZ 6 Scheduler Thread1 Thread2 Thread3 ThreadN 1. contention Thread2

    4. acquire 2. event 3. contention (spin lock) 4. reschedule high overhead when many threads
  4. $PNQMFYJUZPG .VMUJUISFBE1SPHSBNNJOH 7 algorithm data structure ideal world algorithm data

    structure parallelism parallel algorithm parallel data structure real world complicated source code simple source code buggy difficult to maintain actually we want
  5. -PDLBOE 5SBOTBDUJPOBM.FNPSZ w -PDL w FYFDVUFDSJUJDBMTFDUJPOFYDMVTJWFMZ w POMZPOFDPEFFOUFSUIFDSJUJDBMTFDUJPO w 5SBOTBDUJPOBM.FNPSZ

    w FYFDVUFDSJUJDBMTFDUJPOTQFDVMBUJWFMZ w NVMUJQMFDPEFTFOUFSTBNFDSJUJDBMTFDUJPOTJNVMUBOFPVTMZ w DPOqJDUTBSFEFUFDUFECPUIXIJMFFYFDVUJOHDSJUJDBMTFDUJPOBOEUIFFOE PGDSJUJDBMTFDUJPO 8
  6. 4QJOMPDLCZ"UPNJD0QFSBUJPO w $"4 DPNQBSFBOETXBQ  w DPNQBSFBOETXBQBSFQFSGPSNFEBUPNJDBMMZ w UFTUBOETFU DPNQBSFBOEBEE

    FUDʜ w TQJOMPDLJTBDIJFWFECZVTJOH$"4 9 int locked; lock_spin() { while (__sync_lock_test_and_set(&locked, 1)) { while (locked) ; // busy-wait } } unlock_spin() { __sync_lock_release(&locked); } JGMPDLFEJT TFU
  7. 4ZOUBYPG5SBOTBDUJPOBM.FNPSZ BUPNJD SFUSZ PS&MTF 10 atomic { // transaction if

    (q.size() == 0) { // rollback and retry // transactions is restarted when // read-set is updated retry; } … // do something } orElse { // detect rollback and retry }
  8. 4PGUXBSF5SBOTBDUJPOBM .FNPSZ w 5- w %BWF%JDF 0SJ4IBMFW BOE/JS4IBWJUl5SBOTBDUJPOBMMPDLJOH**z UI*OUFSOBUJPOBM$POGFSFODFPO %JTUSJCVUFE$PNQVUJOHz

    %*4$ w -4" w 5PSWBME3JFHFM 1BTDBM'FMCFS BOE$ISJTUPG'FU[FS l"-B[Z4OBQTIPU"MHPSJUINXJUI&BHFS 7BMJEBUJPOz UI*OUFSOBUJPOBM$POGFSFODFPO%JTUSJCVUFE$PNQVUJOH %*4$ w -PH5. w ,FWJO&.PPSF +BZBSBN#PCCB .JDIFMMF+.PSBWBO .BSL%)JMM %BWJE"8PPE l-PH5.MPH CBTFEUSBOTBDUJPOBMNFNPSZz )1$" w %&6$& w (VZ,PSMBOE /JS4IBWJUBOE1BTDBM'FMCFS l/POJOWBTJWF+BWB$PODVSSFODZXJUI%FVDF45.z  .VMUJ1SPH 12 FUD
  9. 5-7BSJBCMFT 14 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL WBSJBCMF

    WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU
  10. 5-"MHPSJUIN  15 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } MPBEUIFHMPCBMWFSTJPODMPDLBOETUPSFJUJOB UISFBEMPDBMSFBEWFSTJPOOVNCFS 
  11. 5-"MHPSJUIN  16 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU SVOUISPVHIBTQFDVMBUJWFFYFDVUJPO transaction { load var1; load var2; … store var3; } SVO
  12. 5-"MHPSJUIN  17 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU MPHSFBEBEESFTTFTUPUIFSFBETFU transaction { load var1; load var2; … store var3; } MPHSFBETFU
  13. 5-"MHPSJUIN  18 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU MPHXSJUFBEESFTTFTBOEWBMVFTUP UIFXSJUFTFU transaction { load var1; load var2; … store var3; } MPHXSJUFTFU
  14. 5-"MHPSJUIN  19 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ XSJUFMPDL WBSJBCMF

    WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL UISFBE QPJOUFSUP QPJOUFSUP SFBETFU QPJOUFSUP XSJUFTFU QPJOUFSUP WBMVFPG WBSJBCMFJTTUPSFEBOEMPBEFE /PUFUIBUJGBWBSJBCMFJOUIFSFBETFUBMSFBEZBQQFBST JOUIFXSJUFTFU SFGFSUPUIFWBSJBCMFJOUIFXSJUFTFU GSPNUPBWPJESFBEBGUFSXSJUFIB[BSE
  15. 5-"MHPSJUIN  20 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLWBSJBCMFTBSFOPUNPEJpFEXIFO MPBEJOHNBLFTVSFUIBUWFSTJPOOVNCFSTBSF MFTTUIBOUIFSFBEWFSTJPOOVNCFS transaction { load var1; load var2; … store var3; }  JGNPEJpFE BCPSUUSBOTBDUJPO
  16. 5-"MHPSJUIN  21 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLXSJUFMPDLTBSFGSFF transaction { load var1; load var2; … store var3; } GSFF GSFF JGMPDLFE BCPSUUSBOTBDUJPO
  17. 5-"MHPSJUIN  22 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU BDRVJSFXSJUFMPDLTVTJOHCPVOEFETQJOMPDL transaction { load var1; load var2; … store var3; } MPDL JGGBJMFEUPBDRVJSFXSJUFMPDLMPDLFE BCPSUUSBOTBDUJPO
  18. 5-"MHPSJUIN  23 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU JODSFNFOUUIFHMPCBMWFSTJPODMPDL $"4PQFSBUJPO  BOETUPSFJUUPUIFXSJUFWFSTJPOOVNCFS transaction { load var1; load var2; … store var3; } JODSFNFOU BOETUPSF
  19. 5-"MHPSJUIN  24 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLWBSJBCMFTBSFOPUNPEJpFEXIFO MPBEJOHNBLFTVSFUIBUWFSTJPOOVNCFSTBSF MFTTUIBOUIFSFBEWFSTJPOOVNCFS transaction { load var1; load var2; … store var3; }  JGNPEJpFE BCPSUUSBOTBDUJPO
  20. 5-"MHPSJUIN  25 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU DIFDLXSJUFMPDLTBSFGSFF transaction { load var1; load var2; … store var3; } GSFF GSFF JGMPDLFE BCPSUUSBOTBDUJPO
  21. 5-"MHPSJUIN  26 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } SW XW JOUIFTQFDJBMDBTF XIFSFSFBEWFSTJPO OVNCFS XSJUFWFSTJPOOVNCFS JUJTOPU OFDFTTBSZUPWBMJEBUFUIFSFBETFU

  22. 5-"MHPSJUIN  27 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } DPNNJUWBMVFTPGUIFXSJUFTFU

  23. 5-"MHPSJUIN  28 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } VQEBUFWFSTJPOOVNCFSTCZUIF XSJUFWFSTJPOOVNCFS SFMFBTF
  24. 5-"MHPSJUIN  29 HMPCBMWFSTJPODMPDL WBSJBCMF WFSTJPOOVNCFS (MPCBM.FNPSZ 5ISFBE-PDBM.FNPSZ SFBEWFSTJPOOVNCFS XSJUFMPDL

    WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL WBSJBCMF WFSTJPOOVNCFS XSJUFMPDL XSJUFWFSTJPOOVNCFS UISFBE SFBETFU XSJUFTFU transaction { load var1; load var2; … store var3; } SFMFBTFUIFXSJUFMPDLT SFMFBTF
  25. .&4* .PEJpFE4UBUF 33 main memory CPU0 CPU1 cache 0 cache

    1 cache line dirty, must write back not shared with other CPU
  26. .&4* &YDMVTJWF4UBUF 34 main memory CPU0 CPU1 cache 0 cache

    1 cache line not modified not shared with other CPU
  27. .&4* 4IBSFE4UBUF 35 main memory CPU0 CPU1 cache 0 cache

    1 cache line not modified shared with other CPU
  28. .&4* &YDMVTJWF-PBE 37 main memory CPU0 CPU1 cache 0 cache

    1 1. request exclusive load 2. write back if modified 3. change state to invalid 4. load state with exclusive state
  29. .&4* 4IBSFE-PBE 38 main memory CPU0 CPU1 cache 0 cache

    1 1. request shared load 2. write back if modified 3. change state to shared 4. load state with shared state
  30. .&4* FWJDUJPO 39 main memory CPU0 CPU1 cache 0 cache

    1 1. write back if modified 2. discard
  31. 5SBOTBDUJPOBM $BDIF$PIFSFODF  40 main memory CPU0 CPU1 cache 0

    cache 1 0 prepare transactional bit in each cache line 0: not in transaction 1: in transaction
  32. 5SBOTBDUJPOBM $BDIF$PIFSFODF  41 main memory CPU0 CPU1 cache 0

    cache 1 1 abort transaction if MESI protocol invalidates transaction entry shared or exclusive state
  33. 5SBOTBDUJPOBM $BDIF$PIFSFODF  42 main memory CPU0 CPU1 cache 0

    cache 1 1 discard modified value and abort transaction if MESI protocol invalidates or evicts transaction entry modified
  34. 5SBOTBDUJPOBM $BDIF$PIFSFODF  43 main memory CPU0 CPU1 cache 0

    cache 1 1 abort transaction if MESI protocol evicts transaction entry because cache coherence protocol cannot detect conflicts evicted
  35. 1SPCMFN  w JOpOJUFMPPQJOUSBOTBDUJPO w EFUFDUJPOPGWBSJBCMFWFSTJPOJOMPPQTTIPVMESFEVDF QFSGPSNBODFTJHOJpDBOUMZ w SFRVJSFNFOUPGDMPTFENFNPSZNBOBHFNFOU w

    DPEFTPVUPGUSBOTBDUJPODBOSFGFSBOEVQEBUFWBSJBCMFT JOUSBOTBDUJPOJOMBOHVBHFTMJLF$ $  w DPNQJMFSPSSVOOJOHFOWJSPONFOUTIPVMEDBSFBCPVU 45
  36. 1SPCMFN  46 atomic { … launchMissile(); … } .JTTJMFTNBZCF

    MBVODIFENBOZUJNFT *0JOUSBOTBDUJPONVTUDBVTFTBCPSU
  37. *OUFM54935. 51 xbegin ABORT . . . xend ABORT: //

    fallback if aborted sometimes, must go to fallback codes (such as spin lock)
  38. -PDLCZVTJOHUTYUPPMT IUUQTHJUIVCDPNBOEJLMFFOUTYUPPMT 52 volatile int lock = 0; rtm_lock() {

    for (int i = 0; i < RTM_MAX_RETRY; i++) { unsigned status = _xbegin(); if (status == _XBEGIN_STARTED) { if (! lock) return; // successfully started _xabort(0xff); } if ((status & _XABORT_EXPLICIT) && _XABORT_CODE(status) == 0xff && ! (status & _XABORT_NESTED) { while (lock) _mm_pause(); // busy-wait } else if (!(status & _XABORT_RETRY)) { break; } } while (__sync_lock_test_and_set(&lock, 1)) { // fallback to spin-lock while (lock) _mm_pause(); // busy-wait } } MPDLCZVTJOH*OUFM54935.
  39. 55 Applying Intel® TSX scaling Threads scaling Threads Application with

    Coarse Grain Lock Application re-written with Finer Grain Locks An example of secondary benefits of Intel® TSX Coarse Grain Lock Coarse Grain Lock + Intel® TSX Fine Grain Locks Fine Grain Locks + Intel® TSX Fine Grain Behavior at Coarse Grain Effort GSPN*OUFM%FWFMPQFS'PSVN
  40. 56 Intel® TSX Can Enable Simpler Scalable Algorithms Enabling Simpler

    Algorithms Lock-Free Algorithm • Don’t use critical section locks • Developer manages concurrency • Very difficult to get correct & optimize – Constrain data structure selection – Highly contended atomic operations State of the art lock-free algorithm Ops/sec Threads Ops/sec Threads TSX lock based algorithm Lock-Based + Intel® TSX • Use critical section locks for ease • Let hardware extract concurrency • Enables algorithm simplification – Flexible data structure selection – Equivalent data structure lock-free algorithm very hard to verify Real World Example GSPN*OUFM%FWFMPQFS'PSVN