structure parallelism parallel algorithm parallel data structure real world complicated source code simple source code buggy difficult to maintain actually we want
w FYFDVUFDSJUJDBMTFDUJPOTQFDVMBUJWFMZ w NVMUJQMFDPEFTFOUFSTBNFDSJUJDBMTFDUJPOTJNVMUBOFPVTMZ w DPOqJDUTBSFEFUFDUFECPUIXIJMFFYFDVUJOHDSJUJDBMTFDUJPOBOEUIFFOE PGDSJUJDBMTFDUJPO 8
%*4$ w -4" w 5PSWBME3JFHFM 1BTDBM'FMCFS BOE$ISJTUPG'FU[FS l"-B[Z4OBQTIPU"MHPSJUINXJUI&BHFS 7BMJEBUJPOz UI*OUFSOBUJPOBM$POGFSFODFPO%JTUSJCVUFE$PNQVUJOH %*4$ w -PH5. w ,FWJO&.PPSF +BZBSBN#PCCB .JDIFMMF+.PSBWBO .BSL%)JMM %BWJE"8PPE l-PH5.MPH CBTFEUSBOTBDUJPOBMNFNPSZz )1$" w %&6$& w (VZ,PSMBOE /JS4IBWJUBOE1BTDBM'FMCFS l/POJOWBTJWF+BWB$PODVSSFODZXJUI%FVDF45.z .VMUJ1SPH 12 FUD
Coarse Grain Lock Application re-written with Finer Grain Locks An example of secondary benefits of Intel® TSX Coarse Grain Lock Coarse Grain Lock + Intel® TSX Fine Grain Locks Fine Grain Locks + Intel® TSX Fine Grain Behavior at Coarse Grain Effort GSPN*OUFM%FWFMPQFS'PSVN
Algorithms Lock-Free Algorithm • Don’t use critical section locks • Developer manages concurrency • Very difficult to get correct & optimize – Constrain data structure selection – Highly contended atomic operations State of the art lock-free algorithm Ops/sec Threads Ops/sec Threads TSX lock based algorithm Lock-Based + Intel® TSX • Use critical section locks for ease • Let hardware extract concurrency • Enables algorithm simplification – Flexible data structure selection – Equivalent data structure lock-free algorithm very hard to verify Real World Example GSPN*OUFM%FWFMPQFS'PSVN