$30 off During Our Annual Pro Sale. View Details »

PHPで学ぶVM型正規表現エンジンの仕組み

 PHPで学ぶVM型正規表現エンジンの仕組み

PHPカンファレンス福岡での発表資料です。

久保田光則

June 27, 2015
Tweet

More Decks by 久保田光則

Other Decks in Technology

Transcript

 1. 1)1ΧϯϑΝϨϯε෱Ԭ
  1)1ͰֶͿԾ૝Ϛγϯܕ
  ਖ਼نදݱΤϯδϯͷ࢓૊Έ

  View Slide

 2. ࣗݾ঺հ
  ‣ ٱอాޫଇ !BOBUPP

  ‣ 6*69σβΠφʔɺ

  ιϑτ΢ΣΞΤϯδχΞ
  ‣ "TQFDUJWF--$୅ද
  ‣ IUUQBTQFDUJWFJP

  View Slide

 3. ࠓ೔࿩͢͜ͱ
  ‣ Ծ૝Ϛγϯܕਖ਼نදݱΤϯδϯͷ࢓૊Έ

  View Slide

 4. ॱʹ࿩͍ͯ͘͜͠ͱ
  ‣ ਖ਼نදݱΤϯδϯͷ࣮૷ํ๏
  ‣ Ծ૝Ϛγϯܕਖ਼نදݱΤϯδϯͱ͸
  ‣ ϚονϯάॲཧͷྲྀΕ
  ‣ 7.ͷ࣋ͭϨδελͱεϨουͱ໋ྩ
  ‣ ਖ਼نදݱ͸ͲͷΑ͏ʹίϯύΠϧ͞ΕΔ͔

  View Slide

 5. ࠓ೔ͷ࿩Λฉ͘ͱͲ͏ͳΔ͔
  ‣ ਖ਼نදݱΤϯδϯͷ࢓૊Έͬͯ͜Μͳʹ୯७ͳͷ͔

  ͬͯͼͬ͘Γ͠·͢ ࢲ͸ͼͬ͘Γ͠·ͨ͠

  ‣ ਖ਼نදݱΤϯδϯ ͷ7.
  ͕ॻ͚ΔΑ͏ʹͳΓ·͢

  View Slide

 6. ਖ਼نදݱΤϯδϯͷ
  ࣮૷ํ๏

  View Slide

 7. ‣ ͦ΋ͦ΋ਖ਼نදݱΤϯδϯͷ

  ࣮૷ํ๏ʹ͸ͳʹ͕͋Δ

  View Slide

 8. ͭʹେผ
  ‣ %'"ϕʔεͷ࣮૷ํ๏
  ‣ Ծ૝Ϛγϯ 7.
  ϕʔεͷ࣮૷ํ๏

  View Slide

 9. Ծ૝Ϛγϯϕʔεͷ

  ਖ਼نදݱΤϯδϯͱ͸

  View Slide

 10. Ծ૝Ϛγϯϕʔεͷ࣮૷
  VM
  ‣ ਖ਼نදݱ༻ͷ໋ྩΛ࣋ͭ7. Ծ૝Ϛγϯ
  Λߏங
  ‣ ਖ਼نදݱΛ7.޲͚ͷ໋ྩʹίϯύΠϧ࣮ͯ͠ߦ
  ‣ 1$3&ͷ࣮૷͸͜Ε

  View Slide

 11. ϚονϯάॲཧͷྲྀΕ
  ਖ਼نදݱΛύʔε
  Ծ૝Ϛγϯ༻ͷ໋ྩྻʹม׵
  Ծ૝ϚγϯͰ࣮ߦ
  /(hoge|fuga)/ match “hoge”?

  View Slide

 12. ਖ਼نදݱΛύʔε
  ‣ ਖ਼نදݱͷจࣈྻΛड͚औͬͯϝλจࣈΛύʔε
  /hoge?|fuga(piyo)*/
  /hoge?|fuga(piyo)*/

  View Slide

 13. 7.༻ͷ໋ྩྻʹม׵
  char ‘h’
  char ‘o’
  char ‘g’
  char ‘e’
  split 1, 6
  jmp 11
  a(piyo)*/

  View Slide

 14. Ծ૝ϚγϯͰ࣮ߦ
  VM
  char ‘h’
  char ‘o’
  char ‘g’
  char ‘e’
  split 1, 6
  jmp 11
  ্͔Β
  ໋ྩղऍ
  ͍ͯ͘͠
  ‣ ݁Ռ༩͑ΒΕͨจࣈྻ͕Ϛον͢Δ͔Λ൑ఆ

  View Slide

 15. ͦ΋ͦ΋Ծ૝Ϛγϯ 7.

  ͬͯͳʹ

  View Slide

 16. ීஈΑ͘ݟ͔͚Δ7.
  ‣ ࣮ࡍͷίϯϐϡʔλΛԾ૝Խͨ͠΋ͷ
  ‣ ࠓճͷ࿩ͱ͸͋·Γؔ܎͋Γ·ͤΜ

  View Slide

 17. ࠓճͷ࿩ͷ7.
  ‣ ಛఆͷ໨తͷͨΊʹઃܭ͞ΕͨԾ૝తͳϚγϯ
  ‣ ྫ+7. ;FOE&OHJOF :"37
  VM

  View Slide

 18. 7.ͷجຊߏ଄͸γϯϓϧ
  Ϩδελ͕छྨ
  εϨου
  ໋ྩ͕छྨ

  View Slide

 19. Ϩδελ
  PC SP
  จࣈྻͷݱࡏҐஔ
  4USJOH1PJOUFS

  ໋ྩͷҐஔ
  1SPHSBN$PVOUFS

  ‣ 7.ʹͦͳΘΔม਺ηοτ
  ‣ ࠷ॳ͸ͲͪΒʹ΋͕ೖ͍ͬͯΔ

  View Slide

 20. 1$ʹ͕ೖ͍ͬͯͨΒ
  ‣ ൪໨ͷ໋ྩΛࠓݟ͍ͯΔͱ͍͏͜ͱ
  ‣ ໋ྩΛಡΈ͜Ή͝ͱʹΠϯΫϦϝϯτ͢Δ
  PC=3
  char ‘h’
  char ‘o’
  char ‘g’
  char ‘e’
  split 1, 6
  jmp 11

  View Slide

 21. 41ʹ͕ೖ͍ͬͯͨΒ
  ‣ ࢼߦ͢Δจࣈྻͷ൪໨ͷจࣈΛࠓݟ͍ͯΔͱ͍͏
  ͜ͱ
  SP=2 “hogehoge”

  View Slide

 22. εϨου
  ‣ εϨου͸Ϩδελ 1$ͱ41
  Λ࣋ͭ
  ‣ ࣮ߦ࣌ͷίϯςΩετΈ͍ͨͳ΋ͷ
  ‣ ฒྻॲཧͱ͸ؔ܎ͳ͍
  Thread
  PC SP

  View Slide

 23. 7.ͷ࠷ॳͷঢ়ଶ
  ‣ 7.͸࠷ॳ͸Ұ͚ͭͩεϨουΛ࣋ͬͯ࢝·Δ
  ‣ ໋ྩΛղऍ͢Δ͏ͪʹ૿ݮ͢Δ
  Thread
  VM
  ݱࡏͷεϨου

  View Slide

 24. 7.ͱεϨου
  ‣ 7.͸εϨουΛελοΫ͢Δ
  ‣ 7.͸Ұ൪্ͷεϨουͷϨδελΛૢ࡞͢Δ
  Thread
  Thread
  Thread
  VM
  ݱࡏͷεϨου

  View Slide

 25. 7.ʹඋΘΔͭͷ໋ྩ
  ‣ KNQ໋ྩࢦఆ͢ΔҐஔ΁δϟϯϓ
  ‣ DIBS໋ྩจࣈͷϚονΛࢼߦ͢Δ
  ‣ NBUDI໋ྩ7.ΛࢭΊͯϚον׬ྃ͢Δ
  ‣ TQMJU໋ྩεϨουΛ෼ׂ͢Δ

  View Slide

 26. KNQ໋ྩ
  ‣ KNQY͸YͷҐஔʹ1$Λઃఆ͢Δ
  ‣ ཁ͢ΔʹHPUP
  jmp x

  View Slide


 27. PC=0
  SP=0
  PC=5
  SP=0
  jmp 5
  ‣ 1$Ϩδελ͕ॻ͖׵Θ͍ͬͯΔ

  View Slide

 28. DIBS໋ྩ
  ‣ ݱࡏҐஔ 41
  ͔ΒYͱ͍͏จࣈΛফඅ͢Δ
  ‣ Ϛονͨ͠Β41͕̍ͭ૿͑Δ
  ‣ Ϛον͠ͳ͔ͬͨΒݱࡏͷεϨου͸ফ͑Δ
  char x

  View Slide


 29. ‣ ࢼߦ͍ͯ͠Δจࣈྻ͕zBBzͷͱ͖
  ‣ DIBS໋ྩΛ࣮ߦ͢Δͱ41
  ൪໨ͷจࣈͱൺֱͯ͠

  Ϛον͢ΔͷͰ41ͱ1$͕૿͑Δ
  PC=0
  SP=0
  PC=1
  SP=1
  char ‘a’

  View Slide


 30. ‣ ࢼߦ͍ͯ͠Δจࣈྻ͕zCCzͷͱ͖
  ‣ DIBS໋ྩΛ࣮ߦ͢Δͱ41
  ൪໨ͷจࣈͱൺֱͯ͠

  Ϛον͠ͳ͍ͷͰݱࡏͷεϨου͕ফ͑Δ
  PC=0
  SP=0
  char ‘a’

  View Slide

 31. εϨου͕ফ͑ΔͱͲ͏ͳΔ
  ‣ ελοΫͷҰ൪্ͷεϨου͕ݱࡏͷ

  εϨουʹͳ໋ͬͯྩͷॲཧ͕࢝·Δ
  ‣ ελοΫ͕ۭʹͳͬͨΒ7.͸ఀࢭϚονࣦഊ
  Thread
  Thread
  VM

  View Slide

 32. NBUDI໋ྩ
  ‣ ਖ਼نදݱͷϚον͕׬ྃͨ͠ͱͯ͠

  7.ͷ࣮ߦΛࢭΊΔ
  Ϛον੒ޭ
  match

  View Slide

 33. TQMJU໋ྩ
  ‣ ݱࡏͷεϨουΛ෼ׂͯ͠ɺ

  ͦΕͧΕͷεϨουͷ1$ʹYͱZΛ୅ೖ͢Δ
  ‣ ͪΐͬͱΘ͔ΓͮΒ͍͚ͲҰେࣄͳ໋ྩ
  split x, y

  View Slide


 34. PC=0
  SP=2
  PC=1
  SP=2
  split 1,5
  PC=5
  SP=2
  ‣ ෳ੡͕ऴΘͬͨΒ্ͷεϨου͕ݱࡏͷεϨουʹͳΔ

  View Slide

 35. ‣ Ҏ্͜Ε͚ͩɻ
  ‣ ਖ਼نදݱͷେ൒͸͜ΕͰදݱՄೳ

  View Slide

 36. ਖ਼نදݱ͸Ͳ͏

  ίϯύΠϧ͞ΕΔ͔

  View Slide

 37. ‣ ͲΜͳ෩ʹίϯύΠϧ͞ΕΔ͔঺հ
  ‣ ࠓ͔Βਖ਼نදݱͷਓؒ7.ʹͳͬͯ

  ҰݸҰݸ໋ྩΛղऍ͍͖ͯ͠·͠ΐ͏

  View Slide

 38. B
  0 char ‘a’
  1 match
  ‣ ؆୯

  View Slide

 39. BCD
  0 char ‘a’
  1 char ‘b’
  2 char ‘c’
  3 match
  ‣ ͜Ε΋؆୯

  View Slide

 40. "
  #
  ࿈݁
  ‣ ໋ྩྻΛ୯७ʹܨ͛ΒΕΔ
  ‣ ࠷ޙʹNBUDI໋ྩΛஔ͘

  match

  "
  ͷ໋ྩྻ
  #
  ͷ໋ྩྻ

  View Slide

 41. B Φϓγϣϯ
  0 split 1,2
  1 char ‘a’
  2 match
  ‣ DIBSbB`ͷͱ͜Ζʹ͸͸ଞͷਖ਼نදݱͷ໋ྩྻ͕ೖΕΒΕΔ
  ‣ TQMJU໋ྩ͕؊ɻҰݸҰݸ௥͍ͬͯ͜͏

  View Slide

 42. ͋͞ਓؒ7.ʹͳΖ͏
  Thread PC SP Execution
  T1 0 split 1,2 aaa T2(PC=2)࡞੒
  T1 1 char ‘a’ aaa Ϛον͢ΔͷͰSPΛ૿΍͢
  T1 2 match aaa Ϛον׬ྃ
  ‣ จࣈྻ͕zBBBzͩͬͨ৔߹5ͰϚον׬ྃ

  View Slide

 43. จࣈྻ͕zCCCzͩͬͨΒ
  Thread PC SP Execution
  T1 0 split 1,2 bbb T2(PC=2)࡞੒
  T1 1 char ‘a’ bbb จࣈϚονࣦഊ: T1ফ͑Δ
  T2 2 match bbb Ϛον׬ྃ
  ‣ 5Ͱ͸จࣈϚονࣦഊ͢Δ͕5ͰNBUDI͕࣮ߦ͞ΕΔ
  ‣ ݁ՌϚον੒ޭ

  View Slide

 44. BcCબ୒
  0 split 1,3
  1 char ‘a’
  2 jmp 4
  3 char ‘b’
  4 match
  ‣ DIBSbB` DIBSbC`ͷͱ͜Ζʹ͸͸೚ҙͷਖ਼نදݱͷ໋
  ྩྻ͕ೖΕΒΕΔ

  View Slide

 45. BݸҎ্܁Γฦ͠
  0 char ‘a’
  1 split 0, 2
  2 match
  ‣ ܁Γฦ͠ͷϚονʹ΋TQMJU໋ྩ͕׆༂
  ‣ DIBSbB`ͷͱ͜Ζʹ͸೚ҙͷ໋ྩྻΛೖΕΒΕΔ

  View Slide

 46. BݸҎ্܁Γฦ͠
  0 split 1,3
  1 char ‘a’
  2 jmp 0
  3 match
  ‣ DIBSbB`ͷͱ͜Ζʹ͸ʜ ҎԼུ

  View Slide

 47. B ඇᩦཉͳݸҎ্܁Γฦ͠
  0 split 3,1
  1 char ‘a’
  2 jmp 0
  3 match
  ‣ TQMJUͷҾ਺͕ٯʹͳͬͯΔ

  View Slide

 48. 1)1Ͱ࣮૷ͯ͠ΈͨΒ
  ‣ 7.͚ͩͩͱߦ͙Β͍Ͱ࣮૷Ͱ͖ͨ
  ‣ IUUQCMPHBTJBMDPKQʹίʔυΛܝࡌ
  ‣ ؆୯ͳͷͰ࢓૊Έ͕Θ͔Ε͹୭Ͱ΋ॻ͚Δʂ

  View Slide

 49. ·ͱΊ

  View Slide

 50. ·ͱΊ
  ‣ ਖ਼نදݱΤϯδϯͷ࣮૷๏͸%'"࢖͏΍Γํͱ

  7.࢖͏΍Γํͷೋछྨʹେผ
  ‣ 7.ܕਖ਼نදݱΤϯδϯͷجຊ͸ࢸͬͯγϯϓϧ
  ‣ ໋ྩͭɺεϨουɺϨδελ͚ͭͩ
  ‣ γϯϓϧ͚ͩͲਖ਼نදݱΛ΄ͱΜͲදݱͰ͖Δ
  ‣ ࣮૷͕؆୯ͳͷͰॻ͍ͯΈΑ͏

  View Slide

 51. ࠓճͷ࿩ͷݩωλ
  ‣ 3FHVMBS&YQSFTTJPO.BUDIJOHUIF7JSUVBM.BDIJOF
  "QQSPBDIͱ͍͏จॻ
  ‣ IUUQTXUDIDPNdSTDSFHFYQSFHFYQIUNM
  ‣ 7.ܕਖ਼نදݱΤϯδϯͷ࢓૊Έͱ

  ࣮૷ʹ͍ͭͯղઆ͞Ε͍ͯΔ
  ‣ ฏқͰΘ͔Γ΍͍͢ʂ

  View Slide

 52. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠
  !BOBUPPCMPHBOBUPPKQ

  View Slide