$30 off During Our Annual Pro Sale. View Details »

PHPで学ぶVM型正規表現エンジンの仕組み

 PHPで学ぶVM型正規表現エンジンの仕組み

PHPカンファレンス福岡での発表資料です。

久保田光則

June 27, 2015
Tweet

More Decks by 久保田光則

Other Decks in Technology

Transcript

  1. 1)1ΧϯϑΝϨϯε෱Ԭ
    1)1ͰֶͿԾ૝Ϛγϯܕ
    ਖ਼نදݱΤϯδϯͷ࢓૊Έ

    View Slide

  2. ࣗݾ঺հ
    ‣ ٱอాޫଇ !BOBUPP

    ‣ 6*69σβΠφʔɺ

    ιϑτ΢ΣΞΤϯδχΞ
    ‣ "TQFDUJWF--$୅ද
    ‣ IUUQBTQFDUJWFJP

    View Slide

  3. ࠓ೔࿩͢͜ͱ
    ‣ Ծ૝Ϛγϯܕਖ਼نදݱΤϯδϯͷ࢓૊Έ

    View Slide

  4. ॱʹ࿩͍ͯ͘͜͠ͱ
    ‣ ਖ਼نදݱΤϯδϯͷ࣮૷ํ๏
    ‣ Ծ૝Ϛγϯܕਖ਼نදݱΤϯδϯͱ͸
    ‣ ϚονϯάॲཧͷྲྀΕ
    ‣ 7.ͷ࣋ͭϨδελͱεϨουͱ໋ྩ
    ‣ ਖ਼نදݱ͸ͲͷΑ͏ʹίϯύΠϧ͞ΕΔ͔

    View Slide

  5. ࠓ೔ͷ࿩Λฉ͘ͱͲ͏ͳΔ͔
    ‣ ਖ਼نදݱΤϯδϯͷ࢓૊Έͬͯ͜Μͳʹ୯७ͳͷ͔

    ͬͯͼͬ͘Γ͠·͢ ࢲ͸ͼͬ͘Γ͠·ͨ͠

    ‣ ਖ਼نදݱΤϯδϯ ͷ7.
    ͕ॻ͚ΔΑ͏ʹͳΓ·͢

    View Slide

  6. ਖ਼نදݱΤϯδϯͷ
    ࣮૷ํ๏

    View Slide

  7. ‣ ͦ΋ͦ΋ਖ਼نදݱΤϯδϯͷ

    ࣮૷ํ๏ʹ͸ͳʹ͕͋Δ

    View Slide

  8. ͭʹେผ
    ‣ %'"ϕʔεͷ࣮૷ํ๏
    ‣ Ծ૝Ϛγϯ 7.
    ϕʔεͷ࣮૷ํ๏

    View Slide

  9. Ծ૝Ϛγϯϕʔεͷ

    ਖ਼نදݱΤϯδϯͱ͸

    View Slide

  10. Ծ૝Ϛγϯϕʔεͷ࣮૷
    VM
    ‣ ਖ਼نදݱ༻ͷ໋ྩΛ࣋ͭ7. Ծ૝Ϛγϯ
    Λߏங
    ‣ ਖ਼نදݱΛ7.޲͚ͷ໋ྩʹίϯύΠϧ࣮ͯ͠ߦ
    ‣ 1$3&ͷ࣮૷͸͜Ε

    View Slide

  11. ϚονϯάॲཧͷྲྀΕ
    ਖ਼نදݱΛύʔε
    Ծ૝Ϛγϯ༻ͷ໋ྩྻʹม׵
    Ծ૝ϚγϯͰ࣮ߦ
    /(hoge|fuga)/ match “hoge”?

    View Slide

  12. ਖ਼نදݱΛύʔε
    ‣ ਖ਼نදݱͷจࣈྻΛड͚औͬͯϝλจࣈΛύʔε
    /hoge?|fuga(piyo)*/
    /hoge?|fuga(piyo)*/

    View Slide

  13. 7.༻ͷ໋ྩྻʹม׵
    char ‘h’
    char ‘o’
    char ‘g’
    char ‘e’
    split 1, 6
    jmp 11
    a(piyo)*/

    View Slide

  14. Ծ૝ϚγϯͰ࣮ߦ
    VM
    char ‘h’
    char ‘o’
    char ‘g’
    char ‘e’
    split 1, 6
    jmp 11
    ্͔Β
    ໋ྩղऍ
    ͍ͯ͘͠
    ‣ ݁Ռ༩͑ΒΕͨจࣈྻ͕Ϛον͢Δ͔Λ൑ఆ

    View Slide

  15. ͦ΋ͦ΋Ծ૝Ϛγϯ 7.

    ͬͯͳʹ

    View Slide

  16. ීஈΑ͘ݟ͔͚Δ7.
    ‣ ࣮ࡍͷίϯϐϡʔλΛԾ૝Խͨ͠΋ͷ
    ‣ ࠓճͷ࿩ͱ͸͋·Γؔ܎͋Γ·ͤΜ

    View Slide

  17. ࠓճͷ࿩ͷ7.
    ‣ ಛఆͷ໨తͷͨΊʹઃܭ͞ΕͨԾ૝తͳϚγϯ
    ‣ ྫ+7. ;FOE&OHJOF :"37
    VM

    View Slide

  18. 7.ͷجຊߏ଄͸γϯϓϧ
    Ϩδελ͕छྨ
    εϨου
    ໋ྩ͕छྨ

    View Slide

  19. Ϩδελ
    PC SP
    จࣈྻͷݱࡏҐஔ
    4USJOH1PJOUFS

    ໋ྩͷҐஔ
    1SPHSBN$PVOUFS

    ‣ 7.ʹͦͳΘΔม਺ηοτ
    ‣ ࠷ॳ͸ͲͪΒʹ΋͕ೖ͍ͬͯΔ

    View Slide

  20. 1$ʹ͕ೖ͍ͬͯͨΒ
    ‣ ൪໨ͷ໋ྩΛࠓݟ͍ͯΔͱ͍͏͜ͱ
    ‣ ໋ྩΛಡΈ͜Ή͝ͱʹΠϯΫϦϝϯτ͢Δ
    PC=3
    char ‘h’
    char ‘o’
    char ‘g’
    char ‘e’
    split 1, 6
    jmp 11

    View Slide

  21. 41ʹ͕ೖ͍ͬͯͨΒ
    ‣ ࢼߦ͢Δจࣈྻͷ൪໨ͷจࣈΛࠓݟ͍ͯΔͱ͍͏
    ͜ͱ
    SP=2 “hogehoge”

    View Slide

  22. εϨου
    ‣ εϨου͸Ϩδελ 1$ͱ41
    Λ࣋ͭ
    ‣ ࣮ߦ࣌ͷίϯςΩετΈ͍ͨͳ΋ͷ
    ‣ ฒྻॲཧͱ͸ؔ܎ͳ͍
    Thread
    PC SP

    View Slide

  23. 7.ͷ࠷ॳͷঢ়ଶ
    ‣ 7.͸࠷ॳ͸Ұ͚ͭͩεϨουΛ࣋ͬͯ࢝·Δ
    ‣ ໋ྩΛղऍ͢Δ͏ͪʹ૿ݮ͢Δ
    Thread
    VM
    ݱࡏͷεϨου

    View Slide

  24. 7.ͱεϨου
    ‣ 7.͸εϨουΛελοΫ͢Δ
    ‣ 7.͸Ұ൪্ͷεϨουͷϨδελΛૢ࡞͢Δ
    Thread
    Thread
    Thread
    VM
    ݱࡏͷεϨου

    View Slide

  25. 7.ʹඋΘΔͭͷ໋ྩ
    ‣ KNQ໋ྩࢦఆ͢ΔҐஔ΁δϟϯϓ
    ‣ DIBS໋ྩจࣈͷϚονΛࢼߦ͢Δ
    ‣ NBUDI໋ྩ7.ΛࢭΊͯϚον׬ྃ͢Δ
    ‣ TQMJU໋ྩεϨουΛ෼ׂ͢Δ

    View Slide

  26. KNQ໋ྩ
    ‣ KNQY͸YͷҐஔʹ1$Λઃఆ͢Δ
    ‣ ཁ͢ΔʹHPUP
    jmp x

    View Slide


  27. PC=0
    SP=0
    PC=5
    SP=0
    jmp 5
    ‣ 1$Ϩδελ͕ॻ͖׵Θ͍ͬͯΔ

    View Slide

  28. DIBS໋ྩ
    ‣ ݱࡏҐஔ 41
    ͔ΒYͱ͍͏จࣈΛফඅ͢Δ
    ‣ Ϛονͨ͠Β41͕̍ͭ૿͑Δ
    ‣ Ϛον͠ͳ͔ͬͨΒݱࡏͷεϨου͸ফ͑Δ
    char x

    View Slide


  29. ‣ ࢼߦ͍ͯ͠Δจࣈྻ͕zBBzͷͱ͖
    ‣ DIBS໋ྩΛ࣮ߦ͢Δͱ41
    ൪໨ͷจࣈͱൺֱͯ͠

    Ϛον͢ΔͷͰ41ͱ1$͕૿͑Δ
    PC=0
    SP=0
    PC=1
    SP=1
    char ‘a’

    View Slide


  30. ‣ ࢼߦ͍ͯ͠Δจࣈྻ͕zCCzͷͱ͖
    ‣ DIBS໋ྩΛ࣮ߦ͢Δͱ41
    ൪໨ͷจࣈͱൺֱͯ͠

    Ϛον͠ͳ͍ͷͰݱࡏͷεϨου͕ফ͑Δ
    PC=0
    SP=0
    char ‘a’

    View Slide

  31. εϨου͕ফ͑ΔͱͲ͏ͳΔ
    ‣ ελοΫͷҰ൪্ͷεϨου͕ݱࡏͷ

    εϨουʹͳ໋ͬͯྩͷॲཧ͕࢝·Δ
    ‣ ελοΫ͕ۭʹͳͬͨΒ7.͸ఀࢭϚονࣦഊ
    Thread
    Thread
    VM

    View Slide

  32. NBUDI໋ྩ
    ‣ ਖ਼نදݱͷϚον͕׬ྃͨ͠ͱͯ͠

    7.ͷ࣮ߦΛࢭΊΔ
    Ϛον੒ޭ
    match

    View Slide

  33. TQMJU໋ྩ
    ‣ ݱࡏͷεϨουΛ෼ׂͯ͠ɺ

    ͦΕͧΕͷεϨουͷ1$ʹYͱZΛ୅ೖ͢Δ
    ‣ ͪΐͬͱΘ͔ΓͮΒ͍͚ͲҰେࣄͳ໋ྩ
    split x, y

    View Slide


  34. PC=0
    SP=2
    PC=1
    SP=2
    split 1,5
    PC=5
    SP=2
    ‣ ෳ੡͕ऴΘͬͨΒ্ͷεϨου͕ݱࡏͷεϨουʹͳΔ

    View Slide

  35. ‣ Ҏ্͜Ε͚ͩɻ
    ‣ ਖ਼نදݱͷେ൒͸͜ΕͰදݱՄೳ

    View Slide

  36. ਖ਼نදݱ͸Ͳ͏

    ίϯύΠϧ͞ΕΔ͔

    View Slide

  37. ‣ ͲΜͳ෩ʹίϯύΠϧ͞ΕΔ͔঺հ
    ‣ ࠓ͔Βਖ਼نදݱͷਓؒ7.ʹͳͬͯ

    ҰݸҰݸ໋ྩΛղऍ͍͖ͯ͠·͠ΐ͏

    View Slide

  38. B
    0 char ‘a’
    1 match
    ‣ ؆୯

    View Slide

  39. BCD
    0 char ‘a’
    1 char ‘b’
    2 char ‘c’
    3 match
    ‣ ͜Ε΋؆୯

    View Slide

  40. "
    #
    ࿈݁
    ‣ ໋ྩྻΛ୯७ʹܨ͛ΒΕΔ
    ‣ ࠷ޙʹNBUDI໋ྩΛஔ͘

    match

    "
    ͷ໋ྩྻ
    #
    ͷ໋ྩྻ

    View Slide

  41. B Φϓγϣϯ
    0 split 1,2
    1 char ‘a’
    2 match
    ‣ DIBSbB`ͷͱ͜Ζʹ͸͸ଞͷਖ਼نදݱͷ໋ྩྻ͕ೖΕΒΕΔ
    ‣ TQMJU໋ྩ͕؊ɻҰݸҰݸ௥͍ͬͯ͜͏

    View Slide

  42. ͋͞ਓؒ7.ʹͳΖ͏
    Thread PC SP Execution
    T1 0 split 1,2 aaa T2(PC=2)࡞੒
    T1 1 char ‘a’ aaa Ϛον͢ΔͷͰSPΛ૿΍͢
    T1 2 match aaa Ϛον׬ྃ
    ‣ จࣈྻ͕zBBBzͩͬͨ৔߹5ͰϚον׬ྃ

    View Slide

  43. จࣈྻ͕zCCCzͩͬͨΒ
    Thread PC SP Execution
    T1 0 split 1,2 bbb T2(PC=2)࡞੒
    T1 1 char ‘a’ bbb จࣈϚονࣦഊ: T1ফ͑Δ
    T2 2 match bbb Ϛον׬ྃ
    ‣ 5Ͱ͸จࣈϚονࣦഊ͢Δ͕5ͰNBUDI͕࣮ߦ͞ΕΔ
    ‣ ݁ՌϚον੒ޭ

    View Slide

  44. BcCબ୒
    0 split 1,3
    1 char ‘a’
    2 jmp 4
    3 char ‘b’
    4 match
    ‣ DIBSbB` DIBSbC`ͷͱ͜Ζʹ͸͸೚ҙͷਖ਼نදݱͷ໋
    ྩྻ͕ೖΕΒΕΔ

    View Slide

  45. BݸҎ্܁Γฦ͠
    0 char ‘a’
    1 split 0, 2
    2 match
    ‣ ܁Γฦ͠ͷϚονʹ΋TQMJU໋ྩ͕׆༂
    ‣ DIBSbB`ͷͱ͜Ζʹ͸೚ҙͷ໋ྩྻΛೖΕΒΕΔ

    View Slide

  46. BݸҎ্܁Γฦ͠
    0 split 1,3
    1 char ‘a’
    2 jmp 0
    3 match
    ‣ DIBSbB`ͷͱ͜Ζʹ͸ʜ ҎԼུ

    View Slide

  47. B ඇᩦཉͳݸҎ্܁Γฦ͠
    0 split 3,1
    1 char ‘a’
    2 jmp 0
    3 match
    ‣ TQMJUͷҾ਺͕ٯʹͳͬͯΔ

    View Slide

  48. 1)1Ͱ࣮૷ͯ͠ΈͨΒ
    ‣ 7.͚ͩͩͱߦ͙Β͍Ͱ࣮૷Ͱ͖ͨ
    ‣ IUUQCMPHBTJBMDPKQʹίʔυΛܝࡌ
    ‣ ؆୯ͳͷͰ࢓૊Έ͕Θ͔Ε͹୭Ͱ΋ॻ͚Δʂ

    View Slide

  49. ·ͱΊ

    View Slide

  50. ·ͱΊ
    ‣ ਖ਼نදݱΤϯδϯͷ࣮૷๏͸%'"࢖͏΍Γํͱ

    7.࢖͏΍Γํͷೋछྨʹେผ
    ‣ 7.ܕਖ਼نදݱΤϯδϯͷجຊ͸ࢸͬͯγϯϓϧ
    ‣ ໋ྩͭɺεϨουɺϨδελ͚ͭͩ
    ‣ γϯϓϧ͚ͩͲਖ਼نදݱΛ΄ͱΜͲදݱͰ͖Δ
    ‣ ࣮૷͕؆୯ͳͷͰॻ͍ͯΈΑ͏

    View Slide

  51. ࠓճͷ࿩ͷݩωλ
    ‣ 3FHVMBS&YQSFTTJPO.BUDIJOHUIF7JSUVBM.BDIJOF
    "QQSPBDIͱ͍͏จॻ
    ‣ IUUQTXUDIDPNdSTDSFHFYQSFHFYQIUNM
    ‣ 7.ܕਖ਼نදݱΤϯδϯͷ࢓૊Έͱ

    ࣮૷ʹ͍ͭͯղઆ͞Ε͍ͯΔ
    ‣ ฏқͰΘ͔Γ΍͍͢ʂ

    View Slide

  52. ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠
    !BOBUPPCMPHBOBUPPKQ

    View Slide