Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
PHPで学ぶVM型正規表現エンジンの仕組み
久保田光則
June 27, 2015
Technology
8
6k
PHPで学ぶVM型正規表現エンジンの仕組み
PHPカンファレンス福岡での発表資料です。
久保田光則
June 27, 2015
Tweet
Share
More Decks by 久保田光則
See All by 久保田光則
anatoo
1
290
anatoo
0
2.6k
anatoo
11
19k
anatoo
8
8.4k
anatoo
6
8.6k
anatoo
4
1.3k
anatoo
16
13k
anatoo
27
17k
anatoo
20
19k
Other Decks in Technology
See All in Technology
sadayoshitada0919
0
290
harshbothra
0
120
iwashi86
54
24k
mmarukaw
0
1.9k
picardparis
4
2.3k
syoshie
0
440
sansandsoc
2
690
whitefox_73
0
190
andysumi
0
160
1027kg
0
200
binarymist
0
1.3k
line_developers
PRO
0
170
Featured
See All Featured
addyosmani
1346
190k
tenderlove
53
3.5k
eileencodes
113
25k
hatefulcrawdad
257
17k
sferik
610
55k
keithpitt
401
20k
marcelosomers
221
15k
danielanewman
200
20k
schacon
145
6.6k
jmmastey
10
610
samlambert
237
10k
holman
288
130k
Transcript
1)1ΧϯϑΝϨϯεԬ 1)1ͰֶͿԾϚγϯܕ ਖ਼نදݱΤϯδϯͷΈ
ࣗݾհ ‣ ٱอాޫଇ !BOBUPP ‣ 6*69σβΠφʔɺ ιϑτΣΞΤϯδχΞ ‣ "TQFDUJWF--$ද
‣ IUUQBTQFDUJWFJP
ࠓ͢͜ͱ ‣ ԾϚγϯܕਖ਼نදݱΤϯδϯͷΈ
ॱʹ͍ͯ͘͜͠ͱ ‣ ਖ਼نදݱΤϯδϯͷ࣮ํ๏ ‣ ԾϚγϯܕਖ਼نදݱΤϯδϯͱ ‣ ϚονϯάॲཧͷྲྀΕ ‣ 7.ͷ࣋ͭϨδελͱεϨουͱ໋ྩ ‣
ਖ਼نදݱͲͷΑ͏ʹίϯύΠϧ͞ΕΔ͔
ࠓͷΛฉ͘ͱͲ͏ͳΔ͔ ‣ ਖ਼نදݱΤϯδϯͷΈͬͯ͜Μͳʹ୯७ͳͷ͔ ͬͯͼͬ͘Γ͠·͢ ࢲͼͬ͘Γ͠·ͨ͠ ‣ ਖ਼نදݱΤϯδϯ ͷ7. ͕ॻ͚ΔΑ͏ʹͳΓ·͢
ਖ਼نදݱΤϯδϯͷ ࣮ํ๏
‣ ͦͦਖ਼نදݱΤϯδϯͷ ࣮ํ๏ʹͳʹ͕͋Δ
ͭʹେผ ‣ %'"ϕʔεͷ࣮ํ๏ ‣ ԾϚγϯ 7. ϕʔεͷ࣮ํ๏
ԾϚγϯϕʔεͷ ਖ਼نදݱΤϯδϯͱ
ԾϚγϯϕʔεͷ࣮ VM ‣ ਖ਼نදݱ༻ͷ໋ྩΛ࣋ͭ7. ԾϚγϯ Λߏங ‣ ਖ਼نදݱΛ7.͚ͷ໋ྩʹίϯύΠϧ࣮ͯ͠ߦ ‣ 1$3&ͷ࣮͜Ε
ϚονϯάॲཧͷྲྀΕ ਖ਼نදݱΛύʔε ԾϚγϯ༻ͷ໋ྩྻʹม ԾϚγϯͰ࣮ߦ /(hoge|fuga)/ match “hoge”?
ਖ਼نදݱΛύʔε ‣ ਖ਼نදݱͷจࣈྻΛड͚औͬͯϝλจࣈΛύʔε /hoge?|fuga(piyo)*/ /hoge?|fuga(piyo)*/
7.༻ͷ໋ྩྻʹม char ‘h’ char ‘o’ char ‘g’ char ‘e’ split
1, 6 jmp 11 a(piyo)*/
ԾϚγϯͰ࣮ߦ VM char ‘h’ char ‘o’ char ‘g’ char ‘e’
split 1, 6 jmp 11 ্͔Β ໋ྩղऍ ͍ͯ͘͠ ‣ ݁Ռ༩͑ΒΕͨจࣈྻ͕Ϛον͢Δ͔Λఆ
ͦͦԾϚγϯ 7. ͬͯͳʹ
ීஈΑ͘ݟ͔͚Δ7. ‣ ࣮ࡍͷίϯϐϡʔλΛԾԽͨ͠ͷ ‣ ࠓճͷͱ͋·Γؔ͋Γ·ͤΜ
ࠓճͷͷ7. ‣ ಛఆͷతͷͨΊʹઃܭ͞ΕͨԾతͳϚγϯ ‣ ྫ+7. ;FOE&OHJOF :"37 VM
7.ͷجຊߏγϯϓϧ Ϩδελ͕छྨ εϨου ໋ྩ͕छྨ
Ϩδελ PC SP จࣈྻͷݱࡏҐஔ 4USJOH1PJOUFS ໋ྩͷҐஔ 1SPHSBN$PVOUFS ‣ 7.ʹͦͳΘΔมηοτ ‣
࠷ॳͲͪΒʹ͕ೖ͍ͬͯΔ
1$ʹ͕ೖ͍ͬͯͨΒ ‣ ൪ͷ໋ྩΛࠓݟ͍ͯΔͱ͍͏͜ͱ ‣ ໋ྩΛಡΈ͜Ή͝ͱʹΠϯΫϦϝϯτ͢Δ PC=3 char ‘h’ char ‘o’
char ‘g’ char ‘e’ split 1, 6 jmp 11
41ʹ͕ೖ͍ͬͯͨΒ ‣ ࢼߦ͢Δจࣈྻͷ൪ͷจࣈΛࠓݟ͍ͯΔͱ͍͏ ͜ͱ SP=2 “hogehoge”
εϨου ‣ εϨουϨδελ 1$ͱ41 Λ࣋ͭ ‣ ࣮ߦ࣌ͷίϯςΩετΈ͍ͨͳͷ ‣ ฒྻॲཧͱؔͳ͍ Thread
PC SP
7.ͷ࠷ॳͷঢ়ଶ ‣ 7.࠷ॳҰ͚ͭͩεϨουΛ࣋ͬͯ࢝·Δ ‣ ໋ྩΛղऍ͢Δ͏ͪʹ૿ݮ͢Δ Thread VM ݱࡏͷεϨου
7.ͱεϨου ‣ 7.εϨουΛελοΫ͢Δ ‣ 7.Ұ൪্ͷεϨουͷϨδελΛૢ࡞͢Δ Thread Thread Thread VM ݱࡏͷεϨου
7.ʹඋΘΔͭͷ໋ྩ ‣ KNQ໋ྩࢦఆ͢ΔҐஔδϟϯϓ ‣ DIBS໋ྩจࣈͷϚονΛࢼߦ͢Δ ‣ NBUDI໋ྩ7.ΛࢭΊͯϚονྃ͢Δ ‣ TQMJU໋ྩεϨουΛׂ͢Δ
KNQ໋ྩ ‣ KNQYYͷҐஔʹ1$Λઃఆ͢Δ ‣ ཁ͢ΔʹHPUP jmp x
ਤ PC=0 SP=0 PC=5 SP=0 jmp 5 ‣ 1$Ϩδελ͕ॻ͖Θ͍ͬͯΔ
DIBS໋ྩ ‣ ݱࡏҐஔ 41 ͔ΒYͱ͍͏จࣈΛফඅ͢Δ ‣ Ϛονͨ͠Β41͕̍ͭ૿͑Δ ‣ Ϛον͠ͳ͔ͬͨΒݱࡏͷεϨουফ͑Δ char
x
ਤ ‣ ࢼߦ͍ͯ͠Δจࣈྻ͕zBBzͷͱ͖ ‣ DIBS໋ྩΛ࣮ߦ͢Δͱ41 ൪ͷจࣈͱൺֱͯ͠ Ϛον͢ΔͷͰ41ͱ1$͕૿͑Δ PC=0 SP=0
PC=1 SP=1 char ‘a’
ਤ ‣ ࢼߦ͍ͯ͠Δจࣈྻ͕zCCzͷͱ͖ ‣ DIBS໋ྩΛ࣮ߦ͢Δͱ41 ൪ͷจࣈͱൺֱͯ͠ Ϛον͠ͳ͍ͷͰݱࡏͷεϨου͕ফ͑Δ PC=0 SP=0
char ‘a’
εϨου͕ফ͑ΔͱͲ͏ͳΔ ‣ ελοΫͷҰ൪্ͷεϨου͕ݱࡏͷ εϨουʹͳ໋ͬͯྩͷॲཧ͕࢝·Δ ‣ ελοΫ͕ۭʹͳͬͨΒ7.ఀࢭϚονࣦഊ Thread Thread VM
NBUDI໋ྩ ‣ ਖ਼نදݱͷϚον͕ྃͨ͠ͱͯ͠ 7.ͷ࣮ߦΛࢭΊΔ Ϛονޭ match
TQMJU໋ྩ ‣ ݱࡏͷεϨουΛׂͯ͠ɺ ͦΕͧΕͷεϨουͷ1$ʹYͱZΛೖ͢Δ ‣ ͪΐͬͱΘ͔ΓͮΒ͍͚ͲҰେࣄͳ໋ྩ split x, y
ਤ PC=0 SP=2 PC=1 SP=2 split 1,5 PC=5 SP=2 ‣
ෳ͕ऴΘͬͨΒ্ͷεϨου͕ݱࡏͷεϨουʹͳΔ
‣ Ҏ্͜Ε͚ͩɻ ‣ ਖ਼نදݱͷେ͜ΕͰදݱՄೳ
ਖ਼نදݱͲ͏ ίϯύΠϧ͞ΕΔ͔
‣ ͲΜͳ෩ʹίϯύΠϧ͞ΕΔ͔հ ‣ ࠓ͔Βਖ਼نදݱͷਓؒ7.ʹͳͬͯ ҰݸҰݸ໋ྩΛղऍ͍͖ͯ͠·͠ΐ͏
B 0 char ‘a’ 1 match ‣ ؆୯
BCD 0 char ‘a’ 1 char ‘b’ 2 char ‘c’
3 match ‣ ͜Ε؆୯
" # ࿈݁ ‣ ໋ྩྻΛ୯७ʹܨ͛ΒΕΔ ‣ ࠷ޙʹNBUDI໋ྩΛஔ͘ match "
ͷ໋ྩྻ # ͷ໋ྩྻ
B Φϓγϣϯ 0 split 1,2 1 char ‘a’ 2 match
‣ DIBSbB`ͷͱ͜Ζʹଞͷਖ਼نදݱͷ໋ྩྻ͕ೖΕΒΕΔ ‣ TQMJU໋ྩ͕؊ɻҰݸҰݸ͍ͬͯ͜͏
͋͞ਓؒ7.ʹͳΖ͏ Thread PC SP Execution T1 0 split 1,2 aaa
T2(PC=2)࡞ T1 1 char ‘a’ aaa Ϛον͢ΔͷͰSPΛ૿͢ T1 2 match aaa Ϛονྃ ‣ จࣈྻ͕zBBBzͩͬͨ߹5ͰϚονྃ
จࣈྻ͕zCCCzͩͬͨΒ Thread PC SP Execution T1 0 split 1,2 bbb
T2(PC=2)࡞ T1 1 char ‘a’ bbb จࣈϚονࣦഊ: T1ফ͑Δ T2 2 match bbb Ϛονྃ ‣ 5ͰจࣈϚονࣦഊ͢Δ͕5ͰNBUDI͕࣮ߦ͞ΕΔ ‣ ݁ՌϚονޭ
BcCબ 0 split 1,3 1 char ‘a’ 2 jmp 4
3 char ‘b’ 4 match ‣ DIBSbB` DIBSbC`ͷͱ͜Ζʹҙͷਖ਼نදݱͷ໋ ྩྻ͕ೖΕΒΕΔ
B ݸҎ্܁Γฦ͠ 0 char ‘a’ 1 split 0, 2 2
match ‣ ܁Γฦ͠ͷϚονʹTQMJU໋ྩ͕׆༂ ‣ DIBSbB`ͷͱ͜Ζʹҙͷ໋ྩྻΛೖΕΒΕΔ
B ݸҎ্܁Γฦ͠ 0 split 1,3 1 char ‘a’ 2 jmp
0 3 match ‣ DIBSbB`ͷͱ͜Ζʹʜ ҎԼུ
B ඇᩦཉͳݸҎ্܁Γฦ͠ 0 split 3,1 1 char ‘a’ 2 jmp
0 3 match ‣ TQMJUͷҾ͕ٯʹͳͬͯΔ
1)1Ͱ࣮ͯ͠ΈͨΒ ‣ 7.͚ͩͩͱߦ͙Β͍Ͱ࣮Ͱ͖ͨ ‣ IUUQCMPHBTJBMDPKQʹίʔυΛܝࡌ ‣ ؆୯ͳͷͰΈ͕Θ͔Ε୭Ͱॻ͚Δʂ
·ͱΊ
·ͱΊ ‣ ਖ਼نදݱΤϯδϯͷ࣮๏%'"͏Γํͱ 7.͏Γํͷೋछྨʹେผ ‣ 7.ܕਖ਼نදݱΤϯδϯͷجຊࢸͬͯγϯϓϧ ‣ ໋ྩͭɺεϨουɺϨδελ͚ͭͩ ‣ γϯϓϧ͚ͩͲਖ਼نදݱΛ΄ͱΜͲදݱͰ͖Δ
‣ ࣮͕؆୯ͳͷͰॻ͍ͯΈΑ͏
ࠓճͷͷݩωλ ‣ 3FHVMBS&YQSFTTJPO.BUDIJOHUIF7JSUVBM.BDIJOF "QQSPBDIͱ͍͏จॻ ‣ IUUQTXUDIDPNdSTDSFHFYQSFHFYQIUNM ‣ 7.ܕਖ਼نදݱΤϯδϯͷΈͱ ࣮ʹ͍ͭͯղઆ͞Ε͍ͯΔ ‣
ฏқͰΘ͔Γ͍͢ʂ
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ !BOBUPPCMPHBOBUPPKQ