Upgrade to Pro — share decks privately, control downloads, hide ads and more …

まだ正規表現で消耗してるの?

 まだ正規表現で消耗してるの?

Kenichiro Kishida

April 16, 2016
Tweet

More Decks by Kenichiro Kishida

Other Decks in Technology

Transcript

  1. ·
    ͩ

    ن

    ݱ
    Ͱ


    ͠
    ͯ
    Δ
    ͷ
    ʁ
    Մ


    Λ

    ͛
    Δ
    ͱ
    ͏
    ·
    ͘
    ͍
    ͘
    ͠
    ͣ
    ͻ
    ͜
    1)1ΧϯϑΝϨϯεࡳຈ
    ෳࡶͳਖ਼نදݱΛߟ͑Δͷ͸࣌ؒͷແବͩͬͨ

    ژ

    View full-size slide

  2. ,FOJDIJSP,JTIJEB
    5PLZP +"1"/
    TJ[VIJLP!HNBJMDPN !TJ[VIJLP
    IUUQTHJUIVCDPNTJ[VIJLP
    IUUQCMPHPQFOUPLZPKQ
    R:
    HmM^JRTIeUY

    View full-size slide

  3. I —
    ਖ਼نදݱ

    View full-size slide

  4. /^(?P[a-zA-Z](?:[a-zA-Z0-9\+\-\.])*)\:(?P(?:\/\/(?
    P(?:(?P(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|
    [\!\$&'\(\)\*\+,;\=]|\:))*)@)?(?P(?:(?:0-9|1-90-9|10-90-9|20-40-9|
    250-5)\.(?:0-9|1-90-9|10-90-9|20-40-9|250-5)\.(?:0-9|1-90-9|10-90-9|
    20-40-9|250-5)\.(?:0-9|1-90-9|10-90-9|20-40-9|250-5)|(?:(?:[a-zA-
    Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]))+))(?:\:(?P(?:\d)
    +))?)(?P(?:\/(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!
    \$&'\(\)\*\+,;\=]|\:|@))*)*)|(?P\/(?:(?:(?:[a-zA-Z0-9\-
    \._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@))+(?:\/(?:(?:[a-zA-Z0-9\-
    \._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@))*)+)?)|(?
    P(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\
    +,;\=]|\:|@))+(?:\/(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\
    +,;\=]|\:|@))*)*)|(?P^(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z])
    {2}|[\!\$&'\(\)\*\+,;\=]|\:|@))))(?:\?(?P(?:(?:(?:[a-zA-Z0-9\-\._~]|
    %(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@)|\/|\?))*))?(?:#(?P(?:
    (?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@)|\/|
    \?))*))?$/

    View full-size slide

  5. /^(?P[a-zA-Z](?:[a-zA-Z0-9\+\-\.])*)\:(?P(?:\/\/(?
    P(?:(?P(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|
    [\!\$&'\(\)\*\+,;\=]|\:))*)@)?(?P(?:(?:0-9|1-90-9|10-90-9|20-40-9|
    250-5)\.(?:0-9|1-90-9|10-90-9|20-40-9|250-5)\.(?:0-9|1-90-9|10-90-9|
    20-40-9|250-5)\.(?:0-9|1-90-9|10-90-9|20-40-9|250-5)|(?:(?:[a-zA-
    Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]))+))(?:\:(?P(?:\d)
    +))?)(?P(?:\/(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!
    \$&'\(\)\*\+,;\=]|\:|@))*)*)|(?P\/(?:(?:(?:[a-zA-Z0-9\-
    \._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@))+(?:\/(?:(?:[a-zA-Z0-9\-
    \._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@))*)+)?)|(?
    P(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\
    +,;\=]|\:|@))+(?:\/(?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\
    +,;\=]|\:|@))*)*)|(?P^(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z])
    {2}|[\!\$&'\(\)\*\+,;\=]|\:|@))))(?:\?(?P(?:(?:(?:[a-zA-Z0-9\-\._~]|
    %(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@)|\/|\?))*))?(?:#(?P(?:
    (?:(?:[a-zA-Z0-9\-\._~]|%(?:[0-9A-Z]){2}|[\!\$&'\(\)\*\+,;\=]|\:|@)|\/|
    \?))*))?$/

    View full-size slide

  6. RFC3986
    Uniform Resource Identifier (URI): Generic Syntax

    View full-size slide

  7. URI = scheme ":" hier-part
    [ "?" query ] [ "#" fragment ]
    scheme = ALPHA * ( ALPHA / DIGIT /
    "+" / "-" / "." )
    hier-part = "//" authority path-abempty
    / path-absolute
    / path-rootless
    / path-empty
    https://www.ietf.org/rfc/rfc3986.txt

    View full-size slide

  8. wݕࡧͨ݁͠Ռɺग़͖ͯͨਖ਼نදݱΛί
    ϐϖͯ͠·ͤΜ͔ʁͦΕਖ਼͍͠Ͱ͔͢ʁ
    wଞͷਓ͕ॻ͍ͨਖ਼نදݱΛؚΉίʔυ
    ΛϨϏϡʔͰ͖·͔͢ʁ
    wޙͰͦͷਖ਼نදݱϝϯςͰ͖·͔͢ʁ

    View full-size slide

  9. ΋ͬͱ؆୯ʹ
    ॻ͖͍ͨPSಡΈ͍ͨ
    ͱࢥͬͨ͜ͱ͸ͳ͍Ͱ͔͢ʁ

    View full-size slide

  10. Regular
    Expressions
    made easy

    View full-size slide

  11. $regex = new VerbalExpressions;
    $regex->startOfLine()
    ->then("http")
    ->maybe("s")
    ->then("://")
    ->maybe("www.")
    ->anythingBut(" ")
    ->endOfLine();
    /^(?:http)(?:s)?(?:\:\/\/)(?:www\.)?(?:[^ ]*)$/m

    View full-size slide

  12. $rfc3986 = new VerbalExpressions;
    // scheme
    $scheme = new VerbalExpressions;
    $scheme->add("http")->maybe("s")
    ->_or("ftp");
    $rfc3986->startOfLine()
    ->add($scheme)
    ->add("://");
    /^(?:\(\?\:http\)\(\?\:s\)\?\)\|\(\?\:ftp)(?:\:\/\/)/m

    View full-size slide

  13. w⾭ଟ͘ͷݴޠʹରԠ͍ͯ͠Δ
    w⾭؆୯ʹॻ͚Δ
    w⾪ෳࡶͳέʔε͸ॻ͚ͳ͍
    w⾪ݴޠʹΑ࣮ͬͯ૷͕ϚνϚν

    View full-size slide

  14. http://www.kurtisrainboltgreene.name/hexpress/

    View full-size slide

  15. The hexpress gem is
    another take at the
    concept of
    "Verbal Hexpressions"
    in Ruby.

    View full-size slide

  16. pattern = Hexpress.new.
    start("http").
    maybe("s").
    with("://").
    maybe { words.with(".") }.
    find { matching { [word, "-"] }.multiple }.
    has(".").
    either("com", "org").
    maybe("/").
    ending

    View full-size slide

  17. To PHP
    https://github.com/sizuhiko/hexpress

    View full-size slide

  18. trait Find {
    public function find($value = null, $named =
    false) {
    $param = compact('value', 'named');
    return is_callable($value) ?
    $this->addNested(FindValue::class, $param) :
    $this->addValue(FindValue::class, $param);
    }
    public function capture($value = null) {
    return $this->find($value);
    }
    }
    class FindValue {
    use Nested;
    private $hexpression;
    private $open;
    private $close;
    public function __construct($param) {
    extract($param);
    $this->hexpression = is_callable($value) ?
    new Hexpress($value) : $value;
    $this->open = $named ?
    "(?P<{$named}>" : '(';
    $this->close = ')';
    }
    }
    class Hexpress
    def find(value = nil, &block)
    value ?
    add_value(Nested::Find, value) :
    add_nested(Nested::Find, &block)
    end
    alias_method :capture, :find
    module Nested
    class Find
    include Nested
    def initialize(value=nil,&block)
    @hexpression = value ||
    Hexpress.new.instance_eval(&block)
    @open, @close = "(", ")"
    end
    end
    end
    end
    ࠷ۙͷ1)1ͳΒ͔ͳΓ஧࣮ʹҠ২Մೳ

    View full-size slide

  19. URI = scheme ":" hier-part
    [ "?" query ] [ "#" fragment ]
    scheme = ALPHA * ( ALPHA / DIGIT /
    "+" / "-" / "." )
    hier-part = "//" authority path-abempty
    / path-absolute
    / path-rootless
    / path-empty
    https://www.ietf.org/rfc/rfc3986.txt

    View full-size slide

  20. $this->hexpress
    ->start($this->scheme())
    ->with(':')
    ->has($this->hierPart())
    ->maybe($this->query())
    ->maybe($this->fragment())
    ->end();
    URI = scheme ":" hier-part
    [ "?" query ] [ "#" fragment ]

    View full-size slide

  21. private function scheme()
    {
    return (new Hexpress())
    ->find(function ($hex) {
    $hex->matching(function ($hex) { $hex->letter(); });
    $hex->many(function ($hex) {
    $hex->matching(function ($hex) {
    $hex->letter()->number()->with('+-.');
    });
    }, 0);
    }, 'scheme');
    }
    scheme = ALPHA *
    ( ALPHA / DIGIT / "+" / "-" / "." )
    ̍
    ̍




    View full-size slide

  22. 1)1൛ͷಠࣗػೳ
    ໊લ෇͖αϒύλʔϯ

    View full-size slide

  23. private function scheme()
    {
    return (new Hexpress())
    ->find(function ($hex) {
    $hex->matching(function ($hex) { $hex->letter(); });
    $hex->many(function ($hex) {
    $hex->matching(function ($hex) {
    $hex->letter()->number()->with('+-.');
    });
    }, 0);
    }, 'scheme');
    }
    scheme = ALPHA *
    ( ALPHA / DIGIT / "+" / "-" / "." )
    pOEϝιουͷୈೋҾ਺ʹ
    αϒύλʔϯͷ໊લΛࢦఆͰ͖Δ

    View full-size slide

  24. preg_match(
    'http://example.com:80/',
    $pattern->toRegExp(),
    $matches);
    echo $matches['scheme'];
    #=> 'http'

    View full-size slide

  25. w΋͏ਖ਼نදݱͳΜ͔Ͱফ໣͠ͳ͍ʂ
    w3VCZͷΤίγεςϜ͔ΒͷҠ২΋༰қ
    ʹͳ͍ͬͯΔʂʂ
    wఘΊͳ͍Ͱʂʂʂ

    View full-size slide

  26. WJTJUNZCMPHHJUIVC
    https://github.com/sizuhiko
    http://blog.open.tokyo.jp
    R:
    HmM^JRTIeUY
    @sizuhiko #phpstudy 2016/3/30

    View full-size slide