Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Regular Expression in Android and Java

Regular Expression in Android and Java

They have the same API for regular expression, their classes also the same, but behaviour is not the same. This slide explain what is different and why they are different.

Keishin Yokomaku

October 13, 2015
Tweet

More Decks by Keishin Yokomaku

Other Decks in Technology

Transcript

  1. Regexp in Android and Java
    Keishin Yokomaku @ Drivemode, Inc.
    potatotips #22

    View full-size slide

  2. @KeithYokoma
    • Keishin Yokomaku at Drivemode, Inc.
    • Work
    • Android apps
    • Android Training and its publication
    • Like
    • Bicycle, Photography, Tumblr and Motorsport

    View full-size slide

  3. Pattern.compile()

    View full-size slide

  4. Android is not Java
    • Android and Java have different implementation respectively on regexp API.
    • Same regexp, different result.
    • Write once, not run anywhere (ˑ ‿ ˑ)

    View full-size slide

  5. What’s the matter?

    View full-size slide

  6. Test env vs. runtime
    • If your tests are running on JVM(e.g. with Robolectric)…
    • Some patterns pass the test, but won’t work at runtime.
    • Some patterns work at runtime, but won’t pass the test.

    View full-size slide

  7. ʉ\ʊ(π)ʊ/ʉ

    View full-size slide

  8. Differences in detail

    View full-size slide

  9. Supported flags
    • Java
    • All flags defined at Pattern are supported.
    • Android
    • Only CASE_INSENSITIVE, COMMENTS, DOTALL, LITERAL,
    MULTILINE, UNICODE_CASE, UNIX_LINES are supported.
    • If any other flags are set, RuntimeException will be thrown.

    View full-size slide

  10. Android Pattern
    public class Pattern {
    private Pattern(String pattern, int flags) throws PatternSyntaxException {
    if ((flags & CANON_EQ) != 0) {
    throw new UnsupportedOperationException(“CANON_EQ flag not supported”);
    }
    int supportedFlags = CASE_INSENSITIVE | COMMENTS | DOTALL |
    LITERAL | MULTILINE | UNICODE_CASE | UNIX_LINES;
    if ((flags & ~supportedFlags) != 0) {
    throw new IllegalArgumentException(“Unsupported flags: “ + (flags & ~supportedFlags));
    }
    this.pattern = pattern;
    this.flags = flags;
    compile();
    }
    }

    View full-size slide

  11. Android Pattern
    public class Pattern {
    private Pattern(String pattern, int flags) throws PatternSyntaxException {
    if ((flags & CANON_EQ) != 0) {
    throw new UnsupportedOperationException(“CANON_EQ flag not supported”);
    }
    int supportedFlags = CASE_INSENSITIVE | COMMENTS | DOTALL |
    LITERAL | MULTILINE | UNICODE_CASE | UNIX_LINES;
    if ((flags & ~supportedFlags) != 0) {
    throw new IllegalArgumentException(“Unsupported flags: “ + (flags & ~supportedFlags));
    }
    this.pattern = pattern;
    this.flags = flags;
    compile();
    }
    }

    View full-size slide

  12. Android Pattern
    public class Pattern {
    private Pattern(String pattern, int flags) throws PatternSyntaxException {
    if ((flags & CANON_EQ) != 0) {
    throw new UnsupportedOperationException(“CANON_EQ flag not supported”);
    }
    int supportedFlags = CASE_INSENSITIVE | COMMENTS | DOTALL |
    LITERAL | MULTILINE | UNICODE_CASE | UNIX_LINES;
    if ((flags & ~supportedFlags) != 0) {
    throw new IllegalArgumentException(“Unsupported flags: “ + (flags & ~supportedFlags));
    }
    this.pattern = pattern;
    this.flags = flags;
    compile();
    }
    }

    View full-size slide

  13. Java Pattern
    public class Pattern {
    private void compile() {
    if (has(CANON_EQ) && !has(LITERAL)) {
    normalize();
    } else {
    normalizedPattern = pattern;
    }
    patternLength = normalizedPattern.length();
    // Copy pattern to int array for convenience
    // Use double zero to terminate pattern
    temp = new int[patternLength + 2];
    hasSupplementary = false;
    int c, count = 0;
    // Convert all chars into code points
    for (int x = 0; x < patternLength; x += Character.charCount(c)) {
    c = normalizedPattern.codePointAt(x);
    if (isSupplementary(c)) {
    hasSupplementary = true;
    }
    temp[count++] = c;
    }
    patternLength = count; // patternLength now in code points
    // ……
    }
    }

    View full-size slide

  14. Character class
    • Java
    • Matches only single byte characters
    • Android
    • Matches both single byte and multi byte characters.
    • Details here: http://bit.ly/1R73wkM

    View full-size slide

  15. Regular expression engines
    • Java
    • java.util.regex Engine
    • Conform Unicode Technical Standard #18 Level1 and Release
    2.1”Canonical Equivalents”.
    • Android
    • ICU(International Components for Unicode) Engine
    • Conform Unicode Technical Standard #18 Level 1 and Default Word
    Boundaries and Name Properties from Level2

    View full-size slide

  16. Canonical Equivalents
    • Canonically equivalent code point sequences are assumed to have the
    same appearance and meaning when printed or displayed.
    • e.g. “ü” and “u¨” are canonically equivalent

    View full-size slide

  17. Android is not Java
    ʉ\ʊ(π)ʊ/ʉ

    View full-size slide

  18. Regexp in Android and Java
    Keishin Yokomaku @ Drivemode, Inc.
    potatotips #22

    View full-size slide