Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Strings in Pharo — ESUG 2015

Strings in Pharo — ESUG 2015

We examined the API of the String class in Pharo, to identify good idioms and bad smells among its methods.

Paper presentation at the International Workshop on Smalltalk Technology, during the ESUG conference in Brescia, July 2015
http://www.esug.org/wiki/pier/Conferences/2015

Photos from http://www.pexels.com

Damien Pollet

July 15, 2015
Tweet

More Decks by Damien Pollet

Other Decks in Programming

Transcript

  1. a first look at
    Strings in Pharo
    Damien Pollet — Inria Lille
    International Workshop on Smalltalk Technology — ESUG 2015, Brescia

    View Slide

  2. A First Analysis of String APIs:
    the Case of Pharo
    Damien Pollet Stéphane Ducasse
    RMoD — Inria & Université Lille 1
    [email protected]
    Abstract
    Most programming languages natively provide an abstraction
    of character strings. However, it is difficult to assess the de-
    sign or the API of a string library. There is no comprehensive
    analysis of the needed operations and their different varia-
    tions. There are no real guidelines about the different forces
    in presence and how they structure the design space of string
    manipulation. In this article, we harvest and structure a set of
    criteria to describe a string API. We propose an analysis of
    the Pharo 4 String library as a first experience on the topic.
    Keywords
    Strings, API, Library, Design, Style
    case of strings, however, these characteristics are particularly
    hard to reach, due to the following design constraints.
    For a single data type, strings tend to have a large API:
    in Ruby, the String class provides more than 100 methods,
    in Java more than 60, and Python’s str around 40. In Pharo1,
    the String class alone understands 319 distinct messages, not
    counting inherited methods. While a large API is not al-
    ways a problem per se, it shows that strings have many use
    cases, from concatenation and printing to search-and-replace,
    parsing, natural or domain-specific languages. Unfortunately,
    strings are often abused to eschew proper modeling of struc-
    tured data, resulting in inadequate serialized representations

    View Slide

  3. A First Analysis of String APIs:
    the Case of Pharo
    Damien Pollet Stéphane Ducasse
    RMoD — Inria & Université Lille 1
    [email protected]
    Abstract
    Most programming languages natively provide an abstraction
    of character strings. However, it is difficult to assess the de-
    sign or the API of a string library. There is no comprehensive
    analysis of the needed operations and their different varia-
    tions. There are no real guidelines about the different forces
    in presence and how they structure the design space of string
    manipulation. In this article, we harvest and structure a set of
    criteria to describe a string API. We propose an analysis of
    the Pharo 4 String library as a first experience on the topic.
    Keywords
    Strings, API, Library, Design, Style
    case of strings, however, these characteristics are particularly
    hard to reach, due to the following design constraints.
    For a single data type, strings tend to have a large API:
    in Ruby, the String class provides more than 100 methods,
    in Java more than 60, and Python’s str around 40. In Pharo1,
    the String class alone understands 319 distinct messages, not
    counting inherited methods. While a large API is not al-
    ways a problem per se, it shows that strings have many use
    cases, from concatenation and printing to search-and-replace,
    parsing, natural or domain-specific languages. Unfortunately,
    strings are often abused to eschew proper modeling of struc-
    tured data, resulting in inadequate serialized representations

    View Slide

  4. ?
    Using strings feels

    T E D I O U S…
    Why?

    View Slide

  5. Not enough methods, maybe?
    Objective C
    Java
    Ruby
    Python
    Haskell
    Pharo

    View Slide

  6. Not enough methods, maybe?
    Objective C
    Java
    Ruby
    Python
    Haskell
    Pharo
    85

    View Slide

  7. Not enough methods, maybe?
    Objective C
    Java
    Ruby
    Python
    Haskell
    Pharo
    60
    85

    View Slide

  8. Not enough methods, maybe?
    Objective C
    Java
    Ruby
    Python
    Haskell
    Pharo
    100
    60
    85

    View Slide

  9. Not enough methods, maybe?
    Objective C
    Java
    Ruby
    Python
    Haskell
    Pharo
    40
    100
    60
    85

    View Slide

  10. Not enough methods, maybe?
    Objective C
    Java
    Ruby
    Python
    Haskell
    Pharo
    4
    40
    100
    60
    85

    View Slide

  11. Not enough methods, maybe?
    Objective C
    Java
    Ruby
    Python
    Haskell
    Pharo 319
    4
    40
    100
    60
    85

    View Slide

  12. Concatenation

    View Slide

  13. Concatenation
    Objective C [@"Hello" stringByAppendingString: @"_world"]

    View Slide

  14. Concatenation
    Objective C [@"Hello" stringByAppendingString: @"_world"]
    Java "Hello" + "_world"

    View Slide

  15. Concatenation
    Objective C [@"Hello" stringByAppendingString: @"_world"]
    Java "Hello" + "_world"
    Ruby "Hello" + "_world"

    View Slide

  16. Concatenation
    Objective C [@"Hello" stringByAppendingString: @"_world"]
    Java "Hello" + "_world"
    Ruby "Hello" + "_world"
    Pharo 'Hello' , '_world'

    View Slide

  17. Extraction

    View Slide

  18. Extraction
    Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)]

    View Slide

  19. Extraction
    Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)]
    Java "abcdef".substring(2, 4)

    View Slide

  20. Extraction
    Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)]
    Java "abcdef".substring(2, 4)
    Ruby "abcdef"[2, 4]

    View Slide

  21. Extraction
    Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)]
    Java "abcdef".substring(2, 4)
    Ruby "abcdef"[2, 4]
    Pharo 'abcdef' copyFrom: 3 to: 5

    View Slide

  22. View Slide

  23. http://www.confidentruby.com

    View Slide

  24. ?
    well, aren't strings just…
    objects?

    View Slide

  25. strings
    domain

    objects
    parsing
    serialization

    View Slide

  26. ?
    well, aren't strings just…
    collections?

    View Slide

  27. Feature overlap
    Locating & Extracting
    what: characters, substrings?
    how: index, range, pattern?
    Splitting & Merging
    separator?
    Substituting
    one occurrence, or all?
    eagerly or lazily?


    Testing & Matching
    Converting
    to other strings
    to other types
    Iterating
    byte ≠ codepoint ≠ character

    View Slide

  28. Feature overlap
    Locating & Extracting
    what: characters, substrings?
    how: index, range, pattern?
    Splitting & Merging
    separator?
    Substituting
    one occurrence, or all?
    eagerly or lazily?


    Testing & Matching
    Converting
    to other strings
    to other types
    Iterating
    byte ≠ codepoint ≠ character

    View Slide

  29. More than indices
    Ruby's indexing operator (square brackets):
    my_string
    [index]
    [from, length]
    [from..to]
    [/reg(exp)+/]
    [-index]
    ['substring']

    View Slide

  30. Idioms
    that I expected to find in…

    View Slide

  31. Layers of convenience
    trimLeft:right:
    trimBoth:
    trimLeft: trimRight:
    trimBoth
    trimLeft trimRight
    trim, trimmed
    canonical: both sides explicit
    one explicit predicate block,
    one implicit (same or no trim)
    both sides implicit
    (trim whitespace)
    concise, fluent name

    View Slide

  32. Layers of convenience
    trimLeft:right:
    trimBoth:
    trimLeft: trimRight:
    trimBoth
    trimLeft trimRight
    trim, trimmed
    canonical: both sides explicit
    one explicit predicate block,
    one implicit (same or no trim)
    both sides implicit
    (trim whitespace)
    concise, fluent name
    canonical (both sides specified)

    View Slide

  33. Layers of convenience
    trimLeft:right:
    trimBoth:
    trimLeft: trimRight:
    trimBoth
    trimLeft trimRight
    trim, trimmed
    canonical: both sides explicit
    one explicit predicate block,
    one implicit (same or no trim)
    both sides implicit
    (trim whitespace)
    concise, fluent name
    canonical (both sides specified)
    one side specified

    View Slide

  34. Layers of convenience
    trimLeft:right:
    trimBoth:
    trimLeft: trimRight:
    trimBoth
    trimLeft trimRight
    trim, trimmed
    canonical: both sides explicit
    one explicit predicate block,
    one implicit (same or no trim)
    both sides implicit
    (trim whitespace)
    concise, fluent name
    canonical (both sides specified)
    one side specified
    both sides implicit

    View Slide

  35. Layers of convenience
    trimLeft:right:
    trimBoth:
    trimLeft: trimRight:
    trimBoth
    trimLeft trimRight
    trim, trimmed
    canonical: both sides explicit
    one explicit predicate block,
    one implicit (same or no trim)
    both sides implicit
    (trim whitespace)
    concise, fluent name
    canonical (both sides specified)
    one side specified
    both sides implicit
    concise name

    View Slide

  36. Layers of convenience
    trimLeft:right:
    trimBoth:
    trimLeft: trimRight:
    trimBoth
    trimLeft trimRight
    trim, trimmed
    canonical: both sides explicit
    one explicit predicate block,
    one implicit (same or no trim)
    both sides implicit
    (trim whitespace)
    concise, fluent name

    View Slide

  37. Layers of convenience
    trimLeft:right:
    trimBoth:
    trimLeft: trimRight:
    trimBoth
    trimLeft trimRight
    trim, trimmed
    canonical: both sides explicit
    one explicit predicate block,
    one implicit (same or no trim)
    both sides implicit
    (trim whitespace)
    concise, fluent name


    26*;;


    XIBUTUIFEJGGFSFODF

    View Slide

  38. Sentinel values
    Sentinel index
    zero?

    length + 1
    Depends on use-case…
    raise exception, return null object, maybe?
    Pluggable sentinel case
    indexOf:aCharacter startingAt:index ifAbsent: aBlock

    View Slide

  39. View Slide

  40. Smells
    Imperative style
    indices everywhere — copyReplaceFrom:to:with:
    Ad-hoc behavior
    stemAndNumericSuffix — endsWithDigit
    Redundancies
    findAnySubStr:startingAt: — findDelimiters:startingAt:
    Conversion
    asSymbol, asInteger, asDate — asLowercase, asHTMLString

    View Slide

  41. Mutability
    Let's talk about literals:

    View Slide

  42. Mutability
    Let's talk about literals:

    View Slide

  43. Where to go from here?
    Idioms more general than strings
    how to document & ensure completeness?

    lint rules? pragmas? method protocols? —if only they worked like tags…
    Improving composability
    indices everywhere! imperative style!

    iterators, transducers? — rethink collections as well?
    Mutability vs sharing
    slices / views, ropes

    View Slide

  44. A First Analysis of String APIs:
    the Case of Pharo
    Damien Pollet Stéphane Ducasse
    RMoD — Inria & Université Lille 1
    [email protected]
    Abstract
    Most programming languages natively provide an abstraction
    of character strings. However, it is difficult to assess the de-
    sign or the API of a string library. There is no comprehensive
    analysis of the needed operations and their different varia-
    tions. There are no real guidelines about the different forces
    in presence and how they structure the design space of string
    manipulation. In this article, we harvest and structure a set of
    criteria to describe a string API. We propose an analysis of
    the Pharo 4 String library as a first experience on the topic.
    case of strings, however, these characteristics are particularly
    hard to reach, due to the following design constraints.
    For a single data type, strings tend to have a large API:
    in Ruby, the String class provides more than 100 methods,
    in Java more than 60, and Python’s str around 40. In Pharo1,
    the String class alone understands 319 distinct messages, not
    counting inherited methods. While a large API is not al-
    ways a problem per se, it shows that strings have many use
    cases, from concatenation and printing to search-and-replace,
    parsing, natural or domain-specific languages. Unfortunately,
    strings are often abused to eschew proper modeling of struc-
    3&"%.&

    View Slide