Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Strings in Pharo — ESUG 2015

Strings in Pharo — ESUG 2015

We examined the API of the String class in Pharo, to identify good idioms and bad smells among its methods.

Paper presentation at the International Workshop on Smalltalk Technology, during the ESUG conference in Brescia, July 2015
http://www.esug.org/wiki/pier/Conferences/2015

Photos from http://www.pexels.com

Bc94d333458fb73f6249278982ba2f0a?s=128

Damien Pollet

July 15, 2015
Tweet

More Decks by Damien Pollet

Other Decks in Programming

Transcript

  1. a first look at Strings in Pharo Damien Pollet —

    Inria Lille International Workshop on Smalltalk Technology — ESUG 2015, Brescia
  2. A First Analysis of String APIs: the Case of Pharo

    Damien Pollet Stéphane Ducasse RMoD — Inria & Université Lille 1 damien.pollet@inria.fr Abstract Most programming languages natively provide an abstraction of character strings. However, it is difficult to assess the de- sign or the API of a string library. There is no comprehensive analysis of the needed operations and their different varia- tions. There are no real guidelines about the different forces in presence and how they structure the design space of string manipulation. In this article, we harvest and structure a set of criteria to describe a string API. We propose an analysis of the Pharo 4 String library as a first experience on the topic. Keywords Strings, API, Library, Design, Style case of strings, however, these characteristics are particularly hard to reach, due to the following design constraints. For a single data type, strings tend to have a large API: in Ruby, the String class provides more than 100 methods, in Java more than 60, and Python’s str around 40. In Pharo1, the String class alone understands 319 distinct messages, not counting inherited methods. While a large API is not al- ways a problem per se, it shows that strings have many use cases, from concatenation and printing to search-and-replace, parsing, natural or domain-specific languages. Unfortunately, strings are often abused to eschew proper modeling of struc- tured data, resulting in inadequate serialized representations
  3. A First Analysis of String APIs: the Case of Pharo

    Damien Pollet Stéphane Ducasse RMoD — Inria & Université Lille 1 damien.pollet@inria.fr Abstract Most programming languages natively provide an abstraction of character strings. However, it is difficult to assess the de- sign or the API of a string library. There is no comprehensive analysis of the needed operations and their different varia- tions. There are no real guidelines about the different forces in presence and how they structure the design space of string manipulation. In this article, we harvest and structure a set of criteria to describe a string API. We propose an analysis of the Pharo 4 String library as a first experience on the topic. Keywords Strings, API, Library, Design, Style case of strings, however, these characteristics are particularly hard to reach, due to the following design constraints. For a single data type, strings tend to have a large API: in Ruby, the String class provides more than 100 methods, in Java more than 60, and Python’s str around 40. In Pharo1, the String class alone understands 319 distinct messages, not counting inherited methods. While a large API is not al- ways a problem per se, it shows that strings have many use cases, from concatenation and printing to search-and-replace, parsing, natural or domain-specific languages. Unfortunately, strings are often abused to eschew proper modeling of struc- tured data, resulting in inadequate serialized representations
  4. ? Using strings feels
 T E D I O U

    S… Why?
  5. Not enough methods, maybe? Objective C Java Ruby Python Haskell

    Pharo
  6. Not enough methods, maybe? Objective C Java Ruby Python Haskell

    Pharo 85
  7. Not enough methods, maybe? Objective C Java Ruby Python Haskell

    Pharo 60 85
  8. Not enough methods, maybe? Objective C Java Ruby Python Haskell

    Pharo 100 60 85
  9. Not enough methods, maybe? Objective C Java Ruby Python Haskell

    Pharo 40 100 60 85
  10. Not enough methods, maybe? Objective C Java Ruby Python Haskell

    Pharo 4 40 100 60 85
  11. Not enough methods, maybe? Objective C Java Ruby Python Haskell

    Pharo 319 4 40 100 60 85
  12. Concatenation

  13. Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"]

  14. Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"] Java "Hello" + "_world"

  15. Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"] Java "Hello" + "_world"

    Ruby "Hello" + "_world"
  16. Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"] Java "Hello" + "_world"

    Ruby "Hello" + "_world" Pharo 'Hello' , '_world'
  17. Extraction

  18. Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)]

  19. Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)] Java "abcdef".substring(2, 4)

  20. Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)] Java "abcdef".substring(2, 4)

    Ruby "abcdef"[2, 4]
  21. Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)] Java "abcdef".substring(2, 4)

    Ruby "abcdef"[2, 4] Pharo 'abcdef' copyFrom: 3 to: 5
  22. None
  23. http://www.confidentruby.com

  24. ? well, aren't strings just… objects?

  25. strings domain
 objects parsing serialization

  26. ? well, aren't strings just… collections?

  27. Feature overlap Locating & Extracting what: characters, substrings? how: index,

    range, pattern? Splitting & Merging separator? Substituting one occurrence, or all? eagerly or lazily?
 
 Testing & Matching Converting to other strings to other types Iterating byte ≠ codepoint ≠ character
  28. Feature overlap Locating & Extracting what: characters, substrings? how: index,

    range, pattern? Splitting & Merging separator? Substituting one occurrence, or all? eagerly or lazily?
 
 Testing & Matching Converting to other strings to other types Iterating byte ≠ codepoint ≠ character
  29. More than indices Ruby's indexing operator (square brackets): my_string [index]

    [from, length] [from..to] [/reg(exp)+/] [-index] ['substring']
  30. Idioms that I expected to find in…

  31. Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight

    trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name
  32. Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight

    trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified)
  33. Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight

    trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified) one side specified
  34. Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight

    trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified) one side specified both sides implicit
  35. Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight

    trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified) one side specified both sides implicit concise name
  36. Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight

    trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name
  37. Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight

    trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name 
 
 26*;;
 
 XIBUTUIFEJGGFSFODF
  38. Sentinel values Sentinel index zero?
 length + 1 Depends on

    use-case… raise exception, return null object, maybe? Pluggable sentinel case indexOf:aCharacter startingAt:index ifAbsent: aBlock
  39. None
  40. Smells Imperative style indices everywhere — copyReplaceFrom:to:with: Ad-hoc behavior stemAndNumericSuffix

    — endsWithDigit Redundancies findAnySubStr:startingAt: — findDelimiters:startingAt: Conversion asSymbol, asInteger, asDate — asLowercase, asHTMLString
  41. Mutability Let's talk about literals:

  42. Mutability Let's talk about literals:

  43. Where to go from here? Idioms more general than strings

    how to document & ensure completeness?
 lint rules? pragmas? method protocols? —if only they worked like tags… Improving composability indices everywhere! imperative style!
 iterators, transducers? — rethink collections as well? Mutability vs sharing slices / views, ropes
  44. A First Analysis of String APIs: the Case of Pharo

    Damien Pollet Stéphane Ducasse RMoD — Inria & Université Lille 1 damien.pollet@inria.fr Abstract Most programming languages natively provide an abstraction of character strings. However, it is difficult to assess the de- sign or the API of a string library. There is no comprehensive analysis of the needed operations and their different varia- tions. There are no real guidelines about the different forces in presence and how they structure the design space of string manipulation. In this article, we harvest and structure a set of criteria to describe a string API. We propose an analysis of the Pharo 4 String library as a first experience on the topic. case of strings, however, these characteristics are particularly hard to reach, due to the following design constraints. For a single data type, strings tend to have a large API: in Ruby, the String class provides more than 100 methods, in Java more than 60, and Python’s str around 40. In Pharo1, the String class alone understands 319 distinct messages, not counting inherited methods. While a large API is not al- ways a problem per se, it shows that strings have many use cases, from concatenation and printing to search-and-replace, parsing, natural or domain-specific languages. Unfortunately, strings are often abused to eschew proper modeling of struc- 3&"%.&