Slide 1

Slide 1 text

a first look at Strings in Pharo Damien Pollet — Inria Lille International Workshop on Smalltalk Technology — ESUG 2015, Brescia

Slide 2

Slide 2 text

A First Analysis of String APIs: the Case of Pharo Damien Pollet Stéphane Ducasse RMoD — Inria & Université Lille 1 [email protected] Abstract Most programming languages natively provide an abstraction of character strings. However, it is difficult to assess the de- sign or the API of a string library. There is no comprehensive analysis of the needed operations and their different varia- tions. There are no real guidelines about the different forces in presence and how they structure the design space of string manipulation. In this article, we harvest and structure a set of criteria to describe a string API. We propose an analysis of the Pharo 4 String library as a first experience on the topic. Keywords Strings, API, Library, Design, Style case of strings, however, these characteristics are particularly hard to reach, due to the following design constraints. For a single data type, strings tend to have a large API: in Ruby, the String class provides more than 100 methods, in Java more than 60, and Python’s str around 40. In Pharo1, the String class alone understands 319 distinct messages, not counting inherited methods. While a large API is not al- ways a problem per se, it shows that strings have many use cases, from concatenation and printing to search-and-replace, parsing, natural or domain-specific languages. Unfortunately, strings are often abused to eschew proper modeling of struc- tured data, resulting in inadequate serialized representations

Slide 3

Slide 3 text

A First Analysis of String APIs: the Case of Pharo Damien Pollet Stéphane Ducasse RMoD — Inria & Université Lille 1 [email protected] Abstract Most programming languages natively provide an abstraction of character strings. However, it is difficult to assess the de- sign or the API of a string library. There is no comprehensive analysis of the needed operations and their different varia- tions. There are no real guidelines about the different forces in presence and how they structure the design space of string manipulation. In this article, we harvest and structure a set of criteria to describe a string API. We propose an analysis of the Pharo 4 String library as a first experience on the topic. Keywords Strings, API, Library, Design, Style case of strings, however, these characteristics are particularly hard to reach, due to the following design constraints. For a single data type, strings tend to have a large API: in Ruby, the String class provides more than 100 methods, in Java more than 60, and Python’s str around 40. In Pharo1, the String class alone understands 319 distinct messages, not counting inherited methods. While a large API is not al- ways a problem per se, it shows that strings have many use cases, from concatenation and printing to search-and-replace, parsing, natural or domain-specific languages. Unfortunately, strings are often abused to eschew proper modeling of struc- tured data, resulting in inadequate serialized representations

Slide 4

Slide 4 text

? Using strings feels
 T E D I O U S… Why?

Slide 5

Slide 5 text

Not enough methods, maybe? Objective C Java Ruby Python Haskell Pharo

Slide 6

Slide 6 text

Not enough methods, maybe? Objective C Java Ruby Python Haskell Pharo 85

Slide 7

Slide 7 text

Not enough methods, maybe? Objective C Java Ruby Python Haskell Pharo 60 85

Slide 8

Slide 8 text

Not enough methods, maybe? Objective C Java Ruby Python Haskell Pharo 100 60 85

Slide 9

Slide 9 text

Not enough methods, maybe? Objective C Java Ruby Python Haskell Pharo 40 100 60 85

Slide 10

Slide 10 text

Not enough methods, maybe? Objective C Java Ruby Python Haskell Pharo 4 40 100 60 85

Slide 11

Slide 11 text

Not enough methods, maybe? Objective C Java Ruby Python Haskell Pharo 319 4 40 100 60 85

Slide 12

Slide 12 text

Concatenation

Slide 13

Slide 13 text

Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"]

Slide 14

Slide 14 text

Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"] Java "Hello" + "_world"

Slide 15

Slide 15 text

Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"] Java "Hello" + "_world" Ruby "Hello" + "_world"

Slide 16

Slide 16 text

Concatenation Objective C [@"Hello" stringByAppendingString: @"_world"] Java "Hello" + "_world" Ruby "Hello" + "_world" Pharo 'Hello' , '_world'

Slide 17

Slide 17 text

Extraction

Slide 18

Slide 18 text

Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)]

Slide 19

Slide 19 text

Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)] Java "abcdef".substring(2, 4)

Slide 20

Slide 20 text

Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)] Java "abcdef".substring(2, 4) Ruby "abcdef"[2, 4]

Slide 21

Slide 21 text

Extraction Objective C [@"abcdef" substringWithRange: NSMakeRange(2, 4)] Java "abcdef".substring(2, 4) Ruby "abcdef"[2, 4] Pharo 'abcdef' copyFrom: 3 to: 5

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

http://www.confidentruby.com

Slide 24

Slide 24 text

? well, aren't strings just… objects?

Slide 25

Slide 25 text

strings domain
 objects parsing serialization

Slide 26

Slide 26 text

? well, aren't strings just… collections?

Slide 27

Slide 27 text

Feature overlap Locating & Extracting what: characters, substrings? how: index, range, pattern? Splitting & Merging separator? Substituting one occurrence, or all? eagerly or lazily?
 
 Testing & Matching Converting to other strings to other types Iterating byte ≠ codepoint ≠ character

Slide 28

Slide 28 text

Feature overlap Locating & Extracting what: characters, substrings? how: index, range, pattern? Splitting & Merging separator? Substituting one occurrence, or all? eagerly or lazily?
 
 Testing & Matching Converting to other strings to other types Iterating byte ≠ codepoint ≠ character

Slide 29

Slide 29 text

More than indices Ruby's indexing operator (square brackets): my_string [index] [from, length] [from..to] [/reg(exp)+/] [-index] ['substring']

Slide 30

Slide 30 text

Idioms that I expected to find in…

Slide 31

Slide 31 text

Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name

Slide 32

Slide 32 text

Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified)

Slide 33

Slide 33 text

Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified) one side specified

Slide 34

Slide 34 text

Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified) one side specified both sides implicit

Slide 35

Slide 35 text

Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name canonical (both sides specified) one side specified both sides implicit concise name

Slide 36

Slide 36 text

Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name

Slide 37

Slide 37 text

Layers of convenience trimLeft:right: trimBoth: trimLeft: trimRight: trimBoth trimLeft trimRight trim, trimmed canonical: both sides explicit one explicit predicate block, one implicit (same or no trim) both sides implicit (trim whitespace) concise, fluent name 
 
 26*;;
 
 XIBUTUIFEJGGFSFODF

Slide 38

Slide 38 text

Sentinel values Sentinel index zero?
 length + 1 Depends on use-case… raise exception, return null object, maybe? Pluggable sentinel case indexOf:aCharacter startingAt:index ifAbsent: aBlock

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Smells Imperative style indices everywhere — copyReplaceFrom:to:with: Ad-hoc behavior stemAndNumericSuffix — endsWithDigit Redundancies findAnySubStr:startingAt: — findDelimiters:startingAt: Conversion asSymbol, asInteger, asDate — asLowercase, asHTMLString

Slide 41

Slide 41 text

Mutability Let's talk about literals:

Slide 42

Slide 42 text

Mutability Let's talk about literals:

Slide 43

Slide 43 text

Where to go from here? Idioms more general than strings how to document & ensure completeness?
 lint rules? pragmas? method protocols? —if only they worked like tags… Improving composability indices everywhere! imperative style!
 iterators, transducers? — rethink collections as well? Mutability vs sharing slices / views, ropes

Slide 44

Slide 44 text

A First Analysis of String APIs: the Case of Pharo Damien Pollet Stéphane Ducasse RMoD — Inria & Université Lille 1 [email protected] Abstract Most programming languages natively provide an abstraction of character strings. However, it is difficult to assess the de- sign or the API of a string library. There is no comprehensive analysis of the needed operations and their different varia- tions. There are no real guidelines about the different forces in presence and how they structure the design space of string manipulation. In this article, we harvest and structure a set of criteria to describe a string API. We propose an analysis of the Pharo 4 String library as a first experience on the topic. case of strings, however, these characteristics are particularly hard to reach, due to the following design constraints. For a single data type, strings tend to have a large API: in Ruby, the String class provides more than 100 methods, in Java more than 60, and Python’s str around 40. In Pharo1, the String class alone understands 319 distinct messages, not counting inherited methods. While a large API is not al- ways a problem per se, it shows that strings have many use cases, from concatenation and printing to search-and-replace, parsing, natural or domain-specific languages. Unfortunately, strings are often abused to eschew proper modeling of struc- 3&"%.&