Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Golang string tips

Golang string tips

Some tips about string type in Go.

Summary:
- string is a read-only byte slice
- string is not necessary a valid UTF-8 bytes
- %q is useful when formatting a string
- consider using bytes.Buffer to concatenate a bunch of strings
- take advantage of unicode/utf8 package

Reference:
- https://golang.org/ref/spec The Go Programming Language Specification - The Go Programming Language
- https://blog.golang.org/strings Strings, bytes, runes and characters in Go - The Go Blog
- http://qiita.com/ruiu/items/2bb83b29baeae2433a79 Goでは文字列連結はコストの高い操作 - Qiita
- http://qiita.com/ono_matope/items/d5e70d8a9ff2b54d5c37 Goの文字列結合のパフォーマンス - Qiita

The slides are published on the below URL:
http://go-talks.appspot.com/github.com/hkurokawa/go-slides/string-tips/stringtips.slide

Hiroshi Kurokawa

June 02, 2015
Tweet

More Decks by Hiroshi Kurokawa

Other Decks in Technology

Transcript

  1. Literals Raw stirng: surrounded by back quotes (`) Interpreted string:

    surrounded by double quotes (") ` r a w l i t e r a l w i t h a n e w l i n e ` " H e l l o , \ u 4 e 1 6 \ u 7 5 4 c " / / " H e l l o , 世界" " \ x 4 8 \ x 6 5 \ x 6 c \ x 6 c \ x 6 f \ x 2 c \ x 2 0 \ x e 4 \ x b 8 \ x 9 6 \ x e 7 \ x 9 5 \ x 8 c " / / " H e l l o , 世界"
  2. Again, string is just a bunch of bytes string may

    not be a valid UTF-8 encoded string f u n c m a i n ( ) { c o n s t s a m p l e = " \ x b d \ x b 2 \ x 3 d \ x b c \ x 2 0 \ x e 2 \ x 8 c \ x 9 8 " f m t . P r i n t l n ( " P r i n t l n : " ) f m t . P r i n t l n ( s a m p l e ) f m t . P r i n t l n ( " B y t e l o o p : " ) f o r i : = 0 ; i < l e n ( s a m p l e ) ; i + + { f m t . P r i n t f ( " % x " , s a m p l e [ i ] ) } f m t . P r i n t f ( " \ n " ) f m t . P r i n t l n ( " P r i n t f w i t h % x : " ) f m t . P r i n t f ( " % x \ n " , s a m p l e ) } Run
  3. %q verb %q verb will escape any non-printable byte sequences

    in a string %+q verb will expose the Unicode values of properly formatted UTF-8 that represents non-ASCII data %q tries to interpret a single byte as a rune (See strconv.quoteWith() (http://golang.org/src/strconv/quote.go) ) f u n c m a i n ( ) { c o n s t s a m p l e = " \ x b d \ x b 2 \ x 3 d \ x b c \ x 2 0 \ x e 2 \ x 8 c \ x 9 8 " f m t . P r i n t l n ( " P r i n t l n : " ) f m t . P r i n t l n ( s a m p l e ) f m t . P r i n t l n ( " P r i n t f w i t h % q : " ) f m t . P r i n t f ( " % q \ n " , s a m p l e ) f m t . P r i n t l n ( " B y t e l o o p p r i n t i n g w i t h % q : " ) f o r i : = 0 ; i < l e n ( s a m p l e ) ; i + + { f m t . P r i n t f ( " % q " , s a m p l e [ i ] ) } f m t . P r i n t l n ( ) } Run
  4. Count by runes, not by bytes f u n c

    m a i n ( ) { c o n s t s a m p l e = ` H e l l o , ` f m t . P r i n t l n ( " P r i n t l n : " ) f m t . P r i n t l n ( s a m p l e ) f m t . P r i n t l n ( " l e n ( ) : " ) f m t . P r i n t l n ( l e n ( s a m p l e ) ) f m t . P r i n t l n ( " u t f 8 . R u n e C o u n t I n S t r i n g ( ) : " ) f m t . P r i n t l n ( u t f 8 . R u n e C o u n t I n S t r i n g ( s a m p l e ) ) f m t . P r i n t l n ( " r a n g e l o o p : " ) f o r _ , c : = r a n g e s a m p l e { f m t . P r i n t f ( " % + q " , c ) } f m t . P r i n t l n ( ) } Run
  5. Append strings one by one As in Java, it is

    inefficient to concatenate a lot of strings one by one, like `s += list[i]`, which will cause memory allocation many times Solution Use a p p e n d ( ) b : = m a k e ( [ ] b y t e , 0 , 1 0 0 ) f o r _ , v : = r a n g e l i s t { b = a p p e n d ( b , v . . . ) } r e t u r n s t r i n g ( b ) Use b y t e s . B u f f e r b : = b y t e s . N e w B u f f e r ( m a k e ( [ ] b y t e , 0 , 1 0 0 ) ) f o r _ , v : = r a n g e l i s t { b . W r i t e S t r i n g ( v ) } r e t u r n b . S t r i n g ( )
  6. Usefule packages unicode (https://golang.org/pkg/unicode/) : Test functions on some Unicode

    properties unicode/utf8 (https://golang.org/pkg/unicode/utf8/) : Utility functions to translate between rune <=> bytes / string strings (https://golang.org/pkg/strings/) : Utility functions to manipulate strings c o n s t c h e e r s = ` C h e e r s ! ` f o r i , w : = 0 , 0 ; i < l e n ( c h e e r s ) ; i + = w { r u n e V a l u e , w i d t h : = u t f 8 . D e c o d e R u n e I n S t r i n g ( c h e e r s [ i : ] ) f m t . P r i n t f ( " % # U s t a r t s a t b y t e p o s i t i o n % d \ n " , r u n e V a l u e , i ) w = w i d t h } Run c o n s t c h e e r s = " \ t \ t C h e e r s \ t \ U 0 0 0 1 F 3 7 B ! \ t \ n " f m t . P r i n t l n ( " B e f o r e T r i m F u n c : " ) f m t . P r i n t l n ( c h e e r s ) s : = s t r i n g s . T r i m F u n c ( c h e e r s , u n i c o d e . I s C o n t r o l ) f m t . P r i n t l n ( " A f t e r T r i m F u n c : " ) f m t . P r i n t l n ( s ) Run
  7. The Go Programming Language Specification - The Go Programming Language

    (https://golang.org/ref/spec) Strings, bytes, runes and characters in Go - The Go Blog (https://blog.golang.org/strings) Go では文字列連結はコストの高い操作 - Qiita (http://qiita.com/ruiu/items/2bb83b29baeae2433a79) Go の文字列結合のパフォー マンス - Qiita (http://qiita.com/ono_matope/items/d5e70d8a9ff2b54d5c37)