Do you really think you know strings in Python?

Understanding the tricky concepts behind strings in Python in an interesting way.


Satwik Kansal

June 22, 2018


  2. What is string?

  3. A Data Structure?

  4. In Python, Strings are sequences, just like lists, and tuples.

  5. What is a sequence in Python?

  6. Well, a sequence is an “ordered iterable” with random access.

  7. Sequences

  8. Iterable, iterator, generator and containers Container: Containers are data structures

    holding elements, and that support membership tests. Example, lists, set, dict, tuple, str Iterable: An iterable is any object, not necessarily a data structure, that can return an iterator. Iterator: It's a stateful helper object that will produce the next value when you call next() on it. Any object that has a __next__() method is therefore an iterator.
  9. Iterable, iterator, generator and containers • Most containers are also

    iterable. But many more things are iterable as well. Examples are open files, open sockets, etc. • Where containers are typically finite, an iterable may just as well represent an infinite source of data. • An iterator is a value factory. Each time you ask it for "the next" value, it knows how to compute it because it holds internal state. • A generator is a special kind of iterator (with some elegant syntax of writing it) • Any generator, therefore, is a factory that lazily produces values.
  10. Iterable, iterator, generator and containers

  11. Sequence operations: Concatenation, multiplication, Iterating, and indexing

  12. But, are strings exactly like lists?

  13. Let’s play a game

  14. 1. Python snippet 2. Guess the output 3. Actual Output

    4. Repeat!!
  15. Are strings exactly like lists?

  16. None
  17. So unlike lists, strings are “immutable”

  18. Is there any other difference except immutability?

  19. None
  20. None
  21. So strings are special type of sequences containing only characters.

  22. Objects and identities in Python

  23. Objects and identities in Python (contd…) id returns an integer

    (or long integer) which is guaranteed to be unique and constant for this object during its lifetime is operator checks if both the operands refer to the same object (i.e., it checks if the identity of the operands matches or not).
  24. What happens if you multiply strings with booleans?

  25. None
  26. What’s up with the booleans?

  27. Is the behavior same for all numbers?

  28. None
  29. Are all the string objects different?

  30. None
  31. Are all concatenated strings same?

  32. None
  33. Unlocking mysteries, one by one...

  34. Implicit String interning • Such behavior is due to CPython

    optimization (called string interning) that tries to use existing immutable objects in some cases rather than creating a new object every time. • After being interned, many variables may point to the same string object in memory (thereby saving memory).
  35. When are strings interned implicitly? • The decision of when

    to implicitly intern a string is implementation dependent. There are some facts that can be used to guess if a string will be interned or not: ◦ All length 0 and length 1 strings are interned. ◦ Strings are interned at compile time ('wtf' will be interned but ''.join(['w', 't', 'f'] will not be interned) ◦ Strings that are not composed of ASCII letters, digits or underscores, are not interned. • This explains why 'wtf!' was not interned due to !
  36. None
  37. Constant folding • Constant folding is a technique for peephole

    optimization in Python. • This means the expression 'a'*20 is replaced by 'aaaaaaaaaaaaaaaaaaaa' during compilation to reduce few clock cycles during runtime. • Constant folding only occurs for strings having length less than 20. (Why? Imagine the size of .pyc file generated as a result of the expression 'a'*10**10).
  38. The interactive environment optimization When a and b are set

    to "wtf!" in the same line, the Python interpreter creates a new object, then references the second variable at the same time. If you do it on separate lines, it doesn't "know" that there's already wtf! as an object (because "wtf!" is not implicitly interned as per the facts mentioned above). It's a compiler optimization and specifically applies to the interactive environment.
  39. Half Triple quoted strings

  40. None
  41. Implicit string literal concatenation

  42. What’s the best way to make a “giant” string?

  43. What’s the best way to make a “giant” string?

  44. Let the race begin!

  45. None
  46. Many different ways to concatenate strings

  47. Many different ways to concatenate strings

  48. Bonus

  49. ' bit.ly/wtfpython About me (An active Python blogger, and author

    of “What the f*ck Python?”) https://satwikkansal.xyz Handle: @satwikkansal