Structures and Patterns

Structures and Patterns

A real quick and dense runthrough of the tensions between "data structures" as taught, and "structured data" as we deal with every day. The talk focuses on the glom library: declarative transformations, standard library, and extensibility.

More details (and recording) at https://sedimental.org/talks.html

B4bbc497062643a8913884e7aba305f2?s=128

Mahmoud Hashemi

August 15, 2020
Tweet

Transcript

  1. Structure and Patterns Wrangling Nested Data in Python PyBay 2020

    Mahmoud Hashemi
  2. Data Structures CS 201, meet HTTP 201

  3. 3 Look familiar?

  4. 4 CS 201, meet HTTP 201 ? ? “Data Structures”

    ▶ Homogenous ▶ Invariants ▶ Algorithms Structured Data ▶ Heterogeneous ▶ Hierarchy ▶ “Can-do” attitude?
  5. 5 An ordinary API response We need this value

  6. 6 Take 1: Direct Access Wrangling the response

  7. 7 Take 1: Direct Access Wrangling the response

  8. 8 Take 1: Direct Access Wrangling the response

  9. 9 Take 1.5: Direct Access with Defaults Wrangling the response

    None doesn’t scale
  10. 10 Take 2: Access + Exceptions Wrangling the response Error

    messages? 100% test coverage?
  11. A Third Way (Drumroll please)

  12. glom Python’s nested data operator http:/ /github.com/mahmoud/glom 12

  13. 13 Take 3: glom’s “deep-get” Wrangling the response Concise access:

    Easy defaults:
  14. 14 Take 3: glom’s better errors Wrangling the response Debuggable,

    maintainable error messages: (more on this in a sec)
  15. The Transformation Why deep-get is only the beginning

  16. 16 Building a response The Transformation

  17. Building a response: Raw Python vs glom 17

  18. Declarative Data Transformation (WYSIWYG coding)

  19. Declarative data transforms Target The input {'ID': 2, 'data': {

    'isoDate': '1999-01-01' } } Spec The template {'id': 'data.ID', 'date': 'data.isoDate'} Output The result {'id': 2, 'date': '1999-01-01'} 19 output = glom(target, spec)
  20. WYSIWYG code “What you see is what you get” predates

    the rich text editor. ▪ List comprehensions ▫ [x * 2 for x in range(10)] ▪ Homoiconicity ▫ “Same” + “Representation” ▫ Looks? Or function? ▫ LISP, etc. ▪ Code As Data 20
  21. Code as Data In its most basic form: 21

  22. “ Flat is better than nested. - The Zen of

    Python 22
  23. The Zen of Glom ▪ Flat Python is better than

    nested Python ▫ Flatten Python by handling nested data declaratively ▪ Complex glom specs are better than complicated Python ▪ Actionable errors are everything 23
  24. 24 The Data Trace Better error messages, and better stack

    traces. Short stack, peels away target and spec to get to the unexpected data.
  25. 25 Declarative data transformation 1 ▶ Less code ▶ Fewer

    bugs ▶ Better errors ▶ Daily use The glom stack
  26. Standard data transformations A selection of glom builtins

  27. Deep Assignment Not just for deep-gets. 27

  28. Streaming with Iter() Chainable, composable, declarative iterator transformation. 28 Other

    Iter() methods: ▪ .filter() ▪ .split() ▪ .flatten() ▪ .limit() ▪ .first() ▪ (and more)
  29. Python Native Alternatives exist, but none of them came close

    to matching the expressiveness of Python’s data model. 29
  30. The T object Explicit, Pythonic path specification. 30

  31. The T object: Your Data’s Stunt Double T does anything,

    and has a better contract. 31
  32. The glomenagerie More built-in transforms than we have time for:

    ▪ Invoke ▪ Merge ▪ Flatten ▪ Delete ▪ And/Or ▪ And more… ▫ glom.readthedocs.io 32
  33. 33 Declarative data transformation 1 Standard Specifiers 2 ▶ Deep

    Assign ▶ Streaming Iter ▶ Python-native T ▶ And more! The glom stack
  34. Extending glom Extensions, modes, and a case study

  35. What makes a Specifier Type? Let’s make one! 35 https://glom.readthedocs.io/en/latest/custom_spec_types.html

    Just an object With a method A scope for runtime state Including a glom function for recursion… and modes!
  36. glom Modes 36 Modes are dialects, for keeping specs concise

    and maintainable. ▪ Just like vi and emacs modes ▫ Closer to emacs multi-mode though ▪ Anyone can define and switch modes ▫ Just override scope[MODE] ▪ Four modes ▫ Auto mode (the default) ▫ Fill - Finer-grained data templating ▫ Group - Reduction and bucketization ▫ Match - Pattern matching and validation
  37. glom Match Mode 37 Pattern matching is very in again.

    Next time: Structural Matching, Control Flow, and Variable Capture.
  38. 38 Standard Specifiers 2 3 Extensible API & Runtime Declarative

    data transformation 1 ▶ Specifier types ▶ Modes as dialects ▶ Just Python The glom stack
  39. 39 Standard Specifiers 2 3 Extensible API & Runtime Declarative

    data transformation 1 The glom stack
  40. 40 Thanks! Any questions? Find more at: ▪ glom.readthedocs.io ▪

    github.com/mahmoud ▪ twitter.com/mhashemi ▪ sedimental.org
  41. Real Specs Have Indents Specs can be as varied and

    scalable as your data. 41 https://glom.readthedocs.io/en/latest/tutorial.html Coalesce /ˌkōəˈles/ - verb Accept the first non-failing value, or default.
  42. The Journey Ahead ▪ Data structures ▪ Python: Power and

    Promise ▪ But remember not to overload your slides with content Your audience will listen to you or read the content, but won’t do both. 42
  43. 43

  44. 44

  45. “ Quotations are commonly printed as a means of inspiration

    and to invoke philosophical thoughts from the reader. 45
  46. This is a slide title ▪ Here you have a

    list of items ▪ And some text ▪ But remember not to overload your slides with content Your audience will listen to you or read the content, but won’t do both. 46
  47. Big concept Bring the attention of your audience over a

    key concept using icons or illustrations 47
  48. White Is the color of milk and fresh snow, the

    color produced by the combination of all the colors of the visible spectrum. You can also split your content Black Is the color of ebony and of outer space. It has been the symbolic color of elegance, solemnity and authority. 48
  49. In two or three columns Yellow Is the color of

    gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange. Blue Is the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum. Red Is the color of blood, and because of this it has historically been associated with sacrifice, danger and courage. 49
  50. A picture is worth a thousand words A complex idea

    can be conveyed with just a single still image, namely making it possible to absorb large amounts of data quickly. 50
  51. Use diagrams to explain your ideas 51 Lorem Ipsum Lorem

    Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum
  52. And tables to compare data A B C Yellow 10

    20 7 Blue 30 15 10 Orange 5 24 16 52
  53. 89,526,124 Whoa! That’s a big number, aren’t you proud? 53

  54. 89,526,124 That’s a lot 100% Total success tho! 185,244 Not

    quite as much 54
  55. Our process is easy 55 Vestibulum congue tempus Lorem ipsum

    dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Donec facilisis lacus eget mauris. 3 Vestibulum congue tempus Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Donec facilisis lacus eget mauris. 2 Vestibulum congue tempus Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Donec facilisis lacus eget mauris. 1
  56. Let’s review some concepts Yellow Is the color of gold,

    butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange. Blue Is the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum. Red Is the color of blood, and because of this it has historically been associated with sacrifice, danger and courage. 56 Yellow Is the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange. Blue Is the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum. Red Is the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.
  57. 57