Slide 1

Slide 1 text

Structure and Patterns Wrangling Nested Data in Python PyBay 2020 Mahmoud Hashemi

Slide 2

Slide 2 text

Data Structures CS 201, meet HTTP 201

Slide 3

Slide 3 text

3 Look familiar?

Slide 4

Slide 4 text

4 CS 201, meet HTTP 201 ? ? “Data Structures” ▶ Homogenous ▶ Invariants ▶ Algorithms Structured Data ▶ Heterogeneous ▶ Hierarchy ▶ “Can-do” attitude?

Slide 5

Slide 5 text

5 An ordinary API response We need this value

Slide 6

Slide 6 text

6 Take 1: Direct Access Wrangling the response

Slide 7

Slide 7 text

7 Take 1: Direct Access Wrangling the response

Slide 8

Slide 8 text

8 Take 1: Direct Access Wrangling the response

Slide 9

Slide 9 text

9 Take 1.5: Direct Access with Defaults Wrangling the response None doesn’t scale

Slide 10

Slide 10 text

10 Take 2: Access + Exceptions Wrangling the response Error messages? 100% test coverage?

Slide 11

Slide 11 text

A Third Way (Drumroll please)

Slide 12

Slide 12 text

glom Python’s nested data operator http:/ /github.com/mahmoud/glom 12

Slide 13

Slide 13 text

13 Take 3: glom’s “deep-get” Wrangling the response Concise access: Easy defaults:

Slide 14

Slide 14 text

14 Take 3: glom’s better errors Wrangling the response Debuggable, maintainable error messages: (more on this in a sec)

Slide 15

Slide 15 text

The Transformation Why deep-get is only the beginning

Slide 16

Slide 16 text

16 Building a response The Transformation

Slide 17

Slide 17 text

Building a response: Raw Python vs glom 17

Slide 18

Slide 18 text

Declarative Data Transformation (WYSIWYG coding)

Slide 19

Slide 19 text

Declarative data transforms Target The input {'ID': 2, 'data': { 'isoDate': '1999-01-01' } } Spec The template {'id': 'data.ID', 'date': 'data.isoDate'} Output The result {'id': 2, 'date': '1999-01-01'} 19 output = glom(target, spec)

Slide 20

Slide 20 text

WYSIWYG code “What you see is what you get” predates the rich text editor. ▪ List comprehensions ▫ [x * 2 for x in range(10)] ▪ Homoiconicity ▫ “Same” + “Representation” ▫ Looks? Or function? ▫ LISP, etc. ▪ Code As Data 20

Slide 21

Slide 21 text

Code as Data In its most basic form: 21

Slide 22

Slide 22 text

“ Flat is better than nested. - The Zen of Python 22

Slide 23

Slide 23 text

The Zen of Glom ▪ Flat Python is better than nested Python ▫ Flatten Python by handling nested data declaratively ▪ Complex glom specs are better than complicated Python ▪ Actionable errors are everything 23

Slide 24

Slide 24 text

24 The Data Trace Better error messages, and better stack traces. Short stack, peels away target and spec to get to the unexpected data.

Slide 25

Slide 25 text

25 Declarative data transformation 1 ▶ Less code ▶ Fewer bugs ▶ Better errors ▶ Daily use The glom stack

Slide 26

Slide 26 text

Standard data transformations A selection of glom builtins

Slide 27

Slide 27 text

Deep Assignment Not just for deep-gets. 27

Slide 28

Slide 28 text

Streaming with Iter() Chainable, composable, declarative iterator transformation. 28 Other Iter() methods: ▪ .filter() ▪ .split() ▪ .flatten() ▪ .limit() ▪ .first() ▪ (and more)

Slide 29

Slide 29 text

Python Native Alternatives exist, but none of them came close to matching the expressiveness of Python’s data model. 29

Slide 30

Slide 30 text

The T object Explicit, Pythonic path specification. 30

Slide 31

Slide 31 text

The T object: Your Data’s Stunt Double T does anything, and has a better contract. 31

Slide 32

Slide 32 text

The glomenagerie More built-in transforms than we have time for: ▪ Invoke ▪ Merge ▪ Flatten ▪ Delete ▪ And/Or ▪ And more… ▫ glom.readthedocs.io 32

Slide 33

Slide 33 text

33 Declarative data transformation 1 Standard Specifiers 2 ▶ Deep Assign ▶ Streaming Iter ▶ Python-native T ▶ And more! The glom stack

Slide 34

Slide 34 text

Extending glom Extensions, modes, and a case study

Slide 35

Slide 35 text

What makes a Specifier Type? Let’s make one! 35 https://glom.readthedocs.io/en/latest/custom_spec_types.html Just an object With a method A scope for runtime state Including a glom function for recursion… and modes!

Slide 36

Slide 36 text

glom Modes 36 Modes are dialects, for keeping specs concise and maintainable. ▪ Just like vi and emacs modes ▫ Closer to emacs multi-mode though ▪ Anyone can define and switch modes ▫ Just override scope[MODE] ▪ Four modes ▫ Auto mode (the default) ▫ Fill - Finer-grained data templating ▫ Group - Reduction and bucketization ▫ Match - Pattern matching and validation

Slide 37

Slide 37 text

glom Match Mode 37 Pattern matching is very in again. Next time: Structural Matching, Control Flow, and Variable Capture.

Slide 38

Slide 38 text

38 Standard Specifiers 2 3 Extensible API & Runtime Declarative data transformation 1 ▶ Specifier types ▶ Modes as dialects ▶ Just Python The glom stack

Slide 39

Slide 39 text

39 Standard Specifiers 2 3 Extensible API & Runtime Declarative data transformation 1 The glom stack

Slide 40

Slide 40 text

40 Thanks! Any questions? Find more at: ▪ glom.readthedocs.io ▪ github.com/mahmoud ▪ twitter.com/mhashemi ▪ sedimental.org

Slide 41

Slide 41 text

Real Specs Have Indents Specs can be as varied and scalable as your data. 41 https://glom.readthedocs.io/en/latest/tutorial.html Coalesce /ˌkōəˈles/ - verb Accept the first non-failing value, or default.

Slide 42

Slide 42 text

The Journey Ahead ▪ Data structures ▪ Python: Power and Promise ▪ But remember not to overload your slides with content Your audience will listen to you or read the content, but won’t do both. 42

Slide 43

Slide 43 text

43

Slide 44

Slide 44 text

44

Slide 45

Slide 45 text

“ Quotations are commonly printed as a means of inspiration and to invoke philosophical thoughts from the reader. 45

Slide 46

Slide 46 text

This is a slide title ▪ Here you have a list of items ▪ And some text ▪ But remember not to overload your slides with content Your audience will listen to you or read the content, but won’t do both. 46

Slide 47

Slide 47 text

Big concept Bring the attention of your audience over a key concept using icons or illustrations 47

Slide 48

Slide 48 text

White Is the color of milk and fresh snow, the color produced by the combination of all the colors of the visible spectrum. You can also split your content Black Is the color of ebony and of outer space. It has been the symbolic color of elegance, solemnity and authority. 48

Slide 49

Slide 49 text

In two or three columns Yellow Is the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange. Blue Is the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum. Red Is the color of blood, and because of this it has historically been associated with sacrifice, danger and courage. 49

Slide 50

Slide 50 text

A picture is worth a thousand words A complex idea can be conveyed with just a single still image, namely making it possible to absorb large amounts of data quickly. 50

Slide 51

Slide 51 text

Use diagrams to explain your ideas 51 Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum

Slide 52

Slide 52 text

And tables to compare data A B C Yellow 10 20 7 Blue 30 15 10 Orange 5 24 16 52

Slide 53

Slide 53 text

89,526,124 Whoa! That’s a big number, aren’t you proud? 53

Slide 54

Slide 54 text

89,526,124 That’s a lot 100% Total success tho! 185,244 Not quite as much 54

Slide 55

Slide 55 text

Our process is easy 55 Vestibulum congue tempus Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Donec facilisis lacus eget mauris. 3 Vestibulum congue tempus Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Donec facilisis lacus eget mauris. 2 Vestibulum congue tempus Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor. Donec facilisis lacus eget mauris. 1

Slide 56

Slide 56 text

Let’s review some concepts Yellow Is the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange. Blue Is the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum. Red Is the color of blood, and because of this it has historically been associated with sacrifice, danger and courage. 56 Yellow Is the color of gold, butter and ripe lemons. In the spectrum of visible light, yellow is found between green and orange. Blue Is the colour of the clear sky and the deep sea. It is located between violet and green on the optical spectrum. Red Is the color of blood, and because of this it has historically been associated with sacrifice, danger and courage.

Slide 57

Slide 57 text

57