Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Structures and Patterns

Structures and Patterns

A real quick and dense runthrough of the tensions between "data structures" as taught, and "structured data" as we deal with every day. The talk focuses on the glom library: declarative transformations, standard library, and extensibility.

More details (and recording) at https://sedimental.org/talks.html

Mahmoud Hashemi

August 15, 2020
Tweet

More Decks by Mahmoud Hashemi

Other Decks in Technology

Transcript

  1. Structure and Patterns
    Wrangling Nested Data in Python
    PyBay 2020
    Mahmoud Hashemi

    View full-size slide

  2. Data Structures
    CS 201, meet HTTP 201

    View full-size slide

  3. 3
    Look familiar?

    View full-size slide

  4. 4
    CS 201, meet HTTP 201
    ?
    ?
    “Data Structures”
    ▶ Homogenous
    ▶ Invariants
    ▶ Algorithms
    Structured Data
    ▶ Heterogeneous
    ▶ Hierarchy
    ▶ “Can-do” attitude?

    View full-size slide

  5. 5
    An ordinary API response
    We need this value

    View full-size slide

  6. 6
    Take 1: Direct Access
    Wrangling the response

    View full-size slide

  7. 7
    Take 1: Direct Access
    Wrangling the response

    View full-size slide

  8. 8
    Take 1: Direct Access
    Wrangling the response

    View full-size slide

  9. 9
    Take 1.5: Direct Access with Defaults
    Wrangling the response
    None
    doesn’t scale

    View full-size slide

  10. 10
    Take 2: Access + Exceptions
    Wrangling the response
    Error messages?
    100% test coverage?

    View full-size slide

  11. A Third Way
    (Drumroll please)

    View full-size slide

  12. glom
    Python’s nested data operator
    http:/
    /github.com/mahmoud/glom
    12

    View full-size slide

  13. 13
    Take 3: glom’s “deep-get”
    Wrangling the response
    Concise access:
    Easy defaults:

    View full-size slide

  14. 14
    Take 3: glom’s better errors
    Wrangling the response
    Debuggable, maintainable error messages:
    (more on this in a sec)

    View full-size slide

  15. The Transformation
    Why deep-get is only the beginning

    View full-size slide

  16. 16
    Building a response
    The Transformation

    View full-size slide

  17. Building a response: Raw Python vs glom
    17

    View full-size slide

  18. Declarative Data
    Transformation
    (WYSIWYG coding)

    View full-size slide

  19. Declarative data transforms
    Target
    The input
    {'ID': 2,
    'data': {
    'isoDate': '1999-01-01'
    }
    }
    Spec
    The template
    {'id': 'data.ID',
    'date': 'data.isoDate'}
    Output
    The result
    {'id': 2,
    'date': '1999-01-01'}
    19
    output = glom(target, spec)

    View full-size slide

  20. WYSIWYG code
    “What you see is what you get” predates the rich text editor.
    ▪ List comprehensions
    ▫ [x * 2 for x in range(10)]
    ▪ Homoiconicity
    ▫ “Same” + “Representation”
    ▫ Looks? Or function?
    ▫ LISP, etc.
    ▪ Code As Data
    20

    View full-size slide

  21. Code as Data
    In its most basic form:
    21

    View full-size slide

  22. “ Flat is better than nested.
    - The Zen of Python
    22

    View full-size slide

  23. The Zen of Glom
    ▪ Flat Python is better than nested Python
    ▫ Flatten Python by handling nested data declaratively
    ▪ Complex glom specs are better than complicated Python
    ▪ Actionable errors are everything
    23

    View full-size slide

  24. 24
    The Data Trace
    Better error messages, and better stack traces.
    Short stack, peels away target and spec to get to the unexpected data.

    View full-size slide

  25. 25
    Declarative data
    transformation 1
    ▶ Less code
    ▶ Fewer bugs
    ▶ Better errors
    ▶ Daily use
    The glom stack

    View full-size slide

  26. Standard data transformations
    A selection of glom builtins

    View full-size slide

  27. Deep Assignment
    Not just for deep-gets.
    27

    View full-size slide

  28. Streaming with Iter()
    Chainable, composable, declarative iterator transformation.
    28
    Other Iter() methods:
    ▪ .filter()
    ▪ .split()
    ▪ .flatten()
    ▪ .limit()
    ▪ .first()
    ▪ (and more)

    View full-size slide

  29. Python
    Native
    Alternatives exist, but none of them
    came close to matching the
    expressiveness of Python’s data model.
    29

    View full-size slide

  30. The T object
    Explicit, Pythonic path specification.
    30

    View full-size slide

  31. The T object: Your Data’s Stunt Double
    T does anything, and has a better contract.
    31

    View full-size slide

  32. The glomenagerie
    More built-in transforms than we have time for:
    ▪ Invoke
    ▪ Merge
    ▪ Flatten
    ▪ Delete
    ▪ And/Or
    ▪ And more…
    ▫ glom.readthedocs.io
    32

    View full-size slide

  33. 33
    Declarative
    data
    transformation
    1
    Standard
    Specifiers
    2
    ▶ Deep Assign
    ▶ Streaming Iter
    ▶ Python-native T
    ▶ And more!
    The glom stack

    View full-size slide

  34. Extending glom
    Extensions, modes, and a case study

    View full-size slide

  35. What makes a Specifier Type?
    Let’s make one!
    35
    https://glom.readthedocs.io/en/latest/custom_spec_types.html
    Just an object
    With a method
    A scope for runtime state
    Including a glom function for
    recursion… and modes!

    View full-size slide

  36. glom Modes
    36
    Modes are dialects, for keeping specs concise and maintainable.
    ▪ Just like vi and emacs modes
    ▫ Closer to emacs multi-mode though
    ▪ Anyone can define and switch modes
    ▫ Just override scope[MODE]
    ▪ Four modes
    ▫ Auto mode (the default)
    ▫ Fill - Finer-grained data templating
    ▫ Group - Reduction and bucketization
    ▫ Match - Pattern matching and validation

    View full-size slide

  37. glom Match Mode
    37
    Pattern matching is very in again.
    Next time: Structural Matching, Control Flow, and Variable Capture.

    View full-size slide

  38. 38
    Standard
    Specifiers
    2
    3
    Extensible
    API & Runtime
    Declarative
    data
    transformation
    1
    ▶ Specifier types
    ▶ Modes as dialects
    ▶ Just Python
    The glom stack

    View full-size slide

  39. 39
    Standard
    Specifiers
    2
    3
    Extensible
    API & Runtime
    Declarative
    data
    transformation
    1
    The glom stack

    View full-size slide

  40. 40
    Thanks!
    Any questions?
    Find more at:
    ▪ glom.readthedocs.io
    ▪ github.com/mahmoud
    ▪ twitter.com/mhashemi
    ▪ sedimental.org

    View full-size slide

  41. Real Specs Have Indents
    Specs can be as varied and scalable as your data.
    41
    https://glom.readthedocs.io/en/latest/tutorial.html
    Coalesce
    /ˌkōəˈles/ - verb
    Accept the first non-failing
    value, or default.

    View full-size slide

  42. The Journey Ahead
    ▪ Data structures
    ▪ Python: Power and Promise
    ▪ But remember not to overload your slides with content
    Your audience will listen to you or read the content, but won’t do
    both.
    42

    View full-size slide

  43. “ Quotations are
    commonly printed as a
    means of inspiration and
    to invoke philosophical
    thoughts from the
    reader.
    45

    View full-size slide

  44. This is a slide title
    ▪ Here you have a list of items
    ▪ And some text
    ▪ But remember not to overload your slides with content
    Your audience will listen to you or read the content, but won’t do
    both.
    46

    View full-size slide

  45. Big
    concept
    Bring the attention of your audience
    over a key concept using icons or
    illustrations
    47

    View full-size slide

  46. White
    Is the color of milk and fresh
    snow, the color produced by the
    combination of all the colors of
    the visible spectrum.
    You can also split your content
    Black
    Is the color of ebony and of
    outer space. It has been the
    symbolic color of elegance,
    solemnity and authority.
    48

    View full-size slide

  47. In two or three columns
    Yellow
    Is the color of gold,
    butter and ripe
    lemons. In the
    spectrum of visible
    light, yellow is found
    between green and
    orange.
    Blue
    Is the colour of the
    clear sky and the deep
    sea. It is located
    between violet and
    green on the optical
    spectrum.
    Red
    Is the color of blood, and
    because of this it has
    historically been
    associated with
    sacrifice, danger and
    courage.
    49

    View full-size slide

  48. A picture is worth a thousand
    words
    A complex idea can be conveyed with just a
    single still image, namely making it possible
    to absorb large amounts of data quickly.
    50

    View full-size slide

  49. Use diagrams to explain your ideas
    51
    Lorem Ipsum
    Lorem Ipsum Lorem Ipsum
    Lorem Ipsum
    Lorem Ipsum
    Lorem Ipsum
    Lorem Ipsum

    View full-size slide

  50. And tables to compare data
    A B C
    Yellow 10 20 7
    Blue 30 15 10
    Orange 5 24 16
    52

    View full-size slide

  51. 89,526,124
    Whoa! That’s a big number, aren’t you proud?
    53

    View full-size slide

  52. 89,526,124
    That’s a lot
    100%
    Total success tho!
    185,244
    Not quite as much
    54

    View full-size slide

  53. Our process is easy
    55
    Vestibulum congue
    tempus
    Lorem ipsum dolor sit amet,
    consectetur adipiscing elit, sed do
    eiusmod tempor. Donec facilisis
    lacus eget mauris.
    3
    Vestibulum congue
    tempus
    Lorem ipsum dolor sit amet,
    consectetur adipiscing elit, sed do
    eiusmod tempor. Donec facilisis lacus
    eget mauris.
    2
    Vestibulum congue
    tempus
    Lorem ipsum dolor sit amet,
    consectetur adipiscing elit, sed do
    eiusmod tempor. Donec facilisis lacus
    eget mauris.
    1

    View full-size slide

  54. Let’s review some concepts
    Yellow
    Is the color of gold, butter and ripe
    lemons. In the spectrum of visible
    light, yellow is found between
    green and orange.
    Blue
    Is the colour of the clear sky and
    the deep sea. It is located
    between violet and green on the
    optical spectrum.
    Red
    Is the color of blood, and because of
    this it has historically been
    associated with sacrifice, danger and
    courage.
    56
    Yellow
    Is the color of gold, butter and ripe
    lemons. In the spectrum of visible
    light, yellow is found between
    green and orange.
    Blue
    Is the colour of the clear sky and
    the deep sea. It is located
    between violet and green on the
    optical spectrum.
    Red
    Is the color of blood, and because
    of this it has historically been
    associated with sacrifice, danger
    and courage.

    View full-size slide