Slide 1

Slide 1 text

wei-lee.me Unleash the Chaos Developing a Linter for Un-Pythonic Code!

Slide 2

Slide 2 text

wei-lee.me QR Code to this slide deck

Slide 3

Slide 3 text

wei-lee.me CHAOS!!!!!!

Slide 4

Slide 4 text

wei-lee.me

Slide 5

Slide 5 text

wei-lee.me Well... It sounds more attracting than simply building a linter.

Slide 6

Slide 6 text

wei-lee.me The fact is... We're still going to start from building a standard linter.

Slide 7

Slide 7 text

wei-lee.me

Slide 8

Slide 8 text

wei-lee.me Sometimes, we require specific rules tailored to our needs.

Slide 9

Slide 9 text

wei-lee.me A real-world use case

Slide 10

Slide 10 text

wei-lee.me Back to the time...

Slide 11

Slide 11 text

wei-lee.me It required default values of argument deferrable in __init__ methods of any (operator) class to be conf.getboolean( "operators", "default_deferrable", fallback=False )

Slide 12

Slide 12 text

wei-lee.me ⛔

Slide 13

Slide 13 text

wei-lee.me ✅

Slide 14

Slide 14 text

wei-lee.me We're too lazy to do manual checks. So...

Slide 15

Slide 15 text

wei-lee.me Build our own linter 🔨

Slide 16

Slide 16 text

wei-lee.me We used the ast module

Slide 17

Slide 17 text

wei-lee.me What is ast anyway?

Slide 18

Slide 18 text

wei-lee.me A way to represent your code in a tree structure What is ast anyway?

Slide 19

Slide 19 text

wei-lee.me Minimum AST example

Slide 20

Slide 20 text

wei-lee.me Minimum AST example

Slide 21

Slide 21 text

wei-lee.me Minimum AST example

Slide 22

Slide 22 text

wei-lee.me Minimum AST example

Slide 23

Slide 23 text

wei-lee.me Minimum AST example

Slide 24

Slide 24 text

wei-lee.me Minimum AST example

Slide 25

Slide 25 text

wei-lee.me Minimum AST example

Slide 26

Slide 26 text

wei-lee.me Minimum AST example

Slide 27

Slide 27 text

wei-lee.me Minimum AST example

Slide 28

Slide 28 text

wei-lee.me Minimum AST example

Slide 29

Slide 29 text

wei-lee.me Minimum AST example

Slide 30

Slide 30 text

wei-lee.me Minimum AST example

Slide 31

Slide 31 text

wei-lee.me Go back to our example

Slide 32

Slide 32 text

wei-lee.me

Slide 33

Slide 33 text

wei-lee.me Well... It actually this huge even for such a simple case.

Slide 34

Slide 34 text

wei-lee.me What we want to check default values of argument deferrable in __init__ methods of any (operator) class to be conf.getboolean( "operators", "default_deferrable", fallback=False ) The requirement

Slide 35

Slide 35 text

wei-lee.me What we want to check 1. Search for class definitions 2. Search for __init__ methods 3. Search for arguments named as deferrable 4. Check the default value is conf.getboolean( "operators", "default_deferrable", fallback=False ) Formalize it into steps

Slide 36

Slide 36 text

wei-lee.me What we want to check 1. Search for ClassDef nodes 2. Search for FunctionDef nodes with name __init__ 3. Search for arguments nodes with name deferrable 4. Check the default values are conf.getboolean( "operators", "default_deferrable", fallback=False ) Formalize it into ast terms

Slide 37

Slide 37 text

wei-lee.me 1. Search for ClassDef node

Slide 38

Slide 38 text

wei-lee.me 1. Search for ClassDef node

Slide 39

Slide 39 text

wei-lee.me 1. Search for ClassDef node

Slide 40

Slide 40 text

wei-lee.me 2. Search for FunctionDef node with name __init__

Slide 41

Slide 41 text

wei-lee.me 2. Search for FunctionDef node with name __init__

Slide 42

Slide 42 text

wei-lee.me 3. Search for arguments named as deferrable

Slide 43

Slide 43 text

wei-lee.me 3. Search for arguments named as deferrable ast does not bind arguments and their defaults.

Slide 44

Slide 44 text

wei-lee.me 3. Search for arguments named as deferrable

Slide 45

Slide 45 text

wei-lee.me 3. Search for arguments named as deferrable ✅

Slide 46

Slide 46 text

wei-lee.me 3. Search for arguments named as deferrable ⛔

Slide 47

Slide 47 text

wei-lee.me 3. Search for arguments named as deferrable ❓

Slide 48

Slide 48 text

wei-lee.me 4. Check the default value

Slide 49

Slide 49 text

wei-lee.me 4. Check the default value

Slide 50

Slide 50 text

wei-lee.me 4. Check the default value All those comparisons in this step can be simplified as

Slide 51

Slide 51 text

wei-lee.me

Slide 52

Slide 52 text

wei-lee.me One step forward

Slide 53

Slide 53 text

wei-lee.me Auto-formatter!

Slide 54

Slide 54 text

wei-lee.me ast.NodeTransformer

Slide 55

Slide 55 text

wei-lee.me ast.NodeVisitor We can use it to simplify the linting logic.

Slide 56

Slide 56 text

wei-lee.me Build an auto-formatter Inherit from ast.NodeTransformer

Slide 57

Slide 57 text

wei-lee.me Build an auto-formatter visit_{Node type}

Slide 58

Slide 58 text

wei-lee.me Build an auto-formatter Logic after finding errors

Slide 59

Slide 59 text

wei-lee.me Build an auto-formatter Update "args.defaults" in the error node

Slide 60

Slide 60 text

wei-lee.me Build an auto-formatter Use DefaultDeferrableTransformer

Slide 61

Slide 61 text

wei-lee.me Build an auto-formatter Visit the ast tree

Slide 62

Slide 62 text

wei-lee.me Build an auto-formatter Fix line number due to node changes

Slide 63

Slide 63 text

wei-lee.me Build an auto-formatter Write the modified content back

Slide 64

Slide 64 text

wei-lee.me Let's see how this will be fixed!

Slide 65

Slide 65 text

wei-lee.me It fixed 🎉 But....

Slide 66

Slide 66 text

wei-lee.me All the formats are gone 😱

Slide 67

Slide 67 text

wei-lee.me Cheuk is right From PyCon APAC 2023

Slide 68

Slide 68 text

wei-lee.me

Slide 69

Slide 69 text

wei-lee.me

Slide 70

Slide 70 text

wei-lee.me Preserve More details

Slide 71

Slide 71 text

wei-lee.me Parsing the same module

Slide 72

Slide 72 text

wei-lee.me Syntax Tree from ast

Slide 73

Slide 73 text

wei-lee.me Syntax Tree from LibCST

Slide 74

Slide 74 text

wei-lee.me Syntax Tree from LibCST Visualized Version

Slide 75

Slide 75 text

wei-lee.me Syntax Tree from LibCST Let's zoom in a bit

Slide 76

Slide 76 text

wei-lee.me More features

Slide 77

Slide 77 text

wei-lee.me Metadata Providers

Slide 78

Slide 78 text

wei-lee.me Codemods

Slide 79

Slide 79 text

wei-lee.me Let's try this again CSTTransformer

Slide 80

Slide 80 text

wei-lee.me Let's try this again visit_{Node} / leave_{Node}

Slide 81

Slide 81 text

wei-lee.me Let's try this again a cleaner way to process parameters

Slide 82

Slide 82 text

wei-lee.me Let's try this again Compare syntax tree

Slide 83

Slide 83 text

wei-lee.me Let's try this again Update node

Slide 84

Slide 84 text

wei-lee.me Let's try this again It's now syntax tree visit transformer

Slide 85

Slide 85 text

wei-lee.me Everything looks good

Slide 86

Slide 86 text

wei-lee.me Except for it took 42.35 seconds to process 385 modules Btw, ast took 0.79. seconds (File loading took 0.49 seconds)

Slide 87

Slide 87 text

wei-lee.me What can we do? 1.Use ast to filter the modules that might contain an error 2.Use LibCST to fix those modules

Slide 88

Slide 88 text

wei-lee.me That's what I did in this PR

Slide 89

Slide 89 text

wei-lee.me How do other tools work?

Slide 90

Slide 90 text

wei-lee.me How black works?

Slide 91

Slide 91 text

wei-lee.me How black works? They're still building syntax trees. Just not using ast or LibCST

Slide 92

Slide 92 text

wei-lee.me How black works? 1. Sanitize lines 2. Parse the source code into a syntax tree (lib2to3_parse) 1. Get grammars in target versions 2. Tokenize 3. Return the syntax tree as Node object 3. Generate Line objects through LineGenerator 4. Transform line into formatted LineBlock In a super high level black == 24.8.0

Slide 93

Slide 93 text

wei-lee.me How black works? 1. Sanitize lines 2. Parse the source code into a syntax tree (lib2to3_parse) 1. Get grammars in target versions 2. Tokenize 3. Return the syntax tree as Node object 3. Generate Line objects through LineGenerator 4. Transform line into formatted LineBlock In a super high level black == 24.8.0

Slide 94

Slide 94 text

wei-lee.me How black works? 1. Sanitize lines 2. Parse the source code into a syntax tree (lib2to3_parse) 1. Get grammars in target versions 2. Tokenize 3. Return the syntax tree as Node object 3. Generate Line objects through LineGenerator 4. Transform line into formatted LineBlock In a super high level black == 24.8.0

Slide 95

Slide 95 text

wei-lee.me How black works? 1. Sanitize lines 2. Parse the source code into a syntax tree (lib2to3_parse) 1. Get grammars in target versions 2. Tokenize 3. Return the syntax tree as Node object 3. Generate Line objects through LineGenerator 4. Transform line into formatted LineBlock In a super high level black == 24.8.0

Slide 96

Slide 96 text

wei-lee.me How black works? 1. Sanitize lines 2. Parse the source code into a syntax tree (lib2to3_parse) 1. Get grammars in target versions 2. Tokenize 3. Return the syntax tree as Node object 3. Generate Line objects through LineGenerator 4. Transform line into formatted LineBlock In a super high level black == 24.8.0

Slide 97

Slide 97 text

wei-lee.me How black works? Syntax tree of "a = 1 + 2"

Slide 98

Slide 98 text

wei-lee.me How black works? Line object of "a = 1 + 2"

Slide 99

Slide 99 text

wei-lee.me How black is optimized?

Slide 100

Slide 100 text

wei-lee.me How about ruff?

Slide 101

Slide 101 text

wei-lee.me

Slide 102

Slide 102 text

wei-lee.me

Slide 103

Slide 103 text

wei-lee.me

Slide 104

Slide 104 text

wei-lee.me

Slide 105

Slide 105 text

wei-lee.me But if you take a deeper look

Slide 106

Slide 106 text

wei-lee.me

Slide 107

Slide 107 text

wei-lee.me It's still ast under the hook

Slide 108

Slide 108 text

wei-lee.me A minimum example

Slide 109

Slide 109 text

wei-lee.me Check if a deprecated name is used

Slide 110

Slide 110 text

wei-lee.me Get the qualified name (e.g., Dataset → airflow.datasets.Dataset)

Slide 111

Slide 111 text

wei-lee.me Check whether it's deprecated

Slide 112

Slide 112 text

wei-lee.me Check whether it's deprecated

Slide 113

Slide 113 text

wei-lee.me Push the error message

Slide 114

Slide 114 text

wei-lee.me That's it! (In a high level)

Slide 115

Slide 115 text

wei-lee.me Now you can also contribute to ruff with limited Rust knowledge 🙌

Slide 116

Slide 116 text

wei-lee.me

Slide 117

Slide 117 text

wei-lee.me H35?

Slide 118

Slide 118 text

wei-lee.me H1w a3t g5g s2e n8s?

Slide 119

Slide 119 text

wei-lee.me k8s i18n a11y l10n

Slide 120

Slide 120 text

wei-lee.me

Slide 121

Slide 121 text

wei-lee.me Setting up the rules 1.Skip builtins, dunder methods, and anything imported 2.Change all the top-level variable, class, and function names into numeronyms

Slide 122

Slide 122 text

wei-lee.me Nodes to exclude builtins, dunder methods

Slide 123

Slide 123 text

wei-lee.me Names to exclude import ... as ... / from ... import ... as ...

Slide 124

Slide 124 text

wei-lee.me Collect potential names ClassDef, FunctionDef, Name

Slide 125

Slide 125 text

wei-lee.me Collect potential names ClassDef, FunctionDef, Name

Slide 126

Slide 126 text

wei-lee.me Generate name mapping

Slide 127

Slide 127 text

wei-lee.me numeronymize

Slide 128

Slide 128 text

wei-lee.me NumeronymsTransformer

Slide 129

Slide 129 text

wei-lee.me 🙉

Slide 130

Slide 130 text

wei-lee.me Limitation (something I'm too lazy to implement) • Name collision • Scope • Inheritance

Slide 131

Slide 131 text

wei-lee.me It turns out that bringing chaos correctly is also tough.

Slide 132

Slide 132 text

wei-lee.me

Slide 133

Slide 133 text

wei-lee.me QR code links to my posts related to this talk

Slide 134

Slide 134 text

wei-lee.me Related PyCon Talks • Charlie Marsh - Ruff: An Extremely Fast Python Linter and Code Formatter, Written in Rust • Łukasz Langa - Life Is Better Painted Black, or: How to Stop Worrying and Embrace Auto-Formatting • Cheuk Ting Ho - Reformating your code without AI - let's see how a formatter works

Slide 135

Slide 135 text

wei-lee.me

Slide 136

Slide 136 text

wei-lee.me $ cat weilee.py __name__ = 李唯 / Wei Lee __what_i_am_doing__ = [ Volunteer @ PyCon Taiwan, First Time Speaker @ PyCon APAC, Member @ PyCon APAC, Maintainer of commitizen-tools, Software Engineer @ Astronomer, Committer @ Apache Airflow, ] __github__ = G Lee-W __linkedin__ = l clleew __site__ = p https://wei-lee.me

Slide 137

Slide 137 text

wei-lee.me File "weilee.py", line 1 __name__ = 李唯 / Wei Lee ^^^ SyntaxError: invalid syntax $ python weilee.py

Slide 138

Slide 138 text

wei-lee.me PyCon Taiwan 2025 6th-8th 2025 @ Taipei

Slide 139

Slide 139 text

wei-lee.me PyCon APAC 2025 Sprints 3rd March (Mon.)

Slide 140

Slide 140 text

wei-lee.me