Slide 1

Slide 1 text

Fuzzing Without Speci fi cations:
 Learning Structure from Behaviour Part I Rahul Gopinath 1

Slide 2

Slide 2 text

Fuzzing Without Speci fi cations:
 Learning Structure from Behaviour Part I Rahul Gopinath 2

Slide 3

Slide 3 text

Prerequisites https://github.com/vrthra/summer2025#readme • Install Python 3.12 • Install Graphviz • Install Z3 • Install Jupyter • Start Jupyter 3

Slide 4

Slide 4 text

Prerequisites http://localhost:8888/tree • Install Python 3.10 • Install Graphviz • Install Jupyter • Start Jupyter 4 See the README

Slide 5

Slide 5 text

http://localhost:8888/notebooks/RoadMap.ipynb 5

Slide 6

Slide 6 text

http://localhost:8888/notebooks/x0_0_Prerequisites.ipynb 6

Slide 7

Slide 7 text

7 Finding hidden bugs

Slide 8

Slide 8 text

8 Fuzzing Crash? Program Trash deck technique: 1950s - Gerald Weinberg

Slide 9

Slide 9 text

9 $ ./fuzz [;x1-GPZ+wcckc];,N9J+?#6^6\e?]9lu
 2_%'4GX"0VUB[E/r ~fApu6b8<{%siq8Z
 h.6{V,hr?;{Ti.r3PIxMMMv6{xS^+'Hq!
 AxB"YXRS@!Kd6;wtAMefFWM(`|J_<1~o}
 z3K(CCzRH JIIvHz>_*.\>JrlU32~eGP?
 lR=bF3+;y$3lodQ&]BS6R&j?#tP7iaV}-}`\?[_[Z^LBM
 PG-FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2
 D|vBy!^zkhdf3C5PAkR?V((-%>

Slide 10

Slide 10 text

10

Slide 11

Slide 11 text

http://localhost:8888/notebooks/x1_0_GeneratingSamples.ipynb 11

Slide 12

Slide 12 text

http://localhost:8888/notebooks/x1_0_GeneratingSamples.ipynb 12

Slide 13

Slide 13 text

13

Slide 14

Slide 14 text

@app.route('/admin') def admin(): username = request.cookies.get("username") if not username: return {"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 [ ; x 1 - G P Z + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; {Ti.r3PIxMMMv6{xS^+'Hq!AxB"YXRS@! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH J I I v H z > _ * . \ > J r l U 3 2 ~ e G P ? lR=bF3+;y$3lodQ & ] B S 6 R & j ? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V ( ( - % > < h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@ 5 : d f d 4 5 * ( 7 ^ % 5 a p \ z I y l " ' f , $ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@Wjh Z}r[Scun&sBCS,T[/3]KAeEnQ7lU)3Pn,0)G/ 6N-wyzj/MTd#A;r Program 14 https://www.fuzzingbook.org/html/Fuzzer.html Traditional Fuzzing

Slide 15

Slide 15 text

15 (CACM '90) No longer very effective

Slide 16

Slide 16 text

• Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 16 Feedback Driven Fuzzing 16 https://www.fuzzingbook.org/html/MutationFuzzer.html

Slide 17

Slide 17 text

17 def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene def triangle(a, b, c): __probe_enter() if a == b: __probe_1() if b == c: __probe_2() return Equilateral else: __probe_3() return Isosceles else: __probe_4() if b == c: __probe_5() return Isosceles else: __probe_6() if a == c: __probe_7() return Isosceles else: __probe_8() return Scalene def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 17 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

Slide 18

Slide 18 text

18 Feedback Driven Fuzzing triangle (1,1,1) def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 18 https://www.fuzzingbook.org/html/MutationFuzzer.html

Slide 19

Slide 19 text

19 triangle (1,1,1) def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 19 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

Slide 20

Slide 20 text

triangle (1,1,1) 20 triangle (1,1,2) def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage Mutated 20 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

Slide 21

Slide 21 text

21 triangle (1,1,3) def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage Mutated 21 https://www.fuzzingbook.org/html/MutationFuzzer.html Feedback Driven Fuzzing

Slide 22

Slide 22 text

Feedback Driven Fuzzing 22 triangle (1,1,2) def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 22 https://www.fuzzingbook.org/html/MutationFuzzer.html triangle (1,1,1) triangle (1,1,3)

Slide 23

Slide 23 text

Feedback Driven Fuzzing • Insert Instrumentation • Generate inputs • Collect execution feedback • Branches covered during execution • Slightly Mutate Input and try again Collect inputs obtaining new coverage 23 triangle (1,1,2) def triangle(a, b, c): if a == b: if b == c: return Equilateral else: return Isosceles else: if b == c: return Isosceles else: if a == c: return Isosceles else: return Scalene triangle (1,1,1) AFL 23

Slide 24

Slide 24 text

24 Feedback Driven Fuzzing Weakness: static int is_reserved_word_token(const char *s, int len) { const char *reserved[] = { "break", "case", "catch", "continue", "debugger", "default", "delete", "do", "else", "false", "finally", "for", "function", "if", "in", "instanceof", "new", "null", "return", "switch", "this", "throw", "true", "try", "typeof", "var", "void", "while", "with", "let", "undefined", ((void *)0)}; int i; if (!mjs_is_alpha(s[0])) return 0; for (i = 0; reserved[i] != ((void *)0); i++) { if (len == (int)strlen(reserved[i]) && strncmp(s, reserved[i], len) == 0) return i + 1; } return 0; } Tokens if (x > 100) { } coverage: 20% if (x > 100) { } e coverage: 5% if (x > 100) { } el coverage: 5% if (x > 100) { } els coverage: 5% if (x > 100) { } else coverage: 25% No smooth coverage gradient in parsers 24 Note: Constant unrolling (e.g. AFL) does not help in such lexical tokens

Slide 25

Slide 25 text

25 def json_raw(stm): while True: stm.skipspaces() c = stm.peek() if c == 't': return json_fixed(stm, 'true') elif c == 'f': return json_fixed(stm, 'false') elif c == 'n': return json_fixed(stm, 'null') elif c == '"': return json_string(stm) elif c == '{': return json_dict(stm) elif c == '[': return json_list(stm) elif c in NUMSTART: return json_number(stm) raise JSONError(E_MALF, stm, stm.pos) Weak points: • Need for smooth coverage gradient • Coverage only provides first level guidance 1. {"abc":[]} 2. [{"a":[]}, {"b":[]}, {"c":["ab","c"]}] 25 Feedback Driven Fuzzing

Slide 26

Slide 26 text

26 def json_raw(stm): while True: stm.skipspaces() c = stm.peek() if c == 't': return json_fixed(stm, 'true') elif c == 'f': return json_fixed(stm, 'false') elif c == 'n': return json_fixed(stm, 'null') elif c == '"': return json_string(stm) elif c == '{': return json_dict(stm) elif c == '[': return json_list(stm) elif c in NUMSTART: return json_number(stm) raise JSONError(E_MALF, stm, stm.pos) Weak points: • Need for smooth coverage gradient • Coverage only provides first level guidance 1. {"abc":[]} 2. [{"a":[]}, {"b":[]}, {"c":["ab","c"]}] 26 Feedback Driven Fuzzing

Slide 27

Slide 27 text

27 def json_raw(stm): while True: stm.skipspaces() c = stm.peek() if c == 't': return json_fixed(stm, 'true') elif c == 'f': return json_fixed(stm, 'false') elif c == 'n': return json_fixed(stm, 'null') elif c == '"': return json_string(stm) elif c == '{': return json_dict(stm) elif c == '[': return json_list(stm) elif c in NUMSTART: return json_number(stm) raise JSONError(E_MALF, stm, stm.pos) Weak points: • Need for smooth coverage gradient • Coverage only provides first level guidance 1. {"abc":[]} 2. [{"a":[]}, {"b":[]}, {"c":["ab","c"]}] 27 Feedback Driven Fuzzing

Slide 28

Slide 28 text

28

Slide 29

Slide 29 text

29 Fuzzing Parsers

Slide 30

Slide 30 text

30 1y78 NO ( NO (1+2) YES 21214/91*293 YES &12133 NO

Slide 31

Slide 31 text

31 Fuzzing Parsers $ ./fuzz [;x1-GPZ+wcckc];,N9J+?#6^6\e?]9lu
 2_%'4GX"0VUB[E/r ~fApu6b8<{%siq8Z
 h.6{V,hr?;{Ti.r3PIxMMMv6{xS^+'Hq!
 AxB"YXRS@!Kd6;wtAMefFWM(`|J_<1~o}
 z3K(CCzRH JIIvHz>_*.\>JrlU32~eGP?
 lR=bF3+;y$3lodQ&]BS6R&j?#tP7iaV}-}`\?[_[Z^LBM
 PG-FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2
 D|vBy!^zkhdf3C5PAkR?V((-%>

Slide 32

Slide 32 text

32 Leveraging Instrumentation Based Feedback def parse_num(s,i): n = '' while s[i:] and is_digit(s[i]): n += s[i] i = i +1 return i,n def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Exception(s, i) assert s[i] == ')' return i+1, v def parse_expr(s, i = 0): expr = [] is_op = True while s[i:]: c = s[i] if c in string.digits: if not is_op: raise Exception(s,i) i,num = parse_num(s,i) expr.append(num) is_op = False elif c in ['+', '-', '*', '/']: if is_op: raise Exception(s,i) expr.append(c) is_op = True i = i + 1 elif c == '(': if not is_op: raise Exception(s,i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': break else: raise Exception(s,i) if is_op: raise Exception(s,i) return i, expr Parser Syntax Error Interpreter #

Slide 33

Slide 33 text

http://localhost:8888/notebooks/x1_1_TrackingAccess.ipynb 33

Slide 34

Slide 34 text

34 Tracker for Access

Slide 35

Slide 35 text

35 AST Rewriting

Slide 36

Slide 36 text

36 Find The Last Compared Index

Slide 37

Slide 37 text

37 Speci fi cation Free Generator

Slide 38

Slide 38 text

38 Speci fi cation Free Generators A ( 2 - B 9 ) 4 ) A ∉ (,+,-,1,2,3,4,5,6,7,8,9,0 B ∉ +,-,1,2,3,4,5,6,7,8,9,0,) ) ∉ +,-,1,2,3,4,5,6,7,8,9,0 (2-94)

Slide 39

Slide 39 text

39

Slide 40

Slide 40 text

40 Limitation: Lack of control

Slide 41

Slide 41 text

41 Constraining the Input Domain with Grammars

Slide 42

Slide 42 text

42 Grammar

Slide 43

Slide 43 text

43 Formal Languages Formal Language Descriptions 3. Regular Context Free Recursively Enumerable (Chomsky,1956) Easy to produce and parse Argument Stack Return Stack

Slide 44

Slide 44 text

44 Grammar := := '+' | '-' | '/' | '*' | '(' ')' | := | '.' := | := [0-9] Arithmetic expression grammar De f inition for key

Slide 45

Slide 45 text

45 := := '+' | '-' | '/' | '*' | '(' ')' | := | '.' := | := [0-9] Grammar Arithmetic expression grammar Expansion Rule Terminal Symbol Nonterminal Symbol

Slide 46

Slide 46 text

http://localhost:8888/notebooks/x0_1_Grammars.ipynb 46

Slide 47

Slide 47 text

47 Grammars For Parsing (8 / 3) * 49 := := '+' | '-' | '/' | '*' | '(' ')' | := | '.' := | := [0-9]

Slide 48

Slide 48 text

http://localhost:8888/notebooks/x0_3_Parser.ipynb 48

Slide 49

Slide 49 text

49 Parsing is Surprisingly Integral to Fuzzing 1. Parsing is necessary for decomposing and recomposing inputs without affecting the validity of inputs constructed (8 / 3) * 49

Slide 50

Slide 50 text

50 Parsing is Surprisingly Integral to Fuzzing 1. Parsing is necessary for decomposing and recomposing inputs without affecting the validity of inputs constructed (8 / 3) * 49

Slide 51

Slide 51 text

51 Parsing is Surprisingly Integral to Fuzzing 1. Parsing is necessary for decomposing and recomposing inputs without affecting the validity of inputs constructed (2 + 1) * 49 1 2 + (8 / 3) * 49

Slide 52

Slide 52 text

52 Parsing is Surprisingly Integral to Fuzzing 2. Use parsers to mine the input distribution, and generate uncommon inputs

Slide 53

Slide 53 text

53 Parsing is Surprisingly Integral to Fuzzing 2. Use parsers to extract the complexity of an input or a set of inputs 18439249 (8/3)*49 https://rahul.gopinath.org/post/2024/03/23/k-paths-for-context-free-grammars/

Slide 54

Slide 54 text

54

Slide 55

Slide 55 text

55 Grammars 8.2 - 27 - -9 / +((+9 * --2 + --+-+- ((-1 * +(8 - 5 - 6)) * (-(a-+(((+(4) )))) - ++4) / +(-+---((5.6 - --(3 * -1.8 * +(6 * +-(((-(-6) * ---+6)) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) + 8.2 - 27 - -9 / +((+9 * --2 + --+-+-((-1 * + (8 - 5 - 6)) * (-(a-+(((+(4))))) - + +4) / +(-+---((5.6 - --(3 * -1.8 * + (6 * +-(((-(-6) * ---+6)) / +--(+-+- 7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(- -2 - -++-9.0)))) / 5 * --++090 + * - +5 + 7.513)))) - (+1 / ++((-84)))))) )) * 8.2 - 27 - -9 / +((+9 * --2 + - -+-+-((-1 * +(8 - 5 - 6)) * (-(a-+(( (+(4))))) - ++4) / +(-+---((5.6 - -- (3 * -1.8 * +(6 * +-(((-(-6) * ---+6 )) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6 .37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 := := '+' | '-' | '/' | '*' | '(' ')' | := | '.' := | := [0-9] For Fuzzing (Hanford 1970) (Purdom 1972)

Slide 56

Slide 56 text

56 Grammars As effective producers Interpreter Parser ✘ ✔ 8.2 - 27 - -9 / +((+9 * --2 + --+-+- ((-1 * +(8 - 5 - 6)) * (-(a-+(((+(4) )))) - ++4) / +(-+---((5.6 - --(3 * -1.8 * +(6 * +-(((-(-6) * ---+6)) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) + 8.2 - 27 - -9 / +((+9 * --2 + --+-+-((-1 * + (8 - 5 - 6)) * (-(a-+(((+(4))))) - + +4) / +(-+---((5.6 - --(3 * -1.8 * + (6 * +-(((-(-6) * ---+6)) / +--(+-+- 7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(- -2 - -++-9.0)))) / 5 * --++090 + * - +5 + 7.513)))) - (+1 / ++((-84)))))) )) * 8.2 - 27 - -9 / +((+9 * --2 + - -+-+-((-1 * +(8 - 5 - 6)) * (-(a-+(( (+(4))))) - ++4) / +(-+---((5.6 - -- (3 * -1.8 * +(6 * +-(((-(-6) * ---+6 )) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6 .37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 ++5 / +-(--2 - -++-9.0)))) / 5 * --++090

Slide 57

Slide 57 text

57 Grammars := := '+' | '-' | '/' | '*' | '(' ')' | := | '.' := | := [0-9] As efficient producers def start(): expr() def expr(): match (random() % 6): case 0: expr(); print('+'); expr() case 1: expr(); print('-'); expr() case 2: expr(); print('/'); expr() case 3: expr(); print('*'); expr() case 4: print('('); expr(); print(')') case 5: number() def number(): match (random() % 2): case 0: integer() case 1: integer(); print('.'); integer() def integer(): match (random() % 2): case 0: digit(); integer() case 1: digit() def digit(): match (random() % 10): case 0: print('0') case 1: print('1') case 2: print('2') case 3: print('3') case 4: print('4') case 5: print('5') case 6: print('6') case 7: print('7') Compiled Grammar (F1)

Slide 58

Slide 58 text

http://localhost:8888/notebooks/x0_2_GrammarFuzzer.ipynb 58

Slide 59

Slide 59 text

59 Where to Get the Input Grammar From?

Slide 60

Slide 60 text

60 https://www.fuzzingbook.org/html/GrammarMiner.html AUTOGRAM Where to Get the Grammar From? Hand-written parsers already encode the grammar

Slide 61

Slide 61 text

http://localhost:8888/notebooks/x2_0_MiningGrammar.ipynb 61

Slide 62

Slide 62 text

62 MIMID Where to Get the Grammar From? Hand-written parsers already encode the grammar

Slide 63

Slide 63 text

63 Where to Get the Grammar From? 1. Extract the input string accesses 2. Attach control fl ow information Hand-written parsers already encode the grammar Each control- fl ow structure gets wrapped in a context-manager -If conditionals -Loops -Subroutines

Slide 64

Slide 64 text

64 How to Extract This Grammar? • Inputs + control fl ow -> Dynamic Control Dependence Trees • DCD Trees -> Parse Tree

Slide 65

Slide 65 text

65 Control Dependence Graph Statement B is control dependent on A if A determines whether B executes. def parse_csv(s,i): while s[i:]: if is_digit(s[i]): n,j = num(s[i:]) i = i+j else: comma(s[i]) i += 1 CDG for parse_csv while: determines whether if: executes

Slide 66

Slide 66 text

66 def parse_csv(s,i): while s[i:]: if is_digit(s[i]): n,j = num(s[i:]) i = i+j else: comma(s[i]) i += 1 CDG for parse_csv Dynamic Control Dependence Tree Each statement execution is represented as a separate node DCD Tree for call parse_csv()

Slide 67

Slide 67 text

67 def parse_csv(s,i): while s[i:]: if is_digit(s[i]): n,j = num(s[i:]) i = i+j else: comma(s[i]) i += 1 '1' '2' ',' DCD Tree ~ Parse Tree •No tracking beyond input bu ff er •Characters are attached to nodes where they are accessed last "12," "12,"

Slide 68

Slide 68 text

68 •Characters are attached to nodes where they are accessed last

Slide 69

Slide 69 text

69

Slide 70

Slide 70 text

70 def is_digit(i): return i in '0123456789' def parse_num(s,i): n = '' while s[i:] and is_digit(s[i]): n += s[i] i = i +1 return i,n def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Ex(s, i) assert s[i] == ')' return i+1, v def parse_expr(s, i = 0): expr, is_op = [], True while s[i:]: c = s[i] if isdigit(c): if not is_op: raise Ex(s,i) i,num = parse_num(s,i) expr.append(num) is_op = False elif c in ['+', '-', '*', '/']: if is_op: raise Ex(s,i) expr.append(c) is_op, i = True, i + 1 elif c == '(': if not is_op: raise Ex(s,i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': break else: raise Ex(s,i) if is_op: raise Ex(s,i) return i, expr 9+3/4 Parse tree for parse_expr('9+3/4')

Slide 71

Slide 71 text

71

Slide 72

Slide 72 text

72 def is_digit(i): return i in '0123456789' def parse_num(s,i): n = '' while s[i:] and is_digit(s[i]): n += s[i] i = i +1 return i,n def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Ex(s, i) assert s[i] == ')' return i+1, v def parse_expr(s, i = 0): expr, is_op = [], True while s[i:]: c = s[i] if isdigit(c): if not is_op: raise Ex(s,i) i,num = parse_num(s,i) expr.append(num) is_op = False elif c in ['+', '-', '*', '/']: if is_op: raise Ex(s,i) expr.append(c) is_op, i = True, i + 1 elif c == '(': if not is_op: raise Ex(s,i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': break else: raise Ex(s,i) if is_op: raise Ex(s,i) return i, expr 9+3/4 Identifying Compatible Nodes Which nodes correspond to the same nonterminal

Slide 73

Slide 73 text

73 (9 + 1) * 3 3 * (9 + 1)

Slide 74

Slide 74 text

74 9 + 1 3 * (9 + 1)

Slide 75

Slide 75 text

75 3 (9 + 1) * 3 * (9 + 1)

Slide 76

Slide 76 text

76 3*(1) 1

Slide 77

Slide 77 text

77 3*(1) 1 := :=

Slide 78

Slide 78 text

:= | | | := := | := := '3' | '1' := '(' ')' := := '*' 78

Slide 79

Slide 79 text

:= := | := | | | := := | := := '3' | '1' := '(' ')' := := '*' 79 Generalizing Loops with Regular Inference

Slide 80

Slide 80 text

80

Slide 81

Slide 81 text

81 def is_digit(i): return i in '0123456789' def parse_num(s,i): n = '' while s[i:] and is_digit(s[i]): n += s[i] i = i +1 return i,n def parse_paren(s, i): assert s[i] == '(' i, v = parse_expr(s, i+1) if s[i:] == '': raise Ex(s, i) assert s[i] == ')' return i+1, v def parse_expr(s, i = 0): expr, is_op = [], True while s[i:]: c = s[i] if isdigit(c): if not is_op: raise Ex(s,i) i,num = parse_num(s,i) expr.append(num) is_op = False elif c in ['+', '-', '*', '/']: if is_op: raise Ex(s,i) expr.append(c) is_op, i = True, i + 1 elif c == '(': if not is_op: raise Ex(s,i) i, cexpr = parse_paren(s, i) expr.append(cexpr) is_op = False elif c == ')': break else: raise Ex(s,i) if is_op: raise Ex(s,i) return i, expr := := | := | := '(' ')' | := '*' | '+' | '-' | '/' := | : [0-9] calc.py Recovered Arithmetic Grammar

Slide 82

Slide 82 text

82 8.2 - 27 - -9 / +((+9 * --2 + --+-+- ((-1 * +(8 - 5 - 6)) * (-(a-+(((+(4) )))) - ++4) / +(-+---((5.6 - --(3 * -1.8 * +(6 * +-(((-(-6) * ---+6)) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) + 8.2 - 27 - -9 / +((+9 * --2 + --+-+-((-1 * + (8 - 5 - 6)) * (-(a-+(((+(4))))) - + +4) / +(-+---((5.6 - --(3 * -1.8 * + (6 * +-(((-(-6) * ---+6)) / +--(+-+- 7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6.37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(- -2 - -++-9.0)))) / 5 * --++090 + * - +5 + 7.513)))) - (+1 / ++((-84)))))) )) * 8.2 - 27 - -9 / +((+9 * --2 + - -+-+-((-1 * +(8 - 5 - 6)) * (-(a-+(( (+(4))))) - ++4) / +(-+---((5.6 - -- (3 * -1.8 * +(6 * +-(((-(-6) * ---+6 )) / +--(+-+-7 * (-0 * (+(((((2)) + 8 - 3 - ++9.0 + ---(--+7 / (1 / +++6 .37) + (1) / 482) / +++-+0)))) * -+5 + 7.513)))) - (+1 / ++((-84)))))))) * ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 ++5 / +-(--2 - -++-9.0)))) / 5 * --++090 := := | := | := '(' ')' | := '*' | '+' | '-' | '/' := | : [0-9]

Slide 83

Slide 83 text

83 ::= ::= '"' | '[' | '{' | | 'true' | 'false' | 'null' ::= + | + 'e' + ::= '+' | '-' | '.' | [0-9] | 'E' | 'e' ::= * '"' ::= ']' | (',')* ']' | ( ',' )+ (',' )* ']' ::= '}' | ( '"' ':' ',' )* '"' ':' '}' ::= ' ' | '!' | '#' | '$' | '%' | '&' | ''' | '*' | '+' | '-' | ',' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '@' | '[' | ']' | '^' | '_', ''',| '{' | '|' | '}' | '~' | '[A-Za-z0-9]' | '\' ::= '"' | '/' | 'b' | 'f' | 'n' | 'r' | 't' stm.next() if expect_key: raise JSONError(E_DKEY, stm, stm.pos) if c == '}': return result expect_key = 1 continue # parse out a key/value pair elif c == '"': key = _from_json_string(stm) stm.skipspaces() c = stm.next() if c != ':': raise JSONError(E_COLON, stm, stm.pos) stm.skipspaces() val = _from_json_raw(stm) result[key] = val expect_key = 0 continue raise JSONError(E_MALF, stm, stm.pos) def _from_json_raw(stm): while True: stm.skipspaces() c = stm.peek() if c == '"': return _from_json_string(stm) elif c == '{': return _from_json_dict(stm) elif c == '[': return _from_json_list(stm) elif c == 't': return _from_json_fixed(stm, 'true', True, E_BOOL) elif c == 'f': return _from_json_fixed(stm, 'false', False, E_BOOL) elif c == 'n': return _from_json_fixed(stm, 'null', None, E_NULL) elif c in NUMSTART: return _from_json_number(stm) raise JSONError(E_MALF, stm, stm.pos) def from_json(data): stm = JSONStream(data) return _from_json_raw(stm) microjson.py Recovered JSON grammar

Slide 84

Slide 84 text

84