$30 off During Our Annual Pro Sale. View Details »

How to Compare Fuzzers

How to Compare Fuzzers

SAPLING'24

Rahul Gopinath

November 22, 2024
Tweet

More Decks by Rahul Gopinath

Other Decks in Research

Transcript

  1. Bugs ... How Do We Remove Them? Report all violations

    All reported violations are true Guarantees of resulting artefacts False Alarms Dynamic Analysis Software Testing Static Analysis Type Checking Missing cases
  2. 6 Input ✓ ✘ @app.route('/admin') def admin(): username = request.cookies.get("username")

    if not username: return {"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 Software Testing -- Key to Bug Removal
  3. struct { type: error; payload_length: 4; payload[HELO]; } HeartbeatMessage; struct

    { type: response; payload_length: 64k; payload[HELO..dafdf..maybe...a..secret key...in...the...memory...or...other.. sensitive...information...that...the.. attacker..is..interested...in...afdfdf adfdfdf..and..other..random..junk]; } HeartbeatMessage; 7 Input ✓ ✘ Testing struct { type: request; payload_length: 64k; payload[HELO]; } HeartbeatMessage; /* Allocate memory for the response, size is 1 byte * message type, plus 2 bytes payload length, plus * payload, plus padding */ buffer = OPENSSL_malloc(1 + 2 + payload + padding); bp = buffer; /* Enter response type, length and copy payload */ *bp++ = TLS1_HB_RESPONSE; s2n(payload, bp); memcpy(bp, pl, payload); /* Random padding */ RAND_pseudo_bytes(p, padding); r = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);
  4. @app.route('/admin') def admin(): username = request.cookies.get("username") if not username: return

    {"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 [ ; x 1 - G P Z + w c c k c ] ; , N 9 J + ? # 6 ^ 6 \ e ? ] 9 l u 2 _ % ' 4 G X " 0 V U B [ E / r ~ f A p u 6 b 8 < { % s i q 8 Z h . 6 { V , h r ? ; {Ti.r3PIxMMMv6{xS^+'Hq!AxB"YXRS@! Kd6;wtAMefFWM(`|J_<1~o}z3K(CCzRH J I I v H z > _ * . \ > J r l U 3 2 ~ e G P ? lR=bF3+;y$3lodQ<B89!5"W2fK*vE7v{')KC- i,c{<[~m!]o;{.'}Gj\(X}EtYetrpbY@aGZ1{P! A Z U 7 x # 4 ( R t n ! q 4 n C w q o l ^ y 6 } 0 | Ko=*JK~;zMKV=9Nai:wxu{J&UV#HaU)*Bi C < ) , ` + t * g k a < W = Z . % T 5 W G H Z p I 3 0 D < P q > & ] B S 6 R & j ? # t P 7 i a V } - } ` \ ? [ _ [ Z ^ L B M P G - FKj'\xwuZ1=Q`^`5,$N$Q@[!CuRzJ2D|vBy! ^ z k h d f 3 C 5 P A k R ? V ( ( - % > < h n | 3='i2Qx]D$qs4O`1@fevnG'2\11Vf3piU37@ 5 : d f d 4 5 * ( 7 ^ % 5 a p \ z I y l " ' f , $ee,J4Gw:cgNKLie3nx9(`efSlg6#[K"@Wjh Z}r[Scun&sBCS,T[/3]KAeEnQ7lU)3Pn,0)G/ 6N-wyzj/MTd#A;r Program Fuzzing 8 https://www.fuzzingbook.org/html/Fuzzer.html
  5. @app.route('/admin') def admin(): username = request.cookies.get("username") if not username: return

    {"Error": "Specify username in Cookie"} username = urllib.quote(os.path.basename(username)) url = "http://permissions:5000/permissions/{}".format(username) resp = requests.request(method="GET", url=url) # "superadmin\ud888" will be simpli fi ed to "superadmin" ret = ujson.loads(resp.text) if resp.status_code == 200: if "superadmin" in ret["roles"]: return {"OK": "Superadmin Access granted"} else: e = u"Access denied. User has following roles: {}".format(ret["roles"]) return {"Error": e}, 401 else:return {"Error": ret["Error"]}, 500 Fuzzing Trash deck technique: 1950s - Gerald Weinberg Crash? 9
  6. 11 Which Fuzzer Should We Use? Exponential growth in fuzzing

    literature Cumulative publications in fuzzing Cumulative published vulnerabilities
  7. 12 Which Evaluation Metric Should I Use? • Comparable •

    Ground truth • Unbiased • Budget friendly
  8. 13 New CVEs Found? CVE, short for Common Vulnerabilities and

    Exposures, is a list of publicly disclosed computer security fl aws.
  9. 14 • Comparable • Ground truth • Unbiased • Budget

    friendly Which Fuzzer Should I Use? New CVEs ? ?
  10. 16 • Gets saturated quickly (Signal is lost) • Is

    reachability sufficient? • Limited evaluation of complex input conditions involved in expressing bugs • Are more complex coverage techniques needed? Coverage?
  11. 17 • Comparable • Ground truth • Unbiased • Budget

    friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage
  12. 19 Synthetic Bugs? LAVA: Large-scale Automated Vulnerability Addition • What

    kind of bugs should be seeded? • What about unknown kinds of bugs? • Where should these bugs be seeded?
  13. 20 • Comparable • Ground truth • Unbiased • Budget

    friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ?
  14. 23 Known Bug Benchmarks? • Gets outdated quickly (overfitting by

    fuzzers) • Significant effort in creating and maintaining • Biased as to where and what kind of bugs are present "When a measure becomes a target it ceases to be a good measure" Goodhart's law
  15. 24 • Comparable • Ground truth • Unbiased • Budget

    friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ?
  16. 26 New Bugs in Existing Benchmarks • No ground truth

    • Bug distribution is dependent on external factors • Can lead researchers to postpone publication of vulnerabilities • Feedback can't be used to decide budgeting.
  17. 27 • Comparable • Ground truth • Unbiased • Budget

    friendly Which Fuzzer Should I Use? New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ? New Bugs in old Benchm arks ? Can
  18. 28

  19. 30 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN)

    { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? IDEA: Induce a program variation with each valid token replacement
  20. 31 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN

    + 1) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? Fixes for independent bugs are almost always simple. Finite syntactic size for faults
 (aka. competent programmer hypothesis): Gopinath, Jensen, and Groce Mutations: How Close are they to Real Faults? 2014 ISSRE
  21. 32 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN)

    { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (1) IDEA: Induce a program variation with each valid token replacement
  22. 33 unsigned int len = message_length(msg); if (len < >=

    MAX_BUF_LEN) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (2) IDEA: Induce a program variation with each valid token replacement
  23. 34 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN

    + 1) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? (3) IDEA: Induce a program variation with each valid token replacement
  24. 35 unsigned int len = message_length(msg); if (len < MAX_BUF_LEN

    + 16) { copy_message(msg); } else { // Invalid length, handle error } What Is Mutation Testing? Complex bugs are almost always coupled to simpler bugs. Finite semantic depth for failures
 aka. Coupling e ff ect hypothesis: Gopinath, Jensen, and Groce Mutations: How Close are they to Real Faults? 2017 ICST
  25. 37 Mutation Testing Process m4 m3 m2 m1 I1 I2

    I4 I3 I5 Generate Mutants Generate Fuzz Inputs > Detect Differences from Original > Mutation Score = Detected Mutants / Generated Mutants Original
  26. 38 M mutants (1 input) Number of Mutants Executions Total

    Campaign Effort for mutation testing = MxN program executions Mutation Testing Challenge N inputs (1 mutant) Number of Inputs Executions
  27. 39 What Is The Problem? Computation: Fuzzing -- More executions

    the better Mutation testing -- Execute each input per mutation Solution: 1) Perform coverage analysis fi rst; remove trivial mutants 2) Evaluate independent (non-interacting) mutations simultaneously 3) Remove redundancy in executions
  28. https://rahul.gopinath.org Traditional execution 40 m1 m2 m3 Setup for T1

    Setup for T2 Actual tests Mutants are executed in parallel But a majority of time spent in initialization (an average 7 times the test execution time) (Bell 2014)
  29. https://rahul.gopinath.org Split-Stream Execution 41 Setup for T1 Setup for T2

    Actual tests Execute tests in parallel Fork off mutants as they are encountered T1 T2 Gopinath, Jensen, Groce “Topsy Turvy: A faster and smarter algorithm” ICSE 2016
  30. 43 • Comparable • Ground truth • Unbiased • Budget

    friendly Mutation Testing New CVEs ? ? Coverage Synthetic (LAVA) ? ? Benchm arks (M AGM A) ? New Bugs in old Benchm arks ? M utation Testing ?
  31. 44

  32. 45 Seeded Fault Benchmarks • Easy to fine-tune a fuzzer

    to overfit • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  33. 46 Mutation Analysis • Easy to fine-tune a fuzzer to

    overfit • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  34. 47 Mutation Analysis • Easy to fine-tune a fuzzer to

    overfit Very large number of faults • Faults are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  35. 48 Mutation Analysis • Not easy to fine-tune • Faults

    are rarely similar to real faults • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  36. 49 Mutation Analysis • Not easy to fine-tune • Faults

    are rarely similar to real faults Evidence that mutants are similar to real faults. • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  37. 50 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • Based on bugs we know about! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  38. 51 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • Based on bugs we know about! All possible faults, including unknown ones! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  39. 52 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • All possible faults, including unknown ones! • Human bias in bug curation • Limited supply • Bug interactions requiring deduplication
  40. 53 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • All possible faults, including unknown ones! • Human bias in bug curation No human bias in introduced faults! • Limited supply • Bug interactions requiring deduplication
  41. 54 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • Limited supply • Bug interactions requiring deduplication
  42. 55 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • Limited supply As many as required! Including higher order ones! • Bug interactions requiring deduplication
  43. 56 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • Bug interactions requiring deduplication
  44. 57 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • Bug interactions requiring deduplication All mutants are evaluated indepdently
  45. 58 Mutation Analysis • Not easy to fine-tune • Evidence

    that mutants are similar to real faults. • All possible faults, including unknown ones! • No human bias in introduced faults! • As many as required! Including higher order ones! • All mutants are evaluated indepdently
  46. 59

  47. 60

  48. 61