Is Program Analysis The Silver Bullet Against Software Bugs? by Karim Ali

Karim Ali University of Alberta @karimhamdanali Is Program Analysis The
Silver Bullet Against Software Bugs? Papers We Love Conference — 2019

@karimhamdanali Software Bugs !2

@karimhamdanali Software Bugs Invalid SSL/TLS connections earned Apple Most Epic
Fail [Pwnie ’14] !3 goto fail; goto fail; Source: CVE-2014-1266

@karimhamdanali © Copyright 2014, Philip Koopman. CC Attribution 4.0 International
license. 5 http://www.cbsnews.com/news/toyota-unintended-acceleration-has-killed-89/ May 25, 2010 Software Bugs Errors in ABS software led to fatal accidents and cost Toyota $3 Billion !4 Source: Philip Koopman, CMU

@karimhamdanali Software Bugs Unencrypted, unauthenticated connections to some medical implants
!5 Source: Department of Homeland Security

@karimhamdanali Program Analysis !6 goto fail; goto fail; © Copyright
2014, Philip Koopman. CC Attribution 4.0 International license. 5 http://www.cbsnews.com/news/toyota-unintended-acceleration-has-killed-89/ May 25, 2010

@karimhamdanali What is Program Analysis? !7

@karimhamdanali Program Analysis !8 A way of reasoning about the
runtime behaviour of a program without necessarily executing it

@karimhamdanali Rice’s Theorem “For any interesting property Pr of the
behaviour of a program, it is impossible to write an analysis that can decide for every program p whether Pr holds for p.” !9 Image: CooperToons

@karimhamdanali By deﬁnition, program analysis is undecidable !10

@karimhamdanali Not quite… !11 Image: J. K. Simmons / Whiplash

@karimhamdanali Program Analysis •Settle for an approximation of Pr •Make
it as “good” as possible p analysis yes p analysis no !12 few Image: Jenna Mullins / ENews

@karimhamdanali Program Analysis !13 Code Navigation Code Recommenders Code Refactoring
Constant Propagation Dead Code Elimination Static Inlining Parallelization

@karimhamdanali Program Analysis in Practice !14 Image: Minion Special /
YouTube

@karimhamdanali Program Analysis in Practice !15 Scalability Usability Precision

@karimhamdanali Collaborators •Erick Ochoa (UAlberta) •Spencer Killen (UAlberta) •Kristen Newbury
(UAlberta) •Revan MacQueen (UAlberta) •Daniil Tiganov (UAlberta) •Jeff Cho (UAlberta) •Johannes Späth (Paderborn) •Lisa Nguyen (Paderborn) •Stefan Krüger (Paderborn) •Ondřej Lhoták (Waterloo) •Frank Tip (Northeastern) •Eric Bodden (Paderborn & Fraunhofer IEM) •Mira Mezini (TU Darmstadt) •Julian Dolby (IBM Research) •Andrew Craik (IBM) •Mark Stoodley (IBM) •Vijay Sundaresan (IBM) •Ben Livshits (Imperial College London & Brave) •Emerson Murphy-Hill (Google) •Justin Smith (Lafayette College) •José Nelson Amaral (UAlberta) •James Wright (UAlberta) •Kirsten Thommes (Paderborn) •René Fahr (Paderborn) !16

@karimhamdanali Collaborators •Erick Ochoa (UAlberta) •Spencer Killen (UAlberta) •Kristen Newbury
(UAlberta) •Revan MacQueen (UAlberta) •Daniil Tiganov (UAlberta) •Jeff Cho (UAlberta) •Johannes Späth (Paderborn) •Lisa Nguyen (Paderborn) •Stefan Krüger (Paderborn) •Ondřej Lhoták (Waterloo) •Frank Tip (Northeastern) •Eric Bodden (Paderborn & Fraunhofer IEM) •Mira Mezini (TU Darmstadt) •Julian Dolby (IBM Research) •Andrew Craik (IBM) •Mark Stoodley (IBM) •Vijay Sundaresan (IBM) •Ben Livshits (Imperial College London & Brave) •Emerson Murphy-Hill (Google) •Justin Smith (Lafayette College) •José Nelson Amaral (UAlberta) •James Wright (UAlberta) •Kirsten Thommes (Paderborn) •René Fahr (Paderborn) !17

@karimhamdanali 2010 !18

@karimhamdanali 2010 !19

@karimhamdanali 2010 !20 Where do I begin?

@karimhamdanali 2010 !21 Where do I begin? Start with this
paper!

@karimhamdanali … so what is a Call Graph? !24

@karimhamdanali Call Graph !25

@karimhamdanali Call Graph !26 class Circle extends Shape { void
draw() { ... } } class Square extends Shape { void draw() { ... } } Shape s; if(*) s = new Circle(); else s = new Square(); s.draw();

@karimhamdanali Call Graph !27 class Circle extends Shape { void
draw() { ... } } class Square extends Shape { void draw() { ... } } Shape s; if(*) s = new Circle(); else s = new Square(); s.draw(); required by every inter-procedural analysis

@karimhamdanali Let’s build a Call Graph !28 public class Main
{ public static void main(String[] args) { Shape s; if (args.length > 2) s = new Circle(); else s = new Square(); s.draw(); } } abstract class Shape { abstract void draw(); } class Circle extends Shape { void draw() { ... } } class Square extends Shape { void draw() { ... } }

{ public static void main(String[] args) { Shape s; if (args.length > 2) s = new Circle(); else s = new Square(); s.draw(); } } abstract class Shape { abstract void draw(); } class Circle extends Shape { void draw() { ... } } class Square extends Shape { void draw() { ... } } Main.main()

{ public static void main(String[] args) { Shape s; if (args.length > 2) s = new Circle(); else s = new Square(); s.draw(); } } abstract class Shape { abstract void draw(); } class Circle extends Shape { void draw() { ... } } class Square extends Shape { void draw() { ... } } Main.main() Circle.<init>()

{ public static void main(String[] args) { Shape s; if (args.length > 2) s = new Circle(); else s = new Square(); s.draw(); } } abstract class Shape { abstract void draw(); } class Circle extends Shape { void draw() { ... } } class Square extends Shape { void draw() { ... } } Main.main() Shape.<init>() Circle.<init>() Object.<init>()

{ public static void main(String[] args) { Shape s; if (args.length > 2) s = new Circle(); else s = new Square(); s.draw(); } } abstract class Shape { abstract void draw(); } class Circle extends Shape { void draw() { ... } } class Square extends Shape { void draw() { ... } } Main.main() Shape.<init>() Square.<init>() Circle.<init>() Object.<init>()

{ public static void main(String[] args) { Shape s; if (args.length > 2) s = new Circle(); else s = new Square(); s.draw(); } } abstract class Shape { abstract void draw(); } class Circle extends Shape { void draw() { ... } } class Square extends Shape { void draw() { ... } } Main.main() Shape.<init>() Square.<init>() Circle.<init>() Square.draw() Circle.draw() Object.<init>()

@karimhamdanali Let’s build a Call Graph for javac !35

@karimhamdanali Let’s build a Call Graph for javac !36 •
Java 1.4 • 0.5 MB of class ﬁles • 8 GB of RAM • HOURS! IRIS Reasoner

@karimhamdanali Let’s build a Call Graph for javac !37 •
Java 1.4 • 0.5 MB of class ﬁles • 8 GB of RAM • HOURS! IRIS Reasoner Exception in thread “main" java.lang.OutOfMemoryError: Java heap space

@karimhamdanali Let’s build a Call Graph for "Hello, World!" !38

@karimhamdanali !39 public class HelloWorld { public static void main(String[]
args) { System.out.println("Hello, World!"); } }

@karimhamdanali !40 public class HelloWorld { public static void main(String[]
args) { System.out.println("Hello, World!"); } } • > 30 seconds • > 5,000 reachable methods • > 23,000 call edges

@karimhamdanali Hello, World! !41

@karimhamdanali !42

@karimhamdanali Alone? !44

@karimhamdanali Not Alone! !45 I'd like to ignore library code
what about callbacks? this would be unsound but better than nothing ignore non-application program elements (e.g., system libraries)? whole-program analysis always pulls in the world for completeness. The problem is that the world is fairly large I am NOT interested in those

@karimhamdanali Partial-Program Analysis !46

@karimhamdanali Sound and Precise Partial-Program Analysis !47

@karimhamdanali !48

@karimhamdanali !49 Ideal Call Graph Image: CooperToons

@karimhamdanali !50 Ideal Call Graph Whole-Program Call Graph

@karimhamdanali !51 Ideal Call Graph Whole-Program Call Graph Incomplete Call
Graph (unsound)

Graph (unsound) Conservative Call Graph (highly imprecise)

Graph (unsound) Conservative Call Graph (highly imprecise) Partial-Program Call Graph

@karimhamdanali The Separate Compilation Assumption !54 Source: Ali and Lhoták.
Application-Only Call Graph Construction. [ECOOP '12]

@karimhamdanali The Separate Compilation Assumption All of the library classes
can be compiled in the absence of the application classes. !55

@karimhamdanali Constraints 1. Class Hierarchy 2. Class Instantiation 3. Local
Variables 4. Method Calls !56 5. Field Access 6. Array Access 7. Static Initialization 8. Exception Handling

@karimhamdanali Constraints 1. Class Hierarchy 2. Class Instantiation 3. Local
Variables 4. Method Calls !57 5. Field Access 6. Array Access 7. Static Initialization 8. Exception Handling

@karimhamdanali Library Points-to Set (LPT) !58 Application Library pt(v1) =
o1 o3 pt(v2) = o2 o3 pt(v3) = o1 o4 LPT = o1 o2 o3 o5

@karimhamdanali Library Callbacks !59 Application Library class C { m();
} class B extends L { m(); } class A extends L { m(); } calls class L { m(); } 1 LPT = A C 2

@karimhamdanali !60 Source: Ali and Lhoták. Averroes: Whole-Program Analysis Without
The Whole Program. [ECOOP '13]

@karimhamdanali JAR Placeholder Library SCA JAR !61

@karimhamdanali Evaluation !62 600× smaller library 7× faster analysis 6×
less memory Precise & Sound

@karimhamdanali !63

@karimhamdanali !64 Application Library Scalability

@karimhamdanali Program Analysis in Practice !65 Precision

@karimhamdanali Program Analysis in Practice !66 Scalability Precision

@karimhamdanali Security-Related Static Analyses !67

@karimhamdanali Security-Related Static Analyses !68 public void main(String[] args) {
Object x = null; Object y = x; y.toString(); } Null-Pointer Analysis

String x = args[0]; String y = x; SQL.execute(''SELECT * FROM User where userId='' + y ); } Taint Analysis

File x = new File(); File y = x; y.close(); } Typestate Analysis

@karimhamdanali Static Data-Flow Analysis !71

@karimhamdanali Precise Static Data-Flow Analysis !72

@karimhamdanali Precise Static Data-Flow Analysis !73 public void main(String[] args)
{ File x = new File(); this.z = x; foo(x); x.close(); foo(x); } public void foo(File y){ y.write(...); } public void foo(){ this.a.write(...); }

{ File x = new File(); this.z = x; foo(x); x.close(); foo(x); } public void foo(File y){ y.write(...); } public void foo(){ this.a.write(...); } Context-Sensitive

{ File x = new File(); this.z = x; foo(x); x.close(); foo(x); } public void foo(File y){ y.write(...); } public void foo(){ this.a.write(...); } Field-Sensitive

@karimhamdanali Precise Static Data-Flow Analysis !76 x z y Pushdown
Automaton main() foo(x) bar(y) foo(z) Stack of Calls f h g f Stack of Fields Context-Sensitive ∧ Field-Sensitive

Automaton main() foo(x) bar(y) foo(z) Stack of Calls f h g f Stack of Fields Undecidable Reps [TOPLAS 2000] Source: Thomas W. Reps. Undecidability of Context-Sensitive Data-Dependence Analysis. [TOPLAS '00] Context-Sensitive ∧ Field-Sensitive

Automaton main() foo(x) bar(y) foo(z) Stack of Calls f h g f Stack of Fields Context-Sensitive ∧ Field-Sensitive

@karimhamdanali Precise Static Data-Flow Analysis !79 x z Pushdown Automaton
main() foo(x) bar(y) foo(z) Stack of Calls k-limitting Access Paths/Graphs y.f y.g y.f.h y.f.g Context-Sensitive ∧ Field-Sensitive

@karimhamdanali Precise Static Data-Flow Analysis !80 x z Pushdown Automaton
main() foo(x) bar(y) foo(z) Stack of Calls k-limitting Access Paths/Graphs y.f y.g y.f.h y.f.g Context-Sensitive ∧ Field-Sensitive What’s a good value for k? k-limitting yields too many false positives

@karimhamdanali Synchronized Pushdown Systems (SPDS) !81 Source: Späeth et al.
Context-, Flow-, and Field-Sensitive Data-Flow Analysis using Synchronized Pushdown Systems. [POPL '19]

@karimhamdanali Synchronized Pushdown Systems !82 Context-Sensitive ∧ Field-Sensitive

@karimhamdanali Synchronized Pushdown Systems !83 Context-Sensitive Field-Sensitive Context-Sensitive ∧ Field-Sensitive
∧ ⊑ over-approximation Never encountered in practice

@karimhamdanali Synchronized Pushdown Systems !84 Context-Sensitive Field-Sensitive ∧ Pushdown System
of Calls x z y main() foo(x) bar(y) foo(z) Stack of Calls Variables f h g f Stack of Fields Pushdown System of Fields x z y Variables

of Calls x z y main() foo(x) bar(y) foo(z) Stack of Calls Variables f h g f Stack of Fields Pushdown System of Fields x z y Variables Decidable

of Calls x z y main() foo(x) bar(y) foo(z) Stack of Calls Variables f h g f Stack of Fields Pushdown System of Fields x z y Variables Decidable No k-limitting

@karimhamdanali SPDS Evaluation !87

@karimhamdanali SPDS Evaluation !88 Analysis Time (seconds) 0 5 10
15 20 25 30 35 40 45 50 Number of Field Accesses 2 4 6 8 10 12 14 16 18 Access Path (k=4) Access Path (k=3) Access Path (k=2) Access Path (k=1) SPDS Eclipse

@karimhamdanali … but is it useful in practice? !89

@karimhamdanali CogniCrypt.org Eclipse Foundation !90

@karimhamdanali 68% are insecure (Maven has > 2.7 million artifacts)
!91

@karimhamdanali 95% are insecure (10,000 most recent Android apps on
AndroZoo) !92

@karimhamdanali Symantec CVE-2018-12240 !93

@karimhamdanali !94 Precision SPDS

@karimhamdanali Program Analysis in Practice !95 Usability

@karimhamdanali !98

@karimhamdanali 99 precise responsive seamless tailored Sources: Johnson et al.
Why Don’t Software Developers Use Static Analysis Tools to Find Bugs? [ICSE '13] Sources: Xiao et al. Social Inﬂuences on Secure Development Tool Adoption: Why Security Tools Spread. [CSCW '14] Sources: Smith et al. Questions Developers Ask While Diagnosing Potential Security Vulnerabilities with Static Analysis. [FSE '15]

@karimhamdanali Just-In-Time Static Analysis !100

@karimhamdanali Just-In-Time Static Analysis (Cheetah) !101 https://github.com/secure-software-engineering/cheetah Developers fix errors
2x faster

@karimhamdanali !102 Usability

@karimhamdanali !103 Scalability Usability Precision

@karimhamdanali Where do we go from here? !104 Image: Boomerang
Toons / GIPHY

@karimhamdanali Swift Analysis Framework !105 themaplelab/swan

@karimhamdanali @karimhamdanali Analysis-Driven Inliner Discriminants Budget Algorithm Search Space Call
Frequency Method Size Method Size Nested Knapsack All IDT Methods Post-Inlining Transformations !59 Estimate Post-Inlining Transformations !106 themaplelab/openj9

@karimhamdanali • Understanding the internals of neural networks is limited
due to their complexity • Fixing errors in neural networks without retraining is hard and currently not supported • We use Rosette to solve for changes in weights to a neural network • Rosette is able to represent neural networks and their results as symbolic values, which can then be solved for, under the assertion that a given data point is correct OVERVIEW Rosette Objective Adjust n weights We use rosette to solve for changes in network weights subject to the following objective To maximize Weight Selection EVALUATION WEIGHT SELECTION METHOD TRAINING SOLVING Training Solving Evaluation #lang rosette (define-symbolic x integer?) (assert (> x3)) (define solution (solve x)) > (evaluate x solution) 4 Fixing Neural Networks using Solver-Aided Languages !107 coming soon...

@karimhamdanali Google Android Mobile Security !108 Image: Android Developers Blog

@karimhamdanali Facebook Infer, Zoncolan, SapFix !109 Source: Distefano et al.
Scaling Satic Analsyes at Facebook. [CACM '19]

@karimhamdanali Semmle Continuous Security Analysis !110 Image: LGTM.com

@karimhamdanali Future of Program Analysis !111 • Understanding the internals
of neural networks is limited due to their complexity • Fixing errors in neural networks without retraining is hard and currently not supported • We use Rosette to solve for changes in weights to a neural network • Rosette is able to represent neural networks and their results as symbolic values, which can then be solved for, under the assertion that a given data point is correct OVERVIEW Rosette Objective Adjust n weights We use rosette to solve for changes in network weights subject to the following objective To maximize Weight Selection • We evaluate network performance before and after solving • Network with 784 input nodes, 300 hidden, and 10 output nodes • On average, after making changes, 99.85% of testing points remain correctly classiﬁed EVALUATION WEIGHT SELECTION METHOD TRAINING SOLVING Training Effect of Number of Symbolic Weights on Runtime Solving Evaluation #lang rosette (define-symbolic x integer?) (assert (> x3)) (define solution (solve x)) > (evaluate x solution) 4 @karimhamdanali Discriminants Budget Algorithm Search Space Call Frequency Method Size Method Size Nested Knapsack All IDT Methods Post-Inlining Transformations !59 Extra Images: SIGPLAN Blog

2014, Philip Koopman. CC Attribution 4.0 International license. 5 http://www.cbsnews.com/news/toyota-unintended-acceleration-has-killed-89/ May 25, 2010

2014, Philip Koopman. CC Attribution 4.0 International license. 5 http://www.cbsnews.com/news/toyota-unintended-acceleration-has-killed-89/ May 25, 2010 @karimhamdanali Program Analysis !13 Code Navigation Code Recommenders Code Refactoring Constant Propagation Dead Code Elimination Static Inlining Parallelization

2014, Philip Koopman. CC Attribution 4.0 International license. 5 http://www.cbsnews.com/news/toyota-unintended-acceleration-has-killed-89/ May 25, 2010 @karimhamdanali Program Analysis !13 Code Navigation Code Recommenders Code Refactoring Constant Propagation Dead Code Elimination Static Inlining Parallelization @karimhamdanali !103 Scalability Usability Precision

2014, Philip Koopman. CC Attribution 4.0 International license. 5 http://www.cbsnews.com/news/toyota-unintended-acceleration-has-killed-89/ May 25, 2010 @karimhamdanali Program Analysis !13 Code Navigation Code Recommenders Code Refactoring Constant Propagation Dead Code Elimination Static Inlining Parallelization @karimhamdanali !103 Scalability Usability Precision @karimhamdanali Future of Program Analysis !111 Fixing Neural Networks with Solver-Aided Languages Revan MacQueen1, Julian Dolby2, Karim Ali1 1UNIVERSITY OF ALBERTA, 2IBM RESEARCH https://github.com/themaplelab/ML-SE • Understanding the internals of neural networks is limited due to their complexity • Fixing errors in neural networks without retraining is hard and currently not supported • We use Rosette to solve for changes in weights to a neural network • Rosette is able to represent neural networks and their results as symbolic values, which can then be solved for, under the assertion that a given data point is correct OVERVIEW Rosette Objective Adjust n weights We use rosette to solve for changes in network weights subject to the following objective To maximize Weight Selection • We evaluate network performance before and after solving • Network with 784 input nodes, 300 hidden, and 10 output nodes • On average, after making changes, 99.85% of testing points remain correctly classiﬁed EVALUATION WEIGHT SELECTION METHOD TRAINING SOLVING Training Effect of Number of Symbolic Weights on Runtime Solving Evaluation #lang rosette (define-symbolic x integer?) (assert (> x3)) (define solution (solve x)) > (evaluate x solution) 4 @karimhamdanali Analysis-Driven Inliner Discriminants Budget Algorithm Search Space Call Frequency Method Size Method Size Nested Knapsack All IDT Methods Post-Inlining Transformations !59 Extra Images: SIGPLAN Blog

Karim Ali University of Alberta @karimhamdanali Is Program Analysis The
Silver Bullet Against Software Bugs? @karimhamdanali Program Analysis !6 goto fail; goto fail; © Copyright 2014, Philip Koopman. CC Attribution 4.0 International license. 5 http://www.cbsnews.com/news/toyota-unintended-acceleration-has-killed-89/ May 25, 2010 @karimhamdanali Program Analysis !13 Code Navigation Code Recommenders Code Refactoring Constant Propagation Dead Code Elimination Static Inlining Parallelization @karimhamdanali !103 Scalability Usability Precision @karimhamdanali Future of Program Analysis !111 Fixing Neural Networks with Solver-Aided Languages Revan MacQueen1, Julian Dolby2, Karim Ali1 1UNIVERSITY OF ALBERTA, 2IBM RESEARCH https://github.com/themaplelab/ML-SE • Understanding the internals of neural networks is limited due to their complexity • Fixing errors in neural networks without retraining is hard and currently not supported • We use Rosette to solve for changes in weights to a neural network • Rosette is able to represent neural networks and their results as symbolic values, which can then be solved for, under the assertion that a given data point is correct OVERVIEW Rosette Objective Adjust n weights We use rosette to solve for changes in network weights subject to the following objective To maximize Weight Selection • We evaluate network performance before and after solving • Network with 784 input nodes, 300 hidden, and 10 output nodes • On average, after making changes, 99.85% of testing points remain correctly classiﬁed EVALUATION WEIGHT SELECTION METHOD TRAINING SOLVING Training Effect of Number of Symbolic Weights on Runtime Solving Evaluation #lang rosette (define-symbolic x integer?) (assert (> x3)) (define solution (solve x)) > (evaluate x solution) 4 @karimhamdanali Analysis-Driven Inliner Discriminants Budget Algorithm Search Space Call Frequency Method Size Method Size Nested Knapsack All IDT Methods Post-Inlining Transformations !59 Extra Images: SIGPLAN Blog

Is Program Analysis The Silver Bullet Against S...

Is Program Analysis The Silver Bullet Against Software Bugs? by Karim Ali

More Decks by Papers_We_Love

Other Decks in Programming

Featured

Transcript