Slide 1

Slide 1 text

Controlling your Angr Techniques for improving Symbolic Execution with the Angr framework Keith Makan

Slide 2

Slide 2 text

Who am i ● Keith Makan, BSc Computer Science ● Two time non-award winning author: ○ Android Security Cookbook - https://www.amazon.com/Android-Security-Cookbook-Keith-Makan/dp/1782167161 ○ Penetration Testing with the Bash Shell - https://www.amazon.com/Penetration-Testing-shell-Keith-Makan/dp/1849695105/ref=sr_1_1 ● Security Consultant ○ Secure Code Review ○ Penetration Testing (WebApps, Android Apps, Cloud, Infrastructure) ● Currently pursuing my Masters focused on Symbolic Execution

Slide 3

Slide 3 text

Agenda ● Talk about what Symbolic Execution is, why it’s cool ● Talk about Angr, what its made, of how it works ● Walk through simple example with Angr ● Talk about state exploration algorithms ● Developing your own state exploration

Slide 4

Slide 4 text

What is symbolic execution? Constraint Solving Static Analysis you An program

Slide 5

Slide 5 text

What is symbolic execution? ● Not emulation/virtualization, not direct execution! ● Constraint Solving + ASTs + Static Analysis: ○ Machines started becoming powerful enough to handle constraint solving. ○ Folks in various disciplines including Electrical Engineering, Computer Science etc jumped at the problem. ○ Hackers realized that constraint solving can be used to automate reasoning about security properties. ● Automate test case generation ● Provide formal, demonstrative proof of program behaviour ● In simple terms: ○ We rewrite the program to show which states it will assume under given input. ○ Pass statements to a constraint solver, which guesses solutions for us. ○ Pass the solutions back to the program to prove a given behaviour.

Slide 6

Slide 6 text

What is symbolic execution? : Example 1. int main(int argc, char **argv){ 2. unsigned int x = atoi(argv[1]); 3. unsigned int y = atoi(argv[2]); 4. if (x + y > 10) { 5. x = y - 1; 6. }else { x = y - 10; } 7. } S = {..., a = b - 1}, P = (a + b > 10) 5. x = y - 1; S = {..., a = b - 10}, P = (a + b <= 10) 6. x = y - 10; ii iii successor states S = {x = a, y = b}, P = (true) 2. x = atoi(...); y = atoi(...); i state number path condition P Variable store S Program statement initial state

Slide 7

Slide 7 text

Angr on the inside : Overview z3 CLE VexIR capstone cle loads everything Binary Claripy pyVex loader IR language Symbolic Engine / Execution Manager Constraint Solving Engine Disassembly Library

Slide 8

Slide 8 text

Angr on the inside : Disassembly & Loading ● Translate Machine code and Executable formats (ELF,PE,etc) to Assembler code / Meta-data ● Resolve relocations / imports ● Find entry points and destructors ● Sort out dynamic / static loading ● Derive Control Flow Graph (CFG) (optional) ● ... CLE capstone cle loads everything Binary decom pilation

Slide 9

Slide 9 text

Angr on the inside : Binary Lifting ● Translate ASM to VexIR ● Literalize the implicit register interactions ● Draw up ASTs ● Simplify statements to Single Static Assignment form i.e. each variable is defined once ● ... CLE VexIR cle loads everything lifting pyVex

Slide 10

Slide 10 text

Angr on the inside : Symbolic Execution ● Translate VexIR into Z3 friendly statements ● Translate Z3 results back into interpetable/concretized values. ● Solve constraints, Simplify constraints ● Manage execution states, classify states into different buckets, breakpoint handling. Symbolic execution z3 Claripy

Slide 11

Slide 11 text

Important Classes .block.capstone .solver .memory .inspect angr.SimState .[your plugin] .registers .loader .factory .entry angr.Project angr.SimulationManager .use_techniques .step .stashes .project angr.exploration_techniques

Slide 12

Slide 12 text

Typical workflow 1. Load a binary a. (optional) i. Derive CFG ii. Use PLT iii. Check for symbols iv. Check for sections binary Project

Slide 13

Slide 13 text

1. Load a binary 2. Build initial state a. (optional) i. Breakpoints ii. Hooks iii. State plugins b. Full State c. Blank State d. Custom Typical workflow SimState binary breakpoints / hooks Project

Slide 14

Slide 14 text

1. Load a binary 2. Build initial state 3. Model Variables a. Arguments / Environment b. Memory c. Files d. Registers e. .... Typical workflow SimState binary breakpoints / hooks Project Symbolic Variables

Slide 15

Slide 15 text

Typical workflow SimState Symbolic Variables binary Simulation Manager breakpoints / hooks Exploration Techniques Project 1. Load a binary 2. Model Variables 3. Set up an initial state 4. Set up a simulation Manager a. (optional) i. Exploration tech 5. Step through / run simulation

Slide 16

Slide 16 text

Simple “keygen” example : Binary CFG

Slide 17

Slide 17 text

Simple “keygen” example : Script

Slide 18

Slide 18 text

Simple “keygen” example : Results

Slide 19

Slide 19 text

Other problems people have solved ● Vulnerability Detection and Exploitation Analysis ○ “FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution” - https://arxiv.org/abs/1708.09114 ○ “File Parsing Vulnerability Detection with Symbolic Execution” - https://ieeexplore.ieee.org/document/6269637 ● Automatic Exploit Generation ○ “Unleashing Mayhem on Binary Code” - https://researchgate.net/publication/241633678_Unleashing_Mayhem_on_Binary_Code ● De-obfuscation and detection of Malware ○ Malware deobfuscation: symbolic analysis to the rescue! - https://www.virusbulletin.com/conference/vb2017/abstracts/malware-deobfuscation-symbolic-a nalysis-rescue

Slide 20

Slide 20 text

State Exploration Algorithms ● No knowledge of forward complexity (how complex code will get) ● Vulnerable to the impact of state explosion (states can grow and fork uncontrollably) ● How do you decide which state to step forward? ○ Depth First Search (DFS); the boring default option ○ Breadth First Search (BFS); good for discovering new code and high coverage ○ Random/Uniform - good general applicability ○ Path length i.e. the length of the chain of simStates from here to the beginning ○ Machine Learning / Markov Chains? ○ More optimizations ■ Based on constraint solver costs ■ % of new code discovered ■ ...

Slide 21

Slide 21 text

State Exploration Algorithms : Explorer ● angr.exploration_techniques.Explorer ● angr.SimulationManager.explore(find=,avoid=) ● Step states forward until desired code block is reached ● Avoid a list of blocks ● Pros: ○ Accurate ○ Easy to implement (just compare state.ip with target) ○ Can avoid a list of states if supplied ○ Gets the job done ● Cons: ○ Still suffers path explosion ○ Requires static prior analysis to find target block/state Target state start ? Control Flow Graph

Slide 22

Slide 22 text

State Exploration Algorithms : Explorer

Slide 23

Slide 23 text

State Exploration Algorithms : Directed Target state start Exit state ? ● angr.exploration_techniques.Director ● Uses Control Flow Graph to determine the “distance” to a target ● Requires identification of target block (static analysis) a. Function calls : Director.ExecuteFunctionGoal b. Block address : Director.ExecuteAddressGoal ● All-pairs shortest paths pre-computation a. How to get from one node to any other the fastest* ● Relies on accuracy / completeness of CFG ● Massive savings on state maintenance / constraint solving

Slide 24

Slide 24 text

State Exploration Algorithms : Directed CFG Target state start Exit state State growth 3 8 ? ? Current state State Map 1 1 2 2 Branch distance -1 ? ... ... Loop states -1 -1 -1 -1 -1

Slide 25

Slide 25 text

State Exploration Algorithms : Directed CFG State growth 3 8 ? ? State Map 1 -1 2 2 -1 -1 ? ... ... ... -1 -1 -1 1

Slide 26

Slide 26 text

State Exploration Algorithms : Directed

Slide 27

Slide 27 text

Designing your own state exploration ● Some paths are not realizable / interesting ○ Experiment with when to check for reachability ○ Any way we can be sure we don’t need to check? ○ States you know you’re not interested in? ● Some constraints are not satisfiable ○ Anyway to re-use previous unsat cores? ○ Concretization of memory addresses / variables? ● Loops and Function Calls ○ Produce a ton of states, detect and skip as many as possible ● Libraries ○ Masses of code that may not be important

Slide 28

Slide 28 text

Designing your own state exploration ● Inherit class MyExplore( angr.exploration_techniques.ExplorationTechnique) ● ExplorationTechnique. step(...,simulationManager,...) -> “stash” ○ Hooks the state processing action ○ Essentially step the simulation manager and then return it ○ Count active states, move states between stashes, etc ○ Most important / valuable method on average ● .filter(...,SimState,...) -> True/False ○ Decide which stash a SimState should be passed to ○ Can modify state and pass it to a stash ○ Not completely necessary mostly to keep implementation neat ● .selector(...,SimState,...) -> True/False ○ Determine if state should participate in current round of stepping ● .complete() -> True/False ○ Determine when simulation manager is done

Slide 29

Slide 29 text

Designing your own state exploration : BFS

Slide 30

Slide 30 text

Conclusion ● Automated Binary analysis is cool ● Symbolic execution is slow and clunky at the moment but it’s capable of awesome stuff ● Angr is super easy to use, helps people solve problems in new ways ● State Exploration algorithms are fun to design

Slide 31

Slide 31 text

Thank you, questions?

Slide 32

Slide 32 text

FAQs ● Practice problems: ○ https://crackmes.one ○ https://github.com/guyinatuxedo/nightmare ○ https://docs.angr.io/examples ● Other presentations on angr: ○ Angr Tutorial https://www.youtube.com/watch?v=XgHZ6QnZkgc ○ Angr - a powerful binary analysis platform for CTFs. The Cyber Grand Challenge - https://www.youtube.com/watch?v=Wx2RhKI7TIU

Slide 33

Slide 33 text

References 1. Wang, F. and Shoshitaishvili, Y., 2017, September. Angr-the next generation of binary analysis. In 2017 IEEE Cybersecurity Development (SecDev) (pp. 8-9). IEEE. https://www.researchgate.net/publication/320651303_Angr_-_The_Next_Generation_of_Binary_Analysis 2. Angr Github angr/angr: A powerful and user-friendly binary analysis platform! 3. angr · PyPI 4. Shoshitaishvili, Yan & Wang, Ruoyu & Hauser, Christophe & Kruegel, Christopher & Vigna, Giovanni. (2015). Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware. 10.14722/ndss.2015.23294. https://www.researchgate.net/publication/300924994_Firmalice_-_Automatic_Detection_of_Authentication_Bypass_Vulnerabilities_in_Binar y_Firmware 5. FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution https://www.ndss-symposium.org/wp-content/uploads/2017/09/11_1_2.pdf 6. Angr internals (blog) https://angr.io/blog/throwing_a_tantrum_part_1/ 7. PyVex https://angr.io/api-doc/pyvex.html 8. ValGrind https://www.valgrind.org/docs/manual/hg-manual.html 9. James C. King. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (July 1976), 385–394. DOI:https://doi.org/10.1145/360248.360252 10. Roberto Baldoni, Emilio Coppa, Daniele Cono D’elia, Camil Demetrescu, and Irene Finocchi. 2018. A Survey of Symbolic Execution Techniques. ACM Comput. Surv. 51, 3, Article 50 (July 2018), 39 pages. DOI:https://doi.org/10.1145/3182657