Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Controlling your Angr : Techniques for improving Symbolic Execution with the Angr framework" by Keith Makan

7b0645f018c0bddc8ce3900ccc3ba70c?s=47 Pycon ZA
October 08, 2020

"Controlling your Angr : Techniques for improving Symbolic Execution with the Angr framework" by Keith Makan

Angr is a python framework for analyzing binary programs that comes jam packed with the power of symbolic execution. Angr can automate sophisticated reasoning about programs and facilitate creation of cutting edge symbolic execution strategies. Although some tricks and tips are required to ensure practical application doesn't get bogged down by state explosion and other spins offs of the halting problem. This talk is for anyone who would like an introduction to symbolic execution, learn a powerful tool for automating tasks related to reverse engineering, exploit development and vulnerability discovery and get to know Angr a little better. The presenter will walk through some basic symbolic execution concepts, discuss the internals of Angr and guide listeners through how to design state exploration strategies to make practical applications and analysis written in Angr work.

Talk breakdown:

* What is symbolic execution? (5 mins): Walk through a basic introduction to the concepts, why symbolic execution is cool and what kinds of problems we can solve with it.

* What is Angr? How does it work? (5-10 mins): Crash course in the components that make up Angr, how they work together and a demonstration of a simple CTF example with the framework.

* How do I build a state exploration strategy? (10-15 mins): Walk through of the built in state exploration algorithms that come with Angr, what makes them awesome and how to build one of your own to adapt to the binaries you analyze and reverse engineer.

* Closing and other ideas (3-5 mins): Summary of the discussion and mention of other things to explore with the framework.

Total running time: ~30-35 mins

More about Angr at https://angr.io/.

7b0645f018c0bddc8ce3900ccc3ba70c?s=128

Pycon ZA

October 08, 2020
Tweet

Transcript

  1. Controlling your Angr Techniques for improving Symbolic Execution with the

    Angr framework Keith Makan
  2. Who am i • Keith Makan, BSc Computer Science •

    Two time non-award winning author: ◦ Android Security Cookbook - https://www.amazon.com/Android-Security-Cookbook-Keith-Makan/dp/1782167161 ◦ Penetration Testing with the Bash Shell - https://www.amazon.com/Penetration-Testing-shell-Keith-Makan/dp/1849695105/ref=sr_1_1 • Security Consultant ◦ Secure Code Review ◦ Penetration Testing (WebApps, Android Apps, Cloud, Infrastructure) • Currently pursuing my Masters focused on Symbolic Execution
  3. Agenda • Talk about what Symbolic Execution is, why it’s

    cool • Talk about Angr, what its made, of how it works • Walk through simple example with Angr • Talk about state exploration algorithms • Developing your own state exploration
  4. What is symbolic execution? Constraint Solving Static Analysis you An

    program
  5. What is symbolic execution? • Not emulation/virtualization, not direct execution!

    • Constraint Solving + ASTs + Static Analysis: ◦ Machines started becoming powerful enough to handle constraint solving. ◦ Folks in various disciplines including Electrical Engineering, Computer Science etc jumped at the problem. ◦ Hackers realized that constraint solving can be used to automate reasoning about security properties. • Automate test case generation • Provide formal, demonstrative proof of program behaviour • In simple terms: ◦ We rewrite the program to show which states it will assume under given input. ◦ Pass statements to a constraint solver, which guesses solutions for us. ◦ Pass the solutions back to the program to prove a given behaviour.
  6. What is symbolic execution? : Example 1. int main(int argc,

    char **argv){ 2. unsigned int x = atoi(argv[1]); 3. unsigned int y = atoi(argv[2]); 4. if (x + y > 10) { 5. x = y - 1; 6. }else { x = y - 10; } 7. } S = {..., a = b - 1}, P = (a + b > 10) 5. x = y - 1; S = {..., a = b - 10}, P = (a + b <= 10) 6. x = y - 10; ii iii successor states S = {x = a, y = b}, P = (true) 2. x = atoi(...); y = atoi(...); i state number path condition P Variable store S Program statement initial state
  7. Angr on the inside : Overview z3 CLE VexIR capstone

    cle loads everything Binary Claripy pyVex loader IR language Symbolic Engine / Execution Manager Constraint Solving Engine Disassembly Library
  8. Angr on the inside : Disassembly & Loading • Translate

    Machine code and Executable formats (ELF,PE,etc) to Assembler code / Meta-data • Resolve relocations / imports • Find entry points and destructors • Sort out dynamic / static loading • Derive Control Flow Graph (CFG) (optional) • ... CLE capstone cle loads everything Binary decom pilation
  9. Angr on the inside : Binary Lifting • Translate ASM

    to VexIR • Literalize the implicit register interactions • Draw up ASTs • Simplify statements to Single Static Assignment form i.e. each variable is defined once • ... CLE VexIR cle loads everything lifting pyVex
  10. Angr on the inside : Symbolic Execution • Translate VexIR

    into Z3 friendly statements • Translate Z3 results back into interpetable/concretized values. • Solve constraints, Simplify constraints • Manage execution states, classify states into different buckets, breakpoint handling. Symbolic execution z3 Claripy
  11. Important Classes .block.capstone .solver .memory .inspect angr.SimState .[your plugin] .registers

    .loader .factory .entry angr.Project angr.SimulationManager .use_techniques .step .stashes .project angr.exploration_techniques
  12. Typical workflow 1. Load a binary a. (optional) i. Derive

    CFG ii. Use PLT iii. Check for symbols iv. Check for sections binary Project
  13. 1. Load a binary 2. Build initial state a. (optional)

    i. Breakpoints ii. Hooks iii. State plugins b. Full State c. Blank State d. Custom Typical workflow SimState binary breakpoints / hooks Project
  14. 1. Load a binary 2. Build initial state 3. Model

    Variables a. Arguments / Environment b. Memory c. Files d. Registers e. .... Typical workflow SimState binary breakpoints / hooks Project Symbolic Variables
  15. Typical workflow SimState Symbolic Variables binary Simulation Manager breakpoints /

    hooks Exploration Techniques Project 1. Load a binary 2. Model Variables 3. Set up an initial state 4. Set up a simulation Manager a. (optional) i. Exploration tech 5. Step through / run simulation
  16. Simple “keygen” example : Binary CFG

  17. Simple “keygen” example : Script

  18. Simple “keygen” example : Results

  19. Other problems people have solved • Vulnerability Detection and Exploitation

    Analysis ◦ “FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution” - https://arxiv.org/abs/1708.09114 ◦ “File Parsing Vulnerability Detection with Symbolic Execution” - https://ieeexplore.ieee.org/document/6269637 • Automatic Exploit Generation ◦ “Unleashing Mayhem on Binary Code” - https://researchgate.net/publication/241633678_Unleashing_Mayhem_on_Binary_Code • De-obfuscation and detection of Malware ◦ Malware deobfuscation: symbolic analysis to the rescue! - https://www.virusbulletin.com/conference/vb2017/abstracts/malware-deobfuscation-symbolic-a nalysis-rescue
  20. State Exploration Algorithms • No knowledge of forward complexity (how

    complex code will get) • Vulnerable to the impact of state explosion (states can grow and fork uncontrollably) • How do you decide which state to step forward? ◦ Depth First Search (DFS); the boring default option ◦ Breadth First Search (BFS); good for discovering new code and high coverage ◦ Random/Uniform - good general applicability ◦ Path length i.e. the length of the chain of simStates from here to the beginning ◦ Machine Learning / Markov Chains? ◦ More optimizations ▪ Based on constraint solver costs ▪ % of new code discovered ▪ ...
  21. State Exploration Algorithms : Explorer • angr.exploration_techniques.Explorer • angr.SimulationManager.explore(find=,avoid=) •

    Step states forward until desired code block is reached • Avoid a list of blocks • Pros: ◦ Accurate ◦ Easy to implement (just compare state.ip with target) ◦ Can avoid a list of states if supplied ◦ Gets the job done • Cons: ◦ Still suffers path explosion ◦ Requires static prior analysis to find target block/state Target state start ? Control Flow Graph
  22. State Exploration Algorithms : Explorer

  23. State Exploration Algorithms : Directed Target state start Exit state

    ? • angr.exploration_techniques.Director • Uses Control Flow Graph to determine the “distance” to a target • Requires identification of target block (static analysis) a. Function calls : Director.ExecuteFunctionGoal b. Block address : Director.ExecuteAddressGoal • All-pairs shortest paths pre-computation a. How to get from one node to any other the fastest* • Relies on accuracy / completeness of CFG • Massive savings on state maintenance / constraint solving
  24. State Exploration Algorithms : Directed CFG Target state start Exit

    state State growth 3 8 ? ? Current state State Map 1 1 2 2 Branch distance -1 ? ... ... Loop states -1 -1 -1 -1 -1
  25. State Exploration Algorithms : Directed CFG State growth 3 8

    ? ? State Map 1 -1 2 2 -1 -1 ? ... ... ... -1 -1 -1 1
  26. State Exploration Algorithms : Directed

  27. Designing your own state exploration • Some paths are not

    realizable / interesting ◦ Experiment with when to check for reachability ◦ Any way we can be sure we don’t need to check? ◦ States you know you’re not interested in? • Some constraints are not satisfiable ◦ Anyway to re-use previous unsat cores? ◦ Concretization of memory addresses / variables? • Loops and Function Calls ◦ Produce a ton of states, detect and skip as many as possible • Libraries ◦ Masses of code that may not be important
  28. Designing your own state exploration • Inherit class MyExplore( angr.exploration_techniques.ExplorationTechnique)

    • ExplorationTechnique. step(...,simulationManager,...) -> “stash” ◦ Hooks the state processing action ◦ Essentially step the simulation manager and then return it ◦ Count active states, move states between stashes, etc ◦ Most important / valuable method on average • .filter(...,SimState,...) -> True/False ◦ Decide which stash a SimState should be passed to ◦ Can modify state and pass it to a stash ◦ Not completely necessary mostly to keep implementation neat • .selector(...,SimState,...) -> True/False ◦ Determine if state should participate in current round of stepping • .complete() -> True/False ◦ Determine when simulation manager is done
  29. Designing your own state exploration : BFS

  30. Conclusion • Automated Binary analysis is cool • Symbolic execution

    is slow and clunky at the moment but it’s capable of awesome stuff • Angr is super easy to use, helps people solve problems in new ways • State Exploration algorithms are fun to design
  31. Thank you, questions?

  32. FAQs • Practice problems: ◦ https://crackmes.one ◦ https://github.com/guyinatuxedo/nightmare ◦ https://docs.angr.io/examples

    • Other presentations on angr: ◦ Angr Tutorial https://www.youtube.com/watch?v=XgHZ6QnZkgc ◦ Angr - a powerful binary analysis platform for CTFs. The Cyber Grand Challenge - https://www.youtube.com/watch?v=Wx2RhKI7TIU
  33. References 1. Wang, F. and Shoshitaishvili, Y., 2017, September. Angr-the

    next generation of binary analysis. In 2017 IEEE Cybersecurity Development (SecDev) (pp. 8-9). IEEE. https://www.researchgate.net/publication/320651303_Angr_-_The_Next_Generation_of_Binary_Analysis 2. Angr Github angr/angr: A powerful and user-friendly binary analysis platform! 3. angr · PyPI 4. Shoshitaishvili, Yan & Wang, Ruoyu & Hauser, Christophe & Kruegel, Christopher & Vigna, Giovanni. (2015). Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware. 10.14722/ndss.2015.23294. https://www.researchgate.net/publication/300924994_Firmalice_-_Automatic_Detection_of_Authentication_Bypass_Vulnerabilities_in_Binar y_Firmware 5. FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution https://www.ndss-symposium.org/wp-content/uploads/2017/09/11_1_2.pdf 6. Angr internals (blog) https://angr.io/blog/throwing_a_tantrum_part_1/ 7. PyVex https://angr.io/api-doc/pyvex.html 8. ValGrind https://www.valgrind.org/docs/manual/hg-manual.html 9. James C. King. 1976. Symbolic execution and program testing. <i>Commun. ACM</i> 19, 7 (July 1976), 385–394. DOI:https://doi.org/10.1145/360248.360252 10. Roberto Baldoni, Emilio Coppa, Daniele Cono D’elia, Camil Demetrescu, and Irene Finocchi. 2018. A Survey of Symbolic Execution Techniques. <i>ACM Comput. Surv.</i> 51, 3, Article 50 (July 2018), 39 pages. DOI:https://doi.org/10.1145/3182657