Upgrade to Pro — share decks privately, control downloads, hide ads and more …

"Controlling your Angr : Techniques for improving Symbolic Execution with the Angr framework" by Keith Makan

Pycon ZA
October 08, 2020

"Controlling your Angr : Techniques for improving Symbolic Execution with the Angr framework" by Keith Makan

Angr is a python framework for analyzing binary programs that comes jam packed with the power of symbolic execution. Angr can automate sophisticated reasoning about programs and facilitate creation of cutting edge symbolic execution strategies. Although some tricks and tips are required to ensure practical application doesn't get bogged down by state explosion and other spins offs of the halting problem. This talk is for anyone who would like an introduction to symbolic execution, learn a powerful tool for automating tasks related to reverse engineering, exploit development and vulnerability discovery and get to know Angr a little better. The presenter will walk through some basic symbolic execution concepts, discuss the internals of Angr and guide listeners through how to design state exploration strategies to make practical applications and analysis written in Angr work.

Talk breakdown:

* What is symbolic execution? (5 mins): Walk through a basic introduction to the concepts, why symbolic execution is cool and what kinds of problems we can solve with it.

* What is Angr? How does it work? (5-10 mins): Crash course in the components that make up Angr, how they work together and a demonstration of a simple CTF example with the framework.

* How do I build a state exploration strategy? (10-15 mins): Walk through of the built in state exploration algorithms that come with Angr, what makes them awesome and how to build one of your own to adapt to the binaries you analyze and reverse engineer.

* Closing and other ideas (3-5 mins): Summary of the discussion and mention of other things to explore with the framework.

Total running time: ~30-35 mins

More about Angr at https://angr.io/.

Pycon ZA

October 08, 2020
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Controlling your Angr
    Techniques for improving Symbolic Execution with the Angr framework
    Keith Makan

    View Slide

  2. Who am i
    ● Keith Makan, BSc Computer Science
    ● Two time non-award winning author:
    ○ Android Security Cookbook -
    https://www.amazon.com/Android-Security-Cookbook-Keith-Makan/dp/1782167161
    ○ Penetration Testing with the Bash Shell -
    https://www.amazon.com/Penetration-Testing-shell-Keith-Makan/dp/1849695105/ref=sr_1_1
    ● Security Consultant
    ○ Secure Code Review
    ○ Penetration Testing (WebApps, Android Apps, Cloud, Infrastructure)
    ● Currently pursuing my Masters focused on Symbolic Execution

    View Slide

  3. Agenda
    ● Talk about what Symbolic Execution is, why it’s cool
    ● Talk about Angr, what its made, of how it works
    ● Walk through simple example with Angr
    ● Talk about state exploration algorithms
    ● Developing your own state exploration

    View Slide

  4. What is symbolic execution?
    Constraint Solving Static Analysis
    you
    An program

    View Slide

  5. What is symbolic execution?
    ● Not emulation/virtualization, not direct execution!
    ● Constraint Solving + ASTs + Static Analysis:
    ○ Machines started becoming powerful enough to handle constraint solving.
    ○ Folks in various disciplines including Electrical Engineering, Computer Science etc jumped at
    the problem.
    ○ Hackers realized that constraint solving can be used to automate reasoning about security
    properties.
    ● Automate test case generation
    ● Provide formal, demonstrative proof of program behaviour
    ● In simple terms:
    ○ We rewrite the program to show which states it will assume under given input.
    ○ Pass statements to a constraint solver, which guesses solutions for us.
    ○ Pass the solutions back to the program to prove a given behaviour.

    View Slide

  6. What is symbolic execution? : Example
    1. int main(int argc, char **argv){
    2. unsigned int x = atoi(argv[1]);
    3. unsigned int y = atoi(argv[2]);
    4. if (x + y > 10) {
    5. x = y - 1;
    6. }else { x = y - 10; }
    7. }
    S = {..., a = b - 1}, P = (a + b > 10)
    5. x = y - 1;
    S = {..., a = b - 10}, P = (a + b <= 10)
    6. x = y - 10;
    ii iii
    successor states
    S = {x = a, y = b}, P = (true)
    2. x = atoi(...); y = atoi(...);
    i
    state number
    path condition P
    Variable store S
    Program statement
    initial state

    View Slide

  7. Angr on the inside : Overview
    z3
    CLE
    VexIR
    capstone
    cle loads
    everything
    Binary
    Claripy
    pyVex
    loader
    IR language
    Symbolic
    Engine /
    Execution
    Manager
    Constraint
    Solving Engine
    Disassembly
    Library

    View Slide

  8. Angr on the inside : Disassembly & Loading
    ● Translate Machine code and
    Executable formats (ELF,PE,etc)
    to Assembler code / Meta-data
    ● Resolve relocations / imports
    ● Find entry points and destructors
    ● Sort out dynamic / static loading
    ● Derive Control Flow Graph (CFG)
    (optional)
    ● ...
    CLE
    capstone
    cle loads
    everything
    Binary
    decom
    pilation

    View Slide

  9. Angr on the inside : Binary Lifting
    ● Translate ASM to VexIR
    ● Literalize the implicit register
    interactions
    ● Draw up ASTs
    ● Simplify statements to Single
    Static Assignment form i.e.
    each variable is defined
    once
    ● ...
    CLE
    VexIR
    cle loads
    everything
    lifting
    pyVex

    View Slide

  10. Angr on the inside : Symbolic Execution
    ● Translate VexIR into Z3 friendly
    statements
    ● Translate Z3 results back into
    interpetable/concretized values.
    ● Solve constraints, Simplify
    constraints
    ● Manage execution states, classify
    states into different buckets,
    breakpoint handling.
    Symbolic
    execution
    z3
    Claripy

    View Slide

  11. Important Classes
    .block.capstone
    .solver
    .memory
    .inspect
    angr.SimState .[your
    plugin]
    .registers
    .loader
    .factory
    .entry
    angr.Project
    angr.SimulationManager
    .use_techniques
    .step
    .stashes .project
    angr.exploration_techniques

    View Slide

  12. Typical workflow
    1. Load a binary
    a. (optional)
    i. Derive CFG
    ii. Use PLT
    iii. Check for symbols
    iv. Check for sections
    binary
    Project

    View Slide

  13. 1. Load a binary
    2. Build initial state
    a. (optional)
    i. Breakpoints
    ii. Hooks
    iii. State plugins
    b. Full State
    c. Blank State
    d. Custom
    Typical workflow
    SimState
    binary
    breakpoints /
    hooks
    Project

    View Slide

  14. 1. Load a binary
    2. Build initial state
    3. Model Variables
    a. Arguments / Environment
    b. Memory
    c. Files
    d. Registers
    e. ....
    Typical workflow
    SimState
    binary
    breakpoints /
    hooks
    Project
    Symbolic
    Variables

    View Slide

  15. Typical workflow
    SimState
    Symbolic
    Variables
    binary
    Simulation
    Manager
    breakpoints /
    hooks
    Exploration
    Techniques
    Project
    1. Load a binary
    2. Model Variables
    3. Set up an initial state
    4. Set up a simulation
    Manager
    a. (optional)
    i. Exploration tech
    5. Step through / run
    simulation

    View Slide

  16. Simple “keygen” example : Binary CFG

    View Slide

  17. Simple “keygen” example : Script

    View Slide

  18. Simple “keygen” example : Results

    View Slide

  19. Other problems people have solved
    ● Vulnerability Detection and Exploitation Analysis
    ○ “FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution” -
    https://arxiv.org/abs/1708.09114
    ○ “File Parsing Vulnerability Detection with Symbolic Execution” -
    https://ieeexplore.ieee.org/document/6269637
    ● Automatic Exploit Generation
    ○ “Unleashing Mayhem on Binary Code” -
    https://researchgate.net/publication/241633678_Unleashing_Mayhem_on_Binary_Code
    ● De-obfuscation and detection of Malware
    ○ Malware deobfuscation: symbolic analysis to the rescue! -
    https://www.virusbulletin.com/conference/vb2017/abstracts/malware-deobfuscation-symbolic-a
    nalysis-rescue

    View Slide

  20. State Exploration Algorithms
    ● No knowledge of forward complexity (how complex code will get)
    ● Vulnerable to the impact of state explosion (states can grow and fork
    uncontrollably)
    ● How do you decide which state to step forward?
    ○ Depth First Search (DFS); the boring default option
    ○ Breadth First Search (BFS); good for discovering new code and high coverage
    ○ Random/Uniform - good general applicability
    ○ Path length i.e. the length of the chain of simStates from here to the beginning
    ○ Machine Learning / Markov Chains?
    ○ More optimizations
    ■ Based on constraint solver costs
    ■ % of new code discovered
    ■ ...

    View Slide

  21. State Exploration Algorithms : Explorer
    ● angr.exploration_techniques.Explorer
    ● angr.SimulationManager.explore(find=,avoid=)
    ● Step states forward until desired code block is reached
    ● Avoid a list of blocks
    ● Pros:
    ○ Accurate
    ○ Easy to implement (just compare state.ip with target)
    ○ Can avoid a list of states if supplied
    ○ Gets the job done
    ● Cons:
    ○ Still suffers path explosion
    ○ Requires static prior analysis to find target block/state
    Target
    state
    start
    ?
    Control Flow Graph

    View Slide

  22. State Exploration Algorithms : Explorer

    View Slide

  23. State Exploration Algorithms : Directed
    Target
    state
    start
    Exit state
    ?
    ● angr.exploration_techniques.Director
    ● Uses Control Flow Graph to determine the “distance” to a target
    ● Requires identification of target block (static analysis)
    a. Function calls : Director.ExecuteFunctionGoal
    b. Block address : Director.ExecuteAddressGoal
    ● All-pairs shortest paths pre-computation
    a. How to get from one node to any other the fastest*
    ● Relies on accuracy / completeness of CFG
    ● Massive savings on state maintenance / constraint solving

    View Slide

  24. State Exploration Algorithms : Directed
    CFG
    Target
    state
    start
    Exit state
    State
    growth
    3
    8
    ?
    ?
    Current
    state
    State
    Map
    1
    1 2
    2
    Branch
    distance
    -1
    ?
    ...
    ...
    Loop states
    -1
    -1
    -1
    -1
    -1

    View Slide

  25. State Exploration Algorithms : Directed
    CFG
    State
    growth
    3
    8
    ?
    ?
    State
    Map
    1
    -1 2
    2
    -1
    -1
    ?
    ...
    ...
    ...
    -1
    -1
    -1 1

    View Slide

  26. State Exploration Algorithms : Directed

    View Slide

  27. Designing your own state exploration
    ● Some paths are not realizable / interesting
    ○ Experiment with when to check for reachability
    ○ Any way we can be sure we don’t need to check?
    ○ States you know you’re not interested in?
    ● Some constraints are not satisfiable
    ○ Anyway to re-use previous unsat cores?
    ○ Concretization of memory addresses / variables?
    ● Loops and Function Calls
    ○ Produce a ton of states, detect and skip as many as possible
    ● Libraries
    ○ Masses of code that may not be important

    View Slide

  28. Designing your own state exploration
    ● Inherit class MyExplore( angr.exploration_techniques.ExplorationTechnique)
    ● ExplorationTechnique. step(...,simulationManager,...) -> “stash”
    ○ Hooks the state processing action
    ○ Essentially step the simulation manager and then return it
    ○ Count active states, move states between stashes, etc
    ○ Most important / valuable method on average
    ● .filter(...,SimState,...) -> True/False
    ○ Decide which stash a SimState should be passed to
    ○ Can modify state and pass it to a stash
    ○ Not completely necessary mostly to keep implementation neat
    ● .selector(...,SimState,...) -> True/False
    ○ Determine if state should participate in current round of stepping
    ● .complete() -> True/False
    ○ Determine when simulation manager is done

    View Slide

  29. Designing your own state exploration : BFS

    View Slide

  30. Conclusion
    ● Automated Binary analysis is cool
    ● Symbolic execution is slow and clunky at the moment but it’s capable of
    awesome stuff
    ● Angr is super easy to use, helps people solve problems in new ways
    ● State Exploration algorithms are fun to design

    View Slide

  31. Thank you, questions?

    View Slide

  32. FAQs
    ● Practice problems:
    ○ https://crackmes.one
    ○ https://github.com/guyinatuxedo/nightmare
    ○ https://docs.angr.io/examples
    ● Other presentations on angr:
    ○ Angr Tutorial https://www.youtube.com/watch?v=XgHZ6QnZkgc
    ○ Angr - a powerful binary analysis platform for CTFs. The Cyber Grand Challenge -
    https://www.youtube.com/watch?v=Wx2RhKI7TIU

    View Slide

  33. References
    1. Wang, F. and Shoshitaishvili, Y., 2017, September. Angr-the next generation of binary analysis. In 2017 IEEE Cybersecurity Development
    (SecDev) (pp. 8-9). IEEE. https://www.researchgate.net/publication/320651303_Angr_-_The_Next_Generation_of_Binary_Analysis
    2. Angr Github angr/angr: A powerful and user-friendly binary analysis platform!
    3. angr · PyPI
    4. Shoshitaishvili, Yan & Wang, Ruoyu & Hauser, Christophe & Kruegel, Christopher & Vigna, Giovanni. (2015). Firmalice - Automatic Detection
    of Authentication Bypass Vulnerabilities in Binary Firmware. 10.14722/ndss.2015.23294.
    https://www.researchgate.net/publication/300924994_Firmalice_-_Automatic_Detection_of_Authentication_Bypass_Vulnerabilities_in_Binar
    y_Firmware
    5. FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution
    https://www.ndss-symposium.org/wp-content/uploads/2017/09/11_1_2.pdf
    6. Angr internals (blog) https://angr.io/blog/throwing_a_tantrum_part_1/
    7. PyVex https://angr.io/api-doc/pyvex.html
    8. ValGrind https://www.valgrind.org/docs/manual/hg-manual.html
    9. James C. King. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (July 1976), 385–394.
    DOI:https://doi.org/10.1145/360248.360252
    10. Roberto Baldoni, Emilio Coppa, Daniele Cono D’elia, Camil Demetrescu, and Irene Finocchi. 2018. A Survey of Symbolic Execution
    Techniques. ACM Comput. Surv. 51, 3, Article 50 (July 2018), 39 pages. DOI:https://doi.org/10.1145/3182657

    View Slide