Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Analyzing Evil PDF Files with peepdf

Analyzing Evil PDF Files with peepdf

Hands-on usage of the peepdf tool by Jose Miguel Esparza on malicious PDF files

Includes scripts for use in a enterprise response

Benjamin Scott

July 25, 2012
Tweet

More Decks by Benjamin Scott

Other Decks in Technology

Transcript

  1. Overall PDF Structure Header Body Cross-references Trailer %PDF-1.4 5 0

    obj << /Type /Page/Contents 6 0 R/Resources 4 0 R Parent 25 0 R>> endobj Trailer << /Size 588 /Root 586 0 R >> Startxref 3477152 %%EOF Xref 0 588 0000000000 65535 f 0000000793 00000 n
  2. PDF Body •Tree data structure ▪ Starts with Root object

    ▪ A Catalog define the Pages of document ▪ Pages contain actual content
  3. Basic peepdf usage (on Backtrack under /pentest/forensics/peepdf) peepdf.py –i <pdf_name>

    PPDF> (insert commands here) ▪ info $object_id [$revision] ▪ // show summary of an object ▪ object $object_id [$revision] ▪ // show contents of an object ▪ metadata //show creation info of PDF ▪ info > $outfile //redirect to file ▪ help $command // show usage for commands
  4. PDF Elements: Objects •Object are building blocks ▪ Written as

    object_ID revision ▪ Original is revision zero ▪ 12073 == Integer object ▪ (delicious sandwiches) == String object 119 0 obj 12073 endobj PPDF> info 119 […] Object: integer PPDF> object 119 12073
  5. PDF Elements: Dictionaries •A Dictionary is a list of Objects

    ▪ Written using << and >> ▪ /Type /Catalog == Named object of type Catalog ▪ /Lang (en-US) == String for Catalog’s language 3 0 obj << /Type /Catalog /Lang (en-US) >> endobj PPDF> info 3 […] Object: dictionary Type: /Catalog PPDF> object 3 << /Type /Catalog /Lang (en-US) >>
  6. PDF Elements: References •References are pointers to objects ▪ Written

    as object_ID revision R ▪ Allow for object chaining 99 0 obj << /Type /Catalog /Pages 5 0 R >> endobj PPDF> object 99 << /Pages 5 0 R >> PPDF> object 5 << /Type /Pages /Kids 3 0 R /Count 1 >> PPDF> object 3 << /Type /Page /Content 7 0 R /Font 52 0 R >>
  7. PDF Elements: Streams •Streams are large blobs of data ▪

    Written as stream “data” endstream 60 obj << /Filter /FlateDecode /Length 1337 >> stream “data” endstream endobj PPDF> info 60 […] Object: stream Length: 1337 Encoded: Yes Filters: /FlateDecode PPDF> object 60 << /Filter /FlateDecode /Length 1337 >> stream “data” endstream
  8. Lab 0: Normal PDF •benign.pdf is a perfectly normal document,

    similar to thousands of others in your organization •Goal: ▪ Get comfortable using peepdf ▪ Gain familiarity with non-malicious PDFs
  9. Lab 0: Questions 1.How many objects does the document contain?

    2.How many versions of the document are there? 3.Who is the author of this document? 4.In which time zone was it created? BONUS - Which object would you examine first if this document were flagged as suspicious ?
  10. Lab 0: Answers 1.How many objects does the document contain?

    49 2.How many versions of the document are there? 2 3.Who is the author of this document? Ray Bair 4.In which time zone was it created? -8'00 BONUS - Which object would you examine first if this document were flagged as suspicious ? Object 22
  11. Exploitation in PDF •Built-in Flash interpreter ▪ Malformed input to

    functions •Bugs in file parsers ▪ libtiff ▪ 3D engine •Issues with Multimedia functions ▪ media.newplayer ▪ collab.getIcon, collab.collectEmailInfo •Launching embedded files ▪ /OpenAction -> /Launch
  12. JavaScript Shellcode Benign PDF Binary 1.User opens the PDF 2.JavaScript

    builds crafted input 3.JavaScript triggers exploit using crafted input 4.PDF reader crashes and executes Shellcode 5.Shellcode writes Binary to disk 6.Shellcode executes Binary 7.Binary opens Benign PDF 8.Binary opens C2 channel The Process of Exploitation
  13. More peepdf usage peepdf.py –i <pdf_name> PPDF> (insert commands here)

    ▪ rawobject $object_id [$revision] ▪ // show raw bytes of an object ▪ stream $object_id [$revision] ▪ // show filtered content of stream ▪ rawstream $object_id [$revision] > $out.bin ▪ // dump raw bytes of stream to file ▪ changelog //show changes to PDF
  14. Lab 1: PDF with JavaScript •alertme.pdf contains JavaScript that automatically

    executes Goal: ▪ Find and decode JavaScript using peepdf
  15. Lab 1: Questions 1.What object contains the /OpenAction directive? 2.What

    object contains JavaScript ? 3.What does the JavaScript do ? BONUS: Which filters are applied to the JavaScript object, and in which order?
  16. Lab 1: Answers 1.What object contains the /OpenAction directive? Object

    1 2.What object contains JavaScript ? Object 5 3.What does the JavaScript do ? Opens an alert window with a message once the file is opened, if JavaScript is enabled BONUS: Which filters are applied to the JavaScript object, and in which order? /ASCIIHexDecode /FlateDecode, ascii (flate
  17. Sample Malicious JavaScript for (i = 0; i < buffersize;

    i ++ ){ buffer[i] = unescape("%0a%0a%0a%0a"); } var strtmp3 = "Collab.get" + "Icon(buffer+'_N.bundle');"; eval(strtmp3); CVE-2009-0927 - collab.getIcon() [buffer overflow on stack] for (i = 0; i < 200; i ++ )memory[i] = block + shellcode; try { this .media.newPlayer(null); } catch (e){} util.printd(String.fromCharCode(2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 257 , 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570, 2570), new Date()); } CVE-2009-4324 - media.newplayer [with heap spray] http://contagiodump.blogspot.com/2010/08/aug-3-cve-2009-0927-cve-2009-4324-cve.html
  18. (even) More peepdf usage peepdf.py –i <pdf_name> PPDF> (insert commands

    here) •tree [$file_version] // show overall hierarchy of PDF document •hash object $object_id [$revision] •// show md5/sha1/sha256 hash of an object •hash stream $object_id [$revision] •// show md5/sha1/sha256 hash of a stream •search $string // find text in PDF •xor_search $string // find XOR’d text
  19. Lab 2: PDF doc as a container •embedme.pdf contains a

    binary file as part of its structure, which is valid using PDF specification. ▪ Can be used to include malicious executables as part of a dropper •Goal: ▪ Locate the embedded file using peepdf
  20. Lab 2: Questions 1.What object contains the embedded file ?

    2.What object contains the /AA directive ? 3.Which type of action is taken on the embedded file ? 4.What is the MD5 hash of the executable? BONUS: What was the original name of the exe?
  21. Lab 2: Answers 1.What object contains the embedded file ?

    Object 82 2.What object contains the /AA directive ? Object 10 3.Which type of action is taken on the embedded file ? /GoToE 4.What is the MD5 hash of the executable? 10e4a1d2132ccb5c6759f038cdb6f3c9 BONUS: What was the original name of the exe? calc.exe
  22. Triaging a suspicious PDF Corrupt | Unscannable | Generic (Exploit.PDF-JS.Gen)

    peepdf.py $pdf_file --load-script peepnoevil Email Antivirus Packet Capture Pull SMTP flow from network capture Save attachment from SMTP flow
  23. Triage Results Escalate Close as False Positive Benign Malicious Unclear

    1. Saved PDF file from email 2. AV scan results 3. Output of peepdf.py --load-script peepnoevil 4. Manual analysis of the PDF Analysis Results
  24. Hot Triage Tips •PDFs flagged as 'Corrupted' or 'Unscannable' ▪

    Tentative indicator of badness ▪ Could be large or broken benign file •Output of signature-based scans can be misleading ▪ Yara cannot see inside encoded object streams ▪ Antivirus is bad at detecting malicious PDFs •False positives on benign documents ▪ /JavaScript or /OpenAction can be used legitimately ▪ Flash, 3D objects, obfuscated JS are likely
  25. Other PDF Tools •Didier’s PDF Tools ▪ pdfid <mal_pdf> ▪

    pdfparser --stats <mal_pdf> •Brandon Dixon’s PDF X-RAY ▪ Online and local parsing engine with database ▪ pdfxray_lite.py -r html -f <mal_pdf> •Sid Seward’s PDF Toolkit •Swiss Army Knife for PDFs ▪ pdftk <encrypted_pdf> input_pw <password> output unencrypted.pdf
  26. Peepdf command reference peepdf.py –i <pdf_name> PPDF> (insert commands here)

    ▪ info $object_id [$revision] ▪ // show summary of an object ▪ object $object_id [$revision] ▪ // show contents of an object ▪ metadata //show creation info of PDF ▪ info > $outfile //redirect to file ▪ help $command // show usage for commands ▪ rawobject $object_id [$revision] ▪ // show raw bytes of an object ▪ stream $object_id [$revision] ▪ // show filtered content of stream ▪ rawstream $object_id [$revision] > $out.bin ▪ // dump raw bytes of stream to file ▪ changelog //show changes to PDF ▪ tree [$file_version] // show overall hierarchy of PDF document ▪ hash object $object_id [$revision] ▪ // show md5/sha1/sha256 hash of an object ▪ hash stream $object_id [$revision] ▪ // show md5/sha1/sha256 hash of a stream ▪ search $string // find text in PDF
  27. Popular Vulnerabilities •Embedded MP4 video (Flash) (CVE-2012-0754) •U3D engine (CVE-2011-2462)

    •collab.getIcon (CVE-2009-0927) •Doc.media.newplayer(CVE-2009-4324) •collab.collectEmailInfo (CVE-2007-5659) •Embedded file Launch feature