Slide 1

Slide 1 text

Caring for file formats Caring for file formats Ange Albertini Troopers 2016 Ange Albertini Troopers 2016

Slide 2

Slide 2 text

TL;DR ● Attack surface with file formats is too big. ● Specs are useless (just a nice ‘guide’), not representing reality. ● We can’t deprecate formats because we can’t preserve and we can’t define how they really work ● We need open good libraries to simplify landscape, and create a corpus to express the reality of file format, which gives us real “documentation”. ● Then we can preserve and deprecate older format, which reduces attack surface. ● From then on, we can focus on making the present more secure. ● We don’t need “new” formats: we need ‘alive’ specs and files corpus. Otherwise specs will always diverge from reality.

Slide 3

Slide 3 text

Ange Albertini reverse engineering & visual documentation @angealbertini [email protected] http://www.corkami.com Welcome to my talk!

Slide 4

Slide 4 text

I make polyglots (multi-type files), schizophrenics (multi-behavior)...

Slide 5

Slide 5 text

I tried to explain file formats with cows… But that didn’t really tell why people should care.

Slide 6

Slide 6 text

1 3DES I really like to play with file formats... AES K AES K JPG JAR (ZIP + CLASS) PDF FLV PNG 2

Slide 7

Slide 7 text

I’m a part of PoC||GTFO, for which I’m a file format user and abuser.

Slide 8

Slide 8 text

PoC||GTFO: many file formats ● Articles PDFLaTeX PDFBook Inkscape GhostScript Scribus Blender Gimp Fontforge PDFFont Mutool ● Proof of Concept Qpdf Xpdf Ruby Python Bash Truecrypt Wavpack Audacity Baudline Sox Tar Zip MkIsoFS LSnes PngOpt JpegSnoop AdvPNG Nasm Qemu BPGEnc And many custom scripts handling file formats in unconventional ways…

Slide 9

Slide 9 text

I'm interested about hardware preservation and digital preservation.

Slide 10

Slide 10 text

My interests ● Using file formats ○ graphics, 3d, music… ● Abusing file formats ○ polyglot, schizophrenia, hash collisions… ● Preserving file formats ○ Retro-gaming, digital archeology...

Slide 11

Slide 11 text

A miserable little pile of secrets Not just a sequence of binary What is a file format?

Slide 12

Slide 12 text

If you [/your program] generate a picture of any kind, you might want to export the result to something that you can re-use later. (same for any form of information)

Slide 13

Slide 13 text

A computer dialect to communicate between communities. What is a file format?

Slide 14

Slide 14 text

File formats are community connectors. Don’t think so? Try exporting everything as XML ;)

Slide 15

Slide 15 text

Most people don’t care about They only care about We mostly care about the input/output.

Slide 16

Slide 16 text

Example: We don’t care about GIF We mostly care about its characteristics and how easy it is to use. No need to be emotional, and stay in our comfort zone.

Slide 17

Slide 17 text

We don’t really care about file formats. We care about their caracteristics. Not groundbreaking, but supported “everywhere”.

Slide 18

Slide 18 text

Why should infosec care? Fuzz formats. Blame “bad” devs. Collect CVEs. Boast your ego. 10 PRINT “SOLVED ANYTHING YET?” 20 GOTO 10

Slide 19

Slide 19 text

Attack surface ● 1 OS = N supported formats ● For each format: ○ How many parsers? ○ For each parser: ■ Which version, compiler...

Slide 20

Slide 20 text

The PGM or PPM formats are the easiest way to convert any data in valid grayscale or RGB pictures. But most people don’t know it’s supported out of the box by many softwares.

Slide 21

Slide 21 text

We should reduce the attack surface. How many unsuspected supported [sub-]formats and parsers? https://lcamtuf.blogspot.com/2014/10/psa-dont-run-strings-on-untrusted-files.html

Slide 22

Slide 22 text

How many file formats supported by your browser ? By your OS? How many do you really need ? Think “embedded”.

Slide 23

Slide 23 text

Capacity is still too cheap: we keep stacking formats/features, which doesn’t solve anything. It’s a problem everywhere. We keep losing ground.

Slide 24

Slide 24 text

Slide 25

Slide 25 text

“Pokemon plays Twitch” 1. Exploit a GameBoy game via input 2. Take over the Super GameBoy 3. Take over the Super Nintendo

Slide 26

Slide 26 text

The file itself can perform the exploit (on the hardware or an emulator). The payload displays the article.

Slide 27

Slide 27 text

--> PoC||GTFO 10 is a PoC-ception: - a PDF article describing the exploit - a file performing the exploit (to display the article)

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

“young celebs” What they were supposed to be doesn’t really matter.

Slide 30

Slide 30 text

What file formats were supposed to be doesn't matter anymore, what they are now is all we care. Security cares about current reality, not obsolete theory.

Slide 31

Slide 31 text

We can blame bad parsers. What about the file formats? If the map is unclear enough, you’ll get lost anyway. A blurry file format will never lead to a clean parser.

Slide 32

Slide 32 text

use a ready-made translator: an import/export library Write your own: read the specs. 2 ways to communicate

Slide 33

Slide 33 text

Landscapes

Slide 34

Slide 34 text

To exploit hash collisions, I abused JPEG. To abuse JPEG “everywhere”, just abuse LibJPEG.

Slide 35

Slide 35 text

JPEG format’s landscape in practice, JPEG is LibJPEG turbo v6 ● de facto standard ○ later versions not used (different API) Even if you create your own JPEG library, you want to have full LibJPEG compatibility. JPEG format is defined by LibJPEG.

Slide 36

Slide 36 text

I made extremely custom PDFs for each reader.

Slide 37

Slide 37 text

These "extreme" PDFs fail on any other reader.

Slide 38

Slide 38 text

PDF’s current landscape PDF: 6 interpretations of the specs ● specs are even more useless

Slide 39

Slide 39 text

One good open library: a unified attack surface Fuzz it, pwn everyone ? True, but also fixed for everyone! Is diversity really good? We’re all supposed to use the same file format.

Slide 40

Slide 40 text

Diversity is good? Attack surface is worse. Unofficial substandards.

Slide 41

Slide 41 text

In any cases... Specs are merely an introduction guide. A free set of examples w/ corner cases. A grammar ?

Slide 42

Slide 42 text

PDF’s future PDF/E (engineer): 3d crap PDF/A (archiving): already 8 flavours Specs: ● specs are now commercial ● the main implementation is not open ● no set of free files. And all countries preserve their culture with that format?!?! We’re waiting for a new disaster...

Slide 43

Slide 43 text

many file formats are abandoned One specs. then nothing. It’s like knowing about someone only from a baby’s picture.

Slide 44

Slide 44 text

Slide 45

Slide 45 text

PoC||GTFO 11 is a webserver serving itself, with its own HTML page extracting its own attachments from its ZIP. $ruby pocorgtfo11.pdf Listening for connections on port 8080. To listen on a different port, re-run with the desired port as a command-line argument. A neighbor at 127.0.0.1 is requesting / A neighbor at 127.0.0.1 is requesting /ajax/feelies.json A neighbor at 127.0.0.1 is requesting /favicon.png $unzip -l pocorgtfo11.pdf Archive: pocorgtfo11.pdf Length Date Time Name -------- ---- ---- ---- 0 03-16-16 13:37 4am/ 25955 03-11-16 15:06 4am/Stickybear Math 2 (4am crack).txt [...] 3241 03-16-16 13:37 wafflehouse.txt -------- ------- 8177332 23 files

Slide 46

Slide 46 text

--> PoC||GTFO 11 is self-aware: a PDF that serves itself (HTTP quine), parses its own ZIP to serve its archived feelies.

Slide 47

Slide 47 text

Important question

Slide 48

Slide 48 text

Do you still sleep with a teddy bear?

Slide 49

Slide 49 text

Kids really deprecate stuff Our computers still handle always more and more file formats. ⇒ The attack surface just keeps growing.

Slide 50

Slide 50 text

Obsolete formats are still omnipresent Formats, sub-formats, features...

Slide 51

Slide 51 text

Because it’s unclear if we can go back. We’d be too afraid to deprecate them.

Slide 52

Slide 52 text

Yet we deprecate for security. Example for PDF: JPEG-compressed text is not supported anymore (it could bypass security).

Slide 53

Slide 53 text

Windows PE format becomes stricter (deprecates packers)

Slide 54

Slide 54 text

For example, EPUB 3.1 suddenly killed backward compatibility. http://blog.kbresearch.nl/2016/03/10/the-future-of-epub-a-first-look-at-the-epub-3-1-editors-draft/ Sometimes, it’s not even for security reasons

Slide 55

Slide 55 text

We don’t need new file formats. It’s the same problem again if eventually their specs stop reflecting reality.

Slide 56

Slide 56 text

Even dictionaries have regular updates, to reflect reality.

Slide 57

Slide 57 text

Story time Digipres = PDF worshippers. 150 years of availability? ● Non free specs + closed source software? Here comes the grim reaper: ● Fix your stuff or it will be killed (like Flash) We store our knowledge. What about files born digital? Not infosec, but worrying.

Slide 58

Slide 58 text

veraPDF and its test files: a great initiative.

Slide 59

Slide 59 text

PE.corkami.com: my own collection of hand-made executables and "documentation" (completely free).

Slide 60

Slide 60 text

Some of these failed a lot of software...

Slide 61

Slide 61 text

Consequence of my PE page+corpus ● 'corkami-proof' software ● raises the bar for everyone ● become a hub of knowledge ○ "I can't share the sample", but from the knowledge, my own file will be shared ⇒ even useful for the original contact

Slide 62

Slide 62 text

Conclusion

Slide 63

Slide 63 text

Attack surface Too many (sub)formats Too many parsers (= no good open lib)

Slide 64

Slide 64 text

Specs Specs shouldn’t be a religious text ● Worshipped, but outdated and worthless Specs should reflect reality (a law) ● updated, enforced, realistic, freely available A good open lib

Slide 65

Slide 65 text

Deprecation Deprecation is a natural cycle, and yet... We are afraid to deprecate because no file format is fully preserved: ● open, up to date specs ● free test coverage

Slide 66

Slide 66 text

But it won’t happen... ...until a great disaster ? It ends up on CNN, with a logo & a website :)

Slide 67

Slide 67 text

Ack Phil Fabrice Travis Sergey Micah Kurt QKumba Hanno...

Slide 68

Slide 68 text

Thank you!

Slide 69

Slide 69 text

Caring for file formats corkami.com @angealbertini Hail to the king, baby!