Slide 1

Slide 1 text

Let’s write a PDF file A simple walk-through to learn the basics of the PDF format (at your rhythm) PDF = Portable Document Format r2

Slide 2

Slide 2 text

Ange Albertini reverse engineering & visual documentation @angealbertini [email protected] http://www.corkami.com

Slide 3

Slide 3 text

Goal: write a “Hello World” in PDF

Slide 4

Slide 4 text

PDF is text-based, with some binary in specific cases. But not in this example, so just open a text editor.

Slide 5

Slide 5 text

Statements are separated by white space. (any extra white space is ignored) Any of these: 0x00 Null 0x0C Form Feed 0x09 Tab 0x0D Carriage Return 0x0A Line feed 0x20 Space (yes, you can mix EOL style :( )

Slide 6

Slide 6 text

Delimiters don’t require white space before. ( ) < > [ ] { } /

Slide 7

Slide 7 text

_ Let’s start!

Slide 8

Slide 8 text

%PDF-_ A PDF starts with a %PDF-? signature followed by a version number. 1.0 <= version number <= 1.7 (it doesn’t really matter here)

Slide 9

Slide 9 text

%PDF-1.3 _ Ok, we have a valid signature ☺

Slide 10

Slide 10 text

%PDF-1.3 %_ A comment starts with % until the end of the line.

Slide 11

Slide 11 text

%PDF-1.3 %file body _ After the signature, comes the file body. (we’ll see about it later)

Slide 12

Slide 12 text

%PDF-1.3 %file body xref _ After the file body, comes the cross reference table. It starts with the xref keyword, on a separated line.

Slide 13

Slide 13 text

%PDF-1.3 %file body xref %xref table here _ After the xref keyword, comes the actual table. (we’ll see about it later)

Slide 14

Slide 14 text

%PDF-1.3 %file body xref %xref table here trailer_ After the table, comes the trailer... It starts with a trailer keyword.

Slide 15

Slide 15 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents _ (we’ll see that later too…) ...and its contents.

Slide 16

Slide 16 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref _ (with startxref) Then, a pointer to the xref table...

Slide 17

Slide 17 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer _ (later, too...)

Slide 18

Slide 18 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF_ ...an %%EOF marker. Lastly, to mark the end of the file...

Slide 19

Slide 19 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Easy ;) That’s the overall layout of a PDF document!

Slide 20

Slide 20 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Now, we just need to fill in the rest :)

Slide 21

Slide 21 text

Study time

Slide 22

Slide 22 text

Def: name objects A.k.a. “strings starting with a slash”

Slide 23

Slide 23 text

/Name A slash, then an alphanumeric string (no whitespace)

Slide 24

Slide 24 text

Case sensitive /Name != /name Names with incorrect case are just ignored (no error is triggered)

Slide 25

Slide 25 text

Def: dictionary object Sequence of keys and values (no delimiter in between) enclosed in << and >> sets each key to value

Slide 26

Slide 26 text

Syntax << key value key value [key value]*… >>

Slide 27

Slide 27 text

Keys are always name objects << /Index 1>> sets /Index to 1 << Index 1 >> is invalid (the key is not a name)

Slide 28

Slide 28 text

Dictionaries can have any length << /Index 1 /Count /Whatever >> sets /Index to 1 and /Count to /Whatever

Slide 29

Slide 29 text

Extra white space is ignored (as usual) << /Index 1 /Count /Whatever >> is equivalent to << /Index 1 /Count /Whatever >>

Slide 30

Slide 30 text

Dictionaries can be nested. << /MyDict << >> >> sets /MyDict to << >> (empty dictionary)

Slide 31

Slide 31 text

White space before delimiters is not required. << /Index 1 /MyDict << >> >> equivalent to <>>>

Slide 32

Slide 32 text

Def: indirect object an object number (>0), a generation number (0*) the obj keyword the object content the endobj keyword * 99% of the time

Slide 33

Slide 33 text

Example 1 0 obj 3 endobj is object #1, generation 0, containing “3”

Slide 34

Slide 34 text

Def: object reference object number, object generation, R number number R ex: 1 0 R

Slide 35

Slide 35 text

Object reference Refers to an indirect object as a value ex: << /Root 1 0 R >> refers to object number 1 generation 0 as the /Root

Slide 36

Slide 36 text

Used only as values in a dictionary << /Root 1 0 R >> is OK. << 1 0 R /Catalog>> isn’t.

Slide 37

Slide 37 text

Be careful with the syntax! “1 0 3” is a sequence of 3 numbers 1 0 3 “1 0 R” is a single reference to an object number 1 generation 0

Slide 38

Slide 38 text

Def: file body sequence of indirect objects object order doesn’t matter

Slide 39

Slide 39 text

Example 1 0 obj 3 endobj 2 0 obj << /Index 1 >> endobj defines 2 objects with different contents

Slide 40

Slide 40 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Remember this?

Slide 41

Slide 41 text

A PDF document is defined by a tree of objects.

Slide 42

Slide 42 text

%PDF-1.3 %file body xref %xref table here trailer %trailer contents startxref %xref pointer %%EOF Now, let’s start!

Slide 43

Slide 43 text

%PDF-1.3 %file body xref %xref table here trailer << _ >> startxref %xref pointer %%EOF The trailer is a dictionary.

Slide 44

Slide 44 text

%PDF-1.3 %file body xref %xref table here trailer << /Root_ >> startxref %xref pointer %%EOF It defines a /Root name...

Slide 45

Slide 45 text

%PDF-1.3 %file body xref %xref table here trailer << /Root 1 0 R_>> startxref %xref pointer %%EOF ...that refers to an object...

Slide 46

Slide 46 text

%PDF-1.3 %file body xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF (like all the the other objects) ...that will be in the file body.

Slide 47

Slide 47 text

Recap: the trailer is a dictionary that refers to a root object.

Slide 48

Slide 48 text

%PDF-1.3 _ xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Let’s create our first object...

Slide 49

Slide 49 text

%PDF-1.3 1 0 obj _ endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …(with the standard object declaration)...

Slide 50

Slide 50 text

%PDF-1.3 1 0 obj << _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF (like most objects) ...that contains a dictionary.

Slide 51

Slide 51 text

%PDF-1.3 1 0 obj << /Type_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...and its /Type is...

Slide 52

Slide 52 text

%PDF-1.3 1 0 obj << /Type /Catalog_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...defined as /Catalog...

Slide 53

Slide 53 text

%PDF-1.3 1 0 obj << /Type /Catalog _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF the /Root object also refers to the page tree...

Slide 54

Slide 54 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...via a /Pages name...

Slide 55

Slide 55 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R_>> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...that refers to another object...

Slide 56

Slide 56 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj _ xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...which we’ll create.

Slide 57

Slide 57 text

Recap: object 1 is a catalog, and refers to a Pages object.

Slide 58

Slide 58 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj _ xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Let’s create object 2.

Slide 59

Slide 59 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj _ endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The usual declaration.

Slide 60

Slide 60 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF It’s a dictionary too.

Slide 61

Slide 61 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The pages’ object /Type has to be defined as … /Pages ☺

Slide 62

Slide 62 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids_ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF This object defines its children via /Kids...

Slide 63

Slide 63 text

Def: array enclosed in [ ] values separated by whitespace ex: [1 2 3 4] is an array of 4 integers 1 2 3 4

Slide 64

Slide 64 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ _ ] >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...which is an array...

Slide 65

Slide 65 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R_] >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF … of references to each page object.

Slide 66

Slide 66 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] _ >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF One last step...

Slide 67

Slide 67 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1_>> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...the number of kids has to be set in /Count...

Slide 68

Slide 68 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...and now object 2 is complete!

Slide 69

Slide 69 text

Recap: object 2 is /Pages; it defines Kids + Count (pages of the document).

Slide 70

Slide 70 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj _ We can add our only Kid...

Slide 71

Slide 71 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj _ endobj …(a single page)...

Slide 72

Slide 72 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << _ >> endobj … a dictionary...

Slide 73

Slide 73 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type_ >> endobj … defining a /Type...

Slide 74

Slide 74 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page_ >> endobj … as /Page.

Slide 75

Slide 75 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent_ >> endobj This grateful kid properly recognizes its own parent...

Slide 76

Slide 76 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R_>> endobj … as you would expect ☺

Slide 77

Slide 77 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R _ >> endobj Our page requires resources.

Slide 78

Slide 78 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources_ >> endobj Let’s add them...

Slide 79

Slide 79 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << _ >> >> endobj ...as a dictionary:

Slide 80

Slide 80 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font_ >> >> endobj In this case, fonts...

Slide 81

Slide 81 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << _ >> >> >> endobj ...as a dictionary.

Slide 82

Slide 82 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << _ >> >> >> endobj We define one font...

Slide 83

Slide 83 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1_ >> >> >> endobj ...by giving it a name...

Slide 84

Slide 84 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << _ >> >> >> >> endobj ...and setting its parameters:

Slide 85

Slide 85 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type_ >> >> >> >> endobj its type is ...

Slide 86

Slide 86 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font_ >> >> >> >> endobj … font ☺

Slide 87

Slide 87 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype_ >> >> >> >> endobj Its font type is...

Slide 88

Slide 88 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1_ >> >> >> >> endobj …(Adobe) Type1...

Slide 89

Slide 89 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont_>> >> >> >> endobj ...and its name is...

Slide 90

Slide 90 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial_>> >> >> >> endobj .../Arial.

Slide 91

Slide 91 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> _ >> endobj One thing is missing in our page...

Slide 92

Slide 92 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents_ >> endobj The actual page contents...

Slide 93

Slide 93 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R_ >> endobj … as a reference to another object.

Slide 94

Slide 94 text

xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj That’s all for our page object.

Slide 95

Slide 95 text

Recap: object 3 defines a /Page, its /Parent, /Resources (fonts) and its /Contents is in another object. (thank you Mario!)

Slide 96

Slide 96 text

Study time

Slide 97

Slide 97 text

Def: stream objects So far, everything is text. How do you store binary data (images,...) ?

Slide 98

Slide 98 text

1 0 obj … endobj Stream objects are objects. They start and they end like any other object: Ex: .

Slide 99

Slide 99 text

Stream objects contain a stream. between stream and endstream keywords 1 0 obj stream endstream endobj

Slide 100

Slide 100 text

Streams can contain anything Yes, really! Even binary, other file formats... (except the endstream keyword)

Slide 101

Slide 101 text

Stream parameters are stored before the stream. a dictionary after obj, before stream required: stream length optional: compression algorithm, etc…

Slide 102

Slide 102 text

1 0 obj << /Length 10 >> stream 0123456789 endstream endobj Example

Slide 103

Slide 103 text

_ %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 104

Slide 104 text

4 0 obj _ endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj We create a /Content object... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 105

Slide 105 text

4 0 obj stream _ endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj ...that is a stream object... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 106

Slide 106 text

Study time

Slide 107

Slide 107 text

Page contents syntax parameters sequence then operator ex: param1 param2 operator

Slide 108

Slide 108 text

4 0 obj stream _ endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj Text objects are delimited by BT and ET... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 109

Slide 109 text

4 0 obj stream BT _ ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj ...(BeginText & EndText). xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 110

Slide 110 text

4 0 obj stream BT Tf_ ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj We need to set a font, with Tf. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 111

Slide 111 text

4 0 obj stream BT _ Tf ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj It takes 2 parameters: a font name... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 112

Slide 112 text

4 0 obj stream BT /F1_ Tf ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj ...(from the page’s resources)... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 113

Slide 113 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100_Tf ET endstream endobj ...and a font size. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 114

Slide 114 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf _ ET endstream endobj We move the cursor... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 115

Slide 115 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf Td_ ET endstream endobj ...with the Td operator... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 116

Slide 116 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf _ Td ET endstream endobj ...that takes 2 parameters... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 117

Slide 117 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj stream BT /F1 100 Tf 10 400_Td ET endstream endobj ...x and y coordinates. (default page size: 612x792) xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 118

Slide 118 text

Study time

Slide 119

Slide 119 text

Def: literal strings enclosed in parentheses Ex: (Hi Mum)

Slide 120

Slide 120 text

Can contain parentheses (Hello() World((()

Slide 121

Slide 121 text

Can contain white space ( Hello World ! )

Slide 122

Slide 122 text

Standard escaping is supported (Hello \ World \r\n)

Slide 123

Slide 123 text

Escaping is in octal (Hell\157 World)

Slide 124

Slide 124 text

4 0 obj stream BT /F1 100 Tf 10 400 Td _ ET endstream endobj Showing a text string... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 125

Slide 125 text

4 0 obj stream BT /F1 100 Tf 10 400 Td Tj_ ET endstream endobj ...is done with the Tj operator... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 126

Slide 126 text

4 0 obj stream BT /F1 100 Tf 10 400 Td _ Tj ET endstream endobj ...that takes a single parameter... xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 127

Slide 127 text

4 0 obj stream BT /F1 100 Tf 10 400 Td (_) Tj ET endstream endobj ...a literal string. xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 128

Slide 128 text

4 0 obj stream BT /F1 100 Tf 10 400 Td (Hello World_) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 129

Slide 129 text

4 0 obj stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Our contents stream is complete... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 130

Slide 130 text

4 0 obj stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 131

Slide 131 text

4 0 obj _ stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF One last thing... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 132

Slide 132 text

4 0 obj << _ >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...we need to set its parameters... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 133

Slide 133 text

4 0 obj << /Length_ >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF … the stream length... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 134

Slide 134 text

4 0 obj << /Length 44_>> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …including white space (new lines characters…). %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 135

Slide 135 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Our stream parameters are finished... %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 136

Slide 136 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...so our page contents object is finished. %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj

Slide 137

Slide 137 text

Recap: obj 4 is a stream object with a set length, defining the page’s contents: declare text, set a font and size, move cursor, display text.

Slide 138

Slide 138 text

The whole document is defined. We need to polish the structure.

Slide 139

Slide 139 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Our PDF defines 4 objects, starting at index 1...

Slide 140

Slide 140 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...but PDFs always have an object 0, that is null...

Slide 141

Slide 141 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref %xref table here trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...so 5 objects, starting at 0.

Slide 142

Slide 142 text

Warning: offsets & EOLs We have to define offsets, which are affected by the EOL conventions: 1 char under Linux/Mac, 2 under Windows. (I use 1 char newlines character here)

Slide 143

Slide 143 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Let’s edit the XREF table!

Slide 144

Slide 144 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The next line defines the starting index...

Slide 145

Slide 145 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...and the number of objects.

Slide 146

Slide 146 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Then, one line per object...

Slide 147

Slide 147 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...following the xxxxxxxxxx yyyyy a format (10 digits, 5 digits, 1 letter).

Slide 148

Slide 148 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The first parameter is the offset (in decimal) of the object...

Slide 149

Slide 149 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...(for the null object, it’s 0).

Slide 150

Slide 150 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Then, the generation number (that is almost always 0)...

Slide 151

Slide 151 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...but for object 0, it’s 65535.

Slide 152

Slide 152 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Then, a letter, to tell if this entry is free (f) or in use (n).

Slide 153

Slide 153 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Lastly, each line should take 20 bytes, including EOL...

Slide 154

Slide 154 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...so add a trailing space.

Slide 155

Slide 155 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Next line (the first real object)...

Slide 156

Slide 156 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …object offset, in decimal...

Slide 157

Slide 157 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …generation number...

Slide 158

Slide 158 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n_ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …and declare the object index in use (n)...

Slide 159

Slide 159 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …and the trailing space

Slide 160

Slide 160 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF Do the same with the other objects...

Slide 161

Slide 161 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 00000 n 00000 n 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF …knowing that all lines will end with “ 00000 n ”,...

Slide 162

Slide 162 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n _ trailer << /Root 1 0 R >> startxref %xref pointer %%EOF ...set all offsets.

Slide 163

Slide 163 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref %xref pointer %%EOF The cross-reference table is finished.

Slide 164

Slide 164 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 165

Slide 165 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref %xref pointer %%EOF

Slide 166

Slide 166 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref _ %%EOF We set the startxref pointer...

Slide 167

Slide 167 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref 364_ %%EOF ...as xref’s offset, in decimal (no prepending 0s).

Slide 168

Slide 168 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R >> startxref 364 %%EOF

Slide 169

Slide 169 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R _ >> startxref 364 %%EOF We also need to update the trailer dictionary...

Slide 170

Slide 170 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size_ >> startxref 364 %%EOF ...with the number of objects...

Slide 171

Slide 171 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size 5_>> startxref 364 %%EOF … in the PDF (including object 0).

Slide 172

Slide 172 text

4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj %PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size 5 >> startxref 364 %%EOF Our PDF is now complete.

Slide 173

Slide 173 text

%PDF-1.3 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [ 3 0 R ] /Count 1 >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Arial >> >> >> /Contents 4 0 R >> endobj 4 0 obj << /Length 44 >> stream BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET endstream endobj xref 0 5 0000000000 65535 f 0000000010 00000 n 0000000060 00000 n 0000000120 00000 n 0000000269 00000 n trailer << /Root 1 0 R /Size 5 >> startxref 364 %%EOF

Slide 174

Slide 174 text

Disclaimer: this is a minimal PDF. Most PDF documents are much bigger, and contain many more elements. Our PDF: 528 bytes 4 objects text only A standard generated “Hello World”: 15 kiloBytes 20 objects text and binary (embedded fonts…)

Slide 175

Slide 175 text

No need to type them yourself! Hint: use “mutool clean” to fix offsets and lengths. http://www.mupdf.com/

Slide 176

Slide 176 text

⇒ mutool version Slightly different content, but same rendering. %PDF-1.3 %%μῦ 1 0 obj <> endobj 2 0 obj <> endobj 3 0 obj <> endobj 4 0 obj <> stream q BT /F1 100 Tf 10 400 Td (Hello World!) Tj ET Q endstream endobj 5 0 obj <>>>>> endobj xref 0 6 0000000000 65536 f 0000000018 00000 n 0000000064 00000 n 0000000116 00000 n 0000000191 00000 n 0000000288 00000 n trailer <> startxref 364 %%EOF

Slide 177

Slide 177 text

Hint: you can directly extract the PDF sources. use “pdftotext --layout” on the slide deck http://www.foolabs.com/xpdf/home.html

Slide 178

Slide 178 text

One more thing... This one is important for self study.

Slide 179

Slide 179 text

Def: stream filters streams can be encoded and/or compressed algorithms can be cascaded ex: compression, then ASCII encoding

Slide 180

Slide 180 text

New stream parameter: /Filter ex: encode the stream in ASCII 1 0 obj << /Length 12 >> stream Hello World! endstream endobj 1 0 obj << /Length 24 /Filter /ASCIIHexDecode>> stream 48656C6C6F20576F726C6421 endstream endobj ⇔

Slide 181

Slide 181 text

Ex: compression (deflate = ZIP compression) 1 0 obj << /Length 12 >> stream Hello World! endstream endobj 1 0 obj << /Length 20 /Filter /FlateDecode>> stream x£¾H═╔╔¤/╩IQ♦ ∟I♦> endstream endobj ⇔

Slide 182

Slide 182 text

Filters can be cascaded. Ex: compressed, then encoded in ASCII 1 0 obj << /Length 12 >> stream Hello World! endstream endobj 1 0 obj << /Length 40 /Filter [/ASCIIHexDecode /FlateDecode] >> stream 789CF348CDC9C95708CF2FCA495104001C49043E endstream endobj ⇔

Slide 183

Slide 183 text

Hint: “mutool clean -d” to remove any stream filter. (if you want to explore PDFs by yourself) http://www.mupdf.com/

Slide 184

Slide 184 text

Want more? pdf101.corkami.com

Slide 185

Slide 185 text

Questions? (you can download this poster at http://pics.corkami.com)

Slide 186

Slide 186 text

ACK @Doegox @ChrisJohnRiley @PDFKungFoo

Slide 187

Slide 187 text

To be continued...? https://leanpub.com/binaryisbeautiful

Slide 188

Slide 188 text

Let’s write a PDF file corkami.com @angealbertini Hail to the king, baby! r2