Slide 1

Slide 1 text

Building an online PDF editor from scratch PyWaw #20, 21.01.2013 Zbigniew Siciarz @zsiciarz http://siciarz.net

Slide 2

Slide 2 text

Why?

Slide 3

Slide 3 text

Disclaimer • still not a full-blown editor • proof of concept • simple way to add rich media content to digital magazines

Slide 4

Slide 4 text

Current status

Slide 5

Slide 5 text

Links

Slide 6

Slide 6 text

Multimedia

Slide 7

Slide 7 text

Go to page

Slide 8

Slide 8 text

Everything is a link • website URLs (d’oh!) • multimedia content (audio/video/galleries) • internal links („go to page”) • custom HTML5 widgets

Slide 9

Slide 9 text

Workflow 1. upload a PDF file 2. preprocessing on the server 3. add widgets, links etc. in web editor 4. save and create package 5. publish to mobile devices 6. download package and display content Publisher

Slide 10

Slide 10 text

Preprocessing • run asynchronously as a queued task • extract metadata from uploaded file • create page thumbnails (with ImageMagick) • find any existing links • mark as unpublished

Slide 11

Slide 11 text

Keep existing links! • extract links with PyPDF2 • store in database as PdfLink objects • display in web editor

Slide 12

Slide 12 text

Dimensions and boxes • cartesian coordinate system • box is a list of 4 floats: [x1, y1, x2, y2] • PDF units = 1/72”= pt x y (0, 0) (x1, y1) (x2, y2)

Slide 13

Slide 13 text

Dimensions and boxes • artBox

Slide 14

Slide 14 text

Dimensions and boxes • artBox • bleedBox

Slide 15

Slide 15 text

Dimensions and boxes • artBox • bleedBox • cropBox

Slide 16

Slide 16 text

Dimensions and boxes • artBox • bleedBox • cropBox • mediaBox

Slide 17

Slide 17 text

Dimensions and boxes • artBox • bleedBox • cropBox • mediaBox • trimBox

Slide 18

Slide 18 text

Dimensions and boxes • artBox • bleedBox • cropBox • mediaBox • trimBox

Slide 19

Slide 19 text

PDF Encryption

Slide 20

Slide 20 text

Links • PDF annotations are messy • 4 (or more?) different representations • indirect objects all the way down • reversed coordinates • peculiar edge cases still not covered

Slide 21

Slide 21 text

Watermarking • create blank PDF (watch out for page dimensions!) • draw links with ReportLab • cross your fingers • merge with original file

Slide 22

Slide 22 text

Watermarking

Slide 23

Slide 23 text

Merging + =

Slide 24

Slide 24 text

Merging • PyPDF2 can’t properly merge PDFs with links :( • ReportLab can’t extract links from PDFs* • several hours wasted on hacking PyPDF2 • pdftk…? • pdftk! *Open Source version

Slide 25

Slide 25 text

Merging • apply watermark page by page to original PDF • does not work :( • works!

Slide 26

Slide 26 text

Final package • encrypted PDF + media assets • digitally signed archive • publication = push notification to devices • mobile application downloads the package and displays content

Slide 27

Slide 27 text

Conclusion • sadly, 3 different toolkits are necessary to get the job done PyPDF2 ReportLab pdftk Extract links Yes No* No Draw links No Yes No Merge and preserve links No No Yes *Open Source version

Slide 28

Slide 28 text

ReportLab PLUS? • „ Reuse your existing pdfs in new and exciting ways” • might just work • pricey :(

Slide 29

Slide 29 text

Appendix

Slide 30

Slide 30 text

Appendix

Slide 31

Slide 31 text

Credits • Businessperson designed by Devochkina Oxana from The Noun Project • Servers designed by Daniel Campos from The Noun Project • Maru - http://sisinmaru.blog17.fc2.com/

Slide 32

Slide 32 text

Questions?

Slide 33

Slide 33 text

Thank you!