Building an online PDF editor from scratch

Building an online PDF editor from scratch PyWaw #20, 21.01.2013
Zbigniew Siciarz @zsiciarz http://siciarz.net

Disclaimer • still not a full-blown editor • proof of
concept • simple way to add rich media content to digital magazines

Current status

Multimedia

Go to page

Everything is a link • website URLs (d’oh!) • multimedia
content (audio/video/galleries) • internal links („go to page”) • custom HTML5 widgets

Workflow 1. upload a PDF file 2. preprocessing on the
server 3. add widgets, links etc. in web editor 4. save and create package 5. publish to mobile devices 6. download package and display content Publisher

Preprocessing • run asynchronously as a queued task • extract
metadata from uploaded file • create page thumbnails (with ImageMagick) • find any existing links • mark as unpublished

Keep existing links! • extract links with PyPDF2 • store
in database as PdfLink objects • display in web editor

Dimensions and boxes • cartesian coordinate system • box is
a list of 4 floats: [x1, y1, x2, y2] • PDF units = 1/72”= pt x y (0, 0) (x1, y1) (x2, y2)

Dimensions and boxes • artBox

Dimensions and boxes • artBox • bleedBox

Dimensions and boxes • artBox • bleedBox • cropBox

Dimensions and boxes • artBox • bleedBox • cropBox •
mediaBox

Dimensions and boxes • artBox • bleedBox • cropBox •
mediaBox • trimBox

PDF Encryption

Links • PDF annotations are messy • 4 (or more?)
different representations • indirect objects all the way down • reversed coordinates • peculiar edge cases still not covered

Watermarking • create blank PDF (watch out for page dimensions!)
• draw links with ReportLab • cross your fingers • merge with original file

Watermarking

Merging + =

Merging • PyPDF2 can’t properly merge PDFs with links :(
• ReportLab can’t extract links from PDFs* • several hours wasted on hacking PyPDF2 • pdftk…? • pdftk! *Open Source version

Merging • apply watermark page by page to original PDF
• does not work :( • works!

Final package • encrypted PDF + media assets • digitally
signed archive • publication = push notification to devices • mobile application downloads the package and displays content

Conclusion • sadly, 3 different toolkits are necessary to get
the job done PyPDF2 ReportLab pdftk Extract links Yes No* No Draw links No Yes No Merge and preserve links No No Yes *Open Source version

ReportLab PLUS? • „ Reuse your existing pdfs in new
and exciting ways” • might just work • pricey :(

Appendix

Credits • Businessperson designed by Devochkina Oxana from The Noun
Project • Servers designed by Daniel Campos from The Noun Project • Maru - http://sisinmaru.blog17.fc2.com/

Questions?

Thank you!

Building an online PDF editor from scratch

Building an online PDF editor from scratch

Zbigniew Siciarz

More Decks by Zbigniew Siciarz

Other Decks in Programming

Featured

Transcript