Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an online PDF editor from scratch

Building an online PDF editor from scratch

My talk at PyWaw #20.

Zbigniew Siciarz

January 21, 2013

More Decks by Zbigniew Siciarz

Other Decks in Programming


  1. Building an online PDF editor from scratch PyWaw #20, 21.01.2013

    Zbigniew Siciarz @zsiciarz http://siciarz.net
  2. Disclaimer • still not a full-blown editor • proof of

    concept • simple way to add rich media content to digital magazines
  3. Everything is a link • website URLs (d’oh!) • multimedia

    content (audio/video/galleries) • internal links („go to page”) • custom HTML5 widgets
  4. Workflow 1. upload a PDF file 2. preprocessing on the

    server 3. add widgets, links etc. in web editor 4. save and create package 5. publish to mobile devices 6. download package and display content Publisher
  5. Preprocessing • run asynchronously as a queued task • extract

    metadata from uploaded file • create page thumbnails (with ImageMagick) • find any existing links • mark as unpublished
  6. Keep existing links! • extract links with PyPDF2 • store

    in database as PdfLink objects • display in web editor
  7. Dimensions and boxes • cartesian coordinate system • box is

    a list of 4 floats: [x1, y1, x2, y2] • PDF units = 1/72”= pt x y (0, 0) (x1, y1) (x2, y2)
  8. Links • PDF annotations are messy • 4 (or more?)

    different representations • indirect objects all the way down • reversed coordinates • peculiar edge cases still not covered
  9. Watermarking • create blank PDF (watch out for page dimensions!)

    • draw links with ReportLab • cross your fingers • merge with original file
  10. Merging • PyPDF2 can’t properly merge PDFs with links :(

    • ReportLab can’t extract links from PDFs* • several hours wasted on hacking PyPDF2 • pdftk…? • pdftk! *Open Source version
  11. Final package • encrypted PDF + media assets • digitally

    signed archive • publication = push notification to devices • mobile application downloads the package and displays content
  12. Conclusion • sadly, 3 different toolkits are necessary to get

    the job done PyPDF2 ReportLab pdftk Extract links Yes No* No Draw links No Yes No Merge and preserve links No No Yes *Open Source version
  13. ReportLab PLUS? • „ Reuse your existing pdfs in new

    and exciting ways” • might just work • pricey :(
  14. Credits • Businessperson designed by Devochkina Oxana from The Noun

    Project • Servers designed by Daniel Campos from The Noun Project • Maru - http://sisinmaru.blog17.fc2.com/