Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building an online PDF editor from scratch

Building an online PDF editor from scratch

My talk at PyWaw #20.
http://www.pywaw.org/21-01-2013

Zbigniew Siciarz

January 21, 2013
Tweet

More Decks by Zbigniew Siciarz

Other Decks in Programming

Transcript

  1. Building an online PDF editor from scratch PyWaw #20, 21.01.2013

    Zbigniew Siciarz @zsiciarz http://siciarz.net
  2. Why?

  3. Disclaimer • still not a full-blown editor • proof of

    concept • simple way to add rich media content to digital magazines
  4. Current status

  5. Links

  6. Multimedia

  7. Go to page

  8. Everything is a link • website URLs (d’oh!) • multimedia

    content (audio/video/galleries) • internal links („go to page”) • custom HTML5 widgets
  9. Workflow 1. upload a PDF file 2. preprocessing on the

    server 3. add widgets, links etc. in web editor 4. save and create package 5. publish to mobile devices 6. download package and display content Publisher
  10. Preprocessing • run asynchronously as a queued task • extract

    metadata from uploaded file • create page thumbnails (with ImageMagick) • find any existing links • mark as unpublished
  11. Keep existing links! • extract links with PyPDF2 • store

    in database as PdfLink objects • display in web editor
  12. Dimensions and boxes • cartesian coordinate system • box is

    a list of 4 floats: [x1, y1, x2, y2] • PDF units = 1/72”= pt x y (0, 0) (x1, y1) (x2, y2)
  13. Dimensions and boxes • artBox

  14. Dimensions and boxes • artBox • bleedBox

  15. Dimensions and boxes • artBox • bleedBox • cropBox

  16. Dimensions and boxes • artBox • bleedBox • cropBox •

    mediaBox
  17. Dimensions and boxes • artBox • bleedBox • cropBox •

    mediaBox • trimBox
  18. Dimensions and boxes • artBox • bleedBox • cropBox •

    mediaBox • trimBox
  19. PDF Encryption

  20. Links • PDF annotations are messy • 4 (or more?)

    different representations • indirect objects all the way down • reversed coordinates • peculiar edge cases still not covered
  21. Watermarking • create blank PDF (watch out for page dimensions!)

    • draw links with ReportLab • cross your fingers • merge with original file
  22. Watermarking

  23. Merging + =

  24. Merging • PyPDF2 can’t properly merge PDFs with links :(

    • ReportLab can’t extract links from PDFs* • several hours wasted on hacking PyPDF2 • pdftk…? • pdftk! *Open Source version
  25. Merging • apply watermark page by page to original PDF

    • does not work :( • works!
  26. Final package • encrypted PDF + media assets • digitally

    signed archive • publication = push notification to devices • mobile application downloads the package and displays content
  27. Conclusion • sadly, 3 different toolkits are necessary to get

    the job done PyPDF2 ReportLab pdftk Extract links Yes No* No Draw links No Yes No Merge and preserve links No No Yes *Open Source version
  28. ReportLab PLUS? • „ Reuse your existing pdfs in new

    and exciting ways” • might just work • pricey :(
  29. Appendix

  30. Appendix

  31. Credits • Businessperson designed by Devochkina Oxana from The Noun

    Project • Servers designed by Daniel Campos from The Noun Project • Maru - http://sisinmaru.blog17.fc2.com/
  32. Questions?

  33. Thank you!