$30 off During Our Annual Pro Sale. View Details »

Building an online PDF editor from scratch

Building an online PDF editor from scratch

My talk at PyWaw #20.
http://www.pywaw.org/21-01-2013

Zbigniew Siciarz

January 21, 2013
Tweet

More Decks by Zbigniew Siciarz

Other Decks in Programming

Transcript

  1. Building an online
    PDF editor from scratch
    PyWaw #20, 21.01.2013
    Zbigniew Siciarz @zsiciarz http://siciarz.net

    View Slide

  2. Why?

    View Slide

  3. Disclaimer
    • still not a full-blown editor
    • proof of concept
    • simple way to add rich media content to
    digital magazines

    View Slide

  4. Current status

    View Slide

  5. Links

    View Slide

  6. Multimedia

    View Slide

  7. Go to page

    View Slide

  8. Everything is a link
    • website URLs (d’oh!)
    • multimedia content (audio/video/galleries)
    • internal links („go to page”)
    • custom HTML5 widgets

    View Slide

  9. Workflow
    1. upload a PDF file
    2. preprocessing on the server
    3. add widgets, links etc.
    in web editor
    4. save and create package
    5. publish to mobile devices
    6. download package
    and display content
    Publisher

    View Slide

  10. Preprocessing
    • run asynchronously as a queued task
    • extract metadata from uploaded file
    • create page thumbnails (with ImageMagick)
    • find any existing links
    • mark as unpublished

    View Slide

  11. Keep existing links!
    • extract links with PyPDF2
    • store in database as PdfLink objects
    • display in web editor

    View Slide

  12. Dimensions and boxes
    • cartesian coordinate
    system
    • box is a list of 4 floats:
    [x1, y1, x2, y2]
    • PDF units = 1/72”= pt
    x
    y
    (0, 0)
    (x1, y1)
    (x2, y2)

    View Slide

  13. Dimensions and boxes
    • artBox

    View Slide

  14. Dimensions and boxes
    • artBox
    • bleedBox

    View Slide

  15. Dimensions and boxes
    • artBox
    • bleedBox
    • cropBox

    View Slide

  16. Dimensions and boxes
    • artBox
    • bleedBox
    • cropBox
    • mediaBox

    View Slide

  17. Dimensions and boxes
    • artBox
    • bleedBox
    • cropBox
    • mediaBox
    • trimBox

    View Slide

  18. Dimensions and boxes
    • artBox
    • bleedBox
    • cropBox
    • mediaBox
    • trimBox

    View Slide

  19. PDF Encryption

    View Slide

  20. Links
    • PDF annotations are messy
    • 4 (or more?) different
    representations
    • indirect objects
    all the way down
    • reversed coordinates
    • peculiar edge cases
    still not covered

    View Slide

  21. Watermarking
    • create blank PDF (watch
    out for page dimensions!)
    • draw links with ReportLab
    • cross your fingers
    • merge with original file

    View Slide

  22. Watermarking

    View Slide

  23. Merging
    + =

    View Slide

  24. Merging
    • PyPDF2 can’t properly merge PDFs with
    links :(
    • ReportLab can’t extract links from PDFs*
    • several hours wasted on hacking PyPDF2
    • pdftk…?
    • pdftk!
    *Open Source version

    View Slide

  25. Merging
    • apply watermark page by page to original
    PDF
    • does not work :(
    • works!

    View Slide

  26. Final package
    • encrypted PDF + media assets
    • digitally signed archive
    • publication = push notification to devices
    • mobile application downloads the package
    and displays content

    View Slide

  27. Conclusion
    • sadly, 3 different toolkits are necessary to
    get the job done
    PyPDF2 ReportLab pdftk
    Extract links Yes No* No
    Draw links No Yes No
    Merge and
    preserve links
    No No Yes
    *Open Source version

    View Slide

  28. ReportLab PLUS?
    • „ Reuse your existing pdfs in new and
    exciting ways”
    • might just work
    • pricey :(

    View Slide

  29. Appendix

    View Slide

  30. Appendix

    View Slide

  31. Credits
    • Businessperson designed by Devochkina
    Oxana from The Noun Project
    • Servers designed by Daniel Campos from
    The Noun Project
    • Maru - http://sisinmaru.blog17.fc2.com/

    View Slide

  32. Questions?

    View Slide

  33. Thank you!

    View Slide