Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Carl Cerecke: Publishing Theology with Python

Carl Cerecke: Publishing Theology with Python

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Carl Cerecke:
Publishing Theology with Python
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
@ Kiwi PyCon 2013 - Sunday, 08 Sep 2013 - Track 2
http://nz.pycon.org/

**Audience level**

Intermediate

**Description**

How to use python tools to represent and publish a large corpus of theological literature.

**Abstract**

Python tools are used for publishing a large corpus of theological literature. Python is used to help convert the existing literature from its original, poorly styled, MS Word format to docutils source format. Solutions are presented to various domain-specific problems, including making the documents bible-version independent. The paver tool is used to manage the conversion process from docutils to the various output formats. Some rejected alternative publishing technologies are covered, as well as a future road map.

**YouTube**

http://www.youtube.com/watch?v=I0vOG5WTicU

6b880a0b67fac54c42c77fe70d97334d?s=128

New Zealand Python User Group

September 08, 2013
Tweet

Transcript

  1. TheoreST: Publishing Theology with Python Kiwi PyCon 2013 Carl Cerecke

    Evangelical Bible College of Western Australia September 8, 2013
  2. Overview of the talk Publishing problem (and ideal solution) A

    tank is not the solution Proprietary format → reStructuredText Domain-specific problems (bible verses) reStructuredText → web/pdf/epub (sphinx) Automating with paver and cog Todo. . .
  3. The problem 300 books in MS Word format (written over

    20+ years) Systematic Theology Pastoral Theology Commentaries Miscellaneous topics Two primary authors. Non-technical Domain-specific behaviour (verse references) Distribute as widely and conveniently as possible “Freely you have received, freely give”
  4. The (ideal) solution One authoritative source, multiple outputs Share common

    document fragments Bible version independent Bible references hyperlinked: John 3:16 Build individual books (pdf/epub), or whole collection (html) Simple method for authors to make changes
  5. DITA: The Wrong Way DITA: Darwin Information Typing Architecture http://dita.xml.org/

    ‘Enterprise-level’ technical documentation XML Java Complex Deal-breaker: Too static. . . Dynamically determine bible-reference URLs during document creation? Too hard.
  6. None
  7. reStructuredText and sphinx: A Better Way Markup language http://docutils.sourceforge.net/rst.html Part

    of docutils text processing system Relatively easy to extend Multiple output formats: html, pdf (via L A TEX), epub Used by sphinx http://sphinx-doc.org/ Used to generate http://docs.python.org Extensions for hierarchichal documents Python Not XML!
  8. reStructuredText markup language (PEP 287)

  9. None
  10. TheoreST technology overview https://github.com/cdjc/theoreST python 3.3 sphinx paver (build tool)

    cog (code generation tool) L A TEX and rubber (generating pdf) Linux Mint 15 virtualbox VM Wing IDE Professional (thanks nz pycon!) Google drive/docs
  11. MS Word problems No styles! What you see is all

    you’ve got . . . 3. PAUL’S TRIALS (a) Persecuted in Damascus; escapes in basket (Acts 9:20-25) (b) Driven out of Jerusalem, sent to Tarsus (Acts 9:28-30) (c) Stoned at Lystra and thought to have died (Acts 14:19) (d) Whipped and imprisoned at Phillipi (Acts 16:16-24) . . . Solution: Save As... docbook XML format. Guess indent level
  12. Recognising bible references Want to replace: Lorem ipsum John 3:16

    dolor with: Lorem ipsum ‘John 3:16‘ dolor Other examples: multiple verses: John 3:16,20 verse range: John 3:16-20 whole chapter: John 3 Multi chapter: John 3:16, 4:4 Multi book, chapter, verse, range: John 3:16-18,20,4:1, 3 John 1:2, Luke 1:1
  13. Regular expressions High-level view of regular expressions: A mini-language for

    specifying a set of strings An efficient recogniser for that set of strings
  14. Quoting bible references names = [’Genesis’, ’Exodus’, ..., ’Revelation’] re_names

    = ’(’+’|’.join(names)+’)’ re_raw = ’(’+re_names+r’\s+\d+:(\s*(,|-|;|:|’+\ re_names+’|\d+))+’+’)’ re_bibref = re.compile(re_raw) def bibquote(text): return re_bibref.sub(r’‘\1‘’, text) What about lsaiah? or I Corinthians? Argh!
  15. None
  16. Parsing the verse roles Need to translate ‘John 3:16-18,20‘ into:

    <a href="http://bible.xyz/ESV/john3+3:16-18"> John 3:16-18</a>, <a href="http://bible.xyz/ESV/john3+3:20">20</a> Actually, internal document-tree nodes representing links 1. Tokenise 2. Parse 3. Output
  17. Tokenisation Use python’s own tokeniser > echo "John 3:16-18,20" |

    python3 -m tokenize 1,0-1,4: NAME ’John’ 1,5-1,6: NUMBER ’3’ 1,6-1,7: OP ’:’ 1,7-1,9: NUMBER ’16’ 1,9-1,10: OP ’-’ 1,10-1,12: NUMBER ’18’ 1,12-1,13: OP ’,’ 1,13-1,15: NUMBER ’20’ 1,15-1,16: NEWLINE ’\n’ 2,0-2,0: ENDMARKER ’’
  18. Parsing Plenty of parser-generators available http://wiki.python.org/moin/LanguageParsing But, too heavyweight Verse

    references don’t nest But do need lookahead: ’John’ ’3’ ’:’ ’16’ ’-’ ’18’ ’,’ ’20’ ... Solution: Use a state machine!
  19. Parsing Plenty of parser-generators available http://wiki.python.org/moin/LanguageParsing But, too heavyweight Verse

    references don’t nest But do need lookahead: ’John’ ’3’ ’:’ ’16’ ’-’ ’18’ ’,’ ’20’ ... Solution: Use a state machine! Problem: No goto :-(
  20. Parsing Plenty of parser-generators available http://wiki.python.org/moin/LanguageParsing But, too heavyweight Verse

    references don’t nest But do need lookahead: ’John’ ’3’ ’:’ ’16’ ’-’ ’18’ ’,’ ’20’ ... Solution: Use a state machine! Problem: No goto :-( Solution: My goto function decorator: http://code.activestate.com/recipes/ 576944-the-goto-decorator/
  21. Parsing Plenty of parser-generators available http://wiki.python.org/moin/LanguageParsing But, too heavyweight Verse

    references don’t nest But do need lookahead: ’John’ ’3’ ’:’ ’16’ ’-’ ’18’ ’,’ ’20’ ... Solution: Use a state machine! Problem: No goto :-( Solution: My goto function decorator: http://code.activestate.com/recipes/ 576944-the-goto-decorator/ Problem: Only python 2
  22. Avoiding the raptors

  23. Parsing state machine (no gotos) One method per state Each

    method-state returns next method-state def parse(self): ... state = self.p_book # start state while state: state = state() # ’do’ state. get next def p_book(self): # expect to parse book self.book = self.swallow(Book) # eat Book token self.text += self.book.value # collect link text return self.p_chapter # ’goto’ this state def p_chapter(self): # expecting a chapter ...
  24. Extending reStructuredText/sphinx Roles (inline markup— :gk:‘agape‘) Single function; returns doctree

    nodes verse role (parse verse text, insert references) greek role (nothing much, yet) Directives (block markup) Single class with run method. Returns doctree nodes biblepassage directive (inserts bible text into document) draftcomment directive Domain (collection of extensions) Roles Directives Config values
  25. Automating the build process Problem: minimise effort to build documentation

    What I want to build: All documents in integrated sphinx website Individual documents as pdf/epub/html-for-google-docs For each build I want to specify: bible version (ESV, KJV, NET, etc.) include draft comments possibly other options (e.g. a4/letter) Also, minimal duplication in sphinx conf.py Solution: paver
  26. Automating the build process with paver http://paver.github.io/paver/ Paver is a

    Python-based build/distribution/deployment scripting tool along the lines of Make. Command line I want: paver html (integrated website with all books) paver pdf john (pdf ouput of John, default bible version) paver epub john paver pdf john NET letter (use NET bible, letter paper) paver single john (html for upload to google-docs) paver pdf matthew mark luke john (multiple books) Sets up configuration, then calls sphinx-build
  27. Sharing a single sphinx config file index.rst is the root

    of a document conf.py is the sphinx config file Project file hierarchy: / root directory /index.rst Integrated website with all books /conf.py Symbolic link to src/conf.py /conf override.py Append to conf.py /src/ Common configuration and code /src/conf.py Common sphinx configuration file /group/book/index.rst For stand-alone documents /group/book/conf.py Symbolic link to /src/conf.py /group/book/conf override.py Append to conf.py
  28. Handling sphinx options? Sphinx configuration file is python code. Can

    override simple name=value options on command line: sphinx-build -Dbible version=ESV But complex overrides (i.e. python code) fail: -Dlatex elements[’papersize’]=’a4paper’ Some overrides are static per-book: Generated L A TEX filename = john.tex Others are dynamic— at generation time: papersize = "letter" Solution: code generation with cog
  29. Taming the sphinx config file with cog http://nedbatchelder.com/code/cog/ Cog: Use

    python code in a file to generate part of that file At the end of sphinx config file: #[[[cog #include(’conf_override.py’) #cog.out(’\n’) #if ’override_text’ in globals(): # cog.out(override_text) #]]] #[[[end]]] Cog runs python code between [[[cog and ]]] Output (via cog.out) appears between ]]] and [[[end]]] In paver stdlib: paver.doctools.cog(options)
  30. The draft-comment-edit cycle From one of the authors: I come

    from a pre computer background which means I can write material and send out emails plus do some limited scanning but nothing else. Current Solution: 1. Publish to html (paver) 2. Upload to google-drive, converting to google doc 3. Author edits google doc 4. I incorporate changes to reStructuredText source 5. If not finished, goto step 1
  31. Future work Continue converting documents to reST Robust versioning scheme

    Translation (with the aid of google-translate) Dynamic bible-version selection on website Greek/Hebrew/Aramaic hyperlinks Website hosting Formatting tweaks Fix bugs. Add more tests Use more github features