Expensive solutions are not necessary to automatically publish XML content to PDF or HTML-based formats. There are many ways to automate the process and a range of open source tools and scripting solutions can be used.
This presentation covers several approaches to automated XML publishing and provides simple examples for lightweight continuous integration from scheduled builds to watched folders and commit hooks to more complex hosted systems.
Presented November 18, 2013 in Munich at DITA Europe http://www.infomanagementcenter.com/DITAeurope/2013/.
Automating DITA Builds
Lightweight Continuous Integration for Documentation Projects
Information Architecture & Content Strategy
Roger W. Fienhold Sheen
Part I — Background & Concepts
2. Motivation & Beneﬁts
3. Continuous Integration Principles
5. Automation Approaches
Part II — Automation Examples
Q & A
Expensive solutions are not necessary to automatically publish XML content to
PDF or HTML-based formats. There are many ways to automate the process and
a range of open source tools and scripting solutions can be used.
This session covers several approaches to automated XML publishing and provides
simple examples for lightweight continuous integration from scheduled builds to
watched folders and commit hooks to more complex hosted systems.
Examples illustrate the alternatives using a sample DITA publication, version
control systems such as Subversion and Git, and solutions such as Jenkins.
Documentation teams can borrow software development techniques to regularly
publish even minor changes.
By building the entire publication with each change, authors can easily verify the
impact of their changes on the ﬁnal document and ﬁnd errors more quickly.
This approach reduces the need for repetitive manual tasks, allowing authors to
focus on content and improve document quality.
Roger is an independent Information Architect based in Potsdam, Germany.
He provides consulting services to start-ups and global corporations, advising
clients on content re-use strategies, XML-based publication processes and
authoring environments, version control solutions and organizing information with
a sound structure that helps users ﬁnd what they need.
Motivation & Beneﬁts
Why Automate? — Why Use XML if You Don’t?
Automated publishing is one of the strongest arguments for an XML-based
toolchain. Let’s face it: many ﬁnd the XML authoring user experience tedious, so
we better get something out of it – if you’re pressing buttons when you need a
PDF, there are easier ways…
Let computers do what they do best so humans have more time for their part
Latest drafts are available to all stakeholders for immediate veriﬁcation
Quicker time-to-market (even ﬁnal docs can be published instantly)
Why Use Continuous Integration?
Avoid embarrassment: Authors ﬁnd their own mistakes before others do
Synchronize documentation release cycle w/ software lifecycle
(build the documentation whenever the software is built)
Improve quality (can’t commit code that breaks build)
Continuous Integration Principles
“…daily builds are a minimum. A fully automated process that allows you to
build several times a day is both achievable and well worth the eﬀort.”
– Martin Fowler
Single repository with all documentation dependencies (check out & build)
Test before committing (rule out unintended side eﬀects)
Commit related changes (granular changes are easier to roll back)
Commit often (small chunks integrate better than monolithic pieces)
Automated build process (producing deliverables w/o manual intervention)
Visible results of latest build provide accountability (who’s to blame)
Permanent access to the latest deliverables enables ongoing testing
Automated deployment ensures customer access to the latest version
Prerequisites – What You’ll Need
Version-control system (Subversion, Git, Mercurial, etc.)
DITA Open Toolkit
Command-line scripting (UNIX shell scripts, Windows batch ﬁles)
Build ﬁle that deﬁnes all target output formats
DITA-OT packages include sample Ant build scripts in /samples/ant_sample/ .
Automation Approaches — Overview
1. Scheduled builds – build at certain times
2. Watched folders – build when something happens
3. Version control scripts/hooks – build on or before checkin
Git example: commit hooks
4. Dedicated Continuous Integration solutions – CI servers & services
Part II – Automation Examples
Automatically build output in regular intervals (daily/nightly/hourly, etc.) with a
system service or launch dæmon.
Easy to set up using on-board utilities available with your operating system
Minimum solution, “gateway drug” to more granular automation
Makes sense when changes are infrequent, but regular
Generated output may no longer reﬂect the actual state of source ﬁles
May need to wait for the next build in order to see results
Less useful when changes are sporadic, but occasionally high in volume
Linux cron , Mac OS X launchd , Windows Task Scheduler
Mac OS X launchd Sample — Hourly Builds
To run your build every hour, save a ﬁle like this in ~/Library/LaunchAgents :
See the launchd man page or the tutorial at launchd.info for an explanation of the
available options. If you prefer a more guided approach, utilities like Lingon or
LaunchControl provide a user interface and debugging tools for launch scripts.
Mac OS X launchd Sample — Daily Builds
For a daily build that runs at midnight, replace the StartInterval key and
following integer with a StartCalendarInterval and a dictionary of integers that
starts the job at 00:00:
Mac OS X launchd Sample — Build Hourly from 9:00 AM
Or combine these approaches to run your build at 9 AM and every hour thereafter:
Windows Task Scheduler
On Windows, use Accessories > System Tools > Task Scheduler to create a new
scheduled task with a trigger that begins the task on a schedule and an action that
starts a program (your build script):
Creating a scheduled task
Use a “sentinel” to monitor your source and generate output when ﬁles change.
Popular among web developers who maintain code in one format and deliver in
CodeKit or Marked make this easy. Many LaTeX tools provide this for PDFs.
More ﬂexible than scheduled builds
Output reﬂects current source ﬁles
Good when little things change often in odd intervals
HTML builds quickly, always up-to-date & available for veriﬁcation
Opportunity for DITA tool vendors — bundle a folder-watcher for live preview.
More diﬃcult to set up, may require third-party tools
Can be performance intensive if many things change at once
(quiet periods and throttling options are essential)
OS X Folder Actions (works if ﬁles added/removed, not if existing change)
Hazel — All-purpose automation utility & folder watcher for Mac
Windows alternatives like Belvedere, etc.
Our XML-based friend launchd
Linux: incrond – inotify cron (incron) daemon
Dedicated folder watcher utilities such as entr, Guard, Watchr
Easy to set up basic rules (Date Last Modiﬁed is after Date Last Matched)
May match often with basic rules (requires adjustment to throttle)
(Touching 10 ﬁles in a folder builds 10x — Git branch switching is deadly)
Sample Hazel rule
Mac OS X launchd Sample
A launch dæmon can also be used to watch folders for changes and build output.
Set a ThrottleInterval to limit build frequency if necessary:
Version Control Hooks
Generate output each time you interact with the version control system, either
before you commit or after each checkin.
The terms and options diﬀer between systems & client software, but most version
control systems oﬀer a mechanism to “hook” into a stage of the workﬂow and
perform a pre-deﬁned action like running a build script.
Hooks are enabled by modifying sample hook templates provided with the system.
Replace the template content with a command that runs your DITA build.
Pre-commit hooks serve to verify the input and prevent broken code from
being permanently recorded – and preserving your shame for posterity!
Build output before committing and reject the changes if build fails.
Post-commit actions can be used to generate output for every valid change and
typically do not modify the contents of the repository.
Can be tricky to set up, policies may require system administator assistance
Commits are slower, as version control system waits for build to ﬁnish
Strict regime can prevent checkins if something’s wrong
Some clients provide a user interface for client-side commit actions
The hooks subdirectory of a Subversion repository contains templates such
as pre-‐commit.tmpl and post-‐commit.tmpl .
Remove .tmpl to enable.
On UNIX-based systems, sample Git hooks are typically found in
Modify a copy of pre-‐commit.sample and save the result to your local repository
as .git/hooks/pre-‐commit .
Sample pre-‐commit Hook
# Get the absolute path of the `.git/hooks` directory
export GIT_HOOKS=`cd "\`dirname "\\\`readlink "$0" || echo $0\\\`"\`" && pwd`
# Set the absolute path of the build directory
# Set the absolute path of the DITA home directory
# Set the absolute path of the DITA home directory again (weird, but necessary)
# Execute the build script in the shell that is provided by the DITA start script
echo "$BUILD_FILES/build_html.sh" | "$DITA_HOME/startcmd.sh"
# Display OS X system notification via
echo 'HTML build succeeded. Committing…' | /usr/local/bin/terminal-‐notifier -‐sound default
# Exit with status of last command
Cornerstone Commit Actions
The Cornerstone Subversion client for Mac includes user interface options for
Subversion commit hooks, allowing users to associate their own scripts with
commit actions, independent of the repository conﬁguration:
Cornerstone Commit Actions
Bypassing Commit Hooks
Some clients and systems allow you to circumvent commit hooks if necessary:
$ git commit -‐-‐no-‐verify
Atlassian’s free SourceTree client for Git & Mercurial provides an option to bypass
hooks on the commit sheet:
Bypassing hooks with SourceTree
Dedicated CI Solutions
If your developers use continuous integration tools to run automated tests and
build software binaries whenever they update their code, you may be able to use the
same solution to build your DITA deliverables too.
Leverage existing corporate infrastructure & developer expertise
Do more than just build output:
provide access to drafts on intranet for internal review & signoﬀ
integrate documentation into the ﬁnal software installers
publish to company website for immediate public access
Builds/tests typically trigger on push, so bad commits are public
Dedicated server not necessarily required, but often advisable
May require support from developers and/or IT staﬀ
CruiseControl – the original solution from ThoughtWorks (now open source)
Jenkins (formerly known as Hudson) – cross-platform open source CI server
Travis – hosted CI service used to build and test projects hosted at GitHub
(including the DITA Open Toolkit)
Sample Jenkins Job View
Sample Jenkins job
Sample Jenkins Script
if [ "$1" == "ci" ]; then
echo "Continuous integration check for branch $2"
set +e # if one command fails then exit
# Run documentation build script, which sets up the DITA-‐OT environment & runs Ant tasks to build output
# Clean & re-‐create target output directory before moving the new generated output
rm -‐r /var/www/$BRANCH/manual
# Copy all `output` subfolders to Web server "as-‐is"
cp -‐r output/* /var/www/$BRANCH/manual
elif [ "$1" == "nightly" ]; then
echo "Nightly check for branch $2"
#make || RET=1
echo "Unknown parameter: $1" && exit 1
Even systems that are not intended for automated publishing (or which require the
purchase of an expensive add-on or server component) can often be automated via
GUI scripting solutions.
Sikuli – open-source project from the User Interface Design Group at MIT
Fake – web automation tool that “allows you to run (and re-run) fake
interactions with the web” — useful for web-based CMS & DITA editors, etc.
References & Resources
1. launchd – http://launchd.info
2. Lingon (launchd GUI) – http://www.peterborgapps.com/lingon
3. LaunchControl (debugger) – http://www.soma-zone.com/LaunchControl
4. Hazel (folder watcher for Mac) – http://www.noodlesoft.com/hazel.php
5. CruiseControl (original CI server) – http://cruisecontrol.sourceforge.net
6. Jenkins (most? popular CI server) – http://jenkins-ci.org
7. Travis (hosted CI for GitHub projects) – http://travis-ci.com
8. Sikuli (cross-platform GUI scripting) – http://www.sikuli.org
9. Fake (Web automation for Mac) – http://fakeapp.com
For updates, comments and the latest code samples, visit
E-mail [email protected] or connect on GitHub, LinkedIn or Twitter