Expensive solutions are not necessary to automatically publish XML content to PDF or HTML-based formats. There are many ways to automate the process and a range of open source tools and scripting solutions can be used.
This presentation covers several approaches to automated XML publishing and provides simple examples for lightweight continuous integration from scheduled builds to watched folders and commit hooks to more complex hosted systems.
Presented April 28, 2014 in Seattle at DITA North America http://www.cm-strategies.com/2014/abstracts.htm#Sheen.
Automating DITA Builds
Lightweight Continuous Integration for Documentation Projects
Information Architecture & Content Strategy
Roger W. Fienhold Sheen
Part I — Background & Concepts
2. Motivation & Beneﬁts
3. Continuous Integration Principles
5. Automation Approaches
Part II — Automation Examples
Q & A
Motivation & Beneﬁts
Why Automate? — Why Use XML if You Don’t?
Automated publishing is a major advantage of XML-based toolchains.
If you’re pressing buttons when you need a PDF, there are easier ways…
Let computers do what they do best so humans have more time for their part
Latest drafts are available to all stakeholders for immediate veriﬁcation
Quicker time-to-market (even ﬁnal docs can be published instantly)
Why Use Continuous Integration?
Avoid embarrassment: Authors ﬁnd their own mistakes before others do
Synchronize documentation release cycle with the software lifecycle
(build the documentation whenever the software is built)
Improve quality (can’t commit code that breaks build)
Continuous Integration Principles
“…daily builds are a minimum. A fully automated process that allows you to
build several times a day is both achievable and well worth the eﬀort.”
– Martin Fowler
Single repository with all documentation dependencies (check out & build)
Automated build process (producing deliverables w/o manual intervention)
Test before committing (rule out unintended side eﬀects)
Commit often (small chunks integrate better than monolithic pieces)
Commit related changes (granular changes are easier to roll back)
Visible results of latest build provide accountability (who’s to blame)
Permanent access to the latest deliverables enables ongoing testing
Automated deployment ensures customer access to the latest version
Prerequisites – What You’ll Need
Version-control system (Subversion, Git, Mercurial, etc.)
DITA Open Toolkit
Command-line scripting (UNIX shell scripts, Windows batch ﬁles)
Build ﬁle that deﬁnes parameters for all target output formats
DITA-OT packages include sample Ant build scripts in /samples/ant_sample/ .
Automation Approaches — Overview
1. Scheduled Builds
(build at certain times)
2. Watched Folders
(build when something happens)
3. Version Control Scripts: “hooks”
(build on or before checkin)
4. Dedicated Continuous Integration Solutions
(CI servers & services)
Part II – Automation Examples
Automatically build output in regular intervals (daily/nightly/hourly, etc.)
with a system service or launch dæmon.
Easy to set up using on-board utilities available with your operating system
Minimum solution, “gateway drug” to more granular automation
Makes sense when changes are infrequent, but regular
Less useful when changes are sporadic, but occasionally high in volume
Generated output may no longer reﬂect the actual state of source ﬁles
May need to wait for the next build in order to see results
Linux cron , Mac OS X launchd , Windows Task Scheduler
Mac OS X launchd Sample — Hourly Builds
To run your build every hour, save a ﬁle like this in ~/Library/LaunchAgents :
Adjust the path to your build script, use the StartInterval key
and set the following integer value to 3600 (seconds).
→ Get the gist at https://gist.github.com/infotexture/8506117.
Mac OS X launchd Sample — Daily Builds
For a daily build that runs at midnight, replace the StartInterval key and following
integer with a StartCalendarInterval and set the time to start the job:
→ Get the gist at https://gist.github.com/infotexture/8506547 for a complete example.
You can also combine these approaches to run your build once at a certain time
(such as 9 AM) and in regular (hourly) intervals thereafter.
→ See the example at https://gist.github.com/infotexture/8506763.
For an explanation of the available options, see the launchd man page or the
tutorial at launchd.info. If you prefer a more guided approach, utilities like Lingon
or LaunchControl provide a user interface and debugging tools for launch scripts.
Windows Task Scheduler
On Windows, use Accessories > System Tools > Task Scheduler to create a new
scheduled task with a trigger that begins the task on a schedule and an action that
starts a program (your build script):
Creating a scheduled task
Use a “sentinel” to monitor your source and generate output when ﬁles change.
Popular among web developers who maintain code in one format and deliver in
CodeKit or Marked make this easy. Many LaTeX tools provide this for PDFs.
More ﬂexible than scheduled builds
Output reﬂects current source ﬁles
Good when little things change often in odd intervals
HTML builds quickly, always up-to-date & available for veriﬁcation
Opportunity for DITA tool vendors — bundle a folder-watcher for live preview.
More diﬃcult to set up, may require third-party tools
Can be performance intensive if many things change at once
(quiet periods and throttling options are essential)
OS X Folder Actions (works if ﬁles added/removed, not if existing change)
Hazel — All-purpose automation utility & folder watcher for Mac
Windows alternatives like Belvedere, etc.
Our XML-based friend launchd
Linux: incrond — inotify cron (incron) dæmon
Dedicated folder watcher utilities such as entr, Guard, Watchr
Easy to set up basic rules (Date Last Modiﬁed is after Date Last Matched)
May match often with basic rules (requires adjustment to throttle)
(Touching 10 ﬁles in a folder builds 10x — Git branch switching is deadly)
Sample Hazel rule
Mac OS X launchd Sample
A launch dæmon can also be used to watch folders for changes and build output.
The syntax is similar to the examples shown for scheduled builds.
To watch a folder for changes, specify the path to a location in the ﬁle system in
the array under WatchPaths and set a ThrottleInterval to limit build frequency if
→ Get the gist at https://gist.github.com/infotexture/8635029 for a complete example.
Version Control Hooks
Most version control systems oﬀer a mechanism to “hook” into a stage of the
workﬂow and perform a pre-deﬁned action like running a build script, either
before you commit or after each checkin.
This mechanism enables a more deliberate approach to automation, as output is
only generated when users interact with the version control system, rather than on
arbitrary intervals or ﬁle system events.
Pre-commit hooks serve to verify the input and reject changes if the build fails.
Post-commit actions can be used to generate output for every valid change and
typically do not modify the contents of the repository.
Hooks are typically enabled by modifying sample templates provided with the
system. For our purposes, this means replacing the template content with the
sequence of commands necessary to run a DITA build.
Can be tricky to set up, policies may require system administator assistance
Commits are slower, as version control system waits for build to ﬁnish
Strict regime can prevent checkins if something’s wrong
Some clients provide a user interface for client-side commit actions
The hooks subdirectory of a Subversion repository contains templates such as
pre-‐commit.tmpl and post-‐commit.tmpl .
Remove .tmpl to enable.
On UNIX-based systems, sample Git hooks are typically found in
Modify a copy of pre-‐commit.sample and save the result to your local repository as
Git “pre-commit” Hook Example
The sample hook below runs a build script before each commit:
# Get the absolute path of the `.git/hooks` directory
export GIT_HOOKS=`cd "\`dirname "\\\`readlink "$0" || echo $0\\\`"\`" && pwd`
# Set the absolute path of the build directory
# Set the absolute path of the DITA home directory
# Set the absolute path of the DITA home directory again (weird, but necessary)
# Execute the build script in the shell that is provided by the DITA start script
echo "$BUILD_FILES/build_html.sh" | "$DITA_HOME/startcmd.sh"
# Display OS X system notification via
echo 'HTML build succeeded. Committing…' | /usr/local/bin/terminal-‐notifier -‐sound default
# Exit with status of last command
→ Get the gist at https://gist.github.com/infotexture/8635931.
Cornerstone Commit Actions
The Cornerstone Subversion client for Mac includes user interface options that
allow users to associate their own scripts with commit actions, independent of the
repository conﬁguration (and without the assistance of a system administrator):
Cornerstone Commit Actions
Bypassing Commit Hooks
Commit hooks slow down the checkin process, since the system waits for a build
to ﬁnish before checking in the changes. Some clients and systems allow you to
circumvent commit hooks if necessary:
$ git commit -‐-‐no-‐verify
Atlassian’s SourceTree provides an option to bypass hooks on the commit sheet:
Bypassing hooks with SourceTree 20
Dedicated CI Solutions
True continuous integration solutions combine the strengths of each of the options
outlined above and are intended for use in team environments.
CI servers communicate with a version control system, using commit hooks as a
foundation for additional process automation mechanisms.
If your developers use continuous integration tools to run automated tests and
build software binaries whenever they update their code, you may be able to use the
same solution to build your DITA deliverables too.
Leverage existing corporate infrastructure & developer expertise
Oﬄoad performance-intensive build tasks to a dedicated server
Automate other aspects of the publishing process, such as:
providing access to drafts on intranet for internal review & signoﬀ
integrating documentation into the ﬁnal software installers
publishing to company web site for immediate public access 21
CI systems run actions after revisions are shared, so bad commits can happen
Culprit behind unstable builds or failed tests is held publicly accountable
Dedicated server not necessarily required, but often advisable
May require support from developers and/or IT staﬀ
While a broad range of commercial CI solutions are available, the open-source
oﬀerings are among the most mature, actively maintained, and widely adopted:
CruiseControl – the original solution from ThoughtWorks (now open source)
Jenkins (formerly known as Hudson) – cross-platform open source CI server
Travis – hosted CI service used to build and test projects hosted at GitHub
(including the DITA Open Toolkit)
New solutions have been recently released by established players and hitherto
unknown startups as continuous deployment and DevOps gain momentum
The examples below are based on Jenkins, one of the most popular CI solutions.
Like watched folders, Jenkins jobs combine various settings, including:
access credentials and branches of the source code repository
conditions or events that trigger a build, and
actions to be performed when the conditions are fullﬁlled
(build script & post-build actions, e-mail notiﬁcations, ﬁle transfers, etc.).
The Jenkins dashboard includes an overview of jobs with information on the last
build for each job, and a “weather report” icon that represents the aggregated status
(stability) of recent builds:
Sample Jenkins dashboard 24
Jenkins Job View
The dashboard links to dedicated pages for each job, with additional information
on the build history, links to the workspace with the job output (build results), and
recent changes (the commit log from the version control system).
Sample Jenkins job view 25
Sample Jenkins Script
The script below generates output and copies the results to a web server:
if [ "$1" == "ci" ]; then
echo "Continuous integration check for branch $2"
set +e # if one command fails then exit
# Run documentation build script, which sets up the DITA-‐OT environment & runs Ant tasks to build output
# Clean & re-‐create target output directory before moving the new generated output
rm -‐r /var/www/$BRANCH/manual
# Copy all `output` subfolders to Web server "as-‐is"
cp -‐r output/* /var/www/$BRANCH/manual
elif [ "$1" == "nightly" ]; then
echo "Nightly check for branch $2"
#make || RET=1
echo "Unknown parameter: $1" && exit 1
→ Get the gist at https://gist.github.com/infotexture/8742667.
Your operating system may already provide the tools you need for lightweight local
automation options such as daily builds or watched folders. If you’re just getting
started with process automation, these methods are usually a good way to begin.
If you need a way to verify your changes before committing revisions to a version
control system (or test the results afterward), commit hooks provide a smarter
alternative that augments the advantages of intentional user action with the
amenities of process automation.
As your appetite for automation increases and your team grows, you’ll soon be
ready to graduate to true continuous integration, allowing you to do more than just
build output—or perhaps even to continuous deployment solutions that provide
instant access to your latest documents for all your customers.
However you begin, and whatever process you may choose, I encourage you to
explore the possibilities for automation in your own publishing workﬂows.
References & Resources
1. launchd – http://launchd.info
2. Lingon (launchd GUI) – http://www.peterborgapps.com/lingon
3. LaunchControl (debugger) – http://www.soma-zone.com/LaunchControl
4. Hazel (folder watcher for Mac) – http://www.noodlesoft.com/hazel.php
5. CruiseControl (original CI server) – http://cruisecontrol.sourceforge.net
6. Jenkins (most popular CI server) – http://jenkins-ci.org
7. Travis (hosted CI for GitHub projects) – http://travis-ci.com
For updates, comments and the latest code samples, visit
A long-form version of this presentation is scheduled to appear as an article in the June
issue of the CIDM Newsletter.
For more information on the application of software development methodologies
to DITA authoring, be sure to visit Frank Shipley’s DITA Release Management
presentation tomorrow at 8:30.
E-mail [email protected] or connect on GitHub or Twitter @infotexture.