Highlighting changes

Change bars are a traditional way of showing where a document has changed so that reviewers can quickly review only the changes without having to read the entire document. A change bar feature typically prints a vertical bar in the margin next to any text that has changed.

The formatter may also highlight changed text in other ways. For example, deleted text may be displayed but overstruck, while new text may be underlined. Color can also be used, but keep in mind that not everyone can distinguish all colors, and many document printers are monochrome.

Change highlighting requires two steps:

  1. Add markup to the XML files to indicate where content has changed.

  2. Apply a special change highlighting stylesheet to format the changes.

Change markup

In order for a formatter to show changes, there must be some markup in the XML file to indicate which content has changed.

In DocBook, the revisionflag attribute can be added to any element, using any of the attribute's enumerated values added, deleted, changed, or off. You might wonder what the off value is for. It can be used to indicate that a change marked on a containing element does not apply to the current element.

Adding revisionflag attributes by hand is a tedious process. You not only have to find the differences, you have to make sure you add the attributes to the right elements. And then, once you have produced a highlighted version, you will probably have to go through the document and remove all the revisionflag attributes to start the next cycle of revision.

Fortunately, if you maintain copies of your revisions, you can use a program named DiffMk to compare two versions and generate a temporary version with revisionflag markup. A revision control system such as CVS or SVN can easily store and retrieve versions for comparison.

Using DiffMk

The DiffMk program written by Norman Walsh takes as input two versions of a document, compares them, and adds change markup that can be used by a formatter. It is written in Java and is available free for download from the DiffMk SourceForge project. (An earlier version written in Perl is available as a perl-diffmk package available from some Perl distribution centers. It is not covered here.)

Here is how you set up and use the Java version of DiffMk:

  1. Download and unpack the DiffMk distribution from the SourceForge project at http://sourceforge.net/projects/diffmk/.

  2. Locate the diffmk.xml file in the distribution (in the config subdirectory). It is required for configuring the program.

  3. Locate the DiffMk.properties file in the distribution (it originates in the config subdirectory). Copy or edit that file to make sure the config property value is a relative path to the diffmk.xml configuration file.

  4. You will also need the Java resolver.jar file that is used for XML catalogs. DiffMk is dependent on it, even if you do not use an XML catalog file. See the section “Using catalogs with Saxon” for information on getting and using resolver.jar.

  5. Extract two versions of a DocBook document from a revision conrol system for comparison.

  6. Set up a Java CLASSPATH that includes the following:

    • bin/diffmk.jar from the DiffMk distribution.

    • The directory containing the DiffMk.properties file.

    • resolver.jar

    • The directory containing your CatalogManager.properties file, which is used by resolver.jar.

  7. Process your two document versions with a Java command like the following (assuming the CLASSPATH is set):

    java \
            net.sf.diffmk.DiffMk \
            --output diffs.xml \
            --words \
            old-version.xml \
            new-version.xml
    

The output file diffs.xml will contain a version of the document with revisionflag attributes added.

The --words option compares at the word level to provide more detail. It will insert phrase elements as needed to hold the revisionflag attributes. By showing deleted text as well as added text, it sometimes makes the change document hard to read. If you just want reviewers to read changed parts, then you can leave out the --words option.

You can set options for DiffMk in the DiffMk.properties file, or on the command line. Also, the program's configuration file diffmk.xml specifies the attribute and its values to be used for change markup. It allows the program to be used with other grammars besides DocBook.

HTML change output

Once you have generated a version of a document with change markup using DiffMk, you can apply a stylesheet to format it to display the changes.

For HTML output, change bars displayed in the margin are not a supported feature of the HTML standards nor of browsers. Color can be used, as can underline and overstrike at the word level. These would typically be applied with a CSS stylesheet to HTML class attributes generated by the stylesheet.

The DocBook XSL distribution comes with a stylesheet to do just that. If you apply the html/changebar.xsl stylesheet to a change version generated by DiffMk, then the resulting HTML file will apply the following styles:

  • class="added" has text-decoration: underline; and a yellow background color (#FFFF99).

  • class="deleted" has text-decoration: line-through; a pink background color (#FF7F7F).

  • class="changed" has a green background color (#99FF99).

The changebars.xsl stylesheet imports the stock docbook.xsl stylesheet and adds some customizations to insert the CSS style element and convert the revisionflag attributes into class attributes in the output. By including the CSS styles in the file, it is more portable because it does not require a separate CSS stylesheet file.

If you have a customization of the HTML stylesheet that you would rather use to format the results, then copy the changebars.xsl stylesheet and change the xsl:import statement to import your stylesheet instead of the stock docbook.xsl. It will work with chunking or nonchunking customizations.

You change the colors and styles of the change markup by customizing the template named system.head.content from the changebars.xsl file.

XSL-FO change output

For print output, actual change bars are possible, in addition to other highlighting. Change bars were not part of the XSL-FO 1.0 specification, but they are included in the 1.1 specification, which reached final W3C Recommendation stage in December 2006. So XSL-FO processor vendors are adding change bar support to adhere to the new standard.

XSL-FO 1.1 introduces the fo:change-bar-begin and fo:change-bar-end formatting objects. These are empty elements, and are used to bracket the start and end of a change area. This method allows the change area to cross element boundaries. The formatting properties let you control the color, offset, width, placement and style of the generated change bars.

Currently three XSL-FO processors are known to support change bars: Antenna House's XSL Formatter (version 4), PTC's Arbortext, and RenderX's XEP.

There is not yet a stylesheet in the DocBook XSL distribution for XSL-FO change output like there is for HTML output. However, there is a freely downloadable stylesheet changebars.xsl from DeltaXML Ltd. at http://www.deltaxml.com/library/how-to-compare-docbook.html. The company sells the DeltaXML difference engine, and provides a free comparison service to try it out. The stylesheet works with their DocBook change output, as well as the output from DiffMk.

The DeltaXML changebars.xsl stylesheet imports the stock fo/docbook.xsl stylesheet. You may need to edit the changebars.xsl file to change the relative path in the xsl:import statement to find your local copy of the DocBook stylesheets.