Graphic file locations

When you are including graphic images in your documents, you need to manage the locations of the graphics files. It helps to know that the handling of graphics files is quite different for HTML and FO outputs.

Note

An XSLT processor cannot copy graphics files to an output location. Any file copying that needs to be done must be done outside of the stylesheet process, using a tool such as Make or Ant. To help identify the filenames to be copied, you can use a contributed utility stylesheet named xmldepend.xsl available from the DocBook SourceForge SVN repository. When you process a DocBook document with this stylesheet, it lists all the image pathnames in the file.

HTML output directory

When a DocBook XML file with an imagedata or image element is processed with one of the HTML stylesheets, the graphics file is not opened and read; only the file pathname is passed through to the HTML IMG tag. The image file itself does not have to be present when the HTML is generated, so no error is generated during processing if the graphics file is not present. But it does need to be present at the address specified in the IMG tag when the HTML file is viewed.

For this reason, managing graphics files for HTML output means managing their locations in the output, relative to the HTML files that are generated. When you generate HTML and place them on a server or other accessible location, you also need to manually place the graphics files with them. The XSLT processor will not copy image files to the output location.

Where you place a graphics file in the output area depends on the pathname used to reference it in the HTML IMG tag. That pathname comes from the imagedata or graphic element in the XML document. Those elements let you specify an image path in two ways: with a fileref attribute or an entityref attribute.

Using fileref

A fileref attribute value is interpreted as a literal pathname string. It can be modified in three ways before it is output as the src attribute.

  • If the fileref value does not have a filename extension that indicates the format, then one is appended to the filename. The graphic element must have format attribute for this to work.

  • If the img.src.path parameter is set, its value is prepended to each fileref value if it is not an absolute path. This parameter lets you specify the path to the image files when you build the HTML. If its value is images/ then a fileref value of caution.png is written to the HTML file as src="images/caution.png. But sure to include the trailing slash. This parameter permits you to specify just the filename in your graphics elements, without specifying the details of the location. If you later move the output directory, you can just change the parameter value, and not have to edit every graphics instance in your document.

  • If your document uses XIncludes, then the path may be altered by xml:base attributes inserted by the XInclude processor. See the section “XIncludes and graphics files” for details.

When you build your HTML, you must place the image file in the location specified by the fileref, as modified by the above points. If the result is a relative pathname, then the graphics file must be placed relative to the final output location of the HTML files. If it is an absolute pathname, then the graphics file should be placed relative to the document root of the HTTP server for the HTML files. The fileref attribute value or the img.src.path parameter can also be an absolute URI, to the same or different website.

Using entityref

If you require more flexibility in handling a graphics file, then consider using an entityref attribute with an XML catalog instead. An entityref attribute has an XML attribute type of ENTITY in its declaration. This means the attribute value is not interpreted as a literal pathname string, but as an entity name. The entity name must correspond to a system entity declared in the current document's DTD.

Typically, such system entities are declared in the internal subset of the DTD within the DOCTYPE declaration of the document. The following is an example.

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
               "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY  screenshot3  SYSTEM  "/usr/local/graphics/tutorial3.png"  NDATA  PNG>
...
]>
<book>
...
<imagedata  entityref="screenshot3"/>

The HTML output from processing this example file will include:

<IMG  src="/usr/local/graphics/tutorial3.png">

An important difference from fileref is that an entityref is always resolved to an absolute URI. If you enter a relative path, then it is resolved relative to the absolute path of the document that declares the entity. That could be the current document or a DTD customization file. This behavior comes from the use of the unparsed-entity-uri() XSL function in the DocBook template, and the XSL standard says that function always returns an absolute URI.

Absolute paths in HTML src attributes are a problem if you put the HTML files on a webserver. It is likely the absolute path will not match the document root of the HTTP server, so such references will result in missing graphics when the HTML file is viewed. Relative paths are preferred, but there is no way to get relative paths when using entityref. For this reason, the img.src.path parameter has no effect on entityref paths, because it cannot be prepended to absolute paths.

However, if you put your entity declarations in a separate file, and use an XML catalog to find the declarations file, then you can substitute different pathnames at runtime by using a different catalog. For example, if you move the above entity declaration to a file named mygraphics.ent, you can reference it as follows:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
               "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY  %  graphicset  SYSTEM  "graphics/mygraphics.ent">
%graphicset;
...
]>
<book>
...
<imagedata  entityref="screenshot3"/>

This arrangement uses a parameter entity to specify the location of the file containing the declarations, then it immediately uses a parameter entity reference %graphicset; to pull in the file's contents at that point in the DTD.

You can swap declarations files at runtime by using a catalog entry such as the following:

        <system
            systemId="graphics/mygraphics.ent"  
            uri="../graphics/myothergraphics.ent"/>

You just need to make sure that your alternate graphics declarations file declares the same set of entity names, and that they resolve to full pathnames that work for the HTML output.

You might think that since a system entity uses a SYSTEM identifier and an optional PUBLIC identifier to specify the pathname to the graphics file, that you could use a catalog entry for each graphics file. Unfortunately, this does not work for HTML output. A catalog resolver is triggered when a requested file is to be opened. During HTML processing, the graphics files themselves are never opened. Only their pathname is passed through to the HTML, so such catalog entries would not be used.

FO input directory

Generating PDF from a DocBook file is a two-step process. First the DocBook FO stylesheet is applied to the XML document to generate an intermediate XSL-FO file. Then the XSL-FO is converted to PDF by an XSL-FO processor such as FOP. In the first step, each imagedata and graphic element is handled in a manner similar to the HTML processing described above. That is, the pathname in a fileref or entityref attribute is passed through to a XSL-FO graphics element:

<fo:external-graphic src="url(graphics/tutorial.png)">

The path can be modified in two ways before output:

  • If the fileref value does not have a filename extension that indicates the format, then one is appended to the filename. The graphic element must have format attribute for this to work.

  • If you set the stylesheet parameter img.src.path, then its value is prepended to anyfileref that is not an absolute path. This allows you to store your images in a central location rather than with individual documents, for example.

  • If your document uses XIncludes, then the path may be altered by xml:base attributes inserted by the XInclude processor. See the section “XIncludes and graphics files” for details.

As with HTML processing, the graphics file itself is not opened during the stylesheet processing, so the graphics file does not actually need to be present. However, in the second phase, the XSL-FO processor must open such graphics references to incorporate the graphics data into the PDF file. So it is during the XSL-FO processing phase that the file must be readable at the graphics element's address, possibly modified by the above points.

Once the second stage is completed, the PDF file contains the graphics data, so access to the graphics files is no longer needed. The PDF file can be moved as needed without losing the graphics.