An Introduction to XML and DITA

XML (Extensible Markup Language)-based documentation is getting a lot of attention as a better way to develop and disseminate content than tradition technical writing methods. Bob Boiko, from the Society for Technical Communication, writes that XML-based development can “transform what you do from documentation to delivering information products that drive your organization forward.” (Intercom, April 2007) XML offers many potential benefits, not only for the traditional end user – the customer – but also for support personnel, marketing staff, engineers, and more.

Developing content using XML is based on the assumption that information is no longer transmitted in a monolithic book with all the knowledge captured in a linear narrative from introduction to conclusion. It is now conceived as small chunks that answer specific
questions such as “What is…?” and “How do I…?” This approach addresses the needs of  today’s online help systems, as well as today’s users, who want to find answers quickly and to the point.

In addition, content developed using XML can be developed independent of the form and media it will be presented on (including web and print presentation). Content can be developed once and reused throughout the document set. Content can be intelligently searched and organized. Users can easily give feedback to the developers. And content can be developed on open-source tools.

However, developing XML-based documentation from scratch can be time-consuming and expensive. The Darwin Information Typing Architecture (DITA) is a standardized architecture that is based on principles of modular reuse and extensibility. Using DITA can capture the benefits of XML-based publishing while making development faster and cheaper.

First, some important definitions about XML

XML is a markup language. A markup language is a set of start and end tags you can use to “mark up” text with additional information about your content.

A tag set (or element) identifies the text contained within the tags. For example, the “<xmp>…text…</xmp>” tag set tells processes that the text is part of an example. The tag set information can be used to tell the display media how to display the text. For example, a summary might be in bold text while an example may be indented and italicized. The tag set can also identify the information for search and processing purposes. For example, a search for definitions might extract all information marked with a <term> tag.

A document type definition (DTD) is a file that defines the markup rules. For example, a DTD may define <head> and <body> elements, and say that you can’t put <head> after <body>.

A stylesheet (Cascading Stylesheet [CSS] or Extensible Style Language [XSL]) is a mapping of XML elements to display properties. For example, a stylesheet could say that <codeblock> content should be displayed using the Courier font.

An XSLT transform is a mapping of one XML structure to another format. This allows you to transform your XML source into HTML to view on the web or to PDF to print.

Some problems with XML

You can use XML and custom DTDs, stylesheets, and XSLTs to develop high-quality documentation tailored to your content and customers. However, it can take a long time to develop custom tools and they can become obsolete if the nature or needs of the documentation changes. Using the standardized framework of DITA, the user can capture the benefits of XML-based publishing while making development faster and cheaper.

Some important definitions about DITA

DITA is a structured architecture that organizes XML into standardized but highly customizable components.

A topic (or chunk) is a file that  treats a specific piece of subject matter. The subject of a topic should be small enough that it can be addressed within a few paragraphs, which is an amount that a user can reasonably read online without having to scroll more than a single screen.

An element is a discrete piece of information contained within a topic. Examples of elements include title, author, list item, and cross-reference. Each element is defined by a tag set and has a defined structure and semantics. Each DITA topic has required and optional elements. For example, every topic has a title element identified by the <title> tag and defined by its DITA specification.

A DITA topic stands alone because it does not depend on users reading a particular piece of information before or after it. Topics, then, can be freely combined as appropriate for various needs. For example, topics A, B, and C could be used in one task-oriented presentation while B, C, and D could be used in a different task-oriented presentation.

DITA topics are further divided into three defined types of topics: task, concept, and reference topics.

A task topic contains steps describing how to do something. The instructions typically take the form of a numbered list with an imperative sentence for each list item. Task elements might include:

  • Rationale: why or when a user would want to perform this task
  • Prerequisites: what a user should do before performing this task
  • Responses: what the user should see as a result of performing this task
  • Examples: examples of what information to enter or what to do
  • Postrequisites: what to do next after this task is completed

A concept topic defines a major abstraction such as a process or function. It might also include elements like examples or graphics.

A reference topic is factual in nature. It includes elements related to properties and syntax. The breakdown of reference information is often prescribed by conventions, such as those used in documenting a programming language.

Chunking is the process of defining the topics and elements by applying XML tags. For example, the title of each topic is labeled with “<title> …text…</title>”. A prerequisite in a task is labeled with “<prereq> …text…</prereq>”.

An portion of a DITA topic might look like:

<topic>
<title>Preface</title>
<p>This manual describes the use of the ABC Widget. It includes the following chapters:</p>
<ul>
<li>Overview</li>
<li>Components</li>
<li>Programming Instructions</li>
</ul>
</topic>

A map is a document that collects and organizes topics. It might be the table of contents for a document that will be produced as a book or it might be the options returned from an online  search.

Specialization is the process of defining a new element, topic, or map. Because specializations can be based on existing elements, topics, or maps, you only need to define what is different in the new item.

How does DITA work?

DITA provides a standardized set of elements, topics, maps, stylesheets, and XSLTs to speed the process from content development to production. The DITA framework is broad enough to address most documentation needs. However, DITA specialization allows customization that is efficient and cost-effective.

There are a wide range of tools that support DITA development including proprietary tools like Framemaker and open-source tools like the DITA Open Tool Kit.

Using the DITA framework to develop XML-based documentation can speed development of dynamic information delivery, while reducing specialization costs for your organization.

Phoenix Technical Publications has provided complete technical writing and documentation services in Palo Alto and the San Francisco Bay Area for over 25 years.