Automatic generation of DAISY talking books from Word

Written By: Peter Abrahams
Content Copyright © 2007 Bloor. All Rights Reserved.

The latest versions of Microsoft Word create XML output called Open XML. This makes it much easier for programs to understand the structure of the document and use it as input to other processes. One important category of processes is conversion, such as converting a Word document into a DAISY talking book format.

Microsoft and DAISY Consortium have announced a joint development project. This open technical collaboration project on will yield a free, downloadable plug-in for Microsoft Office Word that will enable users to translate Open XML-based documents into DAISY XML.

This will make it practical to provide a much larger range of documents to print-disabled users.

For those of my readers not familiar with DAISY it is a format that has been especially developed to make books accessible to print-disabled users. This includes people who are blind or visually impaired as well as people with dyslexia, or people with cognitive and learning difficulties, or those who have cultural barriers such as different languages or alphabets, or who cannot physically handle a book. Most of these users can be helped by having the book read to them, and some of them will be helped by provision of alternative formats including Braille or large print.

A person who is able to read a normal printed page will use a variety of visual cues in order to navigate text, parse information, speed-read, skim over sections and locate the data of interest. The cues include the table of contents, headings, lists, highlighting, indentations, the index and the glossary to guide the eye as it dances across the pages.

The structure within DAISY publications makes it possible to navigate quickly by heading or page number and to use indexes and references, all with correctly ordered, synchronized audio and text. There are a variety of DAISY readers ranging from PC programs to stand-alone highly portable devices. The readers present the material in a form suitable for the user and also provide navigation techniques such as:

  • List headings.
  • Skim read (first few words of a paragraph).
  • Skip to next paragraph, or next bullet.
  • Search for word.

The challenge has been that creating DAISY formatted documents from other sources has been a manual process that added the relevant XML tags. This means that the process is slow, expensive and error prone and this has limited the quantity of material available.

The cost can be justified for books that will be read by a large number of print-disabled users; it is much more difficult to justify when the total number of readers is small and the number of print-disabled users maybe in single figures. Examples include academic papers, internal company processes, university course material and specialised magazines.

The majority of these documents are created using Microsoft Word. If the Word document could be converted to DAISY automatically with a click of a button (save as DAISY) then the cost would be minimal and any of these documents could be supplied in DAISY format either as a standard option or on-demand.

Documents that are not available in Word format may be able to benefit from this technology. For example, a document could be scanned and with an optical character recognition (OCR) program converted into a Word file. Word can then be used to add the structure and semantics of the document before passing it to the DAISY converter.

The Open XML to DAISY plug-in will convert any Word document into a DAISY document. This process will be simple and accurate if the original Word document has been formatted using styles. Styles such as headings and bullets can be converted directly into the equivalent structures in DAISY.

If the document has not used styles the plug-in will attempt to infer the document structure by looking for formatting such as ‘bold, centred, 16 point’ and assuming that is a level one heading. This process will often work but not always as structures may be missed or others inferred that do not exist.

I am a great believer in the use of styles as they simplify the process of producing well-formatted documents and also help all users navigate around the document. This new plug-in is just another reason why styles should be used. I would strongly recommend that anyone who intends to use this plug-in should also ensure that the authors and editors understand and use styles. A useful introduction to the subject (Writing Accessible Electronic Documents with Microsoft Word) is available as a free download.

In addition to clear benefits for the print-disabled community, the Open XML to DAISY XML translator also offers the potential for further innovation in the information-intensive markets of publishing, training and education, for example:

  • A foreign language student could have text read to them in the foreign language whilst seeing the text highlighted on the page; this could be of particular help when the alphabet is different (Cyrillic, Hebrew, Arabic, or Chinese).
  • A person who cannot hold a book because of RSI, thalidomide, lying in a bed etc, could scan and read the book with ease.
  • Practical training could be enhanced by enabling a student to listen to instructions whilst carrying out a procedure.

This plug-in is going to be provided as open source via SourceForge. This means that besides the direct benefit of the plug-in it can also be used as an example for further plug-ins for conversion to other formats.

The initial plug-in is planned to be available in March 2008 and the intent is to extend it to include other Office formats such as Excel and PowerPoint over time.

All in all an excellent announcement.