Automated testing of website PDFs

Written By: Peter Abrahams
Published:
Content Copyright © 2006 Bloor. All Rights Reserved.

Discussion on testing of websites for usability, accessibility and conformance to standards has concentrated on HTML content. This is true of both user testing and automated testing. Real user testing will pick up problems in technologies besides HTML such as Javascript, GIF, PDF (documents and forms), Flash, AJAX etc, but the concentration of effort has been on HTML.

Automated testing so far has very nearly ignored non-HTML checking by just including warnings that other technologies are being used and need to be checked. Modern web sites have a significant minority of their content in non-HTML and especially PDF documents and forms. Sitemorse have just announced the first steps into automated testing of PDF documents. The next release will perform twenty eight checks on PDF documents to ensure users do not experience problems such as broken links, or failing email addresses—t will also test whether or not PDFs are utilising the accessibility features provided by Adobe.

This is excellent news for people who are blind or partially sighted and often find it difficult to read Adobe PDF documents. It will expose the problem on many sites and show the scale of the problem. To understand the requirements review my article Accessible PDF documents for the blind.

I believe that I can claim some responsibility for this new function because I raised the issue with Sitemorse at a meeting early this year and they do respond to suggestions of this sort. It is a good first release but I am afraid that I was hoping for rather more. Adobe Acrobat Professional has an accessibility checker built in and it checks for a large number of things that Sitemorse does not look for in this release, for example:

  • Missing alt tags on figures
  • Identifying inaccessible text
  • Missing language definition
  • And a large number more

This of course will mean that a PDF file may check as clean by SiteMorse but have many errors picked up by Adobe. The problem with the Adobe check is that it is done file by file and not across a web site so a complete SiteMorse function would be a really welcome addition.

SiteMorse will include this new test in the league tables, I suspect that is going to show that very few websites have tagged PDF documents.

So two cheers for SiteMorse for moving into this area but a big request that they go much further in checking the detail of PDFs; and whilst I am making suggestions can they start looking at other technologies especially accessible Flash.