Document Formats for the Web: PDF, DWF, and CMS

Have you noticed how many documents you encounter online end in "PDF"? The increasing prevalence of this file format is a testimony to the success of Adobe’s "portable document format." Indeed, it’s ubiquity and the free Adobe Acrobat Reader to view these files has led some Web projects to standardize around Acrobat, the product suite that produces PDF files.

Files in PDF format used to be the bane of accessibility professionals and a disaster for those dependent on-screen readers to translate Web content into interpretable formats (audio or large-type visual representation). PDFs presented a "disabled not allowed" sign to information viewable to others. So why has the popularity of PDF files grown so widely?

PDFs represent independence from the idiosyncrasies of software applications and printers. In effect, PDFs claim cross-platform compatibility, application independence, and content integrity for documents distributed via the Web. With the recent successes of Adobe’s e-Paper division, the producer of Acrobat, it seems that the newest release of Acrobat 6.0 addresses some of the criticism.

Making PDFs More Useful
As with most things involving structure and text that relate to the Web, XML (eXtensible Markup Language) comes to the rescue. The newest release of Acrobat’s PDF file format incorporates some features from the land XML.

One of the primary reasons that PDFs became popular in the first place was the fidelity to the printed page of presenting on the screen what was formerly only available on paper format. You can take a journal article or any other printed document and make a high-quality digital representation of it, including the ability to reprint it exactly like the original. The PDF Reader is easily called by other applications, whether as a browser plug-in to view a file, or as an alternative print driver to create a PDF document from another application, e.g., Microsoft Word. So far so good.

PDFs are much more readable than a scanned TIFF image, and provide value-added display control. Until relatively recently, they were just a bag o’ bits with a simple presentation interface.

XML provides a mechanism for adding structure to the parts of a document. In particular, with XML, business logic can be imbedded in the document so that it can be acted on in a workflow process. XML forms and their associated logic can use PDF as a familiar interaction interface. Perhaps most interesting is the ability to add digital signatures, metadata, and schemas to the PDF. This is particularly important to enable the PDF document to interact with search tools and cataloging systems for query and archiving—definitely a step in the right direction. It’s no surprise that Adobe has decided on this strategy to make the PDF a major player in complex document management environments, leveraging its strength as an interface to present digital images as faithful representations of the printed page.

Alternatives or Complements to PDFs?
While Adobe has exceeded Wall Street expectations in its recent quarterly report, other file formats are being presented to address perceived weaknesses, including advocates restricting the use of the PDF format altogether.

Autodesk, makers of AutoCAD design software, have an open file standard called Design Web Format (DWF), that claims to create, display, and print multi-sheet computer-aided design drawings faster, with higher resolution, and in smaller file sizes, than can be done with PDFs. It’s limited currently to design documents, but if you need to work with them, you’re probably already working with AutoCAD files. Others can view them with the Autodesk Express Viewer or the Autodesk Volo View application.

Then again, why exactly are you putting these files up on your Web site or course management system? Jakob Nielsen of useit.com (Usable Information Technology) argues that PDFs are really good for one thing: printing. If your intention is anything but that, you should consider using something else, he argues. His suggestions hinge around presenting the best user experience possible. That is, give users enough information about the document to justify downloading and opening up a helper application to proceed further. In addition, Nielsen urges you to prevent search engine spiders from indexing the words in PDF that, on hyperlinking to them, dump the user to the PDF’s front page, nowhere near the indexed word.

CMS and The Digital Paper Chase
One of the surprises faculty encounter when switching to using a CMS in teaching is the extent to which we rely on paper for course materials. The life cycle of material selected for teaching a course is more complex than we tend to acknowledge. Some of the critical elements include:

· Conceptualizing what is required;
· Finding it;
· Digitizing into some acceptable file format (if not originally electronic);
· Placing it on the CMS;
· Deciding what to do with it when the course ends.

These are steps we go through in preparing for our teaching in general. Now that we’re using CMSes to extend the interaction with course material online, the full life cycle of electronic documents becomes an issue for faculty, for IT departments running CMSes for their institutions, and for the CMS vendors. Is your CMS a ‘roach motel’ for your course materials? You need a personal, as well as institutional strategy for CMS electronic document management. We’ll approach this in future columns.

Phil Long, Ph.D. (longpd@mindspring.com) is senior strategist for the Academic Computing Enterprise at MIT. He is also a senior associate for the TLT Group of the AAHE.

Briefs

Office services king Kinko’s will offer course-packs, compilations of course materials used to supplement traditional textbooks, to the higher education market.

The National Science Foundation will award $9 million to U.C. Irvine and $3.5 million to U.C. San Diego to develop information sharing tools and organizational strategies for first responders and emergency service providers.

A Student design teams from the University of Minnesota, U.C. San Diego, and Cornell won first, second, and third place in a chip design contest sponsored by the Semiconductor Research Corp.

A federal jury awarded CollegeNET $1.2 million in damages for infringement of two patents covering its Universal Forms Engine.

For these and other news stories, or to subscribe to our eNewsletters, Syllabus News Update and Syllabus IT Trends, visit www.syllabus.com.

comments powered by Disqus