Captioning Streaming Media: Anticipating 508 Compliance Needs
By Eric C. Todd - The Ohio State University
Creating transcripts is the bottleneck in providing closed captioning for video-based content, whether it is delivered on a DVD, via a streaming server, or as downloadable files. Introducing automation into the captioning process would be valuable. At Ohio State University, as at all other institutions that accept government funding, providing captioning for our video-based resources requires campus-wide planning and resources to meet emerging 508 captioning requirements.
Some campuses distribute video-based programming via a Web server as progressive downloads rather than as streaming media. We have found this to be a less effective method of video distribution because it limits the size of our viewing audience and imposes copyright considerations on the video we can deliver. Streaming servers require more resources than maintaining a Web server, but the viewership can be broader, with fewer restrictions due to bandwidth/processor speed or copyright limitations.
Helix Server, a RealNetworks product, is the only server we have found that effectively delivers multiple streaming formats. Windows Media Player and Apple QuickTime servers are exclusive to their own formats. We have learned that it is inappropriate to standardize in only one of these three families of steaming video. Codecs change continually and so do products. Currently, RealMedia and SureStream technology have made Real-formatted media the most accommodating to bandwidth issues, allowing its encoded media to reach the largest audience at the highest quality resolution. In my experience, Windows Media Player follows as the second most capable for streaming multiple bit rates. QuickTime rates a close third in terms of playback access and quality.
In spite of these broad generalizations, certain streaming solutions work better depending on the content streamed, such as when PowerPoint, screen captures, indexes, links and captions are integrated. In these cases, the format should be determined by viewer and developer requirements and by the limitations of the chosen development software or server restrictions.
OSU streaming services d'es not have any format restrictions. Our Helix Server has supported streaming media of all formats since 1998. All the players have certain drawbacks. Both Real and QuickTime feature ads in distributing their free players and all of the players try to configure the user’s computer so that they are the default player for all media formats, even when they don’t support all the protocols. Some players are plagued with problems, such as QuickTime breaking dependencies when a new version is delivered. You are then required to purchase a new registration if the dependencies are not version specific.
These are the realities we must deal with until a truly universal player is developed. Today, there are no universal players that will support all formats, or any codec that accommodates all delivery and content needs. So the focus must be on targeting the broadest audience with the most appropriate technology, while maintaining a level of quality in the delivery of online media.
To help our content creators and video viewers, Ohio State has created a Web site with direct links to the free players and a troubleshooting link that provides us with client information. This client information allows us to diagnose and assist in the correction of client/player issues. We also feature a format comparison page that demonstrates examples of popular formats at various bandwidths so content creators can make choices to best serve their audiences.
We have been tracking client usage (which includes demographics, formats, players, systems, etc.) since 2001. In 2005, OIT servers held 16,993 Real Networks files, 6,709 Windows Media Files, 914 QuickTime files, and 5,355 MP3 (which include podcasts) files. They were accessed 1,551,442 times by 74,130 unique visitors from 144 countries.
Most of these files are not captioned currently. There is no reason we can’t provide captioning for all accepted formats, but standardizing on a process would be a preferred way of establishing best practices and developing community expertise. Each format uses a form of a text script or event list, which is generated, merged, or married into the clip with the caption software upon export. Real uses .rt files, Windows Media Player uses a .txt script, QuickTime uses a .txt script, DVD’s use .cc or .scc. The captions can be exported from the same software once the transcript is timed, if the proper software is used. The actual packaging (smi, asx, etc.) depends on the content of the media (format, inclusion of slides, etc.).
MAGpie, developed by the CPB/WGBH National Center for Accessible Media, is a preferred captioning tool because it is quite intuitive and available as a free download (http://ncam.wgbh.org/webaccess/magpie/). There are other more automated tools to help relieve an otherwise tedious process. CPC (Computer Prompting and Captioning) is another easy-to-use, versatile software choice similar to MAGpie, but with more automation and more format export options.
Both MAGpie and CPC require several steps to synchronize the existing transcript with the media. First, the text is chunked into appropriate caption length. Next, these chunks are loaded into the caption software, and finally the captions are synchronized using the relative time generated by the media player and exported into the desired format. CPC automatically chunks the text to your parameters and then sets the times directly to your imported clip. Once the captioning text is provided, we typically estimate two hours of production time to synch and export one hour of captioned video. Gallaudet University provides some useful comparative information on captioning options.
Even as we become more experienced with the captioning process, methods for creating the transcripts remain very time-consuming and, in most cases, very expensive. Transcript services charge between $25-$100/hour and their time commitments are highly variable based on the quality of the audio and the number of individuals speaking represented in the transcription session. To address these time and cost concerns, we have begun experimenting with Dragon NaturallySpeaking as a method to generate automated transcripts to be used with MAGpie or CPC.
Here’s a summary of their workflow:
- Videography is arranged for a presentation. (High quality audio and video are essential to high quality Web-delivered video and to the Dragon-based transcription process).
- The presenter, before or after the presentation, records, on a high-quality recorder, the Dragon training passages (or someone visits the presenter and d'es the training "live").
- Video is edited and output to the final form. A resolution of 320x240 is often used. Our experiences favor using the Sorenson codec for video compression.
- The Dragon-into-text method is used for producing a preliminary raw transcript, which is then edited by hand for errors in automated interpretation.
- The transcript is put into MS Word, PDF, or text format.
- The transcript is "chunked." (line breaks are inserted into the text file corresponding to each screen of captioning).
- MAGpie or CPC is used to synchronize the captions with the SMTPE time code on the video.
- MAGpie or CPC is used to export both SAMI and QT/Real timecode files.
- SMIL, ASX, and Flash-player files are created to be used by the various players. (SMIL, Flash-player, ASX, and SAMI are all XML, or XML-like).
- The ASX (for WMP), SMIL, and Flash player XML files "glue together" the time code and video files.
Ohio State University currently has resources for videography, encoding of captions, and supporting the creation and delivery of the streaming media. There are no ideal resolutions or codecs. They vary based on viewer and developer needs or limitations. However, there is a manageable set of preferred settings, codecs, formats, and software packages for the creation of specific projects or project types. Our next step is to create a more standardized process for content creators to follow when incorporating captions into new or existing media.
We are at the point where we should resolve our bottleneck, the acquisition of near real-time quality transcripts, in a manner that requires the least amount of university resources and faculty time. Dragon NaturallySpeaking offers significant benefits, and I am looking forward to seeing a demonstration of ViaScribe from
IBM. If ViaScribe performs as it claims, it certainly could prove to be a valuable tool.
Eric Todd ([email protected]) is manager of Classroom Digital Media Distribution for the Office of Information Technology at The Ohio State University.