Home > Google Book Search: The Good, the Bad, & the Ugly

Digital Libraries

Google Book Search: The Good, the Bad, & the Ugly

1/1/2008

When a Text Isn't Text

THE PAGES OF TEXT SHOWN through Book Search are actually images, not text. Although as part of Google's digitizing process a conversion takes place to turn a scanned page into text, the publicly offered results are less stellar than those made possible by the better-known OCR applications such as Abbyy FineReader, which is used by compression software provider LuraTech as part of its PDF conversion solution.

Frequently, an out-of-copyright book in Google will include a "View plain text" function, but the user will be shown a page displaying only "No text" at the top-meaning that Google was unable to convert that particular page into plain text. And if a user's keyword search turns up such a page, Book Search still succeeds in locating and highlighting the search terms, even if it can't seem to display the page in plain-text form. It's almost as if two separate optical character recognition systems are in play: one for the search engine, and another for converting scanned pages into plain text. This inconsistency may not trouble most readers; but those who are print-disabled and need to use a screen reader or convert the text to a speech reader, say otherwise.

Susan Gerhart holds a doctorate in computer science and has worked in research and management in software engineering and technology transfer at Duke University (NC), NASA, the National Science Foundation, USC's Information Sciences Institute, and Embry-Riddle Aeronautical University (FL). Gerhart is also legally blind. As she points out in her blog, As Your World Changes, her experiments in using Book Search have turned up this anomaly, for settings that turned images off in her browser. "I got a snippet of page text, a big empty block of missing image, and various book metadata, including where to buy or borrow," she says. When she tried turning images on, "Ouch, was it bright," she recalls.

She writes: "There's nothing in, around, or any way out of the image into screen readable mode. The image might as well have been a lake, a building, or porn for all the information I could glean from it. I wondered why the omnipotent Google toolbar, gathering data about my searches, and offering me various extra search information, could not also be the reader." Gerhart is doubtless not alone in her frustration.

Linda Becker, the VP of sales and marketing for Kirtas, doesn't believe that Google has somehow created a faster digitization process. "I do know what they're doing, and I can't comment on it," she says. "But what I can say is this: They're not scanning faster, they're not digitizing faster, and they don't have the quality controls that the user deserves."

She may be right: In an ongoing online debate about whether Google is using robotic machinery or human beings to flip the pages, bloggers have poked fun at the search giant's quality control methods (or lack of them) by posting screenshots that reveal hands, fingers, and arms in Book Search results. Becker suggests that those screenshots may not be anomalies. "If you go into Google [Book Search] and look at any book, you'll be able to see by the number of body parts and fingerprints that [the pages] are being turned manually."