Getting in Deep: After Google, the Invisible Web
For most of us, Web searching continues
to take the form of searching Google. A simple click to Google.com produces
relevant results. When Google was first released by two Stanford Ph.D. candidates,
Larry Page and Sergey Brin, in 1998, the search tool transformed Internet searching
almost immediately. Google incorporated linking structures into its algorithm—the
code that determines the ranking of pages within a retrieval list—and, thereby,
retrieved better and more accurate information. Since that time Google has added
features to an index that now includes more than two billion pages.
With all this and more, Google has become the search tool of choice for most
information specialists and novices alike. So why use any other search tool?
Google searching became synonymous with Web searching because it works. Brin
and Page believed there were no "bad" searches when using Google. Whatever the
search, Google could retrieve better information based on the relationship of
link structures—in other words, good sites are linked to more often by other
sites. Google quickly became the best place to look for top-level Web sites
such as business, institutional, and personal homepages.
But there is more. Google organizes a query within subject categories when
applicable, provides links to language translations, maintains cached links
to original pages, and lists similar pages. Other features include access to
street maps when typing an address with a city or state, dictionary definitions
of search terms, and specialized databases, such as Google Uncle Sam for government
information and University Search for information from specific institutions.
With all these features and added accuracy, it becomes difficult to use anything
else but Google.
What is the Invisible Web?
- Internet content not directly indexed by conventional search tools
- Data primarily found in databases
- Requires a direct query to retrieve information
- Free, Fee, and Hybrid Databases
- Other terms: Deep Web, Opaque Web, Special or Searchable Databases
If It's Invisible, Why Bother?
- Generally better quality information
- More specific information
- Finding information is faster and more efficient
What You Don't See is What You Don't Get
Ah, but there's the rub. Neither Google nor any other search tool can index
all the information on the Internet. Conventional search tools such as Google,
Yahoo, AltaVista, All the Web, or meta-searchers like Ixquick, Viv’simo, and
SurfWax often access more than a couple billion pages in their databases. However,
a large portion of available information has been difficult or impossible to
search. Material that is not accessible using conventional search tools has
become known as the "Invisible Web." Other names for the Invisible Web include
the Deep Web, Opaque Web, and searchable databases.
Such information is not accessible to conventional search tools because it
is inside databases such as the U.S. Census, Amazon.com, or a library's online
catalog. The locations of these pages can be found through resources such as
Gary Price's Direct Search, Complete Planet: The Deep Web, InvisibleWeb.com,
and Invisible-web.net. Or the information can be located via subject directory
tools like Infomine, Librarians' Index to the Internet (LII), Best Information
on the Net, and AlphaSearch.
The reality is that many information specialists, as well as the general public,
use the Invisible Web already. Most Web surfers have accessed an Invisible Web
site at one time or another. However, they access only a portion of the Invisible
Web, typically the portion found in three general forms:
First, the Fee Group, or paid databases, such as EBSCO, OVID, ProQuest, and
Medline. These databases have a cost associated with use.
Second, the Free Group: government databases such as the Census, AskERIC, PublicMed,
the Currency Converter, FindArticles, and library online catalogs. These databases
are free for anyone to access.
And the Hybrid Group: UnCoverWeb, online newspapers like the New York Times
and Wall Street Journal, which currently take this form. These databases have
both free portions and fee portions.
The Not-So-Invisible Web
For the past two years, the Invisible Web has been the "next big thing" in Internet
searching. The truth is, it is still a big thing. When it comes to more than
500 billion Web pages located in searchable databases, how can it be anything
but big? But the Invisible Web is still unwieldy for most. Resources such as
Infomine, Librarians' Index to the Internet, and Direct Search have been underused
by both information specialists and novice searchers in part because they are
difficult to use. However, the increasing exposure of the Invisible Web is helping
to bring these resources to the surface. As more searchers use them, access
will become better, driven by the demand to find and use relevant information.
At the same time, the claim that much of the information contained on the Invisible
Web cannot be found via the major search tools is becoming less and less valid.
Non-HTML pages—PDFs, Microsoft Word, Excel, and PowerPoint files, and other
formatted information—is becoming available via major search tools, especially
Google. Other information, such as a book from my local library catalog, is
just a click or two deeper. True, I may not be able to find out how much $273
dollars is worth in the currency of Jordan, but I can easily find a currency
converter. So, it is a matter of knowing what you are looking for and continuing
to be versatile and persistent. Using Google, the Invisible Web, or any other
search tool will not miraculously transform Internet searching into anything
other than what it is—an art.
Horizontal Searching: Unveiling the Invisible Web
Keyword searching Google, Yahoo, or library databases is only the beginning
of a search. If we think of the Web as part of the whole instead of separate,
we can connect to information found on the Internet.
Horizontal searching takes advantage of information available on both the surface
and the Invisible Web. For example, cut and paste an article title from a database
like PsycInfo or ERIC into a search on Google or another search tool. The results
often uncover a host of related and relevant materials.
Bibliographies, full-text articles and documents, hompages of authors, and
e-mail addresses found on the Web lead back to the library catalog then a separate
dive into a library database and another dive into the Internet. Horizontal
searching combines these resources into a comprehensive search that unveils
the Invisible Web and more. For an illustration of horizontal searching, go
Strategies for Finding Information on the Invisible Web
- Move beyond keyword searching on Google or other search tools
- Try alternative strategies using Invisible Web sites when appropriate:
- Think of the Invisible Web as part of the whole instead of separate
from traditional research
- Horizontal Searching—incorporate the Web as integral to a search
- Search the Web titles of articles found in research databases
- Look for bibliographies on the Web that can be incorporated into
searches for books, journal articles, or other documents
- Search for authors from books and journals
- Search for organizations and government reports
- Follow citations onto the Web
- Check e-mail addresses to contact authors for further information