Getting in Deep: After Google, the Invisible Web -- Campus Technology

Getting in Deep: After Google, the Invisible Web

By Dale Vidmar
03/25/03

For most of us, Web searching continues to take the form of searching Google. A simple click to Google.com produces relevant results. When Google was first released by two Stanford Ph.D. candidates, Larry Page and Sergey Brin, in 1998, the search tool transformed Internet searching almost immediately. Google incorporated linking structures into its algorithm—the code that determines the ranking of pages within a retrieval list—and, thereby, retrieved better and more accurate information. Since that time Google has added features to an index that now includes more than two billion pages.

With all this and more, Google has become the search tool of choice for most information specialists and novices alike. So why use any other search tool?

Google searching became synonymous with Web searching because it works. Brin and Page believed there were no "bad" searches when using Google. Whatever the search, Google could retrieve better information based on the relationship of link structures—in other words, good sites are linked to more often by other sites. Google quickly became the best place to look for top-level Web sites such as business, institutional, and personal homepages.

But there is more. Google organizes a query within subject categories when applicable, provides links to language translations, maintains cached links to original pages, and lists similar pages. Other features include access to street maps when typing an address with a city or state, dictionary definitions of search terms, and specialized databases, such as Google Uncle Sam for government information and University Search for information from specific institutions. With all these features and added accuracy, it becomes difficult to use anything else but Google.

What is the Invisible Web?

Internet content not directly indexed by conventional search tools
Data primarily found in databases
Requires a direct query to retrieve information
Free, Fee, and Hybrid Databases
Other terms: Deep Web, Opaque Web, Special or Searchable Databases

If It's Invisible, Why Bother?

Generally better quality information
More specific information
Finding information is faster and more efficient

What You Don't See is What You Don't Get
Ah, but there's the rub. Neither Google nor any other search tool can index all the information on the Internet. Conventional search tools such as Google, Yahoo, AltaVista, All the Web, or meta-searchers like Ixquick, Viv’simo, and SurfWax often access more than a couple billion pages in their databases. However, a large portion of available information has been difficult or impossible to search. Material that is not accessible using conventional search tools has become known as the "Invisible Web." Other names for the Invisible Web include the Deep Web, Opaque Web, and searchable databases.

Such information is not accessible to conventional search tools because it is inside databases such as the U.S. Census, Amazon.com, or a library's online catalog. The locations of these pages can be found through resources such as Gary Price's Direct Search, Complete Planet: The Deep Web, InvisibleWeb.com, and Invisible-web.net. Or the information can be located via subject directory tools like Infomine, Librarians' Index to the Internet (LII), Best Information on the Net, and AlphaSearch.

The reality is that many information specialists, as well as the general public, use the Invisible Web already. Most Web surfers have accessed an Invisible Web site at one time or another. However, they access only a portion of the Invisible Web, typically the portion found in three general forms:

First, the Fee Group, or paid databases, such as EBSCO, OVID, ProQuest, and Medline. These databases have a cost associated with use.

Second, the Free Group: government databases such as the Census, AskERIC, PublicMed, the Currency Converter, FindArticles, and library online catalogs. These databases are free for anyone to access.

And the Hybrid Group: UnCoverWeb, online newspapers like the New York Times and Wall Street Journal, which currently take this form. These databases have both free portions and fee portions.

The Not-So-Invisible Web
For the past two years, the Invisible Web has been the "next big thing" in Internet searching. The truth is, it is still a big thing. When it comes to more than 500 billion Web pages located in searchable databases, how can it be anything but big? But the Invisible Web is still unwieldy for most. Resources such as Infomine, Librarians' Index to the Internet, and Direct Search have been underused by both information specialists and novice searchers in part because they are difficult to use. However, the increasing exposure of the Invisible Web is helping to bring these resources to the surface. As more searchers use them, access will become better, driven by the demand to find and use relevant information.

At the same time, the claim that much of the information contained on the Invisible Web cannot be found via the major search tools is becoming less and less valid. Non-HTML pages—PDFs, Microsoft Word, Excel, and PowerPoint files, and other formatted information—is becoming available via major search tools, especially Google. Other information, such as a book from my local library catalog, is just a click or two deeper. True, I may not be able to find out how much $273 dollars is worth in the currency of Jordan, but I can easily find a currency converter. So, it is a matter of knowing what you are looking for and continuing to be versatile and persistent. Using Google, the Invisible Web, or any other search tool will not miraculously transform Internet searching into anything other than what it is—an art.

Horizontal Searching: Unveiling the Invisible Web
Keyword searching Google, Yahoo, or library databases is only the beginning of a search. If we think of the Web as part of the whole instead of separate, we can connect to information found on the Internet.

Horizontal searching takes advantage of information available on both the surface and the Invisible Web. For example, cut and paste an article title from a database like PsycInfo or ERIC into a search on Google or another search tool. The results often uncover a host of related and relevant materials.

Bibliographies, full-text articles and documents, hompages of authors, and e-mail addresses found on the Web lead back to the library catalog then a separate dive into a library database and another dive into the Internet. Horizontal searching combines these resources into a comprehensive search that unveils the Invisible Web and more. For an illustration of horizontal searching, go to http://home.sou.edu/~vidmar/horizontal-searching.

Strategies for Finding Information on the Invisible Web

Move beyond keyword searching on Google or other search tools
Try alternative strategies using Invisible Web sites when appropriate:
- Searchable Databases:
  - Invisible-web.net: www.invisible-web.net/
  - The Invisible Web: www.invisibleweb.com/
  - CompletePlanet: www.completeplanet.com/
  - Searchability: www.searchability.com/

Subject Directories:
- Direct Search: www.freepint.com/gary/direct.htm
- Infomine: http://infomine.ucr.edu/
- Academic Info: www.academicinfo.net/index.html
- Best Information on the Net: http://library.sau.edu/bestinfo/
- Librarians' Index to the Internet: http://ipl.org/
- Scout Report Archives: http://scout.wisc.edu/archives/

Think of the Invisible Web as part of the whole instead of separate from traditional research
Horizontal Searching—incorporate the Web as integral to a search strategy
Search the Web titles of articles found in research databases
Look for bibliographies on the Web that can be incorporated into searches for books, journal articles, or other documents
Search for authors from books and journals
Search for organizations and government reports
Follow citations onto the Web
Check e-mail addresses to contact authors for further information

E-Mail this page

Printable Format

Featured

Rubrik Intros Google Workspace Data Protection

Rubrik has announced the launch of Rubrik Data Protection for Google Workspace, a product the company said is designed to help enterprise customers protect data and restore operations across Google Workspace environments.
Open Source: Advancing Our Digital Commons

IT leaders are recognizing the benefits of a return to open strategies. CT asked Jack Suess, VP of IT and CIO at UMBC, for his views on returning to the digital commons of open source.
Purdue-Google Partnership to Advance AI-Enabled Education and Research

In a move aimed at empowering the Purdue community to integrate AI across multiple facets of the institution, Purdue University has announced a strategic partnership with Google Public Sector.
SharePoint Rolls Out Agentic AI Building and Governance Tools

Microsoft has announced a number of AI enhancements for its SharePoint collaboration platform, including a public preview of agentic building capabilities, a redesigned user experience, and expanded content governance tools.