Getting in Deep: After Google, the Invisible Web

For most of us, Web searching continues to take the form of searching Google. A simple click to Google.com produces relevant results. When Google was first released by two Stanford Ph.D. candidates, Larry Page and Sergey Brin, in 1998, the search tool transformed Internet searching almost immediately. Google incorporated linking structures into its algorithm—the code that determines the ranking of pages within a retrieval list—and, thereby, retrieved better and more accurate information. Since that time Google has added features to an index that now includes more than two billion pages.

With all this and more, Google has become the search tool of choice for most information specialists and novices alike. So why use any other search tool?

Google searching became synonymous with Web searching because it works. Brin and Page believed there were no "bad" searches when using Google. Whatever the search, Google could retrieve better information based on the relationship of link structures—in other words, good sites are linked to more often by other sites. Google quickly became the best place to look for top-level Web sites such as business, institutional, and personal homepages.

But there is more. Google organizes a query within subject categories when applicable, provides links to language translations, maintains cached links to original pages, and lists similar pages. Other features include access to street maps when typing an address with a city or state, dictionary definitions of search terms, and specialized databases, such as Google Uncle Sam for government information and University Search for information from specific institutions. With all these features and added accuracy, it becomes difficult to use anything else but Google.

What is the Invisible Web?

  • Internet content not directly indexed by conventional search tools
  • Data primarily found in databases
  • Requires a direct query to retrieve information
  • Free, Fee, and Hybrid Databases
  • Other terms: Deep Web, Opaque Web, Special or Searchable Databases

If It's Invisible, Why Bother?

  • Generally better quality information
  • More specific information
  • Finding information is faster and more efficient

What You Don't See is What You Don't Get
Ah, but there's the rub. Neither Google nor any other search tool can index all the information on the Internet. Conventional search tools such as Google, Yahoo, AltaVista, All the Web, or meta-searchers like Ixquick, Viv’simo, and SurfWax often access more than a couple billion pages in their databases. However, a large portion of available information has been difficult or impossible to search. Material that is not accessible using conventional search tools has become known as the "Invisible Web." Other names for the Invisible Web include the Deep Web, Opaque Web, and searchable databases.

Such information is not accessible to conventional search tools because it is inside databases such as the U.S. Census, Amazon.com, or a library's online catalog. The locations of these pages can be found through resources such as Gary Price's Direct Search, Complete Planet: The Deep Web, InvisibleWeb.com, and Invisible-web.net. Or the information can be located via subject directory tools like Infomine, Librarians' Index to the Internet (LII), Best Information on the Net, and AlphaSearch.

The reality is that many information specialists, as well as the general public, use the Invisible Web already. Most Web surfers have accessed an Invisible Web site at one time or another. However, they access only a portion of the Invisible Web, typically the portion found in three general forms:

First, the Fee Group, or paid databases, such as EBSCO, OVID, ProQuest, and Medline. These databases have a cost associated with use.

Second, the Free Group: government databases such as the Census, AskERIC, PublicMed, the Currency Converter, FindArticles, and library online catalogs. These databases are free for anyone to access.

And the Hybrid Group: UnCoverWeb, online newspapers like the New York Times and Wall Street Journal, which currently take this form. These databases have both free portions and fee portions.

The Not-So-Invisible Web
For the past two years, the Invisible Web has been the "next big thing" in Internet searching. The truth is, it is still a big thing. When it comes to more than 500 billion Web pages located in searchable databases, how can it be anything but big? But the Invisible Web is still unwieldy for most. Resources such as Infomine, Librarians' Index to the Internet, and Direct Search have been underused by both information specialists and novice searchers in part because they are difficult to use. However, the increasing exposure of the Invisible Web is helping to bring these resources to the surface. As more searchers use them, access will become better, driven by the demand to find and use relevant information.

At the same time, the claim that much of the information contained on the Invisible Web cannot be found via the major search tools is becoming less and less valid. Non-HTML pages—PDFs, Microsoft Word, Excel, and PowerPoint files, and other formatted information—is becoming available via major search tools, especially Google. Other information, such as a book from my local library catalog, is just a click or two deeper. True, I may not be able to find out how much $273 dollars is worth in the currency of Jordan, but I can easily find a currency converter. So, it is a matter of knowing what you are looking for and continuing to be versatile and persistent. Using Google, the Invisible Web, or any other search tool will not miraculously transform Internet searching into anything other than what it is—an art.

Horizontal Searching: Unveiling the Invisible Web
Keyword searching Google, Yahoo, or library databases is only the beginning of a search. If we think of the Web as part of the whole instead of separate, we can connect to information found on the Internet.

Horizontal searching takes advantage of information available on both the surface and the Invisible Web. For example, cut and paste an article title from a database like PsycInfo or ERIC into a search on Google or another search tool. The results often uncover a host of related and relevant materials.

Bibliographies, full-text articles and documents, hompages of authors, and e-mail addresses found on the Web lead back to the library catalog then a separate dive into a library database and another dive into the Internet. Horizontal searching combines these resources into a comprehensive search that unveils the Invisible Web and more. For an illustration of horizontal searching, go to http://home.sou.edu/~vidmar/horizontal-searching.

Strategies for Finding Information on the Invisible Web

  • Think of the Invisible Web as part of the whole instead of separate from traditional research
  • Horizontal Searching—incorporate the Web as integral to a search strategy
  • Search the Web titles of articles found in research databases
  • Look for bibliographies on the Web that can be incorporated into searches for books, journal articles, or other documents
  • Search for authors from books and journals
  • Search for organizations and government reports
  • Follow citations onto the Web
  • Check e-mail addresses to contact authors for further information

Featured

  • landscape photo with an AI rubber stamp on top

    California AI Watermarking Bill Garners OpenAI Support

    ChatGPT creator OpenAI is backing a California bill that would require tech companies to label AI-generated content in the form of a digital "watermark." The proposed legislation, known as the "California Digital Content Provenance Standards" (AB 3211), aims to ensure transparency in digital media by identifying content created through artificial intelligence. This requirement would apply to a broad range of AI-generated material, from harmless memes to deepfakes that could be used to spread misinformation about political candidates.

  • stylized illustration of an open laptop displaying the ChatGPT interface

    'Early Version' of ChatGPT Windows App Now Available to Paid Users

    OpenAI has announced the release of the ChatGPT Windows desktop app, about five months after the macOS version became available.

  • person signing a bill at a desk with a faint glow around the document. A tablet and laptop are subtly visible in the background, with soft colors and minimal digital elements

    California Governor Signs AI Content Safeguards into Law

    California Governor Gavin Newsom has officially signed off on a series of landmark artificial intelligence bills, signaling the state’s latest efforts to regulate the burgeoning technology, particularly in response to the misuse of sexually explicit deepfakes. The legislation is aimed at mitigating the risks posed by AI-generated content, as concerns grow over the technology's potential to manipulate images, videos, and voices in ways that could cause significant harm.

  • Jetstream logo

    Qualified Free Access to Advanced Compute Resources with NSF's Jetstream2 and ACCESS

    Free access to advanced computing and HPC resources for your researchers and education programs? Check out NSF's Jetstream2 and ACCESS.