Getting in Deep: After Google, the Invisible Web

For most of us, Web searching continues to take the form of searching Google. A simple click to Google.com produces relevant results. When Google was first released by two Stanford Ph.D. candidates, Larry Page and Sergey Brin, in 1998, the search tool transformed Internet searching almost immediately. Google incorporated linking structures into its algorithm—the code that determines the ranking of pages within a retrieval list—and, thereby, retrieved better and more accurate information. Since that time Google has added features to an index that now includes more than two billion pages.

With all this and more, Google has become the search tool of choice for most information specialists and novices alike. So why use any other search tool?

Google searching became synonymous with Web searching because it works. Brin and Page believed there were no "bad" searches when using Google. Whatever the search, Google could retrieve better information based on the relationship of link structures—in other words, good sites are linked to more often by other sites. Google quickly became the best place to look for top-level Web sites such as business, institutional, and personal homepages.

But there is more. Google organizes a query within subject categories when applicable, provides links to language translations, maintains cached links to original pages, and lists similar pages. Other features include access to street maps when typing an address with a city or state, dictionary definitions of search terms, and specialized databases, such as Google Uncle Sam for government information and University Search for information from specific institutions. With all these features and added accuracy, it becomes difficult to use anything else but Google.

What is the Invisible Web?

  • Internet content not directly indexed by conventional search tools
  • Data primarily found in databases
  • Requires a direct query to retrieve information
  • Free, Fee, and Hybrid Databases
  • Other terms: Deep Web, Opaque Web, Special or Searchable Databases

If It's Invisible, Why Bother?

  • Generally better quality information
  • More specific information
  • Finding information is faster and more efficient

What You Don't See is What You Don't Get
Ah, but there's the rub. Neither Google nor any other search tool can index all the information on the Internet. Conventional search tools such as Google, Yahoo, AltaVista, All the Web, or meta-searchers like Ixquick, Viv’simo, and SurfWax often access more than a couple billion pages in their databases. However, a large portion of available information has been difficult or impossible to search. Material that is not accessible using conventional search tools has become known as the "Invisible Web." Other names for the Invisible Web include the Deep Web, Opaque Web, and searchable databases.

Such information is not accessible to conventional search tools because it is inside databases such as the U.S. Census, Amazon.com, or a library's online catalog. The locations of these pages can be found through resources such as Gary Price's Direct Search, Complete Planet: The Deep Web, InvisibleWeb.com, and Invisible-web.net. Or the information can be located via subject directory tools like Infomine, Librarians' Index to the Internet (LII), Best Information on the Net, and AlphaSearch.

The reality is that many information specialists, as well as the general public, use the Invisible Web already. Most Web surfers have accessed an Invisible Web site at one time or another. However, they access only a portion of the Invisible Web, typically the portion found in three general forms:

First, the Fee Group, or paid databases, such as EBSCO, OVID, ProQuest, and Medline. These databases have a cost associated with use.

Second, the Free Group: government databases such as the Census, AskERIC, PublicMed, the Currency Converter, FindArticles, and library online catalogs. These databases are free for anyone to access.

And the Hybrid Group: UnCoverWeb, online newspapers like the New York Times and Wall Street Journal, which currently take this form. These databases have both free portions and fee portions.

The Not-So-Invisible Web
For the past two years, the Invisible Web has been the "next big thing" in Internet searching. The truth is, it is still a big thing. When it comes to more than 500 billion Web pages located in searchable databases, how can it be anything but big? But the Invisible Web is still unwieldy for most. Resources such as Infomine, Librarians' Index to the Internet, and Direct Search have been underused by both information specialists and novice searchers in part because they are difficult to use. However, the increasing exposure of the Invisible Web is helping to bring these resources to the surface. As more searchers use them, access will become better, driven by the demand to find and use relevant information.

At the same time, the claim that much of the information contained on the Invisible Web cannot be found via the major search tools is becoming less and less valid. Non-HTML pages—PDFs, Microsoft Word, Excel, and PowerPoint files, and other formatted information—is becoming available via major search tools, especially Google. Other information, such as a book from my local library catalog, is just a click or two deeper. True, I may not be able to find out how much $273 dollars is worth in the currency of Jordan, but I can easily find a currency converter. So, it is a matter of knowing what you are looking for and continuing to be versatile and persistent. Using Google, the Invisible Web, or any other search tool will not miraculously transform Internet searching into anything other than what it is—an art.

Horizontal Searching: Unveiling the Invisible Web
Keyword searching Google, Yahoo, or library databases is only the beginning of a search. If we think of the Web as part of the whole instead of separate, we can connect to information found on the Internet.

Horizontal searching takes advantage of information available on both the surface and the Invisible Web. For example, cut and paste an article title from a database like PsycInfo or ERIC into a search on Google or another search tool. The results often uncover a host of related and relevant materials.

Bibliographies, full-text articles and documents, hompages of authors, and e-mail addresses found on the Web lead back to the library catalog then a separate dive into a library database and another dive into the Internet. Horizontal searching combines these resources into a comprehensive search that unveils the Invisible Web and more. For an illustration of horizontal searching, go to http://home.sou.edu/~vidmar/horizontal-searching.

Strategies for Finding Information on the Invisible Web

  • Think of the Invisible Web as part of the whole instead of separate from traditional research
  • Horizontal Searching—incorporate the Web as integral to a search strategy
  • Search the Web titles of articles found in research databases
  • Look for bibliographies on the Web that can be incorporated into searches for books, journal articles, or other documents
  • Search for authors from books and journals
  • Search for organizations and government reports
  • Follow citations onto the Web
  • Check e-mail addresses to contact authors for further information

Featured

  • glowing blue nodes connected by thin lines in an abstract network on a dark gray to black gradient background

    Report: Generative AI Taking Over SD-WAN Management

    In a few years, nearly three quarters of network operators will use generative AI for SD-WAN management, according to a new report from research firm Gartner.

  • abstract pattern with interconnected blue nodes and lines forming neural network shapes, overlaid with semi-transparent bars and circular data points

    Data, AI Lead Educause Top 10 List for 2025

    Educause recently released its annual Top 10 list of the most important technology issues facing colleges and universities in the coming year, with a familiar trio leading the bunch: data, analytics, and AI. But the report presents these critical technologies through a new lens: restoring trust in higher education.

  • abstract image representing AI tools for reading and writing

    McGraw Hill Introduces 2 Gen AI Learning Tools

    Global education company McGraw Hill has added two new generative AI tools to help personalize learning experiences for both K–12 and higher ed students, according to a news release.

  • abstract image of fragmented, floating geometric shapes with holographic lock icons and encrypted code, set against a dark, glitchy background with intersecting circuits and swirling light trails

    Education Sector a Top Target for Mobile Malware Attacks

    Mobile and IoT/OT cyber threats continue to grow in number and complexity, becoming more targeted and sophisticated, according to a new report from Zscaler.