Home | About | Contact | FAQ | Search | Privacy Policy | Terms & Conditions | Credits

 
Table of Contents
1 Introduction
2 Internet Investigation
3 Domains
4 Searching the Web
5 Deep Web
6 News & Newsgroups
7 Records Research
8 Organizations
9 Anonymous Investigation
References
Tools & Resources
 
   
 
5. The Deep Web
 

In this chapter:

Topography of the Web

Navigating the Deep Web

Searching the Deep Web

 

 

Topography of the Web

The Surface Web is the World Wide Web that most users are familiar with. It is the part of the web that consists mostly of static webpages which are indexed by traditional search engines (Sullivan, 2000, Invisible).

The Shallow Web includes dynamically generated webpages (Sullivan, 2000, Invisible). Such pages may be static pages that are dynamically delivered or may not actually exist until the user views them. For example, if a user conducts a search, the pages containing the result set are dynamically generated shallow web pages. Those pages did not exist until the user hit the search button. Search engines generally avoid indexing these types of pages as they exist for very short periods of time or their dynamic generation methods could cause the crawler software to index the same page many times accidentally (Sullivan, 2000, Invisible). They are not typically useful for investigators.

The Deep Web, which may also be referred to as the Invisible Web, is defined as web content that is not indexable because it is stored in a database. The data cannot be searched with traditional search tools like search engines (Sullivan, 2000, Invisible).
 

Top of Page

 

Navigating the Deep Web

Deep websites tend to be much more narrowly focused on a specific topic and have a higher rate of data quality compared to the surface web (Bergman, 2001). The Deep Web holds a tremendous amount of data and is estimated to be approximately 550 times larger than the surface web (Bergman, 2001).

Even though these vast information repositories are extremely valuable, they are not generally known or utilized by most users (Bergman, 2001).

Common third-party data providers like Lexis-Nexis and ChoicePoint are obvious deep web sites that host information in a database that is accessible via the Internet but is not indexed by regular search engines. Many users may assume that deep web content is only available by subscription or other form of paid access but a recent study has found that 95% of deep web content is publicly accessible (Bergman, 2001).  Some of the more common free deep web sites include Amazon.com, Ebay, the US Census Bureau, the National Oceanic and Atmospheric Association, Federal Express, and Realtor.com. All of these sites house enormous databases that cannot be searched by traditional search engines. These sites are among the most valuable resources available to investigators. Additional deep websites that are recommended for investigators are discussed in Chapter 7.

Top of Page

 

Searching the Deep Web

Web surfers are accustomed to using traditional search tools to find information so the Deep Web presents a challenge for many Internet users.  The best way to search deep web content has been to go directly to the website hosting the content and use that site’s search function to mine the data.  Various Deep Web directories have been developed to help researchers locate these valuable sites.  These directories include CompletePlanet, Infomine, DirectSearch, and Invisible-Web.net

New tools are being developed that allow users to search the Deep Web more effectively.  These tools generate a meta-search by searching multiple deep web databases simultaneously and presenting a combined result set.  Profusion is an example of a Deep Web meta-search tool.

Currently there are few if any Deep Web meta-search tools that scour the free public records that are available on the Web.  The best solution is to visit a Deep Web directory for public records, like SearchSystems, visit the host site, and conduct a search using the resident search functionality on that site.

Top of Page

 

Proceed to Chapter 6:  News & Newsgroups

 

   
  © 2003-2004 James D. Ruotolo.  All rights reserved.

last updated November, 2003