|
In this chapter:
Topography of the Web
Navigating the Deep Web
Searching the Deep Web
The Surface Web is the World Wide Web that
most users are familiar with. It is the part of the web that consists
mostly of static webpages which are indexed by traditional search
engines (Sullivan, 2000, Invisible).
The Shallow Web includes dynamically
generated webpages (Sullivan, 2000, Invisible). Such pages may be static
pages that are dynamically delivered or may not actually exist until the user
views them. For example, if a user conducts a search, the pages
containing the result set are dynamically generated shallow web pages.
Those pages did not exist until the user hit the search button. Search
engines generally avoid indexing these types of pages as they exist for
very short periods of time or their dynamic generation methods could
cause the crawler software to index the same page many times
accidentally (Sullivan, 2000, Invisible). They are not typically useful for
investigators.
The Deep Web, which may also be referred to as the Invisible
Web, is defined as web content that is not indexable because it is
stored in a database. The data cannot be searched with traditional
search tools like search engines (Sullivan, 2000, Invisible).
Top of Page
Deep websites tend to be much more narrowly focused on a specific topic
and have a higher rate of data quality compared to the surface web
(Bergman, 2001). The
Deep Web holds a tremendous amount of data and is estimated to be
approximately 550 times larger than the surface web (Bergman, 2001).
Even though these vast information repositories are extremely valuable,
they are not generally known or utilized by most users (Bergman, 2001).
Common third-party data providers like
Lexis-Nexis and
ChoicePoint are
obvious deep web sites that host information in a database that is
accessible via the Internet but is not indexed by regular search
engines. Many users may assume that deep web content is only available
by subscription or other form of paid access but a recent study has
found that 95% of deep web content is publicly accessible (Bergman,
2001). Some of the
more common free deep web sites include
Amazon.com,
Ebay, the
US Census Bureau,
the National Oceanic and
Atmospheric Association,
Federal Express, and
Realtor.com. All of
these sites house enormous databases that cannot be searched by
traditional search engines. These sites are among the most valuable
resources available to investigators. Additional deep websites that are
recommended for investigators are discussed in
Chapter 7.
Top of Page
Web surfers are accustomed to using traditional
search tools to find information so the Deep Web presents a challenge for
many
Internet users. The best way to search deep web content has been to go
directly to the website hosting the content and use that site’s search
function to mine the data. Various Deep Web directories
have been developed to help researchers locate these valuable sites.
These directories include
CompletePlanet,
Infomine,
DirectSearch, and
Invisible-Web.net.
New tools are being developed that allow users to
search the Deep Web more effectively. These tools generate a
meta-search by searching multiple deep web databases simultaneously and
presenting a combined result set.
Profusion is an
example of a Deep Web meta-search tool.
Currently there are few if any Deep Web meta-search
tools that scour the free public records that are available on the Web.
The best solution is to visit a Deep Web directory for public records,
like
SearchSystems, visit the host site, and conduct a search using the
resident search functionality on that site.
Top of Page
Proceed to Chapter 6: News & Newsgroups

|