Home | About | Contact | FAQ | Search | Privacy Policy | Terms & Conditions | Credits

 
Table of Contents
1 Introduction
2 Internet Investigation
3 Domains
4 Searching the Web
5 Deep Web
6 News & Newsgroups
7 Records Research
8 Organizations
9 Anonymous Investigation
References
Tools & Resources
 
   
 
4. Searching the Web
 

In this chapter:

Browsing

Search Tools

Search Methodology

Common Searching Mistakes & Strategies to Avoid

Advanced Search Techniques

 

 

Browsing

Merriam-Webster defines browsing as “looking over or through an aggregate of things casually especially in search of something of interest.” (2003).

Browsing is a dramatically underestimated research tool.  Hyperlinking encourages browsing by design.  This is the essential functionality of the Internet.  Investigators can browse various subjects, webrings, web communities, and so on to get more acquainted with an unfamiliar topic.  However, an investigator must be careful when browsing to stay focused on the task at hand.  During a browsing session, a user can easily be distracted or spend several hours looking at useless or irrelevant information.  Carefully controlled browsing can be effective and should not be overlooked (Weinberger, N.D.).  The best investigative use for browsing is to gain familiarity with a particular web environment.  For example, when locating the website of a company or individual, it may be advantageous to browse through the entire site rather than searching for a specific piece of information.  Sometimes this casual looking pays big dividends when an investigator stumbles onto a previously unknown nugget of valuable information.  Fortunately, users in search of more specific pieces of information can utilize the various tools available to search the Internet.

Top of Page

Search Tools

The Internet is a vast information resource.  Being able to effectively search for information is an essential tool in the Investigator’s toolbox.  There are several different tools that are available for searching information on the Internet.  It is important for the investigator to understand these different tools, how they work, and how to select the proper tool for the job.  After completing this chapter, refer to the Search Engine Showdown feature chart for a handy reference to the various search engine settings.

Search Engines.  These tools contain indexes of the full-text of selected Web pages. They offer searching by keyword, trying to match exactly the words in the pages.  Traditional search engines offer no browsing or subject categories and their databases or “indexes” are compiled by automated software programs, called spiders or crawlers, with minimal human intervention (Barker, 2003, Types).  Search engines may be general or specialized according to one category of information. 
Pros:  Targeted, full-text searching, breadth and depth of information. 
Cons:  No annotations or browsing, not helpful for introductory information. 
Examples: Google, AltaVista, WiseNut, Teoma, AllTheWeb.

Meta-search Engines.  These devices search multiple search engines simultaneously and return compiled results; they catch approximately 10% of search results in any of the search engines they visit.  Meta-search engines can be an effective tool for searching many engines at once but usually lack the depth of results provided by more traditional search engines (Barker, 2003, Types).
Pros:  Semi-targeted, search multiple sources at one time. 
Cons:  May not process search queries correctly, lacks depth of results. 
ExamplesDogpile, Search.com, IxQuick, SurfWax, ZapMeta.

Subject Directories.  These directories include hand selected sites picked by editors, organized into hierarchical subject categories often annotated with descriptions.  Users may browse subject categories or search using broad general terms.  No full-text search of documents is available.  Users can only search the text of the subject categories and descriptions (Barker, 2003, Types).
Pros:  Annotations, well organized topics, excellent for introductory information. 
Cons:  No full-text searching, errors by human editors, maintenance of index is questionable. 
ExamplesYahoo!, LookSmart, Open Directory Project, About.com.

Subject Guides.  Guides are webpages containing collections of hypertext links on a subject, compiled by expert subject specialists, agencies, associations, and hobbyists.  Guides are useful for getting acquainted with an unfamiliar subject or topic.  They often provide links to the most popular or utilized webpages pertaining to a particular subject (Barker, 2003, Types).  Frequently, these guides are denoted by the titles "Links" or "Resources."
Pros:  Very detailed, may have links to otherwise unknown sources.
Cons:  Usually no searching, questionable maintenance, reliant on a single source of input. 
Examples NHCAA Resources, IASIU Links, IALEIA Links.

Specialized Databases.  There is some information available through the Internet that is not searchable by the traditional search tools described above.  This information resides in databases made available by various data providers (Barker, 2003, Types). These hosts provide their own search interface to this data. For more information on this topic see the section entitled “Invisible Web” below.
Pros:  Excellent data quality, well maintained, very targeted information. 
Cons:  Interfaces and functionality vary, reliance on homegrown search function. 
ExamplesLexisNexis, ChoicePoint, Accurint, ISO ClaimSearch, Ebay, Amazon.

In the early days of web searching, these different tools were easy to identify.  However, some confusion among users now exists because the web search industry has undergone many changes and as a result, there are now many sites that offer combined tools.  Many Search Engines and Subject Directories in particular have consolidated into a one-stop search tool.  For more information about specific search tools, visit Search Engine Showdown or Search Engine Watch.

Top of Page

 

Search Methodology

Investigators are trained to gather vast amounts of evidence.  This mindset can cause researchers to conduct broad searches, providing millions of results.  This tactic feels safe because investigators want to be sure that they are not missing some essential piece of evidence when conducting an investigation.  However, with the overwhelming size of most search engine indexes, this tactic fails.  Humans are simply not capable of processing the amount of information required in order to locate a specific word or phrase from billions of others.  This is why search tools were designed in the first place.  When conducting research on the Internet, it is best to start with a targeted query.  If the desired results are not achieved, gradually expand or generalize the query.

The following five step process, developed by author James Ruotolo, is designed to help researchers locate the information they are looking for when conducting a disability insurance claim fraud investigation.

Step 1: Identify what you know. This step might seem obvious however, many researchers fail to consider what information they already have and how it might help them with their search.  Remember the obvious items like name, telephone number, and address.  Also consider other items like spouses, children, hobbies, or employment.  If necessary, write the items down on a blank sheet of paper.

Step 2: Determine what you want to know. Identify why you are searching for information.  Your search tactics for verifying information you already have will be different than searching for new evidence.  Are you trying to validate a suspicion or hunch?  How specific do you need to be?  Are you searching for targeted data on an individual or more general material on a particular industry?

Step 3: Select the proper search tool.  Based on the previous steps, identify which search tool is best for your needs.  When searching for more specific targeted information on a person or business, use a specialized database or search engine.  When searching for more generic information about an industry or topic, choose a subject guide or subject directory.  Use the chart below the select the right tool for the job.

Step 4: Build a query.  Use the information gathered in the previous steps to build your search query.  Use the terms identified in Step 1 as the basis of  your query, adjusting it based on your purpose defined in Step 2.  Use the proper syntax and consider default features for the tool you selected in Step 3.  Be creative in generating your query and remember the GIGO principle - Garbage In, Garbage Out (Webopedia, 2003). The results are only as good as the query.  Make use of Boolean operators, consider synonyms, and include name variations or aliases.  See Advanced Search Techniques below for more assistance on building queries.

Step 5: Repeat.  Effective web searching is an art and it takes much practice to become proficient.  Repeat your search by adjusting your terms and adding or deleting items as necessary to tweak the results until you identify the information desired.

Top of Page

 

Common Searching Mistakes & Strategies to Avoid

Browsing Searchable Subject Directories.  Browsing can be beneficial in many ways and is an excellent method of gaining general knowledge about unfamiliar topics.  However, it is not efficient to browse when looking for targeted information.  Additionally, using browsing as a search technique has other downfalls.  The taxonomy in each subject directory is different and therefore classifications of topics might not be the same from directory to directory (Barker, 2002, Search).  This makes it difficult to develop a consistent browsing strategy.  When using a subject directory, it is best to use the search feature.

Simple Keyword Searching in Large Databases.  This refers to entering vague terms in the first search box you come across as opposed to using the advanced search feature.  This simple search usually utilizes the system’s defaults and doesn’t allow much flexibility.  Simple keyword searching can be used in subject directories to guide you to the right subject area but it should be avoided in large search engines.  Simple searching will result in irrelevant hits.  Learn and use the advanced searching features available on search engines.  Look for the "Advanced Search" link on any search engine homepage.

Focusing on Popular Links.  Everyone has different information needs and interests.  Simply because the site is “recommended” doesn’t mean it is the best source for you to use.  Recommended sites are often based on financial considerations (those sites that pay a fee to the search engine become “recommended”) or link popularity (Barker, 2002, Search).  Keep in mind that others might be visiting these sites for different reasons.  Click carefully and make your own evaluations.

Ignoring Stop Words.  “Stop Words” are words that search engines ignore because they are too common to be useful search criteria.  Common stop words are adverbs, conjunctions, prepositions, and all forms of the word be (Sherman, 2002, Part 1).  If you searched for:

to be or not to be 

all the words would be excluded except the word not.  The words to and be are stop words and would automatically be ignored.  The word or is a Boolean operator.  In this case, not is the only searchable word in the set.  Therefore, the search engine only searched for:

not

In this example, use of quotation marks would allow a search for the entire phrase, including the stop words.  The proper search query would be:

"to be or not to be"

It is important to note that different search engines treat stop words differently.  Refer to the Help or Advanced Search features on your search site for more information.

Misusing Boolean Operators.  Use of Boolean logic can strengthen a search tremendously.  Unfortunately, use of Boolean operators can be confusing for some users.  To make matters worse, different search engines may interpret the same operators in different ways (Sherman, 2002, Part 1).  Be sure and learn how Boolean operators are used by your favorite search tools.  Remember to check out what the default Boolean settings are!  For more information on Boolean operators, see Advanced Search Techniques below.

Ignoring Case Sensitivity.  Some search engines are case sensitive.  Generally, it is best to search with all lowercase letters unless searching for a proper noun like a name or place (Sherman, 2002, Concluded).  Often, this method will result in hits of all cases (UPPERCASE, lowercase, and Titlecase).  Again, it is important to take note of a search engines default settings with regard to case sensitivity.  To determine whether or not a search engine is case sensitive, see the search engine's Help feature or refer to the Search Engine Showdown's feature chart.

Poor Grammar.  Unlike humans, computers have a difficult time determining intent.  They are unable to hear inflections in tone or read body language.  The only information a search engine has is the information provided in the form of search terms.  Unfortunately, the overwhelming majority of novice and intermediate searchers fail to take into consideration these subtleties of the English language.  These idiosyncrasies or "Seven Deadly Nyms" (Sherman, 2002) can be the death of an otherwise good search.

Contronyms – a word that has multiple meanings that contradict the others.  Examples:  Hysterical (overwhelmed with fear vs. outrageously funny).  Fast (moving quickly vs. firmly stuck in place).

Heteronyms – words that are spelled identically but have different meanings when pronounced differently.  Examples:  bow, desert, object, lead.

Polyonyms – different words that have the same meaning.  Example:  Devil, Beelzebub, Lucifer, Satan.

Homonyms – words that have the same sound but a completely different meaning (and sometimes spelling).  Example:  to, two, too.

Capitonyms – words that change pronunciation and/or meaning when capitalized.  Examples:  polish vs. Polish, amber vs. Amber.

Exonym – a place name that foreigners use instead of the name that natives use.  Examples:  Cologne:Koln, Morocco:Moroc.

Top of Page

 

Advanced Search Techniques

Once an Internet researcher moves beyond the novice stage, more flexibility in searching is often desired.  All the major search engines provide an advanced search page which offers more flexibility in search logic and allows for the change of default settings used in the simple search.  In addition to this functionality, a researcher can take advantage of the more advanced searching functions listed below.

 

Boolean Logic

Logic is used to join search data and the most common form of logic is Boolean logic.  Boolean logic refers to the logical relationship among search terms and is named after mathematician George Boole (Cohen, 2003, Boolean). Almost all Internet search engines use Boolean logic as a basis for their search capabilities.  Basic Boolean logic contains three search terms:  AND, OR, and NOT.  Many people become confused when attempting to master the process of Boolean searching because simple searches can become extremely complex and difficult to follow as additional terms are added to the search.  When more terms are used with Boolean operators, the quality of search results goes up.  Unfortunately, this increases the complexity of the search.  For a primer in Boolean Logic, visit the University of Albany's Boolean Searching on the Internet tutorial.

Nearly all search engines on the internet allow the use of Implied Boolean Logic, also known as Search Engine Math (Sullivan, 2001).  Implied search logic uses arithmetic operators instead of traditional Boolean operators (Cohen, 2003, Boolean).  This helps simplify the search process for many people.

Using the OR operator  There is no arithmetic equivalent for this operator and some search engines use OR as their default Boolean setting.  To search for pages that include either cats OR dogs, type cats OR dogs.  Note that, in this case, OR must be capitalized to be used as a Boolean operator in some search engines.

Using the + operator  This tells the search engine that this term is absolutely required.  If you were looking for pages that included both cats AND dogs, type +cats +dogs

Using the – operator  This tells the search engine to exclude a term.  If you were looking for pages that include cats AND NOT dogs, type cats –dogs

Using quotation marks  Quotation marks tells the search engine to search for a phrase.  Phrase searching requires all the resulting terms to appear in the same order that they were typed.  If you were looking for pages that include the phrase “raining cats and dogs,” type “raining cats and dogs”

Combining operators  Combine operators to create more targeted search strings.  Parentheses may also be used to execute search operators in a particular order, much like a mathematical equation.  The string below searches for cats OR felines AND NOT dogs AND “grooming services.”  Type +(cats OR felines) –dogs +”grooming services”

Using these Boolean operators can improve the results of your search.  Remember that to use them effectively, be sure to check on how the search tool being used interprets these operators and whether or not it is case sensitive.  The techniques described in this section are based on the functionality of most major search engines.  However, each search engine functions differently and readers are encouraged to consult the Help feature on each search engine website before employing these techniques.

 

Power Searching

The syntax shown for the power searching examples below is specific to Google.  Search engine syntax varies and users should consult the Help or Frequently Asked Questions resources available for the search tool being used.

Title.  Title Search restricts a query to the text in the HTML title of a webpage.  This is the text that appears within the title tag of a webpage document.  For example, if this section of this document were it’s own web page, it’s title tag might look like this  <title>Power Searching</title>.  To conduct a title search in most major search engines, type allintitle:terms.  To search for a web page with the term cats and the term dogs in the title, type allintitle: +cats +dogs

Site.  Site search allows you to search only the pages that have been indexed for a specific website.  For example, to search within Microsoft’s website for pages that include the words cats and dogs, type +cats +dogs site:microsoft.com

Domain.  Site search also allows you to restrict only the domain.  To search only a commercial domain for cats and dogs, type +cats +dogs site:.com.  Site searching also allows you to restrict searches to country domains.  Country domains are represented by a two letter designation on the end of a website URL.  For example, .uk is the country domain for the United Kingdom and .ca is the country domain for Canada.  To search only sites in the UK domain for the terms cats and dogs, type +cats +dogs site:.uk

Uniform Resource Locator (URL).  A URL search is similar to a site search but instead searches the actual text of the URL.  Users can search using the allinurl: command .  To search all URLs for the term “fraud,” type allinurl:fraud

Link.  The link search feature allows you to search for all the pages linking to a particular page or domain that you specify.  For example, to find webpages that contain links to Microsoft.com,  you would type link:microsoft.com

Wildcard.  Wildcard characters, represented as an asterisk (*), allow searching for plurals or variations of a word.  It is an excellent search technique to use if the correct spelling of a word is unknown or if there are multiple spellings used.  For example, to search for “theater” and “theatre,” type theat*

Stemming.  Stemming eliminates a suffix from a search term and returns all variations of that term.  Some search engines provide stemming as a default.  For example, if a search is run for “singing,” results would be returned for “sing,” “singing,” “sings,” etc.  In these cases, use of a wildcard symbol is not necessary.

Top of Page

 

 

Proceed to Chapter 5:  The Deep Web

 

   
  © 2003-2004 James D. Ruotolo.  All rights reserved.

last updated December, 2003