Tuesday, September 30, 2008

Conducting an Expansion Search

Expansion searches help you locate all available information on a topic by playing to the features that Google provides. For example, the order of search terms is important in the way that Google interprets a search. In addition, if you work in an acronym-laden field, expanding the acronyms is important to locate all sources of information on a given topic. Consider the following permutations of a search using the keywords Visual Basic serial port.

Visual Basic Serial Port This combination returns 132,000 hits with a first site of http://www.distiworld.com/cd-burner-to-download.htm./

Serial Port Visual Basic Just changing the two groups of words around reduces the number of hits to 130,000 with a first site of http://www.lvr.com/spc.htm.

Serial Port VB Using the VB acronym reduces the number of hits further to 58,200 with a first site of http://www.control.com/1026175817/index_html.

VB Serial Port You'd think that this number would be higher than the Serial Port VB search because of previous results. However, the number of hits is only 57,300 with a first site of http://forums.basicmicro.net/ShowPost.aspx?PostID=7638.

Four sets of keywords (and you could easily do more), four completely different results—it's not hard to understand why an expansion search could help you obtain the maximum benefit from Google. Manual expansion searches become cumbersome for a number of reasons. Repetition is one of the main causes, but there are others such as entry errors and result interpretation. You have to provide enough keywords to make a search specific, but each keyword adds an order of complexity to the expansion search.

Google Web Services steps in by letting you perform an expansion search automatically using code. You supply the four keywords—the code does the rest. By comparing the results of each expansion search, you can come up with an optimal group of sites. For example, you could verify that the site appears in every expansion search return, which tends to reduce the false positives. You can also rate the sites based on the number of times they appear and their position in the list. Although it's possible to perform this kind of data manipulation using a manual search, no one would want to do it.

No comments: