Tuesday, September 30, 2008

Limitations of Google Web Services Output

Many developers are used to working with a variety of data types when creating applications. Data types help define the kind of data you're using. For example, if a data element is a number, you might use an integer (a number without a decimal) or real number (one that has a decimal and equates to a Single or Double for Visual Basic developers). A Web service has no concept of data type when it comes to the data itself. Every data transfer is text. The XML used to transfer the data does include type information, but of the sort that's normally associated with database fields, which means you have to know the field names to make an interpretation. For example, you might receive data in a message like the one shown here.


12k


... some text highlight) more text ...






True




http://www.mwt.net/~jmueller

<b>DataCon Services</b>


You don't have to understand the XML portion of this message segment, but look at the data. Google Web Services sends all data as characters (as do all other Web services) and defines the data using tags (the words between the angle brackets) and attributes (extra information within the tag). For example, the line that contains http://www.mwt.net/~jmueller includes the tag that tells you that this value is http://www.mwt.net/~jmueller and that the tag type is an xsi:type=“xsd:string”. The tag tells you what kind of information this is. By knowing the Google database layout, you also know the data type and other information about the entry. However, the information you receive from Google is still plain text. You can see other examples of XML responses in the \GoogleAPI\soap-samples of the kit. Simply open them using Internet Explorer or another browser that supports XML


Your browser is actually very handy for viewing XML data, even if it might not make sense right now. The "Viewing XML Data in Your Browser" section of Chapter 3 discusses in detail how you can use your browser. For right now, all you need to know is that you can look at the various kinds of XML responses by opening the files in your browser.

Figure 1.5 points to another potential problem with Web service output. All of the tags and other information supplied in a request and response consume space. The file is larger than a text file with the same data because of all the tag information required. In addition, it's far more efficient to store many data types in their native format, rather than use characters. Consequently, Web service data suffers from bloat. The data uses more bandwidth than a binary message and consequently, you could experience performance problems. Because of this issue, you need to create efficient queries for your application that maximize data throughput despite the limitations of the XML format. The "Making Sensible Queries" section of the chapter discusses this issue in detail.

The results you obtain from Google are largely a matter of the input you provide in the form of a request. The "Conducting an Expansion Search" section of the chapter points out a serious flaw in making any assumptions about the return you receive from Google. The query can become quite complex because even the order of the words makes a difference in the results you receive. Google must make this assumption because most people enter the words in the order they think about them, which is usually most important to least important. Consequently, if you always assume that your first query returns all possible results, you'll find Google Web Services disappointing.

The ranking of results you receive from Google Web Services is also unlikely to be the same as the ranking you need. Google sells keywords to make some sites turn up higher in the result list. In addition, Google often bases the site ranking on criteria that won't match your own, such as the number of times that a keyword appears. The bottom line is that the output you receive from Google is "raw" output—information that you haven't filtered or organized in any way. One of the reasons to use Google Web Services is to enable you to perform tasks such as site ranking so the results appear in the order that's best for your organization.

No comments: