Emerald Group Publishing Limited
Copyright © 1999, MCB UP Limited
Keywords: WWW, Intelligent character recognition, Information retrieval
There has been some discussion in the media lately of the value of the information found on the Internet. Some writers claim that the Internet is a poor provider of information as they have been unable to find what they want. I maintain that the fault lies with the user and not with the Internet. It is certainly true that there is an untold amount of rubbish on the Internet but I would say that the percentage of dross on the Internet is no higher than the percentage of dross put out world wide in printed form. If you were to enter the search term yields on the Internet you would find many thousands of pages (227,510 on AltaVista to be precise) most of which would be irrelevant to your needs.
However, it must be remembered that this is the equivalent of carrying out a search on all the text of all the printed material in the world. Such a search would throw up hundreds of Mills and Boon type books where the innocent maiden yields to the demands of the devilishly handsome young squire and a lot of links to agricultural pages where you could read about crop yields. Just as you would pick and choose which books to look in to find out about yields, so you need to pick and choose more carefully when searching the Internet.
The major search engines all offer advanced search options that enable you to narrow your search. However, even using the simple search option you can use syntax to limit your search. Even if you insist on just typing the word yields, some of the search engines will offer you options once the search has been carried out. For instance, AltaVista (www.altavista.com), as well as offering you the two hundred thousand odd Web pages also offers you "related searches" where it proposes a number of search terms from money market yields through corporate bond yield to crop yield.
Searching by phrase
You will generally get the best results if you frame the search more precisely at the start. For example, searching for the phrase "property yields" returned 126 Web pages, a much more manageable list than the search results for yields alone. In this case note that the search term is enclosed in quotation marks. By doing this the search is for the exact phrase property yields; the words must appear adjacent to each other and in the order shown. This use of exact phrase searching is not limited to AltaVista but is common across all the major search engines. In addition to the use of quotation marks, the simple search facility generally allows the use of the + and - signs. To make sure that a specific word is included in your search topic, place the plus (+) symbol before the word in the search box. To make sure that a specific word is excluded from your search topic, place a minus (--) sign before the word in the search box. For example, a search for yields - crop would ensure that those pages specifically dealing with crop yields would not be returned in your research results.
It is usually best to use lower case only in your search term. When you use lower case text, the search service finds both upper and lower case results. When you use upper case text, the search service finds only exact case matches. So if you were to type London in the search box your results would include those with london, London and LONDON in them. If you typed London, then only those with London will be returned.
Limiting by date and location
Another frequent, and perhaps more justified, complaint about the Internet is that the information retrieved is often out of date. This problem can be avoided by limiting the dates searched. In fact two problems can be solved in one by using yahoo's UK site - www.yahoo.co.uk. By choosing this, rather than the US based www.yahoo.com you can limit your search to UK sites only. This avoids the problem of the predominance of material from the USA that generally appears in most search results. Once on the Yahoo site choose the "advanced search" option which appears at the end of the search box. You will be taken to http://search.yahoo.co.uk/search/ukie/options and you will be given the chance to "Find only new listings added during the past ..." with a selection of date limitations from one day to three years.
InfoSeek is another search engine that has a UK-based site (www.infoseek.co.uk). Their simple search page lets you specify a Web wide search or a search of sites in any one of 20 or so countries. The disadvantage of the UK site is that it does not offer advanced options. The US based site, however, does and has an easy to use interface. Go to http://infoseek.go.com and then choose "search options" for a form to fill out. As well as allowing the user to include or exclude specific words or phrases, the form allows you to specify the location and so a UK based site search can be carried out from here. The other advantage of the US based site over the UK alternative is that when the results are returned each site has a link to "find similar pages". So, for instance, a search for forecasts limited to the UK returned over 400 sites including weather forecasts and forecasts of satellite TV sales. However, there was a link to "Inflation Forecasts and Monetary Policy" and on choosing "find similar pages" I was presented with a list of very relevant and useful looking sites. These included links to World Bank working papers on monetary policy, a site on "The Role of Forecasting in Meeting Inflation Targets" and The Treasury site with a paper on"The Monetary Framework".
Before moving on to hierarchical search methods it is worth mentioning a new type of search engine that is gaining in popularity. These are known as multisearch engines. My favourite is Dogpile, which can be found at www.dogpile.com. This site allows you to enter you search term and then it goes off and searches a whole range of other engines and returns the results. As it says: "The Dogpile search interface takes a single line for a query and processes it so that you will get the maximum benefit from your search. Currently twenty-two search tools are supported". The Web search engines that are currently checked are LookSmart, Thunderstone, GoTo.com, Yahoo, Dogpile Open Directory, About.com, Lycos' Top 5 per cent, InfoSeek, Direct Hit, Lycos, and AltaVista. The results from each engine are displayed individually, which can be useful in determining which engine might be of most use to you in future.
Another multisearch engine worth a mention is Savvy Search, which is located at www.savvysearch.com. Savvy Search queries 19 search engines simultaneously as well as DejaNews, Shareware.com, 411 People Finder, the Yellow Pages and other directories. It reveals the results in order of relevance, indicating which search engines contained which listings. A useful feature of this service is the "integrate results" option, as it removes duplicates, and provides a succinct summary of each of the findings.
If you are totally unfamiliar with the Web or are trying to find out about a topic of which you know very little it is sometimes difficult to form a sensible search term. In these cases the hierarchical search, or search by topic, can be useful. Most of the major search engines offer between 12 and 16 top-level topics ranging from Arts and Entertainment, through Reference to Sports. So, if a user wanted to find something on, say, investing in the Central America, they might start by choosing the Regional topic from the Lycos home page. From the next level they would choose Central America and then narrow the search to International Business and Trade. From here there are a number of links including one to a document on "Doing Business in Latin America", this site claims that "through years of experience TCA knows the 'dos' and the 'don'ts' in these countries. TCA knows how to deal with bureaucratic systems, speaks the local languages fluently, is familiar with the market and paves the way for you to come into contact with your new partners and customers".
A completely different route would be followed if looking for details of the economic situation in Newfoundland. Staring from Yahoo's home page (www.yahoo.com) choose Business and Economy and from the next menu choose Statistics and Indicators. From the choices presented choose economic, which takes you to a list of sites including "Canadian Imperial Bank of Commerce Economics". Here we find an excellent regional overview of Newfoundland. The two page document begins by telling me that "Newfoundland growth this year is projected to be about 3.4 per cent, down from its estimated pace of 4.0 per cent in 1998. Continued, but slower, expansion at a pace of about 2.0 per cent, is forecast for 2000." The document includes graphs showing Economic Growth and Net Interprovincial Migration from Newfoundland and includes details of employment and reviews of the industrial and service sectors.
In conclusion then, I hope that I have given a few pointers on more efficient search methodology. There is no denying that it does take time to learn to use the Internet effectively, just as it takes time to learn to use a library effectively. However, it is time well spent. Just taking the last search as an example: if you were asked to produce a report on the current economic situation in Newfoundland with projections for the offshore fishing industry in the 2000 where on earth would you start? By using the Internet the above report would be in front of you in less that two minutes, further details on the fishing industry is available from the Canadian Imperial Bank of Commerce Economics site. Even with looking further afield, the whole thing could be placed in a wider context by looking at A Post-Budget Review of Provincial Finances on the same site. All this for less than five minutes of search time. BEAT THAT with traditional sources!
Sites mentioned in this article
AltaVista: www.altavista.comDogpile: www.dogpile.comInfoSeek UK site: www.infoseek.co.ukInfoSeek US site: www.lycos.comSavvySearch: www.savvysearch.com/Yahoo's UK: www.yahoo.co.ukYahoo UK advanced search: search.yahoo.co.uk/search/ukie/options
Scarlett Palmer The Department of Land Management and Development,The University of Reading, RG6 6AW.E-mail: email@example.com