September 1998 Search Engine Coverage Update

Steve Lawrence and C. Lee Giles, NEC Research Institute

Publications      CiteSeer      Inquirus      Search Tips      Web Analysis      Press Information      Home page

Here is an update on the coverage of the search engines as of September 1998. For the original study, including estimation of the size of the Web, see web analysis.

Conclusions

With respect to our original study, some highlights of the current study are below.

Coverage is worse

The coverage of the engines is increasing slower than the size of the Web.

Dead links are more common

The percentage of dead links returned by the engines has increased.

Northern Light

Northern Light now has the second highest coverage for our queries.

Coverage

Here is the estimated coverage of each engine compared with the combined coverage of the 7 engines used in the study. These estimates are averaged over 1025 queries performed in September 1998.

Web coverage


As before, we did not simply compare the number of documents returned by each engine - results from such a study would provide inaccurate estimates of the coverage of the engines. Instead, we downloaded and analyzed every single page that each engine listed, in order to enforce a consistent relevance measure across all engines (otherwise some engines return documents with related terms or documents that no longer exist which would make the results inaccurate). For an indication of the stability of the results versus the number of queries used, see the coverage estimates versus the number of queries (a random subset of the queries was chosen for each point on the graph).

We note that the decreased relative coverage of HotBot appears to be due to their new practice of only listing one page per site. We are currently unaware of a way to turn this feature off (as we can for Infoseek). Also of note in comparison to our previous study: Northern Light has significantly increased their coverage relative to the other engines, and the difference between the largest and smallest coverage of the engines is not as great.

It is important to note that the queries used in the study were from the employees of the NEC Research Institute. Most of the employees are scientists, and scientists tend to search for less "popular", or harder to find information. However, the search engines are typically biased towards indexing more "popular" information. Therefore the coverage of the search engines is typically better for more popular information.

Recency; freshness; invalid links

The following figure shows the percentage of invalid links for seven major Web search engines, averaged over 1025 queries performed in September 1998.

Web coverage


Other issues and future studies

There are many other issues not discussed here, for example other ways to compare search engines (e.g. relevance, query interface), and suggestions for searching the Web depending on the kind of information desired. More information can be found on the original study page, in the related interviews and press articles, and our tips for searching the Web. For full details of the original study, request a reprint of the article.

Future studies will update and extend the results shown here. Check back for updates.


Tips for searching the Web
More information and press
Request a reprint of the article

Search Engine Watch: News, tips and more about search engines, by Danny Sullivan

Home page for Steve Lawrence
Home page for C. Lee Giles

Copyright © 1998 NEC Research Institute