September 1998 Search Engine Coverage Update
Publications
CiteSeer
Inquirus
Search Tips
Web Analysis
Press Information
Home page
Here is an update on the coverage of the search engines as of
September 1998. For the original study, including estimation of the
size of the Web, see web
analysis.
Conclusions
With respect to our original study, some highlights of the current study are below.
Coverage is worse |
The coverage of the engines is increasing slower than the size of the Web. |
Dead links are more common |
The percentage of dead links returned by the engines has increased. |
Northern Light |
Northern Light now has the second highest coverage for our queries. |
Coverage
Here is the estimated coverage of each engine compared with the
combined coverage of the 7 engines used in the study. These estimates
are averaged over 1025 queries performed in September 1998.
As before, we did not simply compare the number of documents returned
by each engine - results from such a study would provide inaccurate
estimates of the coverage of the engines. Instead, we downloaded and
analyzed every single page that each engine listed, in order to
enforce a consistent relevance measure across all engines (otherwise
some engines return documents with related terms or documents that no
longer exist which would make the results inaccurate). For an
indication of the stability of the results versus the number of
queries used, see the coverage
estimates versus the number of queries (a random subset of the
queries was chosen for each point on the graph).
We note that the decreased relative coverage of HotBot appears to
be due to their new practice of only listing one page per site. We are
currently unaware of a way to turn this feature off (as we can for
Infoseek). Also of note in comparison to our previous study: Northern
Light has significantly increased their coverage relative to the other
engines, and the difference between the largest and smallest coverage
of the engines is not as great.
It is important to note that the queries used in the study were
from the employees of the NEC Research Institute. Most of the
employees are scientists, and scientists tend to search for less
"popular", or harder to find information. However, the search engines
are typically biased towards indexing more "popular"
information. Therefore the coverage of the search engines is typically better
for more popular information.
Recency; freshness; invalid links
The following figure shows the percentage of invalid links for seven
major Web search engines, averaged over 1025 queries performed in
September 1998.
Other issues and future studies
There are many other issues not discussed here, for example other ways
to compare search engines (e.g. relevance, query interface), and
suggestions for searching the Web depending on the kind of information
desired. More information can be found on the original
study page, in the related interviews
and press articles, and our tips
for searching the Web. For full details of the original study,
request a reprint of the
article.
Future studies will update and extend the results shown here. Check
back for updates.
Tips for searching the Web
More information and press
Request a reprint of the article
Search Engine Watch: News, tips and more about search engines, by Danny Sullivan
Home page for Steve Lawrence
Home page for C. Lee Giles
Copyright © 1998 NEC Research Institute