
Invisible Web
Def.: everything a search engine does not see is invisible
- non linked web servers and pages
- "dangerous" pages/URLs (?URLs: http://...?... - crawler traps)
- database content (accessible by textboxes etc. only)
- a) servers: detectable by domain registry
b) pages: important amount detectable by path-crawling
- "dangerous"/dynamic pages: robust crawler software
- databases (textbox only etc.):
a) theoretically by automated (dictionary) input
b) practically by meta searching the databases
How big is all that invisible web??
start
(C) W.Sander-Beuermann, University of Hannover, RRZN, SearchEngineLab