The Robot
|
 |
- list of start-URLs as input
- HTTP requests
- uses robots.txt
- follows redirect, moves, refreshes
- detects file-types
- extract text-body
- collects links, follows them (generating new URL-list)
next
Details of ResearchPortal.net, Dirk Hennig, RRZN, University of Hannover, Germany, http://metager.de/cris2002/