Readlinkdb is an alias for org.apache.nutch.crawl.LinkDbReader
This reader class enables us to to obtain various information from within a linkdb. The two types of information we can retirieve is
- A dump of the whole linkdb which is then written to a text file for easy viewing.
- Specific information relating to a specific URL.
:TODO: More could be added to the above e.g what is the nature and structure of the information we retieve from a dump of the linkdb and a specific URL.
Usage:
bin/nutch readlinkdb <linkdb> (-dump <out_dir> | -url <url>)
<linkdb>: This is the linkdb diretory we wish to read and obtain information from.
-dump <out_dir>: This parameter dumps the whole linkdb to a text file in any <out_dir> we wish to specify.
-url <url>: The -url arguement provides us with information about a specific <url>. This is written to System.out.