Parsechecker is an alias for org.apache.nutch.parse.ParserChecker

This class takes an URL, fetches it (without storing) and returns the URL, the parse_data and all of the parse_text of that URL. It is extremely useful for checking parser implementations from the command line.


bin/nutch parsechecker [-dumpText] [-forceAs mimeType] url

[-dumpText]: TEnables us to dump the parse_text into a text file

[-forceAs mimeType]: Forces mimType for the given URL arguement.

url: The URL you wish to check the parser on.

e.g. bin/nutch parsechecker -dumpText > check.log


  • No labels