===*P*roblem (exception) / *S*olution pairs===

P:Not a known field name:DEFAULT

S: Add plugin
<value>query-basic|..... in nutch-default.xml

P: java.lang.NullPointerException at java.io.Reader.(Reader.java:61) ... at org.apache.nutch.analysis.CommonGrams.init(CommonGrams.java:152) at

S: the file common-terms.utf8 needs to be in the right directory (lib | classes?)
<description>The name of a file containing a list of common terms
that should be indexed in n-grams.

P: Bad mapred.job.tracker: local

S: if you want to run crawl without hdfs you can omit start-all.sh
just do " nutch crawl urlsdir "

P: ... getlocalpath NullPointerException

S: check mapred.local.dir and other tmp dirs in nutch-default.xml / hadoop-default.xml

P:extension point: org.apache.nutch.net.URLNormalizer does not exist

S:check your plugins + plugin.includes settings and add urlnormalizer-regex or urlnormalizer-(pass|regex|basic)

P:java.net.UnknownHostException "hostname"

S: add "hostname" to the /etc/hosts file.

P: ...[null] [MalformedUrlException]

S: add common-terms.utf8 to nutch dir

P: java.lang.ClassCastException: org.apache.hadoop.io.Text

S: wrong hadoop version / patch http://files.pannous.de/org.rar

P:java.lang.NoSuchMethodError: org.apache.hadoop.io.MapFile $Writer.

S: wrong hadoop version / patch http://files.pannous.de/org.rar

P: NullPointerException when crawling :

S: add to nutch-site.xml:
<description>Our HTTP 'User-Agent' request header.</description>

P: java.io.IOException: config()

S: ignore it ! (wink)

P: nutch crawl ... Job Failed!

S: manifold. set log4j.properties debug level ! log4j.rootLogger=ALL, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender

P: No scoring plugins - at least one scoring plugin is required!

S: Add "scoring-opic" to <property> <name>plugin.includes</name>

P: ... java.net.SocketTimeoutException: Accept timed out

S: try using nutch without hdfs / check ports in hadoop file / RPC problems : start crawl without startall.sh ?

P: java.lang.NoClassDefFoundError xyz on windows

S: get rid of spaces in your classpath and path variables !

  • No labels