Running Nutch with Resin
These are the special tweaks I had to do to get Nutch to run with Resin2.
I followed the tutorial.
I did everything the tutorial said. In the command where I am asking nutch to crawl, I wanted to point it my own searcher dir (look below). So, I modified the command like so:
Resin does not use Xerces as its default xml parser and Nutch is happier with Xerces. So Resin should be told to use Xerces. You should add following lines to resin.conf in the element.
I was not using jdk1.4 logging yet, so I also added the following system property to see all of Nutch's logging. I configured the java1.4logging.conf as:
Another problem that comes up while using resin is that Nutch was not able to find searcher.dir
Not finding searcher dir
my search page looked like this:
so in the logs it looked like:
if everything was good, your logs would look similar to:
so I modified nutch-site.xml like following:
<?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<description>My path to nutch's searcher dir.</description>
Note: Same property exists in the nutch-default.xml, but you should not change it. Use nutch-site.xml to change properties for your specific installation. This suggestion is reiterated from the first comment in nutch-default.xml.
Resin3 should have similar issues, so one should be able to fix them in a similar manner. I have not tried it on Resin3 yet, but will soon.