Recently, and with a bit of effort, I got db1.spack up and running on nutch trunk. I decided to keep track of what I did to get db2.spack up and running, and contribute this tutorial.
Here are some minimal steps:
- get either the Desktop or Server Version from http://www.ubuntu.com/download/
- boot and install
- familiarize yourself with: sudo su -
Add Nutch User
Let's add a nutch user to do our nutch stuff
I tried to get java from normal apt sources and I am guessing it is my Athlon that broke me. I broke down and got java from Sun (http://java.sun.com/j2se/1.5.0/download.jsp), the Download JDK 5.0 Update 4 link. I tried getting the 1.4.2 and it didn't work, but 1.5.0 worked.
You might also want to follow the instructions for Debian-izing the Sun JDK: http://plugindoc.mozdev.org/faqs/distronotes/ubuntu-x86.html#java-sun
Let's put JAVA_HOME in our ~/.bash_profiles, and source said ~/.bash_profiles for root and nutch
Add the Multiverse to your sources.list or use the GUI:
System -> Administration -> Synaptic Package Manager
Settings -> Repositories
With the new apt sources, let's update
And get the packages we need.
subversion is used to get nutch, ant is used to build nutch and lynx is used to test nutch.
Build Nutch Code and Index
Let's change over to the nutch user
Checkout the code AND the gora code
Since this tutorial is for getting trunk to work, let's go there
We build with ant
And build a war for tomcat and later searching
Follow the nutch tutorial (http://lucene.apache.org/nutch/tutorial.html) to build a index, or for a simple index:
If you are using the latest "trunk" stuff, the url seeding has been changed from a single file to a directory. Using trunk (after 0.7.2), put the urls in a file (here, called "nutch") in a DIRECTORY called "urls":
Using 0.7.2 or before, just put urls in a FILE called "urls":
Then, in any case, you specify in the same fashion ("urls" below referring either to a dir or a file, depending on the version you're using):
See, perl can be useful
Again, I tried apt without much luck, so I downloaded tomcat from Apache (http://jakarta.apache.org/site/downloads/downloads_tomcat-4.cgi).
As above, I put the java stuff in /opt
Out with the old and in with the new
Let's move to where we put the index
And start tomcat from there
Connect to tomcat and perform a search.
I searched for 'nutch' and all was well! (you can use <TAB> to get to the search input in lynx)
Tutorial written by Earl Cahill, 2005