Problem:
To make a simple UIMA app work over hadoop
Assumption:
How TO
It will not work. There are several other things which has to be taken care of before
Important Consideration (Before creating/running the Job Jar over hadoop)
E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
rMng.setExtensionClassPath(str, true); //Here str is the path to any of the resources which can be obtained via
//ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
rMng.setDataPath(str);
aEngine = UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
This 4th point has to be considered as when we read a xml without using classloader by default it reads from temp task directory i.e..
/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
But all the resources gets unjarred in
/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
directory
So to tell the system to look out for hadoop in the correct directory we have to use Resource Manager. Actually this is required to take
care of the the resources which UIMA will try to load because of the imports present in its various descriptors