Date: Tue, 19 Mar 2024 09:14:29 +0000 (UTC) Message-ID: <1460459590.56019.1710839669606@cwiki-he-fi.apache.org> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_56018_1525979909.1710839669606" ------=_Part_56018_1525979909.1710839669606 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
UIMA Module was= removed in Solr 7.5 (SOLR-11694)
Solr UIMA contrib enables enhancing of Solr documents using the Unstruct= ured Information Management Architecture (UIMA). UIMA lets you defin= e custom pipelines of Analysis Engines which incrementally add metadata to = the document via annotations.
The SolrUIMA UpdateRequestProcessor is a custom UpdateRequestProcessor that takes document(s) = being indexed, sends them to a UIMA pipeline and then returns the document(= s) enriched with the specified metadata.
mkdir solr/example/solr/collection1/lib cp solr/dist/apache-solr-uima*.jar solr/example/solr/collection1/lib cp solr/contrib/uima/lib/*.jar solr/example/solr/collection1/lib/ cp solr/build/contrib/solr-uima/lucene-libs/lucene-analyzers-uima-4.0-SNA= PSHOT.jar solr/example/solr/collection1/lib/
All the SolrUIMA configuration is placed inside a <uimaConfig> ele= ment inside the solrconfig.xml.
<uimaConfig> <runtimeParameters> <!-- here go parameters defined in the AE which override parameters= in the delegate AEs --> ... </runtimeParameters> <analysisEngine><!-- here goes the AE path in the classpath --&g= t;</analysisEngine> <analyzeFields merge=3D"true"><!-- comma separated list of field= s of the original document to analyze --></analyzeFields> <fieldMapping> <!-- here goes the mapping between features of UIMA FeatureStructure= s to Solr fields --> <type name=3D"org.apache.uima.something.Annotation"> <map feature=3D"oneFeature" field=3D"destination_field"/> </type> ... </fieldMapping> </uimaConfig>
The analysisEngine element holds the classpath to the UIMA Analysis Engi= ne descriptor that describes which analysis block should be executed. The a= nalysis engine referenced can be primitive or aggregate.
The analyzeFields element lists the name of fields (comma separated) whi= ch will be analyzed by the UIMA pipeline. If the attribute merge is false t= he field specified will be analyzed separately while if merge is true the l= isted fields contents will be merged and analyzed only once.
see SOLR-2129
UIMA supports the use of existing analysis engines (see he= re and here) as long as the creation of cu= stom components.
The current contrib/uima module uses a predefined set of components :
These components are arranged in a pipeline inside the OverridingParamsExtServicesAE Analysis Engi= ne descriptor. As you can see looking at the descriptor fragment;
<node>AggregateSentenceAE</node> <node>OpenCalaisAnnotator</node> <node>TextKeywordExtractionAEDescriptor</node> <node>TextLanguageDetectionAEDescriptor</node> <node>TextCategorizationAEDescriptor</node> <node>TextConceptTaggingAEDescriptor</node> <node>TextRankedEntityExtractionAEDescriptor</node>
the first node represent an aggregate Analysis Engine which includes the= Whitespace Tokenizer and HMM Tagger (recognizing sentences), the second no= de uses the Open Calais Annotator to extracte named entities, the following= nodes use different Alchemy API Annotator services to detect keywords, lan= guage, document category, discovered concepts and named entities.
To use different UIMA components inside the contrib/uima module you need= to:
If you're using Ant you only need put the component jar inside the solr/= contrib/uima/lib directory.
If you're using Maven you need to declare the component you want to use = inside the <dependencies> element in the generated pom.xml.
For example if you want to use UIMA Dictionary Annotator 2.3.1-SNAPSHOT = you can either get it from snapshot repo and paste it i= n solr/contrib/uima/lib and run 'ant clean dist' or paste the following in = the generated pom.xml (as child of the <dependencies> tag) and run 'm= vn clean package'.
<dependency> <groupId>org.apache.uima</groupId> <artifactId>DictionaryAnnotator</artifactId> <version>2.3.1-SNAPSHOT</version> </dependency>
Change the descriptor to be used by this module inside config/uimaConfig= /analysisEngine of the solrconfig.xml of your Solr instance.
One can use the default one bundled inside the component or create a new= one.
For example to use one of the default Dictionary Annotator Analysis Engi= ne descriptors use the following (which runs Whitespace Tokenizer and then = Dictionary Annotator):
<config> ... <uimaConfig> ... <analysisEngine>/AggregateAE.xml</analysisEngine> ... </uimaConfig> ... </config>
Sometimes Analysis Engines require custom parameters to be set inside th= eir descriptor or custom resources to be imported. The easiest way to do so= is to get a copy of such a descriptor, modify parameters/resources as need= ed and put them inside a directory which gets included in the final jar (i.= e.: solr/contrib/uima/src/main/resources/org/apache/uima )
Inside the solrconfig.xml go to config/uimaConfig/fieldMapping element a= nd change <type> element according to the annotations extracted by th= e used component.
For example if you're using the Dictionary Annotator and you want to put= the dictionary entry annotations found inside a 'lemmas' field you should = configure the fieldMapping element as following:
<config> ... <uimaConfig> ... <fieldMapping> <type name=3D"org.apache.uima.DictionaryEntry"> <map feature=3D"coveredText" field=3D"lemmas"/> </type> </fieldMapping> ... </uimaConfig> ... </config>
Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima= path.
Get the generated apache-solr-uima*.jar from the build directory along w= ith the used components' jars and paste both inside one of the <lib> = directories defined inside the solrconfig.xml.
You can now restart the Solr-UIMA instance to test it.
This is a UIMA component, see SVN<= /a> and documentation
For a deepest dive into UIMA please take a look at the documentation