cTAKES 3.0 - LVG

{scrollbar}

65%

Contents of this Page

2

Overview of LVG

This annotator wraps the National Library of Medicine (NLM) SPECIALIST lexical tools. It generates a canonical form for words, and this normalization step is helpful for dictionary lookup.* It also generates a list of lemma entries with Penn Treebank tags. These tags could be useful for a part of speech (POS) tagger. However, for the OpenNLP POS tagger, we use a tag dictionary rather than lemma information.

Refer to the documentation for the POS tagger annotator.

*Note: LVG adds variants that the dictionary lookup will use in an attempt to discover terms whose form in the text is not present in the dictionary database. e.g., the singular variants of plural forms, capitalization variants, etc. While LVG often increases the number of dictionary terms found in the text, there is, of course, a risk of introducing false positive returns from the dictionary.

Analysis engines (annotator) - LvgAnnotator.xml

Parameters

UseSegments
controls whether only certain sections will be annotated by this annotator

SegmentsToSkip
list of sections not to be processed by this annotator

UseCmdCache
controls whether to look up information in a cache before using norm

CmdCacheFileLocation
location of norm cache file

CmdCacheFrequencyCutoff
(cutoff value)

ExclusionSet
words for which canonicalForm is never set and Lemma entries are never posted

XeroxTreebankMap
mapping of part of speech tags used by LVG to POS tags from lexical tools to Penn Treebank tags

PostLemmas
controls whether any lemma entries are posted to the CAS

UseLemmaCache
controls whether to look up lemma information in a cache before using lvg

LemmaCacheFileLocation
the location of the cache file

LemmaCacheFileFrequencyCutoff
(cutoff value)

Resources

lvg.properties - The LVG config file resources/lvg/data/config/lvg.properties defines the location and attributes of the LVG database and the jdbc driver used.
LVG database - The database engine used is hsqldb. The database file included is a sample. Refer to the LVG section of the install instructions for details on how to replace the sample.

Space shortcuts

Child pages

Overview of LVG

Analysis engines (annotator) - LvgAnnotator.xml

Parameters

Resources