This document describes additional installation steps required to take advantage of the following YTEX features:
- Semantic Similarity & Word Sense Disambiguation
- Storing annotations in a relational database
- Exporting annotations to machine learning tools
Prerequisites
- we suggest that you install UMLS in your database. NLM provides scripts for installing UMLS in MySQL and Oracle. Refer to UMLS SQL Server Installation for instructions on how to install UMLS in MS SQL Server.
- Tested on Linux and Windows. We don't test YTEX on mac; however, users have successfully installed this on mac following linux installation instructions.
Database Prerequisites
YTEX supports MS SQL Server 2008 and above, MySQL version 5.x, and Oracle versions 10gR2 and above. Create a database user (and schema) for use with ytex. See platform specific notes below.
Oracle
As documented here your database must use the UTF-8 charset.
Make sure you use a tablespace with enough room; e.g. create the ytex user and schema like this:
create tablespace TBS_YTEX datafile 'C:/oracle/oradata/orcl/TBS_YTEX.dbf' size 1000M autoextend on online; create user ytex identified by ytex default tablespace TBS_YTEX; alter user ytex quota unlimited on TBS_YTEX; grant connect, resource to ytex; grant create materialized view to ytex; grant create view to ytex;
If you have installed the UMLS locally, you must also grant ytex select permissions on umls tables; e.g. assuming that umls tables are in the umls schema:
grant select on umls.MRCONSO to ytex; grant select on umls.MRSTY to ytex; grant select on umls.MRREL to ytex;
MySQL
To create the mysql user and database, login to mysql as root and run the following commands (change as necessary):
CREATE DATABASE ytex CHARACTER SET utf8; CREATE USER 'ytex'@'localhost' IDENTIFIED BY 'ytex'; GRANT ALL PRIVILEGES ON ytex.* TO 'ytex'@'localhost';
On mac you should use the 127.0.0.1 instead of localhost. Note that if ytex connects to the mysql server from a different machine, you should replace localhost with the host name or ip address of the machine you will connect from, or use the wildcard ('%'):
CREATE USER 'ytex'@'%' IDENTIFIED BY 'ytex'; GRANT ALL PRIVILEGES ON ytex.* TO 'ytex'@'%';
If you have installed UMLS in your database, you must give the ytex user select permission on these tables:
GRANT SELECT on umls.* to 'ytex'@'%';
The document table uses the text and blob datatypes for the doc_text column that holds the document text. If you are processing large documents, you may need to use the longtext datatype instead. Furthermore, you may have to increase the maximum packet size.
SQL Server
You must have the permission to create database objects in the YTEX database and schema. If you don't have these permissions, ask your DBA to add you to the db_ddladmin & db_datawriter roles for the YTEX database.
If you want to install the UMLS in your SQL Server, you may want to use a different database/schema from the YTEX database. If that is the case, you need permissions on the UMLS database/schema as well.
Installation
1) Install ctakes 'as usual'
Go through the standard ctakes installation for the distribution you just created: See https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1+User+Install+Guide. For the rest of this document, we assume ctakes is installed in CTAKES_HOME
1.5) Patch YTEX Distro (YTEX 3.2.0 only)
Not needed for YTEX 3.2.1. Some of the install scripts need to be patched (fixed in trunk). Download and unzip ytex-patch-3.2.0.zip 'over' your installation.
Linux users: set the shell scripts to executable:
cd CTAKES_HOME/bin chmod ug+x ant ctakes.profile *.sh
2) Unzip YTEX Libraries
Download and unzip ctakes-ytex-lib-3.1.2-SNAPSHOT.zip 'over' your installation. This contains non-APACHE 2.0 license compliant libraries:
- Hibernate
- Weka
- MySQL JDBC Driver
- MS SQL Server JDBC Driver
If you are using oracle, download the oracle jdbc driver ojdbc7_g and place it in your CTAKES_HOME\lib directory.
3) Unzip YTEX Resources (Optional - UTS login required)
Download and unzip ctakes-ytex-resources-3.1.2-SNAPSHOT.zip 'over' your installation. This contains:
- Concept Graphs derived from the UMLS2013AA used to compute semantic similarity measures
- Dictionary Lookup table derived from UMLS2013AA for named entity recognition.
If you do not install these files, Word Sense Disambiguation will be disabled, and default ytex dictionary lookup will be limited to a small sample subset of the UMLS
You can always create concept graphs for WSD from your UMLS installation. If you have the UMLS in your DB, YTEX will create a dictionary lookup table from the UMLS during the installation.
4) Edit environment batch/shell script
Fix the path references to match your environment.
- windows - no changes necessary; see CTAKES_HOME\bin\setenv.cmd
- linux -
- move CTAKES_HOME/bin/ctakes.profile to ${HOME}/ctakes.profile
- edit the CTAKES_HOME environment variable
- make executable - chmod u+x ${HOME}/ctakes.profile
5) Create CTAKES_HOME\resources\org\apache\ctakes\ytex\ytex.properties
In this file, you specify the database connection parameters. You will find templates in CTAKES_HOME\lib\ctakes-ytex-res-[version].jar, under org\apache\ctakes\ytex\ytex.properties.<db type>.example. If you have UMLS installed on your database, specify the umls.schema and umls.catalog properties (see the properties file for an explanation of what these are).
cd %CTAKES_HOME%\resources mkdir org\apache\ctakes\ytex @REM extract the mysql example. change mysql to mssql (for MS SQL Server) or orcl (for Oracle) jar xf ..\lib\ctakes-ytex-res-*.jar org/apache/ctakes/ytex/ytex.properties.mysql.example copy org\apache\ctakes\ytex\ytex.properties.mysql.example org\apache\ctakes\ytex\ytex.properties @REM edit the properties file notepad org\apache\ctakes\ytex\ytex.properties
cd $CTAKES_HOME/resources mkdir -p org/apache/ctakes/ytex # extract the mysql example. change mysql to mssql (for MS SQL Server) or orcl (for Oracle) jar xf ../lib/ctakes-ytex-res-*.jar org/apache/ctakes/ytex/ytex.properties.mysql.example cp org/apache/ctakes/ytex/ytex.properties.mysql.example org/apache/ctakes/ytex/ytex.properties # edit the properties file vi org\apache\ctakes\ytex\ytex.properties
6) Install the UMLS in your database (Optional)
We strongly suggest that you install UMLS in your database.
- If you have not done so already, obtain a UMLS License and create a UMLS Technology Services (UTS) Account, available free of charge: https://uts.nlm.nih.gov/home.html
- UMLS's MetamorphoSys can create database load scripts for MySQL and Oracle. Follow these instructions: http://www.nlm.nih.gov/research/umls/implementation_resources/scripts/index.html
- We have provided load scripts for MS SQL. Refer to UMLS SQL Server Installation for instructions on how to install UMLS in MS SQL Server.
7) Execute the setup script
windows: Open a command prompt, navigate to CTAKES_HOME, and execute setup script:
cd /d %CTAKES_HOME%\bin\ctakes-ytex\scripts ..\..\ant.bat -f build-setup.xml all > setup.out 2>&1
linux: From a shell, cd to the CTAKES_HOME directory, set the environment, make sure necessary scripts are executable, and execute the ant script:
chmod u+x ${HOME}/ctakes.profile . ${HOME}/ctakes.profile cd ${CTAKES_HOME}/bin chmod u+x ant chmod u+x *.sh cd ctakes-ytex/scripts nohup ../../ant -f build-setup.xml all > setup.out 2>&1 & tail -f setup.out
Check setup.out to make sure the setup was succesful
This will call the ant script build-setup.xml, which does the following:
- Generates configuration files from templates
- Sets up YTEX Database Objects
The installation executes SQL scripts located in the CTAKES_HOME\bin\scripts\ctakes-ytex\data directory. All YTEX database objects will be dropped and recreated. If this is the initial installation, ignore the errors about objects not existing when they are being dropped. If you have installed the UMLS in your database and configured YTEX to use it, YTEX will create a dictionary lookup table with all concepts from the UMLS. The setup speed is dependent on the latency between the machine you are installing on and the database server. Creating the dictionary lookup table from the UMLS can take several hours.