Overview of Temporal Module
This module is dedicated to processing electronic medical records for meaningful events, temporal expressions, and their relations on a timeline.
Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. The identification of temporal relations in medical text has drawn growing attention because of its potential to dramatically increase the understanding of many medical phenomena such as disease progression, longitudinal effects of medications, and a patient's clinical course, and its many clinical applications such as question answering[1, 2], clinical outcomes prediction[3], and the recognition of temporal patterns and timelines[4, 5].
Key components
Event annotator
A Begin-Inside-Outside (BIO) style sequence annotator for clinically meaningful events, i.e. anything that would show up on a detailed timeline of the patient’s care or life.
Temporal expression annotators
A series of BIO style sequence annotators that employed forward and backward search algorithms and multiple learning methods (Support Vector Machine (SVM), Conditional Random Field (CRV) ) for annotating temporal expressions which would provide concrete temporal references throughout the document or section, e.g. “today”, “24 hours ago”, “postoperative”. More details can be found in [6].
DocTimeRel annotator
For every event, there is an SVM-based annotator that can automatically reason the temporal relation between the target event and the document creation time (DCT). This module provided a basic and stable temporal solution that can position all events into coarse temporal bins, e.g. “before the DCT”, “after the DCT”, or “overlap the DCT”. This annotator has proved helpful in solving real clinical temporal-sensitive tasks for multiple institutions [5].
Temporal relation (TLINK) annotators
SVM-based annotators for detecting within-sentence Event-Time relations and Event-Event relations. For i2b2 datasets there are also cross sentence Event-Time and Event-Event relation annotators. Multiple techniques have been implemented, including narrative container-based annotation concept [7], tree kernels [8] for syntactic similarity measurement, multi-layered temporal modeling [9], event expansion [10], and deep neural network methods [11, 12].
Evaluation
The SVM-based temporal relation annotators achieve an F-score of 0.589 which outperform the best system of Clinical TempEval 2016 [13], whose F-score was 0.573. State-of-the-art results for event-time relations have been achieved with our neural network approaches.
All the above annotators were trained and tested on colon cancer notes from the THYME data set [14].
Evaluating within-sentence event-time temporal relations:
The evaluation program is org.apache.ctakes.temporal.eval.EvaluationOfEventTimeRelations.java
org.apache.ctakes.temporal.ae.EventTimeSelfRelationAnnotator
is used by EvaluationOfEventTimeRelations
Required parameters:
--text <path to the folder contains the THYME raw notes>
--format Anafora
--xml <path to the folder contains the THYME gold annotation files in xml format>
--xmi <path to the target folder of xmi file output>
If the xmi folder is empty (e.g. the xmi files have not already been produced), the code will run getXMIWritingPreprocessorAggregateBuilder() which writes xmi files into the target folder.)
--kernelParams "c 0.001953125 t 0.03125 d 3 g 1.0 S 1 C + L 0.5 T 0.1 N 3"
The parameters for SVM model
--patients 1-218
The patient IDs in colon cancer dataset
--useGoldAttributes
Optional parameters:
--closure
apply closure calculation to training instances
--test
train on the combination of training and development split, and test on the testing split. Without this parameter, the system will train on training split and test on development split
--skipTrain
If the model is already trained, we can use this to skip the training part and directly apply the model on predicting CONTAINS relations on the test set.
The model files can be found at target/eval/thyme/train_and_test/event-time.
Evaluating within-sentence event-event temporal relations
The evaluation program is org.apache.ctakes.temporal.eval.EvaluationOfEventEventThymeRelations.java
org.apache.ctakes.temporal.ae.EventEventRelationAnnotator
is used by EvaluationOfEventEventThymeRelations
The parameters are the same as event-time parameters described above.
The model files can be found at target/eval/thyme/train_and_test/event-event.
Evaluating DocTimeRel and Contextual Modality
The evaluation program is org.apache.ctakes.temporal.eval.EvaluationOfEventProperties.java
EvaluationOfEventProperties
uses org.apache.ctakes.temporal.ae.DocTimeRelAnnotator
and org.apache.ctakes.temporal.ae.ContextualModalityAnnotator
The parameters are the same as event-time parameters.
The model files for DocTimeRel can be found at target/eval/event-properties/train_and_test/docTimeRel
The model files for Contextual Modality can be found at target/eval/event-properties/train_and_test/
contextualModality
Creating Temporal Output in Anafora XML Format
Within-sentence event-event and event-time temporal relations can be output in XML format that is used by Anafora using org.apache.ctakes.temporal.eval.EvaluationOfBothEEAndETRelations.java
That program uses the following two annotators:
org.apache.ctakes.temporal.ae.EventTimeSelfRelationAnnotator
org.apache.ctakes.temporal.ae.EventEventRelationAnnotator
Required parameters for EvaluationOfBothEEAndETRelations
--text <path to the folder contains the THYME raw notes>
--format Anafora
--xml <path to the folder contains the THYME gold annotation files in xml format >
--xmi <path to the target folder of xmi file output>
--kernelParams "c 0.001953125 t 0.03125 d 3 g 1.0 S 1 C + L 0.5 T 0.1 N 3"
--patients 1-218
--useGoldAttributes
--test
--skipTrain
skipTrain is now a required parameter. The system will use pretrained models in the target/eval/thyme/train_and_test/
folder, and directly predict on the test split.
--anaforaOutput <path to the target folder of xml file output>
Potentially Improving Temporal Pipeline Results
When running the temporal module, you may want to consider using the newer BIO sentence detector instead of the original sentence detector - on the THYME data the BIO sentence detector achieved better results. However, the BIO sentence detector is not better in all cases.
Online Demo
A demo of the cTAKES Temporal Module can be found at http://54.68.117.30:8080/index.jsp.
Using the BIO Sentence Detector with the Temporal Pipeline
If you are comfortable with modifying Java code, in org.apache.ctakes.temporal.eval.Evaluation_ImplBase.java, modify the method getXMIWritingPreprocessorAggregateBuilder() as follows:
The default pipeline is using the old sentence detector, SentenceDetector. To switch to the new sentence detector, please comment out:
// aggregateBuilder.add( AnalysisEngineFactory.createEngineDescription( // SentenceDetector.class, // SentenceDetector.SD_MODEL_FILE_PARAM, // "org/apache/ctakes/core/sentdetect/sd-med-model.zip" ) );
and add:
aggregateBuilder.add(SentenceDetectorAnnotatorBIO.getDescription(FileLocator.locateFile("org/apache/ctakes/core/sentdetect/model.jar").getPath()));
References
- Das, A.K. and M.A. Musen. A comparison of the temporal expressiveness of three database query methods. in Annual Symposium on Computer Applications in Medical Care. 1995. IEEE COMPUTER SOCIETY PRESS.
- Kahn, M.G., L.M. Fagan, and S. Tu, Extensions to the time-oriented database model to support temporal reasoning in medical expert systems. Methods of information in medicine, 1990. 30(1): p. 4-14.
- Schmidt, R., S. Ropele, C. Enzinger, et al., White matter lesion progression, brain atrophy, and cognitive decline: the Austrian stroke prevention study. Annals of neurology, 2005. 58(4): p. 610-616.
- Zhou, L. and G. Hripcsak, Temporal reasoning with medical data—a review with emphasis on medical natural language processing. Journal of biomedical informatics, 2007. 40(2): p. 183-202.
- Lin, C., E.W. Karlson, D. Dligach, et al., Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record. Journal of the American Medical Informatics Association, 2014: p. amiajnl-2014-002642.
- Miller, T.A., S. Bethard, D. Dligach, et al., Extracting Time Expressions from Clinical Text, in Proceedings of BioNLP 15. 2015.
- Miller, T.A., S. Bethard, D. Dligach, et al., Discovering narrative containers in clinical text, in ACL 2013. 2013: Sofia, Bulgaria. p. 18.
- Lin, C., T. Miller, A. Kho, et al., Descending-Path Convolution Kernel for Syntactic Structures, in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL). 2014: Baltimore, Maryland, USA. p. 81-86.
- Lin, C., D. Dligach, T.A. Miller, et al., Multilayered temporal modeling for the clinical domain. J Am Med Inform Assoc, 2016. 23(2): p. 387-95.
- Lin, C., T. Miller, D. Dligach, et al., Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP 2016, 2016: p. 108.
- Dligach, D., T. Miller, C. Lin, et al., Neural temporal relation extraction., in European Chapter of the Association for Computational Linguistics (EACL 2017). 2017: Valencia, Spain.
- Hartzell, E. and C. Lin. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. in International Conference on Intelligent Biology and Medicine (ICIBM 2016). 2016. Houston, Texas, USA.
- Bethard, S., G. Savova, W.-T. Chen, et al., Semeval-2016 task 12: Clinical tempeval. Proceedings of SemEval, 2016: p. 1052-1062.
- Styler IV, W.F., S. Bethard, S. Finan, et al., Temporal annotation in the clinical domain. Transactions of the Association for Computational Linguistics, 2014. 2: p. 143-154.