Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated macOS installation and fixed lists

...

Mac Installation Instructions

...

brew install tesseract tesseract-

...

lang

Issues with Installing via Brew

If you have trouble installing via Brew, some options to try:

...

, you can

...

try

...

installing Tesseract from source.

Tesseract won't work with TIFF files

If you are having trouble getting Tesseract to work with TIFF files, read this link. Summary:

  1. uninstall tesseractbrew uninstall tesseract
  2. 2. uninstall leptonica brew uninstall leptonica
  3. 3. install leptonica with tiff support brew install leptonica --with-libtiff 4.
  4. install tesseract brew install tesseract --with-all-languages --with-serial-num-packtesseract-lang

Installing Tesseract on RHEL

  1. Add "epel" to your yum repositories if it isn't already installed
    1a. wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm (or appropriate version)
    1b. rpm -Uvh epel-release-latest-7.noarch.rpm
    2. yum install tesseract 3. To add language packs, see what's available yum search tesseract then, e.g. yum install tesseract-langpack-ara

Installing Tesseract on Ubuntu

  1. sudo apt-get update
  2. 2. sudo apt-get install tesseract-ocr
  3. 3. To add language packs, see what's available then, e.g. sudo apt-get install tesseract-ocr-fra

...

  • Tesseract installation path = ""
  • Language dictionary = "eng"
  • Page Segmentation Mode = "1"
  • Minmum file size = 0
  • Maximum file size = 2147483647
  • Timeout = 120

To changes these settings you can either modify the existing TesseractOCRConfig.properties file in tika-parser/src/main/resources/org/apache/tika/parser/ocr, or overriding it by creating your own and placing it in the package org/apache/tika/parser/ocr on your classpath.

...