...
- https://github.com/Unstructured-IO/unstructured (as of this writing, uses poppler, tesseract, libreoffice and pandoc). Also has commercial API and platform options.
- https://github.com/Sinaptik-AI/panda-etl (looks like a wrapper for: https://sinaptik.ai/ ?)
Select Open Source Tools that haven't been updated in the last year (no particular order/not an endorsement)
- https://github.com/clovaai/donut
- https://github.com/facebookresearch/nougat (Meta's package to process academic documents, built on top of donut)
Commercial Tools (no particular order/not an endorsement)
...