Extracting Text from PowerPoint format
Here are different ppt extraction code. No guarantees, please modify list information if you test it.
Using Apache Tika: http://tika.apache.org/
Using POI HSLF: Quick Guide] (see [http://jakarta.apache.org/poi/hslf/quick-guide.html for details on text extraction)
From: poi-users: http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04809.html
From: slide-dev: http://www.mail-archive.com/slide-dev@jakarta.apache.org/msg10445.html
From: http://nagoya.apache.org/eyebrowse/ReadMsg?listName=poi-dev@jakarta.apache.org&msgNo=4326
Here is some sample code that works with 'some* ppt formats. It's basically an implementation of a POIFSReader*'Listener. There are no guarantees on how well it works - it is known to ignore unicode text records for starters. It requires POI libraries.