This Java program is intended to convert a PDF file (for which currently no viewer exists on the Apple Newton) into a series of images (currently: JPEG, PICT or BMP), together with a HTML file or a NewtonBook source file containing these images.
In a PDF file, there is no text stored. Instead, for each letter the position (and direction) is given. So for extracting text, one would need to take all letters on a single page, sort them into lines and try to find out which letter form a word (and where the spaces are). And because of the different letter sizes, and different typesetting possibilities, a text exctraxrion program can not always guess the text correct (especially if the text is in multiple columns).
The library used for creating the images (JPedal) also supports text extraction. It works good at sorting the letters, but it does not determine the spaces (in some test cases, none of them are detected).
Tested with JDK 1.4.1 and 1.3.1 under Windows NT, It should work also on Macintosh platforms, but I don't know if the SUN libraries for writing JPEG are included on this platform. The GUI requires JDK 1.4 or later (But the commandline version also runs JDK 1.3.1).
Comments