Converting PDFs into Image Files (Citimortgage PDF Reprise)

I didn't receive a response from Citimortgage about their ghastly PDF files, but on my next visit I was again able to view my statements in Evince, the GNOME PDF viewer. (Although they were still obnoxiously large files for the amount of data represented.)

But then on my next visit after that, the following month's statement was again not viewable. Come on, Citimortgage, this shouldn't be that difficult.

However, I had already invested some time in learning a couple of things about free software programs for working with PDF files in GNU/Linux, hoping to either shrink the files down or convert them into another format with less storage overhead. I'm a small man with small ambitions, and it had become a mission to not waste so much space on these records.

(You'd think it would be worthwhile for Citimortgage to deal with the situation, as it wastes a lot more space and bandwidth for them spread out over all their customers, but I guess not. Maybe that's why they only keep the last three month's worth of statements around, as opposed to years of history you can get at other places.)

ImageMagick

I first found ImageMagick, which is a GPL-compatible command line graphics program that is loaded with goodies. Easy install with:

sudo apt-get install imagemagick

And then it's easy to convert a pdf file to png or jpg:

convert some.pdf some.png    #(or some.jpg)

The image quality was poor, however. I discovered that ImageMagick is using Ghostscript (gs) for PDF conversions, and found a nice example command to run it and get a higher resolution. I don't know if gs was already on my system or if it was included in the ImageMagick install, but it was ready and willing, and I was able to come up with:

Ghostscript

gs -q -sDEVICE=pngmono -dBATCH -dNOPAUSE -dFirstPage=1 -dLastPage=1 -r300 -sOutputFile=test.png test.pdf

pngmono is an option I found by using gs -h to list available devices.

Using resolution (r) = 300 produced a relatively small (< 100KB) png file that prints reasonably well. Nifty. And Ghostscript is licensed under the GNU GPL, which is of course the best free software license.

Another benefit in this situation is that gs splits things out by page. The second page of the statement is always the same thing, so I don't have to bother saving that more than once a year. One of the things ImageMagick does for you is to split up multipage PDFs in to numbered image files. I'm not sure what options are available for Ghostscript on its own. My example will have you running it once for each page. Might need to do some scripting to make things more convenient when running it directly. I didn't experiment much, having a very narrow objective.

Citimortgage Tomfoolery

I ran the older Citimortgage statements that were only 100KB through this command with no complaint, although didn't save much on file size. Running the newer, larger files through Ghostscript results in a message like this:

   **** Warning: File has a corrupted %%EOF marker, or garbage after %%EOF.
   **** Warning: stream Length incorrect.

   **** This file had errors that were repaired or ignored.
   **** The file was produced by:
   **** >>>> Xenos D2eVision v2 <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

Clearly, something went wrong somewhere along the line in the Citimortgage PDF manufacturing department.

In Conclusion

I don't know if it was worth spending the effort on this quest, but I was happy to add two more tools to my free software toolbox. Of course, I suppose now you'll tell me that the latest version of Evince will save PDFs as image files.

Related

If you enjoyed this article, please subscribe for free!
Via the atom or rss feed, or enter your email address to get updates when new entries are posted:
(Your email will not be shared nor used for anything other than sending new posts. See the policies page for more about subscriptions and privacy.)

You can skip to the end and leave a response. Pinging is currently not allowed.

Comments

  1. Another option is stone simple: view the document large enough so you can read all the text and do a screen capture. This makes a super small file and you could even resave the png as a jpg - tiny.

    It doesn't convert to bitmaps, but one of my favorite PDF utilities is a little command line gem called pdftk (PDFToolkit). Say you want to take all your individual bank statement pdf's and combine them into one pdf file (which I like to do for the year end). Easy. Say you want to take page 3 from one pdf, rotate it 90 deg. do the same with page 12, and save that to a two-page pdf. Etc, etc.

    Supposedly the next version of Inkscape 0.46 due out early '08 will also allow you to open a pdf file (single page) and export as an image file, amongst other things; possibly even re-export as pdf. As well as do all the drawing things etc Inkscape does so well.

    Then there is PDFEdit which does even more and in a graphical interface and re-saves to pdf. You can extract text, rotate pages, delete stuff, change the text of a pdf if you don't like the math! It's getting better all the time.

  2. Hi, ArtInvent. Thanks for visiting and sharing your thoughts on this.

  3. Just a thought: Have you tried using gv to look at the PDF files? gv is an old graphical front-end for ghostscript, and it does a reasonably good job on PDF. It's light-weight, to boot, and you can "print" the file directly to postscript.

    Whether or not the postscript file will be smaller than the original PDF is left as an exercise for the reader.

  4. I hadn't heard of gv but I'll give it a try the next time I'm monkeying around with PDF files. Thanks for the tip!

You can follow any responses to this entry through the
comments feed.

Say Your Say

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

By submitting your comment here, you agree to license it under the same Creative Commons Attribution-ShareAlike 3.0 License as the movingtofreedom.org web site. Please see policies for more information about comments and privacy.