I didn’t receive a response from Citimortgage about their ghastly PDF files, but on my next visit I was again able to view my statements in Evince, the GNOME PDF viewer. (Although they were still obnoxiously large files for the amount of data represented.)
But then on my next visit after that, the following month’s statement was again not viewable. Come on, Citimortgage, this shouldn’t be that difficult.
However, I had already invested some time in learning a couple of things about free software programs for working with PDF files in GNU/Linux, hoping to either shrink the files down or convert them into another format with less storage overhead. I’m a small man with small ambitions, and it had become a mission to not waste so much space on these records.
(You’d think it would be worthwhile for Citimortgage to deal with the situation, as it wastes a lot more space and bandwidth for them spread out over all their customers, but I guess not. Maybe that’s why they only keep the last three month’s worth of statements around, as opposed to years of history you can get at other places.)
ImageMagick
I first found ImageMagick, which is a GPL-compatible command line graphics program that is loaded with goodies. Easy install with:
sudo apt-get install imagemagick
And then it’s easy to convert a pdf file to png or jpg:
convert some.pdf some.png #(or some.jpg)
The image quality was poor, however. I discovered that ImageMagick is using Ghostscript (gs) for PDF conversions, and found a nice example command to run it and get a higher resolution. I don’t know if gs was already on my system or if it was included in the ImageMagick install, but it was ready and willing, and I was able to come up with:
Ghostscript
gs -q -sDEVICE=pngmono -dBATCH -dNOPAUSE -dFirstPage=1 -dLastPage=1 -r300 -sOutputFile=test.png test.pdf
pngmono is an option I found by using gs -h to list available devices.
Using resolution (r) = 300 produced a relatively small (< 100KB) png file that prints reasonably well. Nifty. And Ghostscript is licensed under the GNU GPL, which is of course the best free software license.
Another benefit in this situation is that gs splits things out by page. The second page of the statement is always the same thing, so I don’t have to bother saving that more than once a year. One of the things ImageMagick does for you is to split up multipage PDFs in to numbered image files. I’m not sure what options are available for Ghostscript on its own. My example will have you running it once for each page. Might need to do some scripting to make things more convenient when running it directly. I didn’t experiment much, having a very narrow objective.
Citimortgage Tomfoolery
I ran the older Citimortgage statements that were only 100KB through this command with no complaint, although didn’t save much on file size. Running the newer, larger files through Ghostscript results in a message like this:
**** Warning: File has a corrupted %%EOF marker, or garbage after %%EOF. **** Warning: stream Length incorrect. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Xenos D2eVision v2 <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification.
Clearly, something went wrong somewhere along the line in the Citimortgage PDF manufacturing department.
In Conclusion
I don’t know if it was worth spending the effort on this quest, but I was happy to add two more tools to my free software toolbox. Of course, I suppose now you’ll tell me that the latest version of Evince will save PDFs as image files.


4 Comments
Another option is stone simple: view the document large enough so you can read all the text and do a screen capture. This makes a super small file and you could even resave the png as a jpg – tiny.
It doesn’t convert to bitmaps, but one of my favorite PDF utilities is a little command line gem called pdftk (PDFToolkit). Say you want to take all your individual bank statement pdf’s and combine them into one pdf file (which I like to do for the year end). Easy. Say you want to take page 3 from one pdf, rotate it 90 deg. do the same with page 12, and save that to a two-page pdf. Etc, etc.
Supposedly the next version of Inkscape 0.46 due out early ’08 will also allow you to open a pdf file (single page) and export as an image file, amongst other things; possibly even re-export as pdf. As well as do all the drawing things etc Inkscape does so well.
Then there is PDFEdit which does even more and in a graphical interface and re-saves to pdf. You can extract text, rotate pages, delete stuff, change the text of a pdf if you don’t like the math! It’s getting better all the time.
19 February 2008 at 11:52 am
Hi, ArtInvent. Thanks for visiting and sharing your thoughts on this.
19 February 2008 at 10:42 pm
Just a thought: Have you tried using gv to look at the PDF files? gv is an old graphical front-end for ghostscript, and it does a reasonably good job on PDF. It’s light-weight, to boot, and you can “print” the file directly to postscript.
Whether or not the postscript file will be smaller than the original PDF is left as an exercise for the reader.
21 February 2008 at 8:54 pm
I hadn’t heard of gv but I’ll give it a try the next time I’m monkeying around with PDF files. Thanks for the tip!
22 February 2008 at 7:23 am