September 23, 2004

The Perils of PDF: OpenOffice

In our last installment, I promised to relate whether or not the open source office suite OpenOffice would solve my PDF-production woes. Before I answer that, though, I'd like to thank everyone who offered to help in one way or another. I appreciate your offers, and may yet take one or more of you up on them--it's too early to tell, yet. In the meantime, I've printed out a copy of Through Darkest Zymurgia for my brother Charles, so that he'll no longer have an excuse not to have read it.

Now, OpenOffice. Not prolong the suspense, the answer is "Yes and No, and (finally) No."

OpenOffice has a long and venerable history. I first encountered it when it was a Sun Microsystems product called StarOffice, and I didn't like it much. One of the things I disliked was that it wasn't just an office suite; it wanted to be full-screen with its own desktop. I didn't really need it at that point (almost everybody has standardized on MS Word where I work, and it just wasn't worth being incompatible), so I was just looking at it as a curiousity.

Later on, control was transferred to an open source consortium and the product was renamed OpenOffice. A couple of years ago I looked at an early version running on Unix under X11; it was better than StarOffice had been, but it had some unpleasant effects on my color maps, and I dumped it. (If you don't know what I'm talking about, be happy; if you do, you'll understand.)

But CafePress said it was a reasonable solution, so I went looking. It turns out that there is a freely downloadable distribution of OpenOffice for Mac OS X; the only quirk is that it's not yet a native Aqua app (Aqua is OS X's window manager). Instead, it runs under X11. This, by itself, doesn't bother me; I'm a Unix programmer, I've done a fair amount of GUI programming under X11, and I run X11 programs on my PowerBook all the time. It's not as pretty, but if it gets the job done I'm not one to quibble.

My initial experiences were encouraging. OpenOffice installed easily and without any trouble, and started right up. It's even pretty zippy for a Java application; I'm (very mildly) curious how they managed that. The "Writer" application (the equivalent of MS Word) has a straightforward, easy to understand interface, and inside of an hour I'd produced a short PDF file that printed very nicely indeed.

Still, there were some hurdles. Like Word, using Writer successfully is all about defining styles. And Writer has a confusing array of pre-defined styles, and there are half-a-dozen different kinds: character styles, paragraph styles, page styles, chapter styles, and several others that I don't remember at the moment. The key to success was clearly going to involve understand the different flavors and the predefined styles in each flavor, and redefining the predefined styles to suit my needs. In the software world, this is called "Research", and it was clearly going to be a lengthy task.

Still, so far, so good. If Writer's formatting model is complex, the on-line help in the current version is surprisingly good. I don't know how many times I've clicked on the help button in a dialog box, only to get a help page that simply restates the contents of the dialog box. You know--there's a check box that says "Enable Engine Coolant", and you go to the help page and it says "Enables the Engine Coolant". There's never any word on why the software has an Engine, what the Engine does, or why it might possibly need Cooling. But the OpenOffice on-line help actually is fairly helpful, and between that and the information at the OpenOffice web site I'm sure I could have figured things out.

So I started trying to use OpenOffice to write up some tutorial material I'd written about Snit, my Tcl object framework--and ran face first into The Font Problem.

What's The Font Problem? X11. X11 was one of the first widely available windowing systems, and one of the first to do high-quality screen fonts. But that's a long time gone, and the X11 font model is seriously showing its age.

We Windows and Mac users have become accustomed to high-quality fonts that look the same on the screen as they do when printed. It's not even an issue for most of us; we take it for granted. Oh, how gladly I have forgotten the halcyon days of my youth, when TrueType was but a dream and all the cool kids installed Adobe Type Manager! But OpenOffice has the font problem in spades. It tries to solve it by converting the Mac's fonts to a form it can use, but it doesn't do it very well, and this makes font selection a truly difficult task.

OpenOffice makes available to you all of the fonts that it knows about. Some of them are purely screen fonts; the printed output will look different. What You See Is Not What You Get. Some of them are purely printer fonts; the screen will look different. Same problem. Some few work the same on both the printer and the screen--but for many of these, OpenOffice picks up the different styles--bold, italic, and so forth--as being different fonts.

If I picked Palatino, for example, and tried to use bold or italic type, it all looked like normal type on my screen. The PDF output was fine; but it was impossible to tell, while looking at the screen, whether a given piece of text was italicized or not. Not good.

I found, if I recall correctly, two fonts that worked for both screen and printer, and had all their styles, and they weren't fonts I wanted to use.

In other words, I was going to have spend a fair amount of time learning how to use OpenOffice's formatting system to get the book to look right just so that I could print it in a font I don't like. No thank you. Add to that the file format issue, and the answer was clearly no, not unless I couldn't find a better alternative.

The file format issue? I'm a programmer. I like plain text files. I have text in plain text files that I wrote when I was in college twenty years ago. I can still read those files. About fifteen years ago, my wife and I put together a "family cookbook" in MS Word; I'm not sure now that it was even Word for Windows. Later, I converted those files to WordPerfect 5.1. Later, well...I've still got those WordPerfect 5.1 files, but I don't have a machine that runs WordPerfect 5.1. What I've got is a text editor called Emacs with which I can open those files, delete all of the special characters, and recover the actual text. It's a pain, but I can do it.

I've seen file formats come, and file formats go, and text files go on and on and on. I've no particular desire to marry a project like this to yet another ephemeral file format, unless there's a compelling reason to do so.

So I turned away from OpenOffice, to try a different approach. An older approach. An approach that suits my skills and prejudices.

Don't miss the next installment of The Perils of PDF!

Posted by Will Duquette at September 23, 2004 09:22 AM

steve h said:

I knew that there was another person in the world who preferred plain-text files...

Seriously, I was shocked the first time I accidentally opened a Word 7.0 document in "WordPad" (Word 5.0, bundled with Win9x). I saw a string of gibberish, and then an opening paragraph that I had deleted after my first save.

I didn't migrate immediately to plain-text, but that began the trip. After arriving at a stage where I'd rather do plain-text than anything, I'm wondering if I want to go back and convert all my "formatted" documents to plain-text.

By the way, I'll stay tuned. I've not run into this particular set of issues with OpenOffice, but that's probably because I have a simpler setup. Anyway, I'd love to know some more pitfalls to watch out for.