19/05/2011

Professional plotting

I've written and read plenty of technical reports in my time, and one major feature that almost all technical reports have in common is the inclusion of figures. Pick up any ten technical documents and I reckon you'll find at least 5 different ways of including figures; and of these 5 only one will actually look any good. So as I start to produce figures for my PhD I've been starting to look into the best way of managing this initially simple sounding task.

When I say "figure" I'm generally thinking of some kind of plot, or set of plots, usually in 2D; however what I'm going to discuss should apply to most other "technical" "pictures", but probably won't extend to photo-type images (for reasons that will hopefully become obvious).

The Issues
I won't bother going into all the different ways a figure and a report might come together, except to say that at the very bottom of the scale would be a 'picture'/screenshot pasted into a word document - this just looks awful. What I'll work up to will hopefully be the top of the scale.

What I will list is what I see as some major stumbling blocks in figure/report preparation:
  1. Difficulty updating the figure.
  2. Inability to resize the figure to suit the report.
  3. Images (+ text) looking crappy when zoomed in.
  4. Disconnect between the figure labels and the report text.
  5. Difficulty regenerating the figure at a later date.
A lot of these are irrelevant when we're looking at a printed out finished article, so what we really need to understand are the details of how the information is stored and processed on the computer.

Background Details
One of the first important distinctions is the difference between types of image. Most images that one would typically come across on a computer are raster images, these are stored as a set of pixels - imagine a piece of squared paper with each square filled in a different colour. From a distance, or if all the squares are very small, then this looks great; however if we zoom in (e.g. make each square larger) then we start to see the joins between edges and everything starts looking "blocky". Most programs usually handle this by blurring the pixels together slightly, which can help up to a point, but often we just end up with a blurred mess - not what we want in a precise technical report.

The alternative is vector graphics. These are saved more as a description of what needs to be drawn, rather than point-for-point what is on the screen. This means that zooming is purely a mathematical operation, and all the lines will still appear as prefect lines. The same also works for text, which is stored within a vector graphic as actual text, rather than as a picture of it.

There are plenty of graphics explaining this along with a good description in the Wiki pages linked above. But if you're still not sure then try this simple experiment: type a word into a paint program (e.g. Microsoft Paint) and zoom in, and then do the same in a word processing program (e.g. Microsoft Word) - the difference should be pretty obvious.

In summary, unless what your are working with is an actual picture (in which case converting it to vector graphics would be impossible) then you will get best quality out of maintaining it in a vector format. There are plenty of these formats to choose from; however I find them to be surprisingly unsupported in a lot of applications. As my final target format is pdf (as mentioned elsewhere in this blog) I'm going to be working with eps and pdf formats. These both rely on postscript as an underlying process and are therefore fairly compatible.

My process (overview)
With all of the above as my aims I've worked out a basic process for generating figures. It seems to be working fairly well so far, so I'll outline it here:

1) Write a script to produce the figure and save is as an eps file. This means that I can always go back and see how each figure was produced (what the original data was, how it was scaled, etc, etc). If the data changes then I can simply rerun the script and a new figure will be produced. If I need the figure in a different ratio or with different colours (or a different label, etc, etc) then I can make some minor changes to the script and rerun it. I keep the script under version control, but not the eps file it produces (as I can always reproduce this if necessary). I use Matlab for this process as it is what I am most familiar with (although I often store the raw data in an excel or csv file and read this into Matlab). I suspect I could use Gnuplot or something similar instead.

2) Include the eps file in my LaTeX script. This means that when I regenerate the pdf output from my LaTeX it always includes the most recent version of the figure. As it remains as vector graphics throughout the process I get nice clean professional results.

This process solves all of the problems outlined above, except point 4. It is still possible to produce an eps figure from Matlab with "Courier" font and then include it in a Latex document using "Times" font. I find that this looks out of place. I get around this by using a function called matlabfrag, in combination with pstool package for LaTeX. This means that the final figure picks up the font used in the rest of the document. It also allows me to use full LaTeX expressions in my figures.

My process (in detail)
This may get more refined as time goes by, but currently this is the detailed version of how I would produce a figure:
1a) Write a Matlab script to plot the figure as normal. Using standard matlab command to plot and label axes, etc.
1b) Within the script include a call to 'saveFigure.m'. This is a function I have created which accepts a file name and optionally a figure size (otherwise some default values are used), resizes the figure and then calls matlabfrag to save it as an eps file (and an associated tex file including all the labels).
2a) In the LaTeX preamble include '\usepackage{pstool}'. This allows the use of the psfragfig command.
2b) Within my LaTeX include the figure in the normal way. However, instead of using the latex '\includegraphics' command, I replace it with the '\psfragfig' command.

Notes
I can make my 'saveFigure.m' function available to anyone interested, but it doesn't do much more than I have described above!
I have created a slightly revised process for including Simulink models in documents which is a little different that I can discuss if anyone is interested?
I spent a little time trying to get psfrag to play well with eps files produced from other packages - e.g. Google Sketchup, however I don't think I've quite got to the bottom of it yet.

No comments:

Post a Comment