Publication-quality plots with Gnuplot

After careful consideration of the alternatives on different platforms, I’ve concluded the best software for generating publication-quality plots is Gnuplot. The R suite is much better suited for statistical analysis, but the plot generation capabilities aren’t as flexible and to my eye don’t look as good. (Full disclosure: I am mainly speaking of the base plotting library–I have not extensively used some of the other libraries such as lattice or ggplot2). One can generate publication-quality graphs with Excel, but it is a chore, and most importantly, one cannot easily set the exact sizes of diagrams or generate high resolution images of graphs.

Now in some ways, the Gnuplot language syntax seems somewhat old-fashioned, but once you learn it (or better yet, learn to write programs that generate it). While the PNG driver in Gnuplot doesn’t generate terribly good-looking output, both the Postscript and SVG drivers work very well. I find them most useful particularly for publication-quality scatter diagrams. Generally, you create a file called something.plt, with commands to be passed to ‘gnuplot’ (‘wgnuplot.exe’ on WinXP). To use the postscript terminal to generate an EPS file (best for images around 3″ square, use these commands:

  set terminal postscript eps enhanced size 3in,3in
  set output 'file.eps'

To generate a Postscript file (best for images around 6″-6.5″ square):

  set terminal postscript enhanced size 6in,6in
  set output 'file.ps'

To generate an SVG file (best for images around 4″-6″ square):

  set terminal svg enhanced size 500,500
  set output 'file.svg'

The svg terminal doesn’t seem to accept the ‘in’ argument to the size option, but in SVG-land 100=1in. I like svg the most, because the resulting graph may be further annotated with the open-source SVG editor program Inkscape. Therefore subsequent instructions here are assuming the use of the svg driver.

The ‘enhanced’ option to the svg terminal is important. It allows more sophisticated titles to be set:

  set encoding iso_8859_1
  set title 'This is the title of the graph'
  set xlabel 'Molecular weight (kDa)'
  set ylabel 'Solvent-inaccessible surface area x 1000 ({\305}^2)'

‘^’ allows superscripting and ‘_’ subscripting as in LaTeX. Non-ASCII characters may be included as well:

{\305}  Angstroms

Controlling the legend (key):

  set nokey          # turn off the legend
  set key top left

Controlling the axis ranges:

  set xrange [0:10]
  set xtics 0,2,10   # set an increment of 2
  set yrange [0:1000]

Adding a function to print, such as a regression curve:

  f(x) = 0.039440 * x + 0.678467

Now that we have all of the parameters set, we use the plot function to actually generate the graph data:

  plot 'data1.dat' using 1:2 w p, \
       'data2.dat' using ($2/1000):($3/1000) w lp pt 13 ps 1.5 lt 1 lc -1 lw 1,
       f(x)

‘w’ stands for ‘with’. The first argument is the style, which is how the data should be plotted. Some possible arguments are:

points(p)       - unconnected points
linespoints(lp) - points connected by lines

Some of the other arguments are:

pointtype(pt)  - controls the appearance of points
pointsize(ps)  - controls the size of points
linetype(lt)   - controls the appearance of lines (solid, dashed, etc.)
linecolor(lc)  - controls the line (and point) color
linewidth(lw)  - controls the width of lines

Unfortunately, the arguments for these options are specified as arbitrary numbers, which are defined by the specific driver. For the SVG driver, these are the pointtypes:

0   dot
1   +
2   x
3   *
4   open square
5   closed square
6   open circle
7   closed circle
8   open upward-pointing triangle
9   closed upward-pointing triangle
10  open downward-pointing triangle
11  closed downward-pointing triangle
12  open diamond
13  closed diamond

The SVG linecolors:

-1  black
0   gray
1   red
2   green
3   blue
4   cyan
5   dark green
6   dark blue
7   orange
8   teal?
9   light green
10  purple
11  light orange
12  magenta
13  yet another green

There do not appear to be different linetypes other than solid for the SVG driver (though that can be corrected in Inkscape). If that is an issue, the postscript terminal does have many kinds of dashed lines. An easy way to see the capabilities of a terminal is to issue the ‘test’ command.

The images can then be exported with Inkscape at arbitrary resolution to a lossless PNG file. This can also be done at the command line:

  $ inkscape --export-area-drawing --export-png=file.png \
             --export-dpi=300 file.svg

Using the open-source program ‘convert’ from the ImageMagick suite of tools, this can be converted to TIFF or another format, optionally removing the transparency and adding a border:

  $ convert -background "#ffffff00" -flatten -bordercolor "#ffffff00" \
            -border 50x50 input.png output.tiff

‘convert’ may also be used to convert an EPS image to a raster format:

  $ convert -density 300 input.eps -geometry 900x900 output.tiff

It can also convert SVG to raster formats, but I find its output inferior to that of Inkscape.

Graduate school… of science!

I am a big fan of Penelope Trunk, a business writer of all things, who writes a column called the “Brazen Careerist”. Yes, I know that I don’t work in business. (Yet.) But I think she always has interesting things to say about how to get ahead in one’s career. And I do have one of those. Okay, maybe I don’t have one of those, but I want one.

Anyway, she wrote a column a few weeks ago about how regular, focused blogging is good for your career. Some of her arguments are frankly a stretch, but she makes a good point overall. Alas, my own blog is neither regular, nor really focused on what I (plan to) do for a living, so I fail on both counts.

In fact, I’m not sure I’ve really written about what I do in my blog at all. I have a little biographical blurb on my main page, but that’s it. Long story short, I’m a graduate student in Biophysics, trying to finish up my Ph.D. on a topic relating to macromolecular crystallography. I have been in grad school for a long time. If you’ve ever read the comic Piled Higher and Deeper, I am essentially Mike Slackenerny, save that I am more suspicious of free food. Read some of the comics he’s in; that will give you a healthy dose of what my life is like.

I do honestly enjoy what I do. The work is interesting, and I spend the vast majority of my time doing it. It’s just that I often fail the “dinner party” test. I can explain what I do to laypeople, but not always quickly enough before they get bored and wander off in search of more hors d’oeuvres. Since I suspect that the vast majority of you out there in readerland aren’t biophysicists, I hesitate to go into the topic.

But I do want to find my focus, and so I’m going to try to talk a little more at a layperson level in this space about the kind of research that I do. Don’t worry; the rants aren’t going away, and I have a beer-related project I’d also like to talk about in upcoming posts. But I hope you’ll bear with me as I conduct a little experiment.