Publication-quality plots with Gnuplot

After careful consideration of the alternatives on different platforms, I’ve concluded the best software for generating publication-quality plots is Gnuplot. The R suite is much better suited for statistical analysis, but the plot generation capabilities aren’t as flexible and to my eye don’t look as good. (Full disclosure: I am mainly speaking of the base plotting library–I have not extensively used some of the other libraries such as lattice or ggplot2). One can generate publication-quality graphs with Excel, but it is a chore, and most importantly, one cannot easily set the exact sizes of diagrams or generate high resolution images of graphs.

Now in some ways, the Gnuplot language syntax seems somewhat old-fashioned, but once you learn it (or better yet, learn to write programs that generate it). While the PNG driver in Gnuplot doesn’t generate terribly good-looking output, both the Postscript and SVG drivers work very well. I find them most useful particularly for publication-quality scatter diagrams. Generally, you create a file called something.plt, with commands to be passed to ‘gnuplot’ (‘wgnuplot.exe’ on WinXP). To use the postscript terminal to generate an EPS file (best for images around 3″ square, use these commands:

  set terminal postscript eps enhanced size 3in,3in
  set output 'file.eps'

To generate a Postscript file (best for images around 6″-6.5″ square):

  set terminal postscript enhanced size 6in,6in
  set output 'file.ps'

To generate an SVG file (best for images around 4″-6″ square):

  set terminal svg enhanced size 500,500
  set output 'file.svg'

The svg terminal doesn’t seem to accept the ‘in’ argument to the size option, but in SVG-land 100=1in. I like svg the most, because the resulting graph may be further annotated with the open-source SVG editor program Inkscape. Therefore subsequent instructions here are assuming the use of the svg driver.

The ‘enhanced’ option to the svg terminal is important. It allows more sophisticated titles to be set:

  set encoding iso_8859_1
  set title 'This is the title of the graph'
  set xlabel 'Molecular weight (kDa)'
  set ylabel 'Solvent-inaccessible surface area x 1000 ({\305}^2)'

‘^’ allows superscripting and ‘_’ subscripting as in LaTeX. Non-ASCII characters may be included as well:

{\305}  Angstroms

Controlling the legend (key):

  set nokey          # turn off the legend
  set key top left

Controlling the axis ranges:

  set xrange [0:10]
  set xtics 0,2,10   # set an increment of 2
  set yrange [0:1000]

Adding a function to print, such as a regression curve:

  f(x) = 0.039440 * x + 0.678467

Now that we have all of the parameters set, we use the plot function to actually generate the graph data:

  plot 'data1.dat' using 1:2 w p, \
       'data2.dat' using ($2/1000):($3/1000) w lp pt 13 ps 1.5 lt 1 lc -1 lw 1,
       f(x)

‘w’ stands for ‘with’. The first argument is the style, which is how the data should be plotted. Some possible arguments are:

points(p)       - unconnected points
linespoints(lp) - points connected by lines

Some of the other arguments are:

pointtype(pt)  - controls the appearance of points
pointsize(ps)  - controls the size of points
linetype(lt)   - controls the appearance of lines (solid, dashed, etc.)
linecolor(lc)  - controls the line (and point) color
linewidth(lw)  - controls the width of lines

Unfortunately, the arguments for these options are specified as arbitrary numbers, which are defined by the specific driver. For the SVG driver, these are the pointtypes:

0   dot
1   +
2   x
3   *
4   open square
5   closed square
6   open circle
7   closed circle
8   open upward-pointing triangle
9   closed upward-pointing triangle
10  open downward-pointing triangle
11  closed downward-pointing triangle
12  open diamond
13  closed diamond

The SVG linecolors:

-1  black
0   gray
1   red
2   green
3   blue
4   cyan
5   dark green
6   dark blue
7   orange
8   teal?
9   light green
10  purple
11  light orange
12  magenta
13  yet another green

There do not appear to be different linetypes other than solid for the SVG driver (though that can be corrected in Inkscape). If that is an issue, the postscript terminal does have many kinds of dashed lines. An easy way to see the capabilities of a terminal is to issue the ‘test’ command.

The images can then be exported with Inkscape at arbitrary resolution to a lossless PNG file. This can also be done at the command line:

  $ inkscape --export-area-drawing --export-png=file.png \
             --export-dpi=300 file.svg

Using the open-source program ‘convert’ from the ImageMagick suite of tools, this can be converted to TIFF or another format, optionally removing the transparency and adding a border:

  $ convert -background "#ffffff00" -flatten -bordercolor "#ffffff00" \
            -border 50x50 input.png output.tiff

‘convert’ may also be used to convert an EPS image to a raster format:

  $ convert -density 300 input.eps -geometry 900x900 output.tiff

It can also convert SVG to raster formats, but I find its output inferior to that of Inkscape.

On generational navel-gazing

I am exhausted by fluffy, navel-gazing “news” stories that purport to explain and encapsulate an entire generation of Americans. These mostly focus on “Generation Y” or “The Millennials” these days, but comparisons also abound to “Baby Boomers” and “the Greatest Generation.” Almost always they either hyperbolically praise or criticize , and do so with either minimal or no evidence. The critical pieces in particular are frequently written by people not of those generations, which adds a fun extra layer of “you damn kids need to be told what’s wrong with you.”

A perfect case in point is this piece from the Huffington Post: Why Generation Y Yuppies are Unhappy.

Let me paraphrase: a significant portion of Generation Y is arrogant, lazy, entitled and delusional; and thus are unhappy because they are arrogant, lazy, entitled and delusional, so stop it. Also, apparently all Generation Y-ers (or only the ones that count, I guess) are college-educated, middle or upper class, and (presumably) white.

Here’s the thing. People are complicated. Even just in the US, we live in widely varying social, economic and physical environments. We belong to different races and come from different cultures, often more than one of each. But even if you had two people who grew up in the same environment and came from the same cultural groups, that is no guarantee that they will behave identically, because people are complicated! To think one can take a group of diverse Americans who only share one attribute–they were born between two arbitrary dates– and somehow make meaningful inferences about that whole group, or a large portion of it, is ludicrous.

Let me put it another way.  Many Generation Y-ers may be arrogant, or lazy, or entitled, or delusional. Or all four. But so are many Generation X-ers. And Baby Boomers. And Greatest Generation-ers. Furthermore, there are members of all generations– Generation Y included– who are humble, hard-working, realistic and grounded. I’ve met them!

So can we stop with these pointless, stupid journalistic exercises please? All you are doing is manufacturing stereotypes.

A little perspective

It has been a stressful week, and as I went down to the hospital cafeteria for lunch today, I was feeling sorry and frustrated with how things in my life have been going lately. How there’s too much stuff to do in too little time. You know the feeling.

Then I walked past a woman pushing one of those wheeled stands full of electronic monitoring devices, bags of IV fluids, tubes with dripping liquids, etc. She was looking down at the base of the stand and there was a tiny little girl in pajamas, no more than 3 years old, wrapped around the pole at the base of the stand, sitting on a colorful pillow and looking up at her mom. I didn’t need to follow the tubes and wires from the monitors to realize she was the one connected to them, because she was completely, unnaturally bald.

And she wanted pancakes.

God, it’s so easy to get trapped in the echo chamber of our own lives, isn’t it? Sometimes it feels like we’re the only ones suffering, when in reality there are people in this world putting up with challenges that are literally unimaginable. How about a little bit of perspective?

Little girl, I don’t know you, but I hope and pray that you and your family find the strength to get through this, and that you live a long and happy life.

And even though it was lunchtime, I hope that you got all the pancakes your little tummy could hold. With whipped cream and strawberries!

 

Free speech and Koran burning

By now you’ve probably heard of Terry Jones, the preacher in Florida who put the Koran on trial, found it guilty, and “executed” it by burning a copy of the book. Subsequently, riots broke out in Afghanistan in which 16 people have been killed, including 7 U.N. employees. Now Americans, including Sen. Lindsey Graham, are starting to suggest that perhaps Mr. Jones should be prosecuted for his actions.

First things first: the preacher is an idiot. To protest the actions of radical, fundamentalist Muslims, he did something considered to be blasphemous to all Muslims. That’s like protesting the KKK by burning Bibles. He and his congregation are world-class twits for having done what they did.

But how can anybody be considering taking legal action against him? Seriously? How does this not fall squarely into the First Amendment? Freedom of speech extends to people who say things that we–perhaps vehemently–disagree with. Frankly that’s the very point of it.

Some have compared it to shouting “Fire!” in a crowded movie theater. I don’t see the comparison. The panic induced in a theater would result in harm even though the people involved would act (more or less) rationally: trying to get away from the perceived fire. In this case, 16 innocent people were killed because somebody burned a book thousands of miles away. I don’t care how blasphemous it is; that is not rational. To somehow suggest that Mr. Jones should be prosecuted because of what he said, no matter how stupid, is completely contrary to the principles this nation was founded on.

Dear pundits: please shut the &@#$ up.

I have never been more enraged by punditry than all of the criticism that has been leveled at the President about our intervention in Libya. It feels like pundits have produced  thousands of column inches and countless hours of television and radio coverage has been devoted to criticizing the President’s plan (in many cases by the  people who pressured the White House to set up a no fly zone  in the first place!). And yet none of the criticisms I have read or watched have provided a single practical, workable alternative.

Guess what? Every possible plan can be criticized one way or another. We do nothing– an abdication of our moral responsibility to prevent the slaughter of innocent civilians. We do a full-scale invasion to depose Gaddafi– a slap in the face to our overtaxed armed forces already fighting two foreign wars. We wait for UN approval– we’re subservient to the whims of the United Nations. We act unilaterally– we’re an out-of-control superpower wanting to impose our colonial ambitions on the world.

So rather than having the courage to advocate and defend an unpleasant plan of action, the pundit-verse is cowardly sniping at the adminstration’s plan. So to all the pundits out there– do us a favor, would you? Tell us YOUR plan, or else shut the &@#$ up!!

That pesky Constitution, always causing trouble

One thing has been bothering me about the debate over the new health care bill is that every discussion between lawmakers I’ve heard in the media goes something like this:

Opponent: “Forcing people to buy health insurance is unprecedented and violates the commerce clause of the Constitution.”

Supporter: “It will cover 30 million new people and make health care cheaper and all of my constituents want it.”

Am I missing something? I am really hopeful that the healthcare bill they’ve will result in better and cheaper access to health care, but shouldn’t somebody, y’know, make some sort of cogent argument that it is constitutional? And then tell me what it is?

Here’s the thing: 18 state attorneys general are filing a federal lawsuit challenging the bill. Somebody please tell me the defense in the case has a better argument than “Hooray Obamacare.”

ESWTOTD: “Comprise” vs. “constitute”

A whole comprises its parts:

  • “12 different enzymes comprise the system.” WRONG
  • “The system comprises 12 different enzymes.” RIGHT

while the parts constitute the whole:

  • “The system constitutes 12 different enzymes.” WRONG
  • “12 different enzymes constitute the system.” RIGHT

The word compose is often used in place of comprise, though like constitute, the parts compose the whole, not visa versa. But by using the “is composed of” construction, the roles of the subject and object are reversed:

  • “The system is composed of 12 different enzymes.” RIGHT

In my experience, this idiom is more common, especially in spoken English; probably due to the confusion about the proper use of comprise.

[Edit: due to broken CSS, wrong sentences were not properly marked wrong. Fixed.]

ESWTOTD: Learning homophones (albeit not for the easily offended)

Here’s a little levity for a Monday morning: the authors of “Learn Your Damn Homophones” are a little angry about the misuse of homophones (words that are pronounced identically but spelled differently). Okay, incredibly angry. But the advice they give is good, and it’s a good and funny read, provided you don’t mind abundant cuss words and barely controlled rage:

http://www.learnyourdamnhomophones.com/

ESWTOTD: Abbreviations, acronyms and initialisms, oh my!

An abbreviation is any shortened form of a word or phrase: “HEPES,” “Ph.D.,” “etc.,” etc.

Initialisms comprise a subset of abbreviations where the first letter of each word (more or less) of a phrase are combined. Some examples include the “Food and Drug Administration” = “FDA,” “adenosine di-phosphate” = “ADP”, and “4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid” = “HEPES.”

In physical science manuscripts, nearly all initialisms should be written in all capital letters without spaces or periods. (There are some exceptions, such as “a.m.,” “r.m.s.d.,” etc.) This is certainly the case if you introduce new initialisms for brevity or clarity. When introducing an initialism, write it out first, followed by the abbrevation in parentheses: e.g. “structural genomics (SG).” Don’t underline or otherwise highlight the letters used in the initialism.* Don’t introduce new or uncommon initialisms unless you will be using the term several times.

Technically, not all initialisms are acronyms, even though in common spoken English the terms are largely used interchangeably. Acronyms comprise the subset of initialisms that are pronounced as a word rather than a spelled list of letters. For example, “AIDS” and “laser” are acronyms, while “ADP” and “NIH” are initialisms.

* Yes, I know I did that in the prior paragraph. Hey! What’s that over there! <runs away>

ESWTOTD: “Bioinformatics”

The field of bioinformatics, namely the use of computing to collect and analyze biological and biochemical information, is exploding in popularity. The word itself, however, is new. The noun “bioinformatics” was coined by researchers Paulien Hogeweg and Ben Hesper in 1978. According to the Merriam Webster dictionary, it is a special kind of collective noun that is plural but singular in construction. This means that it refers to a group of computational techniques, the group itself is treated as a singular subject:

  • “Bioinformatics are the study of…” WRONG
  • “Bioinformatics is the study of…” RIGHT

There is some debate as to how “bioinformatics” should be used as an adjective. In English, some nouns may be used to modify other nouns in the same way as an adjective:

  • “I studied biology in school.” (noun)
  • “I used a biology textbook in my classes.” (noun modifier)

However, many nouns also have adjective forms, which are preferred in most contexts. For example, the adjective form of “biology” is “biological":

  • “I did a biology analysis.” (noun modifier)
  • “I did a biological analysis.” (adjective)

While the former is grammatically correct, to my ear the latter sounds more natural and idiomatic. For the noun “bioinformatics,” the Merriam Webster dictionary identifies “bioinformatic” as the adjective form:

  • “I did a bioinformatics analysis.” (noun modifier)
  • “I did a bioinformatic analysis.” (adjective)

Like the previous example, while I can’t definitively say that the former (noun modifier) form is grammatically incorrect, I prefer the adjective form “bioinformatic” when an adjective is called for.