Paper on org-mode and reproducible research

As I was talking recently about reproducible research, I have to post this.

A new paper by Eric Schulte, Dan Davison, Thomas Dye, Carsten Dominik. If you haven’t heard about them, you haven’t been on the org-mode mailing list. They could be called the main contributors to org-mode and the part of org-mode called babel, without taking credit away from the  numerous other contributors.

The paper is called

A Multi-Language Computing Environment for Literate Programming and Reproducible Research

and you can find it at http://www.jstatsoft.org/v46/i03 and it is open access.

here is the abstract:

We present a new computing environment for authoring mixed natural and computer language documents. In this environment a single hierarchically-organized plain text source file may contain a variety of elements such as code in arbitrary programming languages, raw data, links to external resources, project management data, working notes, and text for publication. Code fragments may be executed in situ with graphical, numerical and textual output captured or linked in the file. Export to LATEX, HTML, LATEX beamer, DocBook and other formats permits working reports, presentations and manuscripts for publication to be generated from the file. In addition, functioning pure code files can be automatically extracted from the file. This environment is implemented as an extension to the Emacs text editor and provides a rich set of features for authoring both prose and code, as well as sophisticated project management capabilities.

Definitely worth reading, even though R only plays a small role in it, but the principles are important.

 

Cheers and enjoy life.

Debugging with

I just found these two gems about debugging in R on r-help today (here is the thread):

1) posted by Thomas Lumley:

traceback() gets you a stack trace at the last error

options(warn=2) makes warnings into errors

options(error=recover) starts the post-mortem debugger at any error,
allowing you to inspect the stack interactively.

2) added by  William Dunlap:

options(warning.expression=quote(recover()))
will start that same debugger at each warning.

I think these are very useful ideas to remember – thanks.

Cheers, and enjoy life.

Fluxbox and auto-mount

Fluxbox is my window manager of choice, even though I use ubuntu (Oneriric at the moment – very nice, faster then Natty, and as I am not using Unity, I am happy).

But there was one thing which bothered me after the upgrade: auto-mount of external drives was not working anymore. I used under natty nautilus -n to start nautilus in the background and enable the auto-mount. But in Oneiric, auto-mounting has been moved from nautilus to the gnome-settings-daemon. So I asked on the fluxbox list, and got the tip to try udisks and after installing udisk-glue, it worked out of the box. Both are in the Oneiric repo, so

sudo apt-get install udisks-glue

will do the job and it worked out of the box.

Cheers and enjoy life.

Always put comments in your code!

I have a paper which I wrote some years ago, which has not been finished, and which should be accompanied by an R package. So far nothing special, but at that time, I was only at the beginning of my affair with R, and so I made several mistakes (OK – I did also some things right – I hope). One thing which I did not think about (or cared about) was to comment my code. So now I am sitting in front of about 8 R files with strange names and no comments in them. Now:

What can I do with them?

One advantage: I have graphs, generated by R, in my draft paper – so I can trace my scripts back from the name of the graphs, identify the script which created the graphs, then to the data and finally (hopefully) have an idea how my script mess did what it was doing – and hopefully, I will be able to do this before retirement (which is still several years away).

Now – what could I have done better at that time? Well, there are several things:

  1. I could have used org-mode. Org-mode enables one to combine documentation and code in a single file. It is a literate programing at its best (more will likely follow later). In addition: it can easily exported to, among others, pdf and html, including code and text.
  2. But I used only ess. Nevertheless,  I could have added more comments in the code.

There is always the # in R!!!

I am not saying that org-mode would necessarily have saved me (even in org-mode you have to write the documentation and code yourself), but it would have pushed towards documentation, as the body of the text is the documentation, and you put the code in source blocks. At the first look, it sounds strange, but one usually starts with ideas about the code, a structure, notes for algorithms, charts, etc. and all these go into the document. And then, if one starts coding. And to each code block, there should be already some text which explains what it shlud be doing – and voilá, here is the basic documentation.

To execute the code blocks, one can either evaluate them in the document and insert the results, or “tangle” the document, which means extracting the source code into files. As it is possible to define into which file which code block should be extracted, one can create a complex system of resulting R files. And these R files, can then be sourced from R, running in ess / emacs.

The next possible step  would be then to put your script files into a package, which would then even ask for more documentation. And then there will Roxygen help – but that might be told in another blog.

So there are many tools which make documenting your R code easier, but you don’t have to use them.

I want to close with a quote from Donald Knuth. “Literate Programming (1984)” in Literate Programming. CSLI, 1992, pg. 99:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

 Cheers and enjoy life.

An intro

Well – everybody has to start sometime, and I will now.

But first some questions which might be asked about this blog:

  • About what: I’ll see – but likely about
    • R - the programming language for stats, but also for simulations, as it can interface easily with C/C++, Fortran, Java, …
    • GIS in the widest sense – I use GRASS and QGIS and GDAL and will possibly have something to say about those
    • org-mode ESS and emacsthe best and for me by now only way to write R code
    • Science – as I am a scientist, this is obvious. It will include spatial statistics, alien species, management, general scientific topics, …
    • Open source software – I use Linux (Ubuntu) and am of the opinion that whenever possible, open source software should be used, particularly  in research
  • Will I blog regular?
    • We’ll see, but I don’t think so. Blogging for the sense of blogging is useless. Also: I am lazy.
  • Who am I?
    • We’ll – let’s leave it at that: I am a scientist, who is blogging about the things I mentioned above.
    • Is more info relevant? I will reveal more possibly in later blogs.

So – let’s get started with my first blog about something relevant.