Update all user installed R packages – again

And I had to do it again: I am using R installed from homebrew, and after the upgrade from Mavericks to Yosemite, I had to re-install all packages – or was it a GCC upgrade? I don’t know – but I had to do it again.

I still had the link to Randys Zwitch’s solution but I think there were some shortcomings. His solution is as follows:

## Get currently installed packages
package_df <- as.data.frame(installed.packages("/Library/Frameworks/R.framework/Versions/2.15/Resources/library"))
package_list <- as.character(package_df$Package)

## Re-install Install packages

The shortfalls were:

  • hard coded path to the library
  • I don’t like factors…
  • I want (need) to install from source

So I just revised the script slightly and came up with this solution:

    lib  = lib <- .libPaths()[1],
    pkgs = as.data.frame(installed.packages(lib), stringsAsFactors=FALSE)$Package,
    type = 'source'

Very similar, but, most importantly, the path is not hardcoded.

Hope this helps somebody.

Cheers and enjoy life,


P.S: The enjoy life has become more important for me – a friend died in an helicopter crash and he left his wife with two little children. Life is really to short and can end anytime – to precious not be enjoyed. RIP.

Read GRASS raster directly into R?

There was always the issue which bugged me: Why do I have to go through an intermediate format on disk when I want to import a GRASS raster layer into R? At the moment, when I use readRAST6(), the raster layer is exported from GRASS into an intermediate formate (I don’t recall which format it is) on the HDD, then this format is imported into R, and the intermediate layer is deleted. Now – this is working reliably and reasonably fast, but somehow I don’t like this intermediat file. So my idea is: why not use Rcpp to access the functions in GRASS to read the raster collumn-wise and write a function in R which allows to

  1. read the whole raster from the GRASS raster
  2. read single columns or column ranges from the GRASS raster
  3. read single cells from the GRASS raster
  4. read user specified blocks from the GRASS raster

Vice versa, there is a C function in GRASS which writes columns to a raster – so it would be possible to

  1. write a whole R raster to GRASS raster
  2. write single columns or column ranges to a GRASS raster
  3. write single cells to a GRASS raster
  4. write user specified blocks to the GRASS raster

An example module for grass to read a raster and write it into a new raster is at http://svn.osgeo.org/grass/grass/trunk/doc/raster/r.example/main.c.

And now comes the intriguing part: ther is the raster package, which uses a similar machanism to avoid having to load a whole raster into R memory. If raster is linked to GRASS by using these functions, there would be a brilliant backend for working with rasters in R.

Now these are ideas, but I am planning on following them up. Some things which need to be considered and thought trough:

  1. To compile the modules for GRASS, it might be the easiest to write the C code in GRASS so that it get’s compiled with GRASS, possibly even becoming a part of the binary distribution of GRASS. In this way, one would simply have to load the library in GRASS and call the function to read the raster, and it would make it possible to be used by other programs as well. (In my view, GRASS is missing a simple API for these kind of things, but this is a different story).
  2. One could put the C code into an R package and compile it from there, but this might be calling for trouble, as it would be very much linked to GRASS and dependant on internal changes. So the option of writing a C library as part of GRASS which provides functions to read and write blocks of and whole rasters might be the better solution.
  3. The wrapper around the C library would be relatively straightforward using Rcpp.
  4. The R part should be GRASS version agnostic, i.e. the same code independant of the GRASS version. By specifying the path to the GRASS installation, a specific library would be loaded and used.ossible to even switch between different GRASS version.
  5. It might make sense to split this into two packages: one frontend which defines the functions to be used by the R user, and a backend which supplies the functionality to ink these functions to the GRASS backend. So it would be similar to the dbi package which defines the database access functions, and on the other hand the backends which link these to different databases. This would enable a common interface to access spatial data in a GRASS database, Postgresql database, spatialite database, directory containing the raster layers in a specific format, …

OK – so what are the next steps:

  1. Setting up a github repo where interested parties can contribute and comment: https://github.com/rkrug/grassRLink
  2. Getting input from the GRASS community and what they think about this
  3. Getting a structure of the package(s) setup, so that a framework is available in which one can do the coding to satisfy the requirements

I don’t think this is something which can (and should!) be done in a rush, as this framework could possibly form a crucial backbone for spatial processing.

And: if this is there, one can do the same for vectors, spatio-temporal data, …

My feeling is that the time is ripe to give R an interface to the spatial GRASS database which can easily be extended to other spatial storage systems, in the same way that dbi is doing this for databases.

So: please give feedback, let me know what you think, if you have suggestions, tell me if this is not going to work (if you think so).

Cheers and enjoy life.

useR 2013

Four days sunshine, heat and R – isn’t that a dream? Well, I guess for some this would be a nightmare, but that depends if you like heat or not. And it was hot: at 21:00 it was still 35 degrees in the sun. So that aspect is covered, and we can move on to the non controversial part, which is R.

We all know that R is great, and if you would have forgotten, you were permanently reminded that it is. OK – several talks highlighted the shortcomings and problems with R (speed, parallelization, inconsistent (or actually missing) naming conventions) but there was that general agreement: R is great.

There were some unlucky ones who used other statistice packages before (SaS comes to mind…), but fortunately I have to count myself among the lucky ones.

So how was this years useR in Albacete? Great. I enjoyed it very much (also from here a thank you to the organizers and the sponsors) and the talks were overall really interesting and inspiring. Nevertheless, I had the feeling that the talks at the last useR I attended (2011 in Warewick) were a little bit broader, but it was definitely worth attending and I learned a lot. The tutorials were again brilliant, and the one about Rcpp by Hadley Wickham (and Romain Francois, one of the two authors of Rcpp, the other one is Dirk Eddelbuettel) was outstanding. The second one I attended was the one on spatial analysis in R given by Roger Bivand (one of the authors of the sp package, the core of nearly all spatial packages in R) was, although not as hands-on as the one on Rcpp, extremely informative — although I am using sp and spgrass for several years already, I learned many new and useful things, and have some ideas about the R – GRASS interface and how to get data from GRASS into R (see my post Read GRASS raster directly into R?).

The invited talks were, for me as a non-statistician, a little bit to mathematical, as most of them dealt with quite technical aspects of statistical (mostly baysian) analysis. The exception were the talks by Duncan Murdoch, one of the R core team members and THE windows R core team member, who presented news in R 3.0.x and the way forward, and Hadley Wickham (one of the “R Rock Stars”).

So what are my take home messages from this useR in Albacete?

  1. The Beatles are fantastic, and now we know why
  2. there are other implementations of the R language apart from GNU R, but thay are not yet ready for usage. They promise to be faster and more menory efficient then GNU R
  3. Bayes is everywhere, especially where you least expect him to be, and he is getting faster!
  4. brogramming is not a spelling error but a life style
  5. either use lowerCamelCase or underscoreseparatedfunctionnames (Hadley is watching you!) but Do.notMixandmatch
  6. I have to improve on my C++!!!!!!!!!!!!

And if there is only one you remember, remember this:

R is great!!!

Cheers and enjoy life (and R).

Paper on org-mode and reproducible research

As I was talking recently about reproducible research, I have to post this.

A new paper by Eric Schulte, Dan Davison, Thomas Dye, Carsten Dominik. If you haven’t heard about them, you haven’t been on the org-mode mailing list. They could be called the main contributors to org-mode and the part of org-mode called babel, without taking credit away from the  numerous other contributors.

The paper is called

A Multi-Language Computing Environment for Literate Programming and Reproducible Research

and you can find it at http://www.jstatsoft.org/v46/i03 and it is open access.

here is the abstract:

We present a new computing environment for authoring mixed natural and computer language documents. In this environment a single hierarchically-organized plain text source file may contain a variety of elements such as code in arbitrary programming languages, raw data, links to external resources, project management data, working notes, and text for publication. Code fragments may be executed in situ with graphical, numerical and textual output captured or linked in the file. Export to LATEX, HTML, LATEX beamer, DocBook and other formats permits working reports, presentations and manuscripts for publication to be generated from the file. In addition, functioning pure code files can be automatically extracted from the file. This environment is implemented as an extension to the Emacs text editor and provides a rich set of features for authoring both prose and code, as well as sophisticated project management capabilities.

Definitely worth reading, even though R only plays a small role in it, but the principles are important.


Cheers and enjoy life.

Debugging with

I just found these two gems about debugging in R on r-help today (here is the thread):

1) posted by Thomas Lumley:

traceback() gets you a stack trace at the last error

options(warn=2) makes warnings into errors

options(error=recover) starts the post-mortem debugger at any error,
allowing you to inspect the stack interactively.

2) added by  William Dunlap:

will start that same debugger at each warning.

I think these are very useful ideas to remember – thanks.

Cheers, and enjoy life.

Always put comments in your code!

I have a paper which I wrote some years ago, which has not been finished, and which should be accompanied by an R package. So far nothing special, but at that time, I was only at the beginning of my affair with R, and so I made several mistakes (OK – I did also some things right – I hope). One thing which I did not think about (or cared about) was to comment my code. So now I am sitting in front of about 8 R files with strange names and no comments in them. Now:

What can I do with them?

One advantage: I have graphs, generated by R, in my draft paper – so I can trace my scripts back from the name of the graphs, identify the script which created the graphs, then to the data and finally (hopefully) have an idea how my script mess did what it was doing – and hopefully, I will be able to do this before retirement (which is still several years away).

Now – what could I have done better at that time? Well, there are several things:

  1. I could have used org-mode. Org-mode enables one to combine documentation and code in a single file. It is a literate programing at its best (more will likely follow later). In addition: it can easily exported to, among others, pdf and html, including code and text.
  2. But I used only ess. Nevertheless,  I could have added more comments in the code.

There is always the # in R!!!

I am not saying that org-mode would necessarily have saved me (even in org-mode you have to write the documentation and code yourself), but it would have pushed towards documentation, as the body of the text is the documentation, and you put the code in source blocks. At the first look, it sounds strange, but one usually starts with ideas about the code, a structure, notes for algorithms, charts, etc. and all these go into the document. And then, if one starts coding. And to each code block, there should be already some text which explains what it shlud be doing – and voilá, here is the basic documentation.

To execute the code blocks, one can either evaluate them in the document and insert the results, or “tangle” the document, which means extracting the source code into files. As it is possible to define into which file which code block should be extracted, one can create a complex system of resulting R files. And these R files, can then be sourced from R, running in ess / emacs.

The next possible step  would be then to put your script files into a package, which would then even ask for more documentation. And then there will Roxygen help – but that might be told in another blog.

So there are many tools which make documenting your R code easier, but you don’t have to use them.

I want to close with a quote from Donald Knuth. “Literate Programming (1984)” in Literate Programming. CSLI, 1992, pg. 99:

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: “Literate Programming.”

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

 Cheers and enjoy life.