Thursday, 13 June 2013

Practicing static typing in R: Prime directive on trusting our functions with object oriented programming

The creator of S language which R is derived from John Chambers said in one of his books  Software for data analysis programming with R
...This places an obligation on all creators of software to program in such a
way that the computations can be understood and trusted. This obligation
I label the Prime Directive.
He was referring to prime directive from Star Trek. One of the practice in this direction is to have a proper checks in place for the types we use. We can trust that if we pass for example a wrong type to our function, it will fail gracefully. So a type system of a programming language is quite important in mission critical numerical computations. Since R language is weakly typed language, or dynamically typed similar to Perl, Python or Matlab/Octave, most of R users omit to place type checks in their functions if not rarely. For example take the following function that takes arguments of a matrix, a vector and a function name. It applies the named function to each columns of the matrix listed in the given vector. Assuming named function is returning a single number our function will return a vector of numbers.

myMatrixOperation  <-  function(A, v, fName) {
  sliceA <-  A[, v];
  apply(sliceA, 2, fName);
}

One obvious way to put if statements for each argument in our function. So, function may look like:

myMatrixOperation <- function(A, v, fName) {
  if(!is.matrix(A)) {
   stop("A is not a matrix");
  }
  if(!is.vector(v)) {
   stop("v is not a vector");
  }
  if(!is.funcion(fName)) {
   stop("fName is not a function");
  }
  sliceA <- A[, v];
  apply(sliceA, 2, fName);
}
The problem with this approach appears to be the fact that it is too verbose and if we have a repeating pattern of arguments in many functions and many arguments, we would copy and paste code many times. It would not only look ugly but wastes our time. Luckily there is a mechanism to address this: S4-class system. Let's define an S4 class for our set of arguments, following an example instantiation.

setClass("mySlice", representation(A="matrix", v="vector", fName="function"))

myS <- new("mySlice",, A=matrix(rnorm(9),3,3),v=c(1,2), fName=mean)
str(myS)
Formal class 'mySlice' [package ".GlobalEnv"] with 3 slots
  ..@ A    : num [1:3, 1:3] 0.356 -0.34 -0.642 -0.466 2.915 ...
  ..@ v    : num [1:2] 1 2
  ..@ fName:function (x, ...)

Now if we re-write the function that uses our S4 class with type checking only to passing object once.
is.mySlice <- function(obj) {
  l <- FALSE 
  if(class(obj)[1] == "mySlice") { l <- TRUE } 
  l 
} 

myMatrixOperation <- function(mySliceObject) { 
  if(!is.mySlice(mySliceObject)) { 
    stop("argument is not class of mySlice") 
  }  
  sliceA <- mySliceObject@A[, mySliceObject@v]; 
  apply(sliceA, 2, mySliceObject@fName); 
} 
This simple example demonstrates how we can introduce a good organization to our R codes, that obeys the prime directive. Further more modern approach to object orientation is introduced by John Chambers called Reference classes. If you practice this kind of approach in your R codes than I can only say; Live long and prosper.

Monday, 10 June 2013

Ripley Facts

Normally, this blog would only contain technical and scientific related posts. But this time I would like to share with you a very interesting phenomenon I came across on the R mailing list(s). I call it 'Ripley Facts' after the prolific statistician, educator, academic, author and core developer of the R Software Professor Brian Ripley [here].  Facts are his replies to questions. I have listed some of my favourite quotes from those replies below.  The year of post is given at the end of each quote and linked to the archive:
  • Once you appreciate that you have seriously misread the page, things will become a lot clearer. (2005
  • You will need to do your homework a lot more carefully, as it seems you don't have enough knowledge to recognise the errors you are making. (2007)
  • Well, don't try to use a Makefile as you do not know what you are doing. (2013
  • It is user lack-of-understanding: there is no error here. (2013
So, be careful if you decide to post something not well formed there in the R mailing lists. You will be surely grilled.  Actually, that's not the point. The main take home message here is practising  self-criticism and ability to find  the answers independently alone before asking any technical help and possibly waste other people's time. Though, in some cultures this sort of replies may constitute an offensive reply, those cultures may have no idea about brilliant British humour. If you have a favourite quote from Prof. Ripley please do let me know.
(c) Copyright 2008-2017 Mehmet Suzen (suzen at acm dot org)

Creative Commons Licence
This work is licensed under a Creative Commons Attribution 3.0 Unported License.