Become a Better R Programmer with the Awesome ‘lobstr’ Package
“Tools amplify your talent. The better your tools, and the better you know how to use them, the more productive you can be.” — Andrew Hunt, The Pragmatic Programmer
The primary tool for a programmer, is his or her choice of programming language. And when it comes to data science, R has always been my preferred go-to language for building models.
R is one of the most popular programming languages (at least in data science) for a variety of reasons:
- easy-to-use syntax
- elegant visualization/plotting system
- rich ecosystem of packages, and
- rich community support
lobstr
is a intuitive R package that has the potential to make you a better programmer. I happened to stumble upon lobstr
About lobstr
lobstr
was designed by the amazing Hadley Wickham in an attempt to help ordinary developers understand R in a better way. In his own words, lobstr
provides you thetools to dig deeper into the details of R objects.
lobstr
could also be considered as an improved version of thestr
base-R function.
Installing the package
lobstr
is yet to be published on CRAN, so can currently only be installed from GitHub. Please make sure you have have the devtools
package installed before using the below installation code.
# install.packages(“devtools”)devtools::install_github(“r-lib/lobstr”)
The Different Functions within lobstr
lobstr
offers three simple functions:
ref()
— Referencesast()
— Abstract Syntax Treescst()
— Call Stack Trees
These three functions serve three different purposes, and we will understand the details of ref()
and ast()
. We will leave out cst()
for now as it is still going through initial developments and testing.
References — ref()
Have you ever wondered what happens when you assign an existing R object to a new object name? Does it create a new object doubling the memory, or perhaps it just creates a reference?
ref()
will help you understand this. To answer the above questions, let us create a simple numeric vector. We’ll call it simple_vector
. Now, let’s create a new list from the same simple_vector
and we’ll call this one double_vector
.
The reason we are using simple_vector
twice in the list double_vector
is to check if R allocates two different memory spaces, or if it simply refers back to the original simple_vector
.
Finally, we will create another list,triple_vector
, using simple_vector
and double_vector
. Please note that the objects double_vector
and triple_vector
are of type list
(not vectors).
library(lobstr)simple_vector <- c(2.0,3.0,4.0)double_vector <- list(simple_vector,simple_vector)triple_vector <- list(double_vector,simple_vector)ref(simple_vector)#> [1:0x7f9ba555aa58] <dbl>ref(double_vector)#> █ [1:0x7f9ba26be4c8] <list> #> ├─[2:0x7f9ba555aa58] <dbl> #> └─[2:0x7f9ba555aa58]ref(triple_vector)#> █ [1:0x7f9ba13fcf08] <list> #> ├─█ [2:0x7f9ba26be4c8] <list> #> │ ├─[3:0x7f9ba555aa58] <dbl> #> │ └─[3:0x7f9ba555aa58] #> └─[3:0x7f9ba555aa58]
Now using the function ref()
, we can find the memory reference of that R object and as you can see above, 0x7f9ba555aa58
is the memory reference for simple_vector
and when you ref(double_vector)
, you can see that the list object refers back to the memory reference of simple_vector
twice and the same happens with ref(triple_vector)
where triple_vector
refers to two references, one is the address of double_vector
that refers back to simple_vector
twice and another reference to simple_vector
itself.
ref()
does an excellent job of drawing a Tree structure helping us visualize the memory references. With this, we can make sure if we are creating new memory allocation in R or if we are referencing existing memory objects, thus doing a better memory management while writing the code.
Abstract Syntax Trees — ast()
As mentioned on Wikipedia,
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language.
Like every programming language, any expression in R can be expressed in the form a syntax tree. Visualizing expressions in the form of ASTs is immensely helpful while developing complex expressions and testing them.
Let us put ast()
to the test and see what it can do for us. To start off, using ast(1+2)
to visualize a simple addition expression reveals that +
is the operator to which 1
and 2
are passed on. This results in the addition operation, but when we have to assign the output of the same to a new object, then <-
becomes the root node.
library(lobstr)
#ast
#simple addition
ast(1 + 2)#> █─`+` #> ├─1 #> └─2
#simple addition with result assignment
ast(x <- 1 + 2)#> █─`<-` #> ├─x #> └─█─`+` #> ├─1 #> └─2
While we can keep on using ast()
to understand complex expressions, it can be handy for another trivial (yet confusing) operation, i.e., Operator Precedence.
The expression y <- 2 + 3 * 5 / 9 ^ 2
is a difficult one to do manually in seconds, even though it contains simple arithmetic operators. This is because it’s not always easy to use operator precedence in our mind. But here is ast()
doing the same thing:
#operator precedence
ast(y <- 2 + 3 * 5 / 9 ^ 2)#> █─`<-` #> ├─y #> └─█─`+` #> ├─2 #> └─█─`/` #> ├─█─`*` #> │ ├─3 #> │ └─5 #> └─█─`^` #> ├─9 #> └─2
Amazing, right?
Summary
Thus, with the use of lobstr
functions ref()
and ast()
, we can become better at R programming — writing memory efficient codes and understanding the expression evaluation in a better way. The complete code you saw above is available here and thelobstr
documentation can be accessed here.
What is your experience with this package? Let us know in the comments below!