1. 程式人生 > >Become a Better R Programmer with the Awesome ‘lobstr’ Package

Become a Better R Programmer with the Awesome ‘lobstr’ Package

“Tools amplify your talent. The better your tools, and the better you know how to use them, the more productive you can be.” — Andrew Hunt, The Pragmatic Programmer

The primary tool for a programmer, is his or her choice of programming language. And when it comes to data science, R has always been my preferred go-to language for building models.

R is one of the most popular programming languages (at least in data science) for a variety of reasons:

  • easy-to-use syntax
  • elegant visualization/plotting system
  • rich ecosystem of packages, and
  • rich community support

lobstr is a intuitive R package that has the potential to make you a better programmer. I happened to stumble upon lobstr

when I was scouring R Infrastructure’s GitHub page for new R packages, and it turned out to be a really useful one!

About lobstr

lobstr was designed by the amazing Hadley Wickham in an attempt to help ordinary developers understand R in a better way. In his own words, lobstr provides you thetools to dig deeper into the details of R objects.

lobstr could also be considered as an improved version of thestr base-R function.

Installing the package

lobstr is yet to be published on CRAN, so can currently only be installed from GitHub. Please make sure you have have the devtools package installed before using the below installation code.

# install.packages(“devtools”)devtools::install_github(“r-lib/lobstr”)

The Different Functions within lobstr

lobstr offers three simple functions:

  • ref() — References
  • ast() — Abstract Syntax Trees
  • cst() — Call Stack Trees

These three functions serve three different purposes, and we will understand the details of ref() and ast(). We will leave out cst()for now as it is still going through initial developments and testing.

References — ref()

Have you ever wondered what happens when you assign an existing R object to a new object name? Does it create a new object doubling the memory, or perhaps it just creates a reference?

ref() will help you understand this. To answer the above questions, let us create a simple numeric vector. We’ll call it simple_vector. Now, let’s create a new list from the same simple_vector and we’ll call this one double_vector.

The reason we are using simple_vector twice in the list double_vector is to check if R allocates two different memory spaces, or if it simply refers back to the original simple_vector.

Finally, we will create another list,triple_vector, using simple_vectorand double_vector. Please note that the objects double_vector and triple_vector are of type list (not vectors).

library(lobstr)simple_vector <- c(2.0,3.0,4.0)double_vector <- list(simple_vector,simple_vector)triple_vector <- list(double_vector,simple_vector)ref(simple_vector)#> [1:0x7f9ba555aa58] <dbl>ref(double_vector)#> █ [1:0x7f9ba26be4c8] <list> #> ├─[2:0x7f9ba555aa58] <dbl> #> └─[2:0x7f9ba555aa58]ref(triple_vector)#> █ [1:0x7f9ba13fcf08] <list> #> ├─█ [2:0x7f9ba26be4c8] <list> #> │ ├─[3:0x7f9ba555aa58] <dbl> #> │ └─[3:0x7f9ba555aa58] #> └─[3:0x7f9ba555aa58]

Now using the function ref() , we can find the memory reference of that R object and as you can see above, 0x7f9ba555aa58is the memory reference for simple_vector and when you ref(double_vector) , you can see that the list object refers back to the memory reference of simple_vector twice and the same happens with ref(triple_vector) where triple_vector refers to two references, one is the address of double_vector that refers back to simple_vector twice and another reference to simple_vector itself.

ref() does an excellent job of drawing a Tree structure helping us visualize the memory references. With this, we can make sure if we are creating new memory allocation in R or if we are referencing existing memory objects, thus doing a better memory management while writing the code.

Abstract Syntax Trees — ast()

As mentioned on Wikipedia,

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language.

Like every programming language, any expression in R can be expressed in the form a syntax tree. Visualizing expressions in the form of ASTs is immensely helpful while developing complex expressions and testing them.

Let us put ast() to the test and see what it can do for us. To start off, using ast(1+2)to visualize a simple addition expression reveals that + is the operator to which 1 and 2 are passed on. This results in the addition operation, but when we have to assign the output of the same to a new object, then <- becomes the root node.

library(lobstr)
#ast
#simple addition
ast(1 + 2)#> █─`+` #> ├─1 #> └─2
#simple addition with result assignment
ast(x <- 1 + 2)#> █─`<-` #> ├─x #> └─█─`+` #> ├─1 #> └─2

While we can keep on using ast() to understand complex expressions, it can be handy for another trivial (yet confusing) operation, i.e., Operator Precedence.

The expression y <- 2 + 3 * 5 / 9 ^ 2 is a difficult one to do manually in seconds, even though it contains simple arithmetic operators. This is because it’s not always easy to use operator precedence in our mind. But here is ast() doing the same thing:

#operator precedence
ast(y <- 2 + 3 * 5 / 9 ^ 2)#> █─`<-` #> ├─y #> └─█─`+` #> ├─2 #> └─█─`/` #> ├─█─`*` #> │ ├─3 #> │ └─5 #> └─█─`^` #> ├─9 #> └─2

Amazing, right?

Summary

Thus, with the use of lobstr functions ref() and ast(), we can become better at R programming — writing memory efficient codes and understanding the expression evaluation in a better way. The complete code you saw above is available here and thelobstr documentation can be accessed here.

What is your experience with this package? Let us know in the comments below!