1. 程式人生 > >Super Fast Crash Course in R (for developers)

Super Fast Crash Course in R (for developers)

As a developer you can pick-up R super fast.

If you are already a developer, you don’t need to know much about a new language to be able to reading and understanding code snippets and writing your own small scripts and programs.

In this post you will discover the basic syntax, data structures and control structures that you need to know to start reading and writing R scripts.

Let’s get started.

R Crash Course For Developers

R Crash Course For Developers
Photo by hackNY.org, some rights reserved.

R Syntax is Different, But The Same

The syntax in R  looks confusing, but only to begin with.

It is an older LISP-style language inspired by an even older language (S). The assignment syntax is probably the strangest thing you will see. Assignment uses the arrow (<-) rather than a single equals (=).

R has all of your familiar control flow structures like if-the-else, for-loops and while loops.

You can create your own functions and libraries of helper functions for your scripts.

If you have done any scripting before, like JavaScript, Python, Ruby, BASH or similar, then you will pick up R very quickly.

You Can Already Program, Just Learn the R Syntax

As a developer, you already know how to program.

You can take a problem and think up the type of procedure and data structures you need. The language you are using is just a detail. You only need to map your idea of the solution onto the specifics of the language you are using.

This is how you can get started using R very quickly.

To get started, you need to know the absolute basics. Basics such as:

  • How do we assign data to variables?
  • How do we work with different data types?
  • How do we work with the data structures for handling data?
  • How do we use the standard flow control structures?
  • How do you work with functions and third-party packages?

You learn the answers to these questions by looking at code examples. You can then:

  • Map third party code you’re reading onto those examples to better understand them.
  • Pattern the code you write from scratch from the examples.

Let’s take a quick tour of the basic syntax of R

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

R Crash Course For Developers (Start Here)

In this section we will take a quick look at the basic syntax used in R.

After reading (and ideally working through) the examples in this section, you will have enough background as a developer to start reading and understanding other peoples R code.

You will also have the confidence to start writing your own small R scripts.

The examples in this section are split into the following sections:

  1. Assignment
  2. Data Structures
  3. Flow Control
  4. Functions
  5. Packages

Start the R interactive environment (type R on the command line) and let’s get started.

1. Assignment

The key to assignment in R is the arrow operator (<-) for assignment.

Below are examples of assigning an integer, double, string and a boolean, and printing each out to the console in turn.

12345678910111213141516171819 ># integer>i<-23>i[1]23># double>d<-2.3>d[1]2.3># string>s<-'hello world'>s[1]"hello world"># boolean>b<-TRUE>b[1]TRUE

Remember, do not use equals (=) for assignment. It is the biggest mistake new R programmers make.

2. Data Structures

There three data structures that you will use the most in R:

  1. Vectors
  2. Lists
  3. Matrices
  4. Data Frames

Lists

Lists provide a group of named items, not unlike a map.

12345678 # create a list of named itemsa<-list(aa=1,bb=2,cc=3)aa$aa# add a named item to a lista$dd=4a

You can define a new list with the list() function. A list can be initialized with values or empty. Note that the named values in the list can be accessed using the dollar operator ($). Once referenced, they can be read or written. This is also how new items can be added to the list.

Vectors

Vectors are lists of data that can be the same or different types:

12345678910111213141516171819 ># create a vector using the c() function>v<-c(98,99,100)>v[1]9899100>v[1:2][1]9899># create a vector from a range of integers>r<-(1:10)>r[1]12345678910>r[5:10][1]5678910># add a new item to the end of a vector>v<-c(1,2,3)>v[4]<-4>v[1]1234

Notice that vectors are 1-index (indexes start at 1 not 0).

You will use the c() function a lot to concatenate variables into a vector.

Matrices

A matrix is a table of data. It has dimensions (rows and columns) and the columns can be named.

123456789101112131415 # Create a 2-row, 3-column matrix with named headings>data<-c(1,2,3,4,5,6)>headings<-list(NULL,c("a","b","c"))>m<-matrix(data,nrow=2,ncol=3,byrow=TRUE,dimnames=headings)>mabc[1,]123[2,]456>m[1,]abc123>m[,1][1]14

A lot of useful plotting and machine learning algorithms require the data to be provide as a matrix.

Note the syntax to index into rows [1,] and columns [,1] of a matrix.

Data Frame

Data frames are useful for actually representing tables of your data in R.

123456 # create a new data frameyears<-c(1980,1985,1990)scores<-c(34,44,83)df<-data.frame(years,scores)df[,1]df$years

A matrix is much simpler structure, intended for mathematical operations. A data frame is more suited to representing a table of data and is expected by modern implementations of machine learning algorithms in R.

Note that you can index into rows and columns of a data frame just like you can for a matrix. Also note that you can reference a column using its name (df$years)

Some other data structures you could go on to learn about are lists and arrays.

3. Flow Control

R supports all the same flow control structures that you are used to.

  1. If-Then-Else
  2. For Loop
  3. While Loop

As a developer, these are all self explanatory.

If-Then-Else

123456789 # if then elsea<-66if(a>55){print("a is more than 55")}else{print("A is less than or equal to 55")}[1]"a is more than 55"

For Loop

1234567891011 # for loopmylist<-c(55,66,77,88,99)for(value inmylist){print(value)}[1]55[1]66[1]77[1]88[1]99

While Loop

12345678 # while loopa<-100while(a<500){a<-a+100}a[1]500

4. Functions

Functions let you group code and call that code repeatedly with arguments.

The two main concerns with functions are:

  1. Calling Functions
  2. Help For Functions
  3. Writing Custom Functions

Call Functions

You have already used one function, the c() function for concatenating objects into a vector.

R has many built in functions and additional functions can be provided by installing and loading third-party packages.

Here is an example of using a statistical function to calculate the mean of a vector of numbers:

12345 # call function to calculate the mean on a vector of integersnumbers<-c(1,2,3,4,5,6)mean(numbers)[1]3.5

Help for Functions

You can help help with a function in R by using the question mark operator (?) followed by the function name.

123 # help with the mean() function?meanhelp(mean)

Alternatively, you can call the help() function and pass the function name you need help with as an argument (e.g. help(mean)).

You can get example usage of a function by calling the example() function and passing the name of the function as an argument.

12 # example usage of the mean functionexample(mean)

Custom Functions

You can define your own functions that may or may not take arguments or return a result.

Below is an example of a custom function to calculate and return the sum of three numbers:

123456789 # define custom functionmysum<-function(a,b,c){sum<-a+b+creturn(sum)}# call custom functionmysum(1,2,3)[1]6

5. Packages

Packages are the way that third party R code is distributed. The Comprehensive R Archive Network (CRAN) provides hosting and listing of third party R packages that you can download.

Install a Package

You can install a package hosted on CRAN by calling a function. It will then pop-up a dialog to ask you which mirror you would like to download the package from.

For example, here is how you can install the caret package which is very useful in machine learning:

1234 # install the caret packageinstall.packages("caret")# load the packagelibrary(caret)

Help For Package

A package can provide a lot of new functions. You can read up on a package on it’s CRAN page, but you can also get help for the package within R using the library function.

12 # help for the caret packagelibrary(help="caret")

5 Things To Remember

Here are five quick tips to remember when getting started in R:

  • Assignment. R uses the arrow operator (<-) for assignment, not a single equals (=).
  • Case Sensitive. The R language is case sensitive, meaning that C() and c() are two different function calls.
  • Help. You can help on any operator or function using the help() function or the ? operator and help with packages using the double question mark operator (??).
  • How To Quit. You can exit the R interactive environment by calling the q() function.
  • Documentation. R installs with a lot of useful documentation. You can review it in the browser by typing: help.start()

Get a Reference Book

There are many great resources online for learning more about how to use R.

I recommend grabbing a good reference text and keeping it close by. I use and recommend R in a Nutshell.

Amazon Image

Summary

In this post you took a crash course in basic R syntax.

As a developer, you now know enough to read other peoples R scripts.

You also have the tools to start writing your own little scripts in the R interactive environment.

Next Step

Did you work through all of the examples?

  1. Start R.
  2. Work through the tutorial.
  3. Let me know how you went (le