[Intro to Computational Thinking] R Basics 1

May 12 2023

šŸ‘‹ Welcome

Welcome to to the course! This is the first of 2 sets of exercises introducing you to the R programming language1 that we will use throughout the course.

This and the next lesson are optional. Their intended for you if you are not that familiar with R, or you would like to practice your R basics before getting started.

šŸ“ Lesson Preview

šŸ Getting Started with R and RStudio

To complete the exercises in this course, you will need to access RStudio.

You can do this in the cloud on the courseā€™s Posit [RStudio] cloud site:

Or download and install R and RStudio on your computer:

  1. Install R: https://cran.r-project.org/

  2. Install RStudio (free version): https://posit.co/download/rstudio-desktop/

Once you open RStudio, take some time to explore the interface. Then weā€™ll dive into details.

Notice in particular the > character in the left-most window called the Console. This prompt is where you enter R code. To run R code that you have typed after the prompt, press the Return or Enter key.

It is also a good idea to become familiar with the command line terminal outside of R. We will touch on this, especially in Part II of the course.

šŸ’ Tip: in the R Console or the Terminal, you can cancel a process with Ctrl + c. You can clear your Console with Ctrl + l. Note, the content wasnā€™t actually deleted, just scroll up to see the previous content.

[Optional RStudio Alternative] Jupyter Lab

Jupyter Lab is a popular alternative to RStudio. It is less tailored to R, but is used in many especially industry settings. If you would like to try it out, see setup instructions here.

Markdown

Markdown is a markup language widely used in data science to format human readable documents, especially interspersed with code. It is used in RMarkdown (how these materials are created), Quarto (a more expansive RMarkdown), and Jupyter Lab.

Install packages

In base R, use the install.packages() function to install the add-on packages youā€™ll need to gather and manage data:

install.packages("tidyverse")

When you install a package, you will likely be given a list of ā€œmirrorsā€ from which you can download the package. Select the mirror closest to you.

Load packages

To use the packages, you need to theme with the library() function:

library("tidyverse")

Personally, I prefer (and will use for the rest of the course) xfun::pkg_attach2(). It installs packages if they arenā€™t present and loads them all in one call.

xfun::pkg_attach2("tidyverse")

Note you will need to install the xfun package (install.packages("xfun")) if you havenā€™t already.

Object Oriented Programming in the R Language

Objects: Rā€™s ā€œnounsā€

If youā€™ve read a description of the R language before, you will probably have seen it referred to as an ā€˜object-oriented languageā€™. What are objects? Objects are like the R languageā€™s nouns. They are things, like a vector of numbers, a data set, a word, a table of results from some analysis, and so on. Saying that R is object-oriented means that R is focused on doing actions to objects. We will talk about the actions, functions, later in this section. Now letā€™s create a few objects.

Numeric and string type objects

Objects can have a number of different types. Letā€™s make two simple objects. The first is a numeric-type object. The other is a character object.

We can choose almost any name we want for our objects as long as it begins with an alphabetic character and does not contain spaces. Just because there are relatively few hard restrictions on object names, doesnā€™t mean that you should name your object anything. Your code will be much easier to read if object names are short and meaningful. Give each object a unique name to avoid confusion and conflicts. For example, if you reuse an object name in an R session, you could easily accidentally overwrite it.

Good object names also help your code be self documenting, enhancing review, reproducibility, and reuse.

Letā€™s begin working with numeric objects by creating a new object called number with the number 10 in it. Use the assignment operator (<-) to put something into the object:

number <- 10

To see the contents of our object, type its name into the R console.

number
## [1] 10

Letā€™s briefly breakdown this output. 10 is clearly the contents of number. The double hash (##) is included here to tell you that this is output rather than R code. If you run functions in your R console, you will not get the double hash in your output. Finally, [1] gives the position in the object that the number 10 is on. Our object only has one position.

Creating an object with words and other characters, a character object, is very similar. The only difference is that you enclose the character string (letters in a word for example) inside of single or double quotation marks ('', or ""). Letā€™s create an object called words containing the character string Hello World:

# Create hello world character string
words <- "Hello World"

An objectā€™s type is important to keep in mind. It determines what we can do to the object. For example, you cannot take the mean of a character object like the words object:

mean(words)
## Warning in mean.default(words): argument is not numeric or logical: returning
## NA
## [1] NA

Trying to find the mean of our words object gives us a warning message and returns the value NA: not applicable. You can also think of NA as meaning ā€œmissingā€. To find out an objectā€™s type, use the class() function. For example:

class(words)
## [1] "character"

Vector and data frame type objects

So far, we have only looked at objects with a single number or character string. Clearly we often want to use objects that have many strings and numbers. In R these are usually data frame-type objects and are roughly equivalent to the data structures you would be familiar with from using a program such as Microsoft Excel. We will be using data frames extensively throughout the book. Before looking at data frames it is useful to first look at the simpler objects that make up data frames. These are called vectors. Vectors are Rā€™s ā€œworkhorseā€ (Matloff 2011).

Vectors

Vectors are the ā€œfundamental data typeā€ in R (Matloff 2011). They are an ordered group of numbers, character strings, and so on. It may be useful to think of most data in R as composed of vectors. For example, data frames are basically collections of vectors of the same length, i.e. they have the same number of rows, attached together to form columns.

Letā€™s create a simple numeric vector containing the numbers 2.8, 2, and 14.8. To do this, we will use the c() (combine) function and separate the numbers with commas (,):

# Create numeric vector
numeric_vector <- c(2.8, 2, 14.8)

# Show numeric_vector's contents
numeric_vector
## [1]  2.8  2.0 14.8

Vectors of character strings are created in a similar way. The only difference is that each character string is enclosed in quotation marks like this:

character_vector <- c("Albania", "Botswana", "Cambodia")

# Show character_vector's contents
character_vector
## [1] "Albania"  "Botswana" "Cambodia"
Matrices

To give you a preview of what we are going to do when we start working with real data sets, letā€™s combine the two vectors numeric_vector and character_vector into a new object with the cbind() function. This function binds the two vectors together side-by-side as columns.

string_num_matrix <- cbind(character_vector, numeric_vector)
string_num_matrix
##      character_vector numeric_vector
## [1,] "Albania"        "2.8"         
## [2,] "Botswana"       "2"           
## [3,] "Cambodia"       "14.8"

By binding these two objects together, weā€™ve created a new matrix object. You can see that the numbers in the numeric_vector column are between quotation marks. Matrices, like vectors, can only have one data type, so R has converted the numbers to strings.

Data frames

If we want to have an object with rows and columns and allow the columns to contain data with different types, we need to use data frames. Letā€™s use the data.frame function to combine the numeric_vector and character_vector objects.

string_num_df <- data.frame(character_vector, numeric_vector)
string_num_df
##   character_vector numeric_vector
## 1          Albania            2.8
## 2         Botswana            2.0
## 3         Cambodia           14.8

In this output, you can see the data frameā€™s names attribute. It is the column names. You can use the names() function to see any data frameā€™s names:

names(string_num_df)
## [1] "character_vector" "numeric_vector"

You will also notice that the first column of the data set has no name and is a series of numbers. This is the row.names attribute. Data frame rows can be given any name as long as each row name is unique. We can use the row.names() function to set the row names from a vector. For example,

# Reassign row.names
row.names(string_num_df) <- c("First", "Second", "Third")

# Display new row.names
row.names(string_num_df)
## [1] "First"  "Second" "Third"

You can see in this example how row.names() can also be used to print the row names. The row.names attribute does not behave like a regular data frame column. You cannot, for example, include it as a variable in a regression. You can use the row.names() function to assign the row.names values to a regular column.

You will notice in the output for string_num_df that the strings in the character_vector column are not in quotation marks. This does not mean that they are now numeric data. To prove this, try to find the mean of character_vector by running it through the mean() function:

mean(string_num_df$character_vector)
## Warning in mean.default(string_num_df$character_vector): argument is not
## numeric or logical: returning NA
## [1] NA

Lists

Lists are objects whose contents can be items of different values. For example you could have a list with a vector and a data frame. Use the list() function to create lists. For example:

vector_df_list <- list(numeric_vector, string_num_df)

vector_df_list
## [[1]]
## [1]  2.8  2.0 14.8
## 
## [[2]]
##        character_vector numeric_vector
## First           Albania            2.8
## Second         Botswana            2.0
## Third          Cambodia           14.8

Note that each element of the list has a number. For example, the former contents of numeric_vector are element [[1]].

Component selection

The last bit of code we just saw will probably be confusing. Why do we have a dollar sign ($) between the name of our data frame object name and the character_vector variable? The dollar sign is called the component selector. Itā€™s also sometimes called the element name operator. Either way, it extracts a part, component, of an object. In the previous example, it extracted the character_vector column from the string_num_df so that it could be fed to the mean() function.

We can use the component selector to create new objects with parts of other objects. Imagine that we have string_num_df and want an object with only the information in the numeric_vector column. Letā€™s use the following code:

# Extract a numeric vector from string_num_df
numeric_extract <- string_num_df$numeric_vector
numeric_extract
## [1]  2.8  2.0 14.8

Subscripts

Another way to select parts of an object is to use subscripts. You have already seen subscripts in the output from our examples so far. They are denoted with square braces ([]). We can use subscripts to select not only columns from data frames but also rows and individual values. As we began to see in some of the previous output, each part of a data frame has an address captured by its row and column number. We can tell R to find a part of an object by putting the row number/name, column number/name, or both in square braces. The first part denotes the rows and separated by a comma (,) are the columns.

To give you an idea of how this works, letā€™s use the cars data set again. Use head() to get a sense of what this data looks like.

head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

We can see a data frame with information on various car speeds (speed) and stopping distances (dist). If we want to select only the third through seventh rows, we can use the following subscript function call:

cars[3:7, ]
##   speed dist
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10
## 7    10   18

The colon (:) creates a sequence of whole numbers from 3 to 7. To select the fourth row of the dist column, we can type:

cars[4, 2]
## [1] 22

An equivalent way to do this is:

cars[4, "dist"]
## [1] 22

Finally, we can even include a vector of column names to select:

cars[4, c("speed", "dist")]
##   speed dist
## 4     7   22

To extract elements of a list use the double square brackets [[]]:

vector_df_list[[2]]
##        character_vector numeric_vector
## First           Albania            2.8
## Second         Botswana            2.0
## Third          Cambodia           14.8

You can stack up component selectors to extract deeper components of a list. For example:

vector_df_list[[2]]$numeric_vector
## [1]  2.8  2.0 14.8

šŸ„… Exercises

  • Create a list of three data frames with any numeric and string content you want.

    • Select the first column of the third data frame.

    • For a numeric vector in your list, find the variance.

  • RStudio Primers (objects through lists).

šŸ“š References

Gandrud, Christopher. 2020. Reproducible Research with r and RStudio. Boca Raton, FL: CRC Press.
Matloff, Norman. 2011. The Art of Programming in r: A Tour of Statistical Programming Design. San Francisco: No Starch Press.

  1. These lessons are based on Gandrud (2020), Chapter 3.ā†©ļøŽ