[Intro to Computational Thinking] R Basics 1
š Welcome
Welcome to to the course! This is the first of 2 sets of exercises introducing you to the R programming language1 that we will use throughout the course.
This and the next lesson are optional. Their intended for you if you are not that familiar with R, or you would like to practice your R basics before getting started.
š Lesson Preview
The R console
Types (character, numeric, vector, data frame, list)
Workspace and history
š Getting Started with R and RStudio
To complete the exercises in this course, you will need to access RStudio.
You can do this in the cloud on the courseās Posit [RStudio] cloud site:
Or download and install R and RStudio on your computer:
Install R: https://cran.r-project.org/
Install RStudio (free version): https://posit.co/download/rstudio-desktop/
Once you open RStudio, take some time to explore the interface. Then weāll dive into details.
Notice in particular the >
character in the left-most window called
the Console. This prompt is where you enter R code. To run R code
that you have typed after the prompt, press the Return
or Enter
key.
It is also a good idea to become familiar with the command line terminal outside of R. We will touch on this, especially in Part II of the course.
š Tip: in the R Console or the Terminal, you can cancel a process with Ctrl + c
. You can clear your Console with Ctrl + l
. Note, the content wasnāt actually deleted, just scroll up to see the previous content.
[Optional RStudio Alternative] Jupyter Lab
Jupyter Lab is a popular alternative to RStudio. It is less tailored to R, but is used in many especially industry settings. If you would like to try it out, see setup instructions here.
Markdown
Markdown is a markup language widely used in data science to format human readable documents, especially interspersed with code. It is used in RMarkdown (how these materials are created), Quarto (a more expansive RMarkdown), and Jupyter Lab.
Install packages
In base R, use the install.packages()
function to install the add-on
packages youāll need to gather and manage data:
install.packages("tidyverse")
When you install a package, you will likely be given a list of āmirrorsā from which you can download the package. Select the mirror closest to you.
Load packages
To use the packages, you need to theme with the library()
function:
library("tidyverse")
Personally, I prefer (and will use for the rest of the course)
xfun::pkg_attach2()
. It installs packages if they arenāt present and
loads them all in one call.
xfun::pkg_attach2("tidyverse")
Note you will need to install the xfun
package
(install.packages("xfun")
) if you havenāt already.
Object Oriented Programming in the R Language
Objects: Rās ānounsā
If youāve read a description of the R language before, you will probably have seen it referred to as an āobject-oriented languageā. What are objects? Objects are like the R languageās nouns. They are things, like a vector of numbers, a data set, a word, a table of results from some analysis, and so on. Saying that R is object-oriented means that R is focused on doing actions to objects. We will talk about the actions, functions, later in this section. Now letās create a few objects.
Numeric and string type objects
Objects can have a number of different types. Letās make two simple objects. The first is a numeric-type object. The other is a character object.
We can choose almost any name we want for our objects as long as it begins with an alphabetic character and does not contain spaces. Just because there are relatively few hard restrictions on object names, doesnāt mean that you should name your object anything. Your code will be much easier to read if object names are short and meaningful. Give each object a unique name to avoid confusion and conflicts. For example, if you reuse an object name in an R session, you could easily accidentally overwrite it.
Good object names also help your code be self documenting, enhancing review, reproducibility, and reuse.
Letās begin working with numeric objects by creating a new object called
number with the number 10 in it. Use the assignment operator
(<-
) to put something into the object:
number <- 10
To see the contents of our object, type its name into the R console.
number
## [1] 10
Letās briefly breakdown this output. 10
is clearly the contents of
number. The double hash (##
) is included here to tell you that this
is output rather than R code. If you run functions in your R console,
you will not get the double hash in your output. Finally, [1]
gives
the position in the object that the number 10 is on. Our object only has
one position.
Creating an object with words and other characters, a character object,
is very similar. The only difference is that you enclose the character
string (letters in a word for example) inside of single or double
quotation marks (''
, or ""
). Letās create an object called words
containing the character string Hello World
:
# Create hello world character string
words <- "Hello World"
An objectās type is important to keep in mind. It determines what we can do to the object. For example, you cannot take the mean of a character object like the words object:
mean(words)
## Warning in mean.default(words): argument is not numeric or logical: returning
## NA
## [1] NA
Trying to find the mean of our words object gives us a warning message
and returns the value NA: not applicable. You can also
think of NA
as meaning āmissingā. To find out an objectās type, use
the class()
function. For example:
class(words)
## [1] "character"
Vector and data frame type objects
So far, we have only looked at objects with a single number or character string. Clearly we often want to use objects that have many strings and numbers. In R these are usually data frame-type objects and are roughly equivalent to the data structures you would be familiar with from using a program such as Microsoft Excel. We will be using data frames extensively throughout the book. Before looking at data frames it is useful to first look at the simpler objects that make up data frames. These are called vectors. Vectors are Rās āworkhorseā (Matloff 2011).
Vectors
Vectors are the āfundamental data typeā in R (Matloff 2011). They are an ordered group of numbers, character strings, and so on. It may be useful to think of most data in R as composed of vectors. For example, data frames are basically collections of vectors of the same length, i.e. they have the same number of rows, attached together to form columns.
Letās create a simple numeric vector containing the numbers 2.8, 2, and
14.8. To do this, we will use the c()
(combine) function and separate
the numbers with commas (,
):
# Create numeric vector
numeric_vector <- c(2.8, 2, 14.8)
# Show numeric_vector's contents
numeric_vector
## [1] 2.8 2.0 14.8
Vectors of character strings are created in a similar way. The only difference is that each character string is enclosed in quotation marks like this:
character_vector <- c("Albania", "Botswana", "Cambodia")
# Show character_vector's contents
character_vector
## [1] "Albania" "Botswana" "Cambodia"
Matrices
To give you a preview of what we are going to do when we start working
with real data sets, letās combine the two vectors numeric_vector and
character_vector into a new object with the cbind()
function. This
function binds the two vectors together side-by-side as columns.
string_num_matrix <- cbind(character_vector, numeric_vector)
string_num_matrix
## character_vector numeric_vector
## [1,] "Albania" "2.8"
## [2,] "Botswana" "2"
## [3,] "Cambodia" "14.8"
By binding these two objects together, weāve created a new matrix object. You can see that the numbers in the numeric_vector column are between quotation marks. Matrices, like vectors, can only have one data type, so R has converted the numbers to strings.
Data frames
If we want to have an object with rows and columns and allow the columns
to contain data with different types, we need to use data frames. Letās
use the data.frame
function to combine the numeric_vector and
character_vector objects.
string_num_df <- data.frame(character_vector, numeric_vector)
string_num_df
## character_vector numeric_vector
## 1 Albania 2.8
## 2 Botswana 2.0
## 3 Cambodia 14.8
In this output, you can see the data frameās names attribute. It is
the column names. You can use the names()
function to see any data
frameās names:
names(string_num_df)
## [1] "character_vector" "numeric_vector"
You will also notice that the first column of the data set has no name
and is a series of numbers. This is the row.names attribute. Data
frame rows can be given any name as long as each row name is unique. We
can use the row.names()
function to set the row names from a vector.
For example,
# Reassign row.names
row.names(string_num_df) <- c("First", "Second", "Third")
# Display new row.names
row.names(string_num_df)
## [1] "First" "Second" "Third"
You can see in this example how row.names()
can also be used to print
the row names. The row.names attribute does not behave like a regular
data frame column. You cannot, for example, include it as a variable in
a regression. You can use the row.names()
function to assign the
row.names values to a regular column.
You will notice in the output for string_num_df that the strings in
the character_vector column are not in quotation marks. This does
not mean that they are now numeric data. To prove this, try to find the
mean of character_vector by running it through the mean()
function:
mean(string_num_df$character_vector)
## Warning in mean.default(string_num_df$character_vector): argument is not
## numeric or logical: returning NA
## [1] NA
Lists
Lists are objects whose contents can be items of different values. For
example you could have a list with a vector and a data frame. Use the
list()
function to create lists. For example:
vector_df_list <- list(numeric_vector, string_num_df)
vector_df_list
## [[1]]
## [1] 2.8 2.0 14.8
##
## [[2]]
## character_vector numeric_vector
## First Albania 2.8
## Second Botswana 2.0
## Third Cambodia 14.8
Note that each element of the list has a number. For example, the former
contents of numeric_vector are element [[1]]
.
Component selection
The last bit of code we just saw will probably be confusing. Why do we
have a dollar sign ($
) between the name of our data frame object name
and the character_vector
variable? The dollar sign is called the
component selector.
Itās also sometimes called the element name operator. Either way, it
extracts a part, component, of an object. In the previous example, it
extracted the character_vector column from the string_num_df so
that it could be fed to the mean()
function.
We can use the component selector to create new objects with parts of other objects. Imagine that we have string_num_df and want an object with only the information in the numeric_vector column. Letās use the following code:
# Extract a numeric vector from string_num_df
numeric_extract <- string_num_df$numeric_vector
numeric_extract
## [1] 2.8 2.0 14.8
Subscripts
Another way to select parts of an object is to use subscripts. You have
already seen subscripts in the output from our examples so far. They are
denoted with square braces ([]
). We can use subscripts to select not
only columns from data frames but also rows and individual values. As we
began to see in some of the previous output, each part of a data frame
has an address captured by its row and column number. We can tell R to
find a part of an object by putting the row number/name, column
number/name, or both in square braces. The first part denotes the rows
and separated by a comma (,
) are the columns.
To give you an idea of how this works, letās use the cars data set
again. Use head()
to get a sense of what this data looks like.
head(cars)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
We can see a data frame with information on various car speeds (speed) and stopping distances (dist). If we want to select only the third through seventh rows, we can use the following subscript function call:
cars[3:7, ]
## speed dist
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
## 7 10 18
The colon (:
) creates a sequence of whole numbers from 3 to 7. To
select the fourth row of the dist column, we can type:
cars[4, 2]
## [1] 22
An equivalent way to do this is:
cars[4, "dist"]
## [1] 22
Finally, we can even include a vector of column names to select:
cars[4, c("speed", "dist")]
## speed dist
## 4 7 22
To extract elements of a list use the double square brackets [[]]
:
vector_df_list[[2]]
## character_vector numeric_vector
## First Albania 2.8
## Second Botswana 2.0
## Third Cambodia 14.8
You can stack up component selectors to extract deeper components of a list. For example:
vector_df_list[[2]]$numeric_vector
## [1] 2.8 2.0 14.8
š„ Exercises
Create a list of three data frames with any numeric and string content you want.
Select the first column of the third data frame.
For a numeric vector in your list, find the variance.
RStudio Primers (objects through lists).
š References