Lesson 3. Understand the vector data type in R and classes including strings, numbers and logicals - Data science for scientists 101

Learning objectives

At the end of this activity, you will be able to:

  • Understand the structure of and be able to create a vector object in R.

What you need

You need R and RStudio to complete this tutorial. Also we recommend that you have an earth-analytics directory setup on your computer with a /data directory within it.

Vectors and data types

A vector is the most common data structure in R. A vector is defined as a group of values, which most often are either numbers or characters. You can assign this list of values to an object or variable, just like you can for a single value. For example we can create a vector of animal weights:

weight_g <- c(50, 60, 65, 82)
## [1] 50 60 65 82

A vector can also contain characters:

animals <- c("mouse", "rat", "dog")
## [1] "mouse" "rat"   "dog"

There are many functions that allow you to inspect the content of a vector. length() tells you how many elements are in a particular vector:

## [1] 4
## [1] 3

Vector data types

An important feature of a vector is that all of the elements are the same data type. The function class() shows us the class (the data type) of an object:

## [1] "numeric"
## [1] "character"

The function str() shows us the structure of the object and the elements it contains. str() is a really useful function when working with large and complex objects:

##  num [1:4] 50 60 65 82
##  chr [1:3] "mouse" "rat" "dog"

You can add elements to your vector by using the c() function:

# add the number 90 to the end of the vector
weight_g <- c(weight_g, 90)

# add the number 30 to the beginning of the vector
weight_g <- c(30, weight_g)
## [1] 30 50 60 65 82 90

In the examples above, we saw 2 of the 6 atomic vector types that R uses:

  1. "character" and
  2. "numeric"

These are the basic data tpes that all R objects are built from. The other 4 are:

  • "logical" for TRUE and FALSE (the boolean data type)
  • "integer" for integer numbers (e.g. 2L, the L indicates to R that it’s an integer)
  • "complex" to represent complex numbers with real and imaginary parts (e.g., 1+4i) and that’s all we’re going to say about them
  • "raw" that we won’t discuss further

Data type vs. data structure

Vectors are one of the many data structures that R uses. Other important ones include: lists (list), matrices (matrix), data frames (data.frame) and factors (factor). We will look at data.frames when we open our boulder_precip data in the next lesson!

Optional challenge activity

  • Question: What happens when we create a vector that contains both numbers and character values? Give it a try and write down the answer.

  • Question: What will happen in each of these examples? (hint: use class() to check the data type of your objects):

num_char <- c(1, 2, 3, 'a')
num_logical <- c(1, 2, 3, '2.45')
char_logical <- c('a', 'b', 'c', frog)
tricky <- c(1, 2, 3, '4')
  • Question: Why do you think it happens?

  • Question: Can you draw a diagram that represents the hierarchy of the data types?

Subsetting vectors

If we want to extract one or several values from a vector, we must provide one or several indices in square brackets. For instance:

animals <- c("mouse", "rat", "dog", "cat")
## [1] "rat"
animals[c(3, 2)]
## [1] "dog" "rat"

Data tip: R indexes start at 1. Programming languages like Fortran, MATLAB, and R start counting at 1, because that’s what human beings typically do. Languages in the C family (including C++, Java, Perl, and Python) count from 0 because that’s simpler for computers to do.

Subset vectors

We can subset vectors too. For instance, if you wanted to select only the values above 50:

weight_g > 50    # will return logicals with TRUE for the indices that meet the condition
## so we can use this to select only the values above 50
weight_g[weight_g > 50]
## [1] 60 65 82 90

You can combine multiple tests using & (both conditions are true, AND) or | (at least one of the conditions is true, OR):

weight_g[weight_g < 30 | weight_g > 50]
## [1] 60 65 82 90
weight_g[weight_g >= 30 & weight_g == 21]
## numeric(0)

When working with vectors of characters, if you are trying to combine many conditions it can become tedious to type. The function %in% allows you to test if a value is found in a vector:

animals <- c("mouse", "rat", "dog", "cat")
animals[animals == "cat" | animals == "rat"] # returns both rat and cat
## [1] "rat" "cat"
animals %in% c("rat", "cat", "dog", "duck")
animals[animals %in% c("rat", "cat", "dog", "duck")]
## [1] "rat" "dog" "cat"

Optional challenge

  • Can you figure out why "four" > "five" returns TRUE?

Leave a Comment