Working with Matrices, Lists, and Data Frames
1. Assign to the variable n_dims a single random integer between 3 and 10.
# floor rounds down to nearest integer, so setting from 3 to 11 will return a 10 if it is 10.999
n_dims <- floor(runif(1,3,11))
n_dims
## [1] 3
- Create a vector of consecutive integers from 1 to n_dims2
myvec <- seq(1,(n_dims)^2)
myvec
## [1] 1 2 3 4 5 6 7 8 9
- Use the sample function to randomly reshuffle these values.
myvec <- sample(myvec)
myvec
## [1] 9 1 7 3 5 8 2 6 4
- create a square matrix with these elements.
m <- matrix(myvec, nrow=sqrt(length(myvec)))
m
## [,1] [,2] [,3]
## [1,] 9 3 2
## [2,] 1 5 6
## [3,] 7 8 4
- find a function in r to transpose the matrix.
m <- t(m)
m
## [,1] [,2] [,3]
## [1,] 9 1 7
## [2,] 3 5 8
## [3,] 2 6 4
- calculate the sum and the mean of the elements in the first row and the last row.
sum(m[1,])
## [1] 17
sum(m[-1,])
## [1] 28
mean(m[1,])
## [1] 5.666667
mean(m[-1,])
## [1] 4.666667
- read about the eigen() function and use it on your matrix
eigen_m <- eigen(m)
- look carefully at the elements of
$values
and $vectors
. What kind of numbers are these?
$values
is the mathematical variance of a metrix and how symmetrical the matrix is.
$vectors
is a special set of scalars associated with a linear system of equations. a vector whose direction remains unchanged when a linear transformation is applied to it.
- dig in with the typeof() function to figure out their type.
typeof(eigen_m$values)
## [1] "double"
typeof(eigen_m$vectors)
## [1] "double"
# typeof returns "complex" for each of these values which are also labled as 'doubles'
- if have set your code up properly, you should be able to re-run it and create a matrix of different size because n_dims will change.
2. Create a list with the following named elements:
- mymatrix, which is a 4 x 4 matrix filled with random uniform values
- mylogical which is a 100-element vector of TRUE or FALSE values. Do this efficiently by setting up a vector of random values and then applying an inequality to it.
- my_letters, which is a 26-element vector of all the lower-case letters in random order.
mylist <- list(mymatrix = matrix(data = runif(16), ncol = 4),
mylogical = c(runif(100) < 0.5),
myletters = letters[1:26])
- create a new list, which has the element[2,2] from the matrix, the second element of the logical vector, and the second element of the letters vector.
newlist <- list(mylist$mymatrix[2,2],
mylist$mylogical[2],
mylist$myletters[2])
newlist
## [[1]]
## [1] 0.9025716
##
## [[2]]
## [1] TRUE
##
## [[3]]
## [1] "b"
- use the typeof() function to confirm the underlying data types of each component in this list
typeof(newlist[[1]])
## [1] "double"
typeof(newlist[[2]])
## [1] "logical"
typeof(newlist[[3]])
## [1] "character"
- combine the underlying elements from the new list into a single atomic vector with the c() function.
newvec <- c(unlist(newlist))
newvec
## [1] "0.902571638813242" "TRUE" "b"
- what is the data type of this vector?
typeof(newvec)
## [1] "character"
3. Create a data frame with two variables (= columns) and 26 cases (= rows).
- call the first variable my_unis and fill it with 26 random uniform values from 0 to 10
- call the second variable my_letters and fill it with 26 capital letters in random order.
myunits <- runif(26,0, 10)
my_letters <- sample(LETTERS[1:26])
mydata <- data.frame(myunits, my_letters)
mydata
## myunits my_letters
## 1 8.8610031 O
## 2 9.7938322 W
## 3 9.1802644 Y
## 4 3.9839185 N
## 5 4.2179669 K
## 6 5.4920706 A
## 7 7.1201427 S
## 8 1.2730415 C
## 9 3.6511823 Z
## 10 2.1633328 D
## 11 0.8091063 R
## 12 7.7666673 P
## 13 4.6999839 U
## 14 5.3893999 I
## 15 3.6704946 V
## 16 8.8476281 L
## 17 4.1933916 E
## 18 3.2968438 M
## 19 9.3185131 J
## 20 1.9981771 G
## 21 9.3375367 B
## 22 4.0843477 H
## 23 3.8818783 F
## 24 3.4493561 T
## 25 6.7252571 X
## 26 6.2594360 Q
- for the first variable, use a single line of code in R to select 4 random rows and replace the numerical values in those rows with NA.
mydata$myunits[sample(1:26, 4)] <- NA
- for the first variable, write a single line of R code to identify which rows have the missing values.
which(is.na(mydata$myunits))
## [1] 12 19 20 21
- for the second variable, sort it in alphabetical order
mydata <- mydata[order(mydata$my_letters),]
mydata
## myunits my_letters
## 6 5.4920706 A
## 21 NA B
## 8 1.2730415 C
## 10 2.1633328 D
## 17 4.1933916 E
## 23 3.8818783 F
## 20 NA G
## 22 4.0843477 H
## 14 5.3893999 I
## 19 NA J
## 5 4.2179669 K
## 16 8.8476281 L
## 18 3.2968438 M
## 4 3.9839185 N
## 1 8.8610031 O
## 12 NA P
## 26 6.2594360 Q
## 11 0.8091063 R
## 7 7.1201427 S
## 24 3.4493561 T
## 13 4.6999839 U
## 15 3.6704946 V
## 2 9.7938322 W
## 25 6.7252571 X
## 3 9.1802644 Y
## 9 3.6511823 Z
- calculate the column mean for the first variable.
summary(mydata$myunits)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.8091 3.6560 4.2057 5.0474 6.6088 9.7938 4
#or
mean(mydata$myunits, na.rm = TRUE)
## [1] 5.047449