We recommend using “Projects” within RStudio. RStudio then creates a folder for each project in which the project file is stored (file extension .rproj). If Rscripts are loaded or generated within the project, they are then also stored in the project folder. You can find out more about RStudio Projects here.
There are several benefits to using Projects. You can:
specify the Working Directory without using an explicit path (setwd()). This is useful because the path can change (when collaborating with other users, or executing the script at a later date)
automatically cache open scripts and restore open scripts in the next session
set different project-specific options
use version control systems (e.g., git)
Task 1
Create a data.frame with the following data. Tipp: Create a vector for each column first.
What types of data were automatically accepted in the last task? Check this using str(), see whether they make sense and convert where necessary.
Sample Solution
str(df)## 'data.frame': 4 obs. of 5 variables:## $ Species : chr "Fox" "Bear" "Rabbit" "Moose"## $ Number : num 2 5 1 3## $ Weight : num 4.4 40.3 1.1 120## $ Sex : chr "m" "f" "m" "m"## $ Description: chr "Reddish" "Brown, large" "Small, with long ears" "Long legs, shovel antlers"typeof(df$Number)## [1] "double"# Number was interpreted as `double`, but it is actually an `integer`.df$Number <-as.integer(df$Number)# We know sex only has two options:df$Sex <-factor(df$Sex, levels =c("m","f"))
Input: Libraries / packages
Libraries (aka packages) are are “extensions” to the basic R functionality. R packages have become indispensable to using R. The vast majority of packages are hosted on CRAN and can be easily installed using install.packages("packagename"). This installation is done once. To use the library, you must load it into the current R session using library(packagename).
E.g. To import data, we recommend using the readr package1. Install the package using the command install.package("readr"). To use the package, load it into the current R session using library("readr").
Task 3
On Moodle, you will find a folder called Datasets. Download the file and move it in your project folder. Import the weather.csv file. If you use the RStudio GUI for this, save the import command in your R-Script. Please use a relative path (i.e., not a path starting with C:\, or similar).)
Have a look at your dataset in the console. Have the data been interpreted correctly?
Sample Solution
# The 'time' column was interpreted as 'integer'. However, it is # obviously a time indication.
Task 5
The time column is a date/time with a format of YYYYMMDDHH. In order for R to recognise the data in this column as date/time, it must be in the correct format (POSIXct). Therefore, we must tell R what the current format is. Use as.POSIXct() to read the column into R, remembering to specify both format and tz.
Tip
If no time zone is set, as.POSIXct() sets a default (based on sys.timezone()). In our case, however, these are values in UTC (see metadata.csv)
as.POSIXct requires a character input: If you receive the error message 'origin' must be supplied (or similar), you have probably tried to input a numeric into the function with.
Sample Solution
weather$time <-as.POSIXct(as.character(weather$time), format ="%Y%m%d%H", tz ="UTC")
The new table should look like this
stn
time
tre200h0
ABO
2000-01-01 00:00:00
-2.6
ABO
2000-01-01 01:00:00
-2.5
ABO
2000-01-01 02:00:00
-3.1
ABO
2000-01-01 03:00:00
-2.4
ABO
2000-01-01 04:00:00
-2.5
ABO
2000-01-01 05:00:00
-3.0
ABO
2000-01-01 06:00:00
-3.7
ABO
2000-01-01 07:00:00
-4.4
ABO
2000-01-01 08:00:00
-4.1
ABO
2000-01-01 09:00:00
-4.1
Task 6
Create two new columns for day of week (Monday, Tuesday, etc) and calendar week. Use the newly created POSIXct column and a suitable function from lubridate.