library("dplyr")
library("readr")
library("lubridate")
Prepro 2: Exercise A
Task 1
Read the weather data from last week weather.csv (source MeteoSchweiz) into R. Make sure that the columns are formatted correctly (stn
as a factor
, time
as POSIXct
, tre200h0
as a numeric
).
Sample Solution
<- read_delim("datasets/prepro/weather.csv", ",")
weather $stn <- as.factor(weather$stn)
weather$time <- as.POSIXct(as.character(weather$time), format = "%Y%m%d%H", tz = "UTC") weather
Task 2
Read in the metadata.csv dataset as a csv.
If umlauts and special characters are not displayed correctly (e.g. the è in Gèneve), this probably has something to do with the character encoding. The file is currently encoded in UTF-8
. If special characters are not correctly displayed, R has not recognised this encoding and it must be specified in the import function. How this is done depends on the import function used:
- Package functions:
readr
:locale = locale(encoding = "UTF-8")
- Base-R functions:
fileEncoding = "UTF-8"
Note: If you have a file where you do not know how a file is encoded, the following instructions for Windows, Mac and Linux will help.
Sample Solution
<- read_delim("datasets/prepro/metadata.csv", ";", locale = locale(encoding = "UTF-8")) metadata
Task 3
Now we want to enrich the weather
data set with information from metadata
. However, we are only interested in the station abbreviation, the name, the x/y coordinates and the sea level. Select these columns.
Sample Solution
<- metadata[, c("stn", "Name", "x", "y", "Meereshoehe")] metadata
Task 4
Now the metadata
can be connected to the weather
data set. Which join should we use to do this? And, which attribute can we join?
Use the join options in dplyr
(help via? dplyr::join
) to connect the weather
data set and the metadata
.
Sample Solution
<- left_join(weather, metadata, by = "stn")
weather
# Join type: Left-Join on 'weather', as we are only interested in the stations in the 'weather' dataset.
# Attribute: "stn"
Task 5
Create a new month
column (from time
). To do this, use the lubridate::month()
function.
Sample Solution
$month <- month(weather$time) weather
Task 6
Use the month
column to calculate the average temperature per month.
Sample Solution
mean(weather$tre200h0[weather$month == 1])
## [1] -1.963239
mean(weather$tre200h0[weather$month == 2])
## [1] 0.3552632
mean(weather$tre200h0[weather$month == 3])
## [1] 2.965054
# etc. for all 12 months