Prepro 2: Exercise A

Published

February 27, 2024

library("dplyr")
library("readr")
library("lubridate")

Task 1

Read the weather data from last week weather.csv (source MeteoSchweiz) into R. Make sure that the columns are formatted correctly (stn as a factor, time as POSIXct, tre200h0 as a numeric).

Sample Solution
weather <- read_delim("datasets/prepro/weather.csv", ",")
weather$stn <- as.factor(weather$stn)
weather$time <- as.POSIXct(as.character(weather$time), format = "%Y%m%d%H", tz = "UTC")

Task 2

Read in the metadata.csv dataset as a csv.

Tip

If umlauts and special characters are not displayed correctly (e.g. the è in Gèneve), this probably has something to do with the character encoding. The file is currently encoded in UTF-8. If special characters are not correctly displayed, R has not recognised this encoding and it must be specified in the import function. How this is done depends on the import function used:

  • Package functions: readr: locale = locale(encoding = "UTF-8")
  • Base-R functions: fileEncoding = "UTF-8"

Note: If you have a file where you do not know how a file is encoded, the following instructions for Windows, Mac and Linux will help.

Sample Solution
metadata <- read_delim("datasets/prepro/metadata.csv", ";", locale = locale(encoding = "UTF-8"))

Task 3

Now we want to enrich the weather data set with information from metadata. However, we are only interested in the station abbreviation, the name, the x/y coordinates and the sea level. Select these columns.

Sample Solution
metadata <- metadata[, c("stn", "Name", "x", "y", "Meereshoehe")]

Task 4

Now the metadata can be connected to the weather data set. Which join should we use to do this? And, which attribute can we join?

Use the join options in dplyr (help via? dplyr::join) to connect the weather data set and the metadata.

Sample Solution
weather <- left_join(weather, metadata, by = "stn")

# Join type: Left-Join on 'weather', as we are only interested in the stations in the 'weather' dataset.
# Attribute: "stn"

Task 5

Create a new month column (from time). To do this, use the lubridate::month() function.

Sample Solution
weather$month <- month(weather$time)

Task 6

Use the month column to calculate the average temperature per month.

Sample Solution
mean(weather$tre200h0[weather$month == 1])
## [1] -1.963239
mean(weather$tre200h0[weather$month == 2])
## [1] 0.3552632
mean(weather$tre200h0[weather$month == 3])
## [1] 2.965054

# etc. for all 12 months