Sample Solution
library("readr")
weather <- read_delim("datasets/prepro/weather.csv", ",")
weather$stn <- as.factor(weather$stn)
weather$time <- as.POSIXct(as.character(weather$time), format = "%Y%m%d%H", tz = "UTC")Read the weather data from last week weather.csv (source MeteoSchweiz) into R. Make sure that the columns are formatted correctly (stn as a factor, time as POSIXct, tre200h0 as a numeric).
library("readr")
weather <- read_delim("datasets/prepro/weather.csv", ",")
weather$stn <- as.factor(weather$stn)
weather$time <- as.POSIXct(as.character(weather$time), format = "%Y%m%d%H", tz = "UTC")Read in the metadata.csv dataset as a csv.
If umlauts and special characters are not displayed correctly (e.g. the è in Gèneve), this probably has something to do with the character encoding. The file is currently encoded in UTF-8. If special characters are not correctly displayed, R has not recognised this encoding and it must be specified in the import function. How this is done depends on the import function used:
readr: locale = locale(encoding = "UTF-8")fileEncoding = "UTF-8"Note: If you have a file where you do not know how a file is encoded, the following instructions for Windows, Mac and Linux will help.
metadata <- read_delim("datasets/prepro/metadata.csv", ";", locale = locale(encoding = "UTF-8"))Now we want to enrich the weather data set with information from metadata. However, we are only interested in the station abbreviation, the name, the x/y coordinates and the sea level. Select these columns.
metadata <- metadata[, c("stn", "Name", "x", "y", "Meereshoehe")]Now the metadata can be connected to the weather data set. Which join should we use to do this? And, which attribute can we join?
Use the join options in dplyr (help via? dplyr::join) to connect the weather data set and the metadata.
library("dplyr")
weather <- left_join(weather, metadata, by = "stn")
# Join type: Left-Join on 'weather', as we are only interested in the stations in the 'weather' dataset.
# Attribute: "stn"Create a new month column (from time). To do this, use the lubridate::month() function.
library("lubridate")
weather$month <- month(weather$time)Use the month column to calculate the average temperature per month.
mean(weather$tre200h0[weather$month == 1])
## [1] -1.963239
mean(weather$tre200h0[weather$month == 2])
## [1] 0.3552632
mean(weather$tre200h0[weather$month == 3])
## [1] 2.965054
# etc. for all 12 months