Sample Solution
library("readr")
<- read_delim("datasets/prepro/weather.csv", ",")
weather $stn <- as.factor(weather$stn)
weather$time <- as.POSIXct(as.character(weather$time), format = "%Y%m%d%H", tz = "UTC") weather
Read the weather data from last week weather.csv (source MeteoSchweiz) into R. Make sure that the columns are formatted correctly (stn
as a factor
, time
as POSIXct
, tre200h0
as a numeric
).
library("readr")
<- read_delim("datasets/prepro/weather.csv", ",")
weather $stn <- as.factor(weather$stn)
weather$time <- as.POSIXct(as.character(weather$time), format = "%Y%m%d%H", tz = "UTC") weather
Read in the metadata.csv dataset as a csv.
If umlauts and special characters are not displayed correctly (e.g. the è in Gèneve), this probably has something to do with the character encoding. The file is currently encoded in UTF-8
. If special characters are not correctly displayed, R has not recognised this encoding and it must be specified in the import function. How this is done depends on the import function used:
readr
: locale = locale(encoding = "UTF-8")
fileEncoding = "UTF-8"
Note: If you have a file where you do not know how a file is encoded, the following instructions for Windows, Mac and Linux will help.
<- read_delim("datasets/prepro/metadata.csv", ";", locale = locale(encoding = "UTF-8")) metadata
Now we want to enrich the weather
data set with information from metadata
. However, we are only interested in the station abbreviation, the name, the x/y coordinates and the sea level. Select these columns.
<- metadata[, c("stn", "Name", "x", "y", "Meereshoehe")] metadata
Now the metadata
can be connected to the weather
data set. Which join should we use to do this? And, which attribute can we join?
Use the join options in dplyr
(help via? dplyr::join
) to connect the weather
data set and the metadata
.
library("dplyr")
<- left_join(weather, metadata, by = "stn")
weather
# Join type: Left-Join on 'weather', as we are only interested in the stations in the 'weather' dataset.
# Attribute: "stn"
Create a new month
column (from time
). To do this, use the lubridate::month()
function.
library("lubridate")
$month <- month(weather$time) weather
Use the month
column to calculate the average temperature per month.
mean(weather$tre200h0[weather$month == 1])
## [1] -1.963239
mean(weather$tre200h0[weather$month == 2])
## [1] 0.3552632
mean(weather$tre200h0[weather$month == 3])
## [1] 2.965054
# etc. for all 12 months