Prepro 3: Exercise

Published

March 5, 2024

Task 1

You have a dataset, sensors_long.csv, with temperature values from three different sensors. Import it as a csv into R (as sensors_long).

Sample Solution
library("readr")

sensors_long <- read_delim("datasets/prepro/sensors_long.csv", ",")

Task 2

Group sensors_long according to the column name where the sensor information is contained, using the function group_by, and calculate the average temperature for each sensor (summarise). Note: Both functions are part of the dplyr package.

The output will look like this:

Sample Solution
library("dplyr")

sensors_long |>
  group_by(name) |>
  summarise(temp_mean = mean(value, na.rm = TRUE))
## # A tibble: 3 × 2
##   name    temp_mean
##   <chr>       <dbl>
## 1 sensor1      14.7
## 2 sensor2      12.0
## 3 sensor3      14.4

Task 3

Create a new convenience variable, month, for sensors_long (Tip: use the month function from lubridate). Now group by month and sensor and calculate the mean temperature.

Sample Solution
library("lubridate")

sensors_long |>
  mutate(month = month(Datetime)) |>
  group_by(month, name) |>
  summarise(temp_mean = mean(value, na.rm = TRUE))
## # A tibble: 6 × 3
## # Groups:   month [2]
##   month name    temp_mean
##   <dbl> <chr>       <dbl>
## 1    10 sensor1     14.7 
## 2    10 sensor2     12.7 
## 3    10 sensor3     14.4 
## 4    11 sensor1    NaN   
## 5    11 sensor2      8.87
## 6    11 sensor3    NaN

Task 4

Now download the weather.csv dataset (source MeteoSwiss) and import it as a .csv with the correct column types (timeasPOSIXct,tre200h0asdouble`).

Sample Solution
weather <- read_delim("datasets/prepro/weather.csv")

weather$time2 <- as.POSIXct(weather$time, format = "%Y%m%d%H", tz = "UTC")

Task 5

Now create a convenience variable for the calendar week for each measurement (lubridate::isoweek). Then calculate the average temperature value for each calendar week.

Sample Solution
weather_summary <- weather |>
  mutate(week = isoweek(time2)) |>
  group_by(week) |>
  summarise(
    temp_mean = mean(tre200h0, na.rm = TRUE)
  )

Next, you can visualise the result using the following function:

Sample Solution
plot(weather_summary$week, weather_summary$temp_mean, type = "l")

Task 6

In the previous task, we calculated the average temperature per calendar week over all years (2000 and 2001). However, if we want to compare the years with each other, we have to create the year as an additional convenience variable and group it accordingly. Try this with the weather data and then visualise the output.

Sample Solution
weather_summary2 <- weather |>
  mutate(
    week = week(time),
    year = year(time)
    ) |>
  group_by(year, week) |>
  summarise(
    temp_mean = mean(tre200h0, na.rm = TRUE)
  )
Sample Solution
plot(weather_summary2$week, weather_summary2$temp_mean, type = "l")
Figure 9.1: Base plot does not like long tables and makes a continuous line out of the two years