<- 10.3
x
x
[1] 10.3
class(x)
[1] "numeric"
This demo’s source code can also be downloaded as an R Script (right click → Save Target As..)
There are two different numeric
data types in R:
double
: floating-point number (e.g. 10.3, 7.3)integer
(e.g. 10, 7)A double / floating point number is assigned to a variable as follows:
<- 10.3
x
x
[1] 10.3
class(x)
[1] "numeric"
Either <-
or =
can be used. However, the latter is also easily confused with ==
.
= 7.3
y y
[1] 7.3
A number is only stored as an integer
if it is explicitly defined as one (using as.integer()
or L
).
<- 8L
d
class(d)
[1] "integer"
<- FALSE
sunny <- TRUE
dry
& dry sunny
[1] FALSE
<- 3
e <- 6
f
> f e
[1] FALSE
Character strings contain text.
<- "Andrea"
fname <- "Muster"
lname class(fname)
[1] "character"
Connecting / concatenating character strings
paste(fname, lname)
[1] "Andrea Muster"
paste(fname, lname, sep = ",")
[1] "Andrea,Muster"
In most parts of the world, we use the Gregorian Calendar to communicate a point in time. In this system, we track time as years, months, days, hours, minutes and seconds after a specific event (Anno Domini, “in the year of the Lord”).
R, just as all other computer systems, do not store date / time information using years, months days etc. Instead, R stores the number of seconds after a given date (January 1st, 1970, which is also called unix epoch). This information is stored using the class POSIXct
, which also helps us convert this number of seconds into more human readable information. On 01.02.2024 at 13:45, 1’706’791’500 have passed since the unix epoch, so to store this timestamp, R stores the number 1’706’791’500.
# We may have a timestamp saved as a character string
<- "2024-02-01 13:45:00"
today_txt
# as.POSIXct converts the string to POSIXct:
<- as.POSIXct(today_txt)
today_posixct
# When printing a posixct date to the console, it is human readable
today_posixct
[1] "2024-02-01 13:45:00 CET"
# To see the internally stored value (# of seconds), convert it to numeric:
as.numeric(today_posixct)
[1] 1706791500
If the character string is delivered in the above format (year-month-day hour:minute:second
), as.POSIXct knows how to caluate the number of seconds since unix epoch. However, if the format is different, we have to tell R how to read our timestamp. This requires a special syntax, which is described in ?strptime
.
<- "01.10.2017 15:15"
date_txt
# converts character to POSIXct:
as.POSIXct(date_txt)
Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format
<- as.POSIXct(date_txt, format = "%d.%m.%Y %H:%M")
date_posix
date_posix
[1] "2017-10-01 15:15:00 CEST"
Theoretically, strftime
can also be used to extract specific components from a date. However, the functions from lubridate
are much simpler and we recommend you use these. Note how strftime
always returns strings while lubridate
returns more useful datatypes such as integers or factors.
1strftime(date_posix, format = "%m")
2strftime(date_posix, format = "%b")
3strftime(date_posix, format = "%B")
## [1] "10"
## [1] "Oct"
## [1] "October"
library("lubridate")
1month(date_posix)
2month(date_posix, label = TRUE, abbr = TRUE)
3month(date_posix, label = TRUE, abbr = FALSE)
## [1] 10
## [1] Oct
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
## [1] October
## 12 Levels: January < February < March < April < May < June < ... < December
Handling date / time is tricky. We recommend the following practices to make life easier:
15:45
as 15.75
) in a numeric data type.lubridate
rather than strftime()
Using c()
, a set of values of the same data type can be assigned to a variable (as a vector).
<- c(10, 20, 33, 42, 54, 66, 77)
vec vec
[1] 10 20 33 42 54 66 77
# to extract the 5th element
5] vec[
[1] 54
# to extract elements 2 to 4
2:4] vec[
[1] 20 33 42
A list
is a collection of objects that do not need to be the same data type.
<- list("q", TRUE, 3.14) mylist
The individual elements in a list can also have assigned names.
<- list(fav_letter = "q", fav_boolean = TRUE, fav_number = 3.14)
mylist2
mylist2
$fav_letter
[1] "q"
$fav_boolean
[1] TRUE
$fav_number
[1] 3.14
If each entry in a list is the same length, this list can also be represented as a table, which is called a dataframe in R.
# note how the names become column names
as.data.frame(mylist2)
fav_letter fav_boolean fav_number
1 q TRUE 3.14
The data.frame
function allows a table to be created without first having to create a list.
<- data.frame(
df City = c("Zurich", "Geneva", "Basel", "Bern", "Lausanne"),
Arrival = c(
"1.1.2017 10:10", "5.1.2017 14:45",
"8.1.2017 13:15", "17.1.2017 18:30", "22.1.2017 21:05"
)
)
str(df)
'data.frame': 5 obs. of 2 variables:
$ City : chr "Zurich" "Geneva" "Basel" "Bern" ...
$ Arrival: chr "1.1.2017 10:10" "5.1.2017 14:45" "8.1.2017 13:15" "17.1.2017 18:30" ...
The $
symbol can be used to query data:
$City df
[1] "Zurich" "Geneva" "Basel" "Bern" "Lausanne"
New columns can be added and existing ones can be changed:
$Residents <- c(400000, 200000, 175000, 14000, 130000) df
# A tibble: 5 × 3
City Arrival Residents
<chr> <chr> <dbl>
1 Zurich 1.1.2017 10:10 400000
2 Geneva 5.1.2017 14:45 200000
3 Basel 8.1.2017 13:15 175000
4 Bern 17.1.2017 18:30 14000
5 Lausanne 22.1.2017 21:05 130000
We need to convert the Arrival time to a time format (POSIXct
).
# first, test the output of the "as.POSIXct"-function
as.POSIXct(df$Arrival, format = "%d.%m.%Y %H:%M")
[1] "2017-01-01 10:10:00 CET" "2017-01-05 14:45:00 CET"
[3] "2017-01-08 13:15:00 CET" "2017-01-17 18:30:00 CET"
[5] "2017-01-22 21:05:00 CET"
# if it works, we can save the output to a new column
$Arrival_ct <- as.POSIXct(df$Arrival, format = "%d.%m.%Y %H:%M")
df
# We *could* overwrite the old column, but this is a destructive operation!
These columns can now help to create convenience variables. E.g., the arrival time can be derived from the Arrival
column.
$Arrival_day <- wday(df$Arrival_ct, label = TRUE, week_start = 1)
df
$Arrival_day df
[1] Sun Thu Sun Tue Sun
Levels: Mon < Tue < Wed < Thu < Fri < Sat < Sun