+ - 0:00:00
Notes for current slide
Notes for next slide

Analysing sub-daily time series data

Earo Wang

Oct 12, 2017

Slides on http://bit.ly/subdaily-vis

1 / 35

Pedestrian counts πŸšΆβ€β™€οΈ

- sensors

library(sugrrants)
library(tidyverse)
library(ggmap)
sensor_loc <- rwalkr::pull_sensor()
qmplot(x = Longitude, y = Latitude, data = sensor_loc,
colour = I("#d95f02"), size = I(4))

2 / 35

Pedestrian counts πŸšΆβ€β™€οΈ

- sensors

sensors <- c("State Library", "Flagstaff Station",
"Flinders Street Station Underpass")
sensor_loc %>%
mutate(Selected = ifelse(Sensor %in% sensors, TRUE, FALSE)) %>%
qmplot(
x = Longitude, y = Latitude, data = .,
colour = Selected, shape = Selected, size = I(4)
) +
scale_colour_brewer(palette = "Dark2")

3 / 35

Pedestrian counts πŸšΆβ€β™€οΈ

- sensors

- the data

pedestrian <- as_tibble(rwalkr::run_melb(year = 2016))
pedestrian
#> # A tibble: 377,712 x 5
#> Sensor Date_Time Date
#> <chr> <dttm> <date>
#> 1 Chinatown-Lt Bourke St (South) 2016-01-01 2016-01-01
#> 2 Waterfront City 2016-01-01 2016-01-01
#> 3 Lygon St (East) 2016-01-01 2016-01-01
#> 4 Town Hall (West) 2016-01-01 2016-01-01
#> 5 Monash Rd-Swanston St (West) 2016-01-01 2016-01-01
#> 6 Collins Place (South) 2016-01-01 2016-01-01
#> 7 Spencer St-Collins St (North) 2016-01-01 2016-01-01
#> 8 Flinders Street Station Underpass 2016-01-01 2016-01-01
#> 9 Birrarung Marr 2016-01-01 2016-01-01
#> 10 QV Market-Elizabeth St (West) 2016-01-01 2016-01-01
#> # ... with 377,702 more rows, and 2 more variables:
#> # Time <int>, Count <int>
subdat <- pedestrian %>%
filter(Sensor %in% sensors) %>%
mutate(Day = wday2(Date, label = TRUE))
4 / 35

Conventional displays

- time series plot

5 / 35

Conventional displays

- time series plot

- faceted display

6 / 35

sugrrants

calendar-based visualisation

7 / 35
8 / 35
9 / 35

Calendar-based vis

- rearrange

The frame_calendar() function

flinders <- subdat %>%
filter(Sensor == "Flinders Street Station Underpass") %>%
mutate(
Holiday = ifelse(Date %in% au_holiday(2016)$date,
TRUE, FALSE)
)
flinders_cal <- flinders %>%
frame_calendar(x = Time, y = Count, date = Date)
flinders_cal
#> # A tibble: 8,784 x 9
#> Sensor Date_Time
#> * <chr> <dttm>
#> 1 Flinders Street Station Underpass 2016-01-01 00:00:00
#> 2 Flinders Street Station Underpass 2016-01-01 01:00:00
#> 3 Flinders Street Station Underpass 2016-01-01 02:00:00
#> 4 Flinders Street Station Underpass 2016-01-01 03:00:00
#> 5 Flinders Street Station Underpass 2016-01-01 04:00:00
#> 6 Flinders Street Station Underpass 2016-01-01 05:00:00
#> 7 Flinders Street Station Underpass 2016-01-01 06:00:00
#> 8 Flinders Street Station Underpass 2016-01-01 07:00:00
#> 9 Flinders Street Station Underpass 2016-01-01 08:00:00
#> 10 Flinders Street Station Underpass 2016-01-01 09:00:00
#> # ... with 8,774 more rows, and 7 more variables:
#> # Time <int>, Count <int>, Day <ord>, Holiday <lgl>,
#> # Date <date>, .Time <dbl>, .Count <dbl>
10 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

p_flinders <- flinders_cal %>%
ggplot(aes(
x = .Time, y = .Count, group = Date, colour = Holiday
)) +
geom_line() +
scale_colour_brewer(palette = "Dark2")
p_flinders

11 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

prettify(p_flinders)

12 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

The args of frame_calendar()

frame_calendar(
data, x, y, date, calendar = "monthly", dir = "h",
sunday = FALSE, nrow = NULL, ncol = NULL, polar = FALSE,
scale = "fixed", width = 0.95, height = 0.95
)
  • x, y: a unquoted (or bare) variable mapping to x and y axis.
  • date: a Date variable mapping to dates in the calendar.
  • calendar: type of calendar. "monthly", "weekly", "daily".
  • sunday: FALSE indicating to starting with Monday in a week, or TRUE for Sunday.
  • nrow, ncol: number of rows and columns defined for "monthly" calendar layout.
  • scale: "fixed", "free", "free_wday", and "free_mday".
13 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Weekly calendar

flinders_weekly <- flinders %>%
frame_calendar(
x = Time, y = Count, date = Date, calendar = "weekly"
)
p_flinders_weekly <- flinders_weekly %>%
ggplot(aes(
x = .Time, y = .Count, group = Date, colour = Holiday
)) +
geom_line() +
scale_colour_brewer(palette = "Dark2")
prettify(p_flinders_weekly)
14 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Weekly calendar

15 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Daily calendar

flinders_daily <- flinders %>%
frame_calendar(
x = Time, y = Count, date = Date, calendar = "daily"
)
p_flinders_daily <- flinders_daily %>%
ggplot(aes(
x = .Time, y = .Count, group = Date, colour = Holiday
)) +
geom_line() +
scale_colour_brewer(palette = "Dark2")
prettify(p_flinders_daily)
16 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Daily calendar

17 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Local scale when scale = "free"

# calendar plot for flinders street station using local scale
flinders_cal_free <- flinders %>%
frame_calendar(
x = Time, y = Count, date = Date, scale = "free"
)
p_flinders_free <- flinders_cal_free %>%
ggplot(aes(x = .Time, y = .Count, group = Date)) +
geom_line()
prettify(p_flinders_free)
18 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Local scale when scale = "free"

19 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Lagged scatterplot

flinders_cal_day <- flinders %>%
mutate(Lagged_Count = lag(Count)) %>%
frame_calendar(
x = Lagged_Count, y = Count, date = Date,
width = 0.95, height = 0.8
)
p_flinders_day <- flinders_cal_day %>%
ggplot(aes(x = .Lagged_Count, y = .Count, group = Date)) +
geom_point(size = 0.7, alpha = 0.6)
prettify(p_flinders_day)
20 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Lagged scatterplot

21 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Work with group_by()

facet_cal <- subdat %>%
group_by(Sensor) %>%
frame_calendar(
x = Time, y = Count, date = Date, nrow = 2
)
p_facet <- facet_cal %>%
ggplot(aes(x = .Time, y = .Count, group = Date)) +
geom_line(aes(colour = Sensor)) +
facet_grid(
Sensor ~ .,
labeller = labeller(Sensor = label_wrap_gen(20))
) +
scale_colour_brewer(
palette = "Dark2",
guide = guide_legend(title = "Sensor")
)
prettify(p_facet, label = NULL)
22 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Work with group_by()

23 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Boxplots

# boxplots for hourly counts across all the sensors in 2016 December
pedestrian_dec <- pedestrian %>%
filter(Date >= as.Date("2016-12-01")) %>%
frame_calendar(
x = Time, y = Count, date = Date,
width = 0.97, height = 0.97
)
p_boxplot <- pedestrian_dec %>%
ggplot() +
geom_boxplot(
aes(x = .Time, y = .Count, group = Date_Time),
outlier.size = 0.8, width = 0.005,
position = "identity", colour = "grey30"
) +
geom_smooth(
aes(.Time, .Count, group = Date),
se = FALSE, method = "loess"
)
prettify(p_boxplot, label = c("label", "text", "text2"))
24 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

Boxplots

25 / 35

Calendar-based vis

- rearrange

- ggplot2 vis

- variations

- misc

  • Other languages support: ?frame_calendar
  • More examples: vignette("frame-calendar")
26 / 35

tsibble

Chinglish for time series tibble

27 / 35

tsibble

- overview

The tsibble package provides a data class of tbl_ts to manage temporal data frames in a tidy and modern way. A tsibble consists of a time index, keys and other measured variables in a data-centric format, which is built on top of the tibble.

Installation

# install.packages("devtools")
devtools::install_github("earowang/tsibble")
28 / 35

tsibble

- overview

- coercion

Start with a tibble/data.frame

pedestrian
#> # A tibble: 377,712 x 5
#> Sensor Date_Time Date
#> <chr> <dttm> <date>
#> 1 Chinatown-Lt Bourke St (South) 2016-01-01 2016-01-01
#> 2 Waterfront City 2016-01-01 2016-01-01
#> 3 Lygon St (East) 2016-01-01 2016-01-01
#> 4 Town Hall (West) 2016-01-01 2016-01-01
#> 5 Monash Rd-Swanston St (West) 2016-01-01 2016-01-01
#> 6 Collins Place (South) 2016-01-01 2016-01-01
#> 7 Spencer St-Collins St (North) 2016-01-01 2016-01-01
#> 8 Flinders Street Station Underpass 2016-01-01 2016-01-01
#> 9 Birrarung Marr 2016-01-01 2016-01-01
#> 10 QV Market-Elizabeth St (West) 2016-01-01 2016-01-01
#> # ... with 377,702 more rows, and 2 more variables:
#> # Time <int>, Count <int>
29 / 35

tsibble

- overview

- coercion

What makes a valid tsibble?

## S3 method for class 'tbl_df', 'data.frame'
as_tsibble(x, ..., index, validate = TRUE, regular = TRUE)
  • x: other objects to be coerced to a tsibble (tbl_ts).
  • ...: unquoted (or bare) variable(s) giving the key.
  • index: an unquoted (or bare) variable to specify the time index variable.
  • validate: TRUE suggests to verify that the key together with the index uniquely identifies each observation (i.e. a valid tsibble).
  • regular: regular time interval (TRUE) or irregular (FALSE).

* The key is not constrained to a single variable, but expressive for nested and crossed data structures.

30 / 35

tsibble

- overview

- coercion

Coerce to a tsibble with as_tsibble()

library(tsibble)
pedestrian %>%
as_tsibble(Sensor, index = Date_Time)
#> # A tsibble: 377,712 x 5 [1HOUR]
#> # Keys: Sensor
#> * Sensor Date_Time Date
#> <chr> <dttm> <date>
#> 1 Chinatown-Lt Bourke St (South) 2016-01-01 2016-01-01
#> 2 Waterfront City 2016-01-01 2016-01-01
#> 3 Lygon St (East) 2016-01-01 2016-01-01
#> 4 Town Hall (West) 2016-01-01 2016-01-01
#> 5 Monash Rd-Swanston St (West) 2016-01-01 2016-01-01
#> 6 Collins Place (South) 2016-01-01 2016-01-01
#> 7 Spencer St-Collins St (North) 2016-01-01 2016-01-01
#> 8 Flinders Street Station Underpass 2016-01-01 2016-01-01
#> 9 Birrarung Marr 2016-01-01 2016-01-01
#> 10 QV Market-Elizabeth St (West) 2016-01-01 2016-01-01
#> # ... with 377,702 more rows, and 2 more variables:
#> # Time <int>, Count <int>
31 / 35

tsibble

- overview

- coercion

- verbs

  • column-wise verbs *:
    • mutate(): add new variables
    • select(): select variables by name
    • summarise(): reduce multiple values down to a single value (ToDo)
  • row-wise verbs:
    • filter(): filter observations with matching conditions
    • slice(): select observations by row
    • arrange(): arrange observations by variables
  • other verbs:
    • rename(): rename variables by name
    • group_by(): group by one or more variables
  • tsibble verbs:
    • tsummarise(): aggregate over calendar periods

* these verbs have an additional argument drop = FALSE. If TRUE, a tibble is returned.

32 / 35

tsibble

- overview

- coercion

- verbs

The tsummarise() function

ped_ts <- as_tsibble(pedestrian, Sensor, index = Date_Time)
ped_ts %>%
group_by(Sensor) %>%
tsummarise(
YrMon = yearmth(Date_Time),
MinC = min(Count, na.rm = TRUE),
MaxC = max(Count, na.rm = TRUE)
)
#> # A tsibble: 516 x 4 [1MONTH]
#> # Keys: Sensor
#> # Groups: Sensor
#> Sensor YrMon MinC MaxC
#> * <chr> <mth> <dbl> <dbl>
#> 1 Alfred Place 2016 Jan 0 1067
#> 2 Alfred Place 2016 Feb 0 1099
#> 3 Alfred Place 2016 Mar 1 1161
#> 4 Alfred Place 2016 Apr 0 1107
#> 5 Alfred Place 2016 May 0 1099
#> 6 Alfred Place 2016 Jun 0 1101
#> 7 Alfred Place 2016 Jul 0 1174
#> 8 Alfred Place 2016 Aug 0 1075
#> 9 Alfred Place 2016 Sep 0 1071
#> 10 Alfred Place 2016 Oct 0 1057
#> # ... with 506 more rows
33 / 35

tsibble

- overview

- coercion

- verbs

- plans

More on the way

  • as_tsibble: tsibble for forecast (tbl_forecast)
  • fill_na: make implicit missing cases to be explicit
  • slide: rolling window calculation

Graphical support for tsibble in sugrrants

Forecast methods for tsibble in forecast, hts and fasster

34 / 35

ta!

35 / 35

Pedestrian counts πŸšΆβ€β™€οΈ

- sensors

library(sugrrants)
library(tidyverse)
library(ggmap)
sensor_loc <- rwalkr::pull_sensor()
qmplot(x = Longitude, y = Latitude, data = sensor_loc,
colour = I("#d95f02"), size = I(4))

2 / 35
Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k Go to previous slide
↓, β†’, Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow