+ - 0:00:00
Notes for current slide
Notes for next slide
  • Nowadays, we often work with a collection of time series instead of univariate series.
  • These time series can often be organised in a hier way, e.g. US > states > counties. Here, AU > States > tourism regions
  • You collect data at the most disaggregated levels, and sum them up to higher levels.
  • A deeper hier, more data available.
  • On the other hand, temporal context itself generates loads of information.
  • TS data can be break down to different temporal frequencies.
  • This chart shows the Australian travelled to Sydney for the Holiday purpose: high in Jan
  • But what's happened to Sep-Oct 2000? Yea, 2000 Sydney Olympics

Conversations in time:

interactive visualization to explore structured temporal data

Earo Wang

Time series data is rich

1. often intrinsically hierarchical

2. different temporal frequencies

  • Nowadays, we often work with a collection of time series instead of univariate series.
  • These time series can often be organised in a hier way, e.g. US > states > counties. Here, AU > States > tourism regions
  • You collect data at the most disaggregated levels, and sum them up to higher levels.
  • A deeper hier, more data available.
  • On the other hand, temporal context itself generates loads of information.
  • TS data can be break down to different temporal frequencies.
  • This chart shows the Australian travelled to Sydney for the Holiday purpose: high in Jan
  • But what's happened to Sep-Oct 2000? Yea, 2000 Sydney Olympics

Time series data is rich

1. often intrinsically hierarchical

2. different temporal frequencies

  • Now we should look forward to 2032 Brisbane Olympics and work on forecasting the number of visits to Brisbane in 2032 using Sydney's data.
  • information are embedded into temporal and structural context.
  • Given such rich data, how should we start exploring time series?

Domestic trips in Australia šŸ‡¦šŸ‡ŗ

library(tsibbletalk)
tourism_monthly
#> # A tsibble: 80,696 x 5 [1M]
#> # Key: State, Region, Purpose [308]
#> Month State Region Purpose Trips
#> <mth> <chr> <chr> <chr> <dbl>
#> 1 1998 Mar ACT Canberra Business 111.
#> 2 1998 Apr ACT Canberra Business 93.1
#> 3 1998 May ACT Canberra Business 78.1
#> 4 1998 Jun ACT Canberra Business 44.3
#> 5 1998 Jul ACT Canberra Business 129.
#> 6 1998 Aug ACT Canberra Business 71.3
#> # … with 80,690 more rows
  • first and foremost, we need to arrange ts into a tsibble object.
  • here we unlock the domestic trips for lockdown Australia
  • What is a tsibble: a tibble represents time series data.
  • header
  • The idea of key variables is powerful: we know what each time series represents
  • The key provides a central hub for identifying and linking each series in different tables.
  • You may wonder why tsibble? bc we have a growing ecosystem for the tsibble data structure.
  • here tidyverts doesn't mean advocates for the tidyverse; instead it means tidyverse for time series.

Sad charts šŸ™

  • a tsibble is a data frame behind the scene, we can ggplot it.
  • Overplotting, not particular useful
  • upward trends for business, and strong seasonal effects for holidays

Viz šŸ“ˆ šŸ“‰ on a feature space

library(feasts)
tourism_monthly %>% features(Trips, feat_stl)
#> # A tibble: 308 Ɨ 12
#> State Region Purpose trend_strength seasonal_streng…
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 ACT Canbe… Busine… 0.316 0.332
#> 2 ACT Canbe… Holiday 0.220 0.542
#> 3 ACT Canbe… Other … 0.212 0.215
#> 4 ACT Canbe… Visiti… 0.262 0.429
#> 5 New South Wales Blue … Busine… 0.185 0.247
#> 6 New South Wales Blue … Holiday 0.426 0.471
#> # … with 302 more rows, and 7 more variables:
#> # seasonal_peak_year <dbl>, seasonal_trough_year <dbl>,
#> # spikiness <dbl>, linearity <dbl>, curvature <dbl>,
#> # stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
  • The features() computes seasonal and trend features for all time series at once.
  • Time series data collapsed to a bunch of descriptive statistics.
  • The key variables are there that we can link back to the original tsibble.

Viz šŸ“ˆ šŸ“‰ on a feature space

  • each point represents a time series.

Crosstalk between lines and points

  • a tisbble -> lines
  • the featured table -> scatterplot
  • what they share in common are the key identifiers
  • for example, we want to highlight SYD and see how they behave on the feature space or vice versa.
  • Or find which series corresponds to the highest in trend but lowest in seasonality.
  • Static vis stop us doing this, bc writing lots of code, but this is a natural desire for interactive vis.
  • When we think about interactive viz in R, oh probably build a Shiny app. but shiny requires for UI and logics in server side.
  • What about if without shiny. {crosstalk} for linking between different html widgets.

Let tsibble talk

Syntax sugar: nesting and crossing

tourism_shared <- tourism_monthly %>%
as_shared_tsibble(spec = (State / Region) * Purpose)
tourism_shared
#> # A tsibble: 80,696 x 5 [1M]
#> # Key: State, Region, Purpose [308]
#> Month State Region Purpose Trips
#> <mth> <chr> <chr> <chr> <dbl>
#> 1 1998 Mar ACT Canberra Business 111.
#> 2 1998 Apr ACT Canberra Business 93.1
#> 3 1998 May ACT Canberra Business 78.1
#> 4 1998 Jun ACT Canberra Business 44.3
#> 5 1998 Jul ACT Canberra Business 129.
#> 6 1998 Aug ACT Canberra Business 71.3
#> # … with 80,690 more rows
  • Turn a normal data frame to a mutable data object for easy interactions.
  • spec for specifying structures

hierarchy -> tree




p_l <- plotly_key_tree(tourism_shared,
height = 1100, width = 800)
p_l

Shared tsibble %>% {ggplot2} & {plotly}

p_tr <- tourism_shared %>%
ggplot(aes(x = Month, y = Trips)) +
geom_line(aes(group = Region)) +
facet_wrap(~ Purpose, scales = "free_y")
p_br <- tourism_shared %>%
features(Trips, feat_stl) %>%
ggplot(aes(x = trend_strength, y = seasonal_strength_year)) +
geom_point(aes(group = Region))
subplot(p_l,
subplot(
ggplotly(p_tr, tooltip = "Region", width = 700, height = 800),
ggplotly(p_br, tooltip = "Region", width = 700, height = 800),
nrows = 2), widths = c(.4, .6)) %>% highlight(dynamic = TRUE)

Interactive w/o shiny, enabling quick exploration

Slicing and dicing

Wrapping

- overview

Wrapping

- overview

- time of day

Wrapping

- overview

- time of day

- day of week

A shiny module

library(shiny)
p_line <- pedestrian20 %>%
ggplot(aes(x = Date_Time, y = Count, colour = Lockdown)) +
geom_line(size = .3) +
facet_wrap(~ Sensor, scales = "free_y") +
labs(x = "Date Time") +
scale_colour_brewer(palette = "Dark2") +
theme(legend.position = "none")
ui <- fluidPage(
tsibbleWrapUI("dice")
)
server <- function(input, output, session) {
tsibbleWrapServer("dice", ggplotly(p_line, height = 700), period = "1 day")
}
shinyApp(ui, server)

Acknowledgements

  • {crosstalk}
  • {plotly}
  • {shiny}

Time series data is rich

1. often intrinsically hierarchical

2. different temporal frequencies

  • Nowadays, we often work with a collection of time series instead of univariate series.
  • These time series can often be organised in a hier way, e.g. US > states > counties. Here, AU > States > tourism regions
  • You collect data at the most disaggregated levels, and sum them up to higher levels.
  • A deeper hier, more data available.
  • On the other hand, temporal context itself generates loads of information.
  • TS data can be break down to different temporal frequencies.
  • This chart shows the Australian travelled to Sydney for the Holiday purpose: high in Jan
  • But what's happened to Sep-Oct 2000? Yea, 2000 Sydney Olympics
Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k Go to previous slide
↓, →, Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow