class: center, middle, inverse, title-slide # .orange[
] Conversations in time: ## interactive visualization to explore structured temporal data ### Earo Wang --- ## Time series data is rich .pull-left[ .brown[1.] often intrinsically hierarchical .center[<img src = "img/tree.png", width = 55%></img>] ] .pull-right[ .brown[2.] different temporal frequencies <img src="figure/temp-freq-1.png" width="100%" style="display: block; margin: auto;" /> ] ??? * Nowadays, we often work with a collection of time series instead of univariate series. * These time series can often be organised in a hier way, e.g. US > states > counties. Here, AU > States > tourism regions * You collect data at the most disaggregated levels, and sum them up to higher levels. * A deeper hier, more data available. * On the other hand, temporal context itself generates loads of information. * TS data can be break down to different temporal frequencies. * This chart shows the Australian travelled to Sydney for the Holiday purpose: high in Jan * But what's happened to Sep-Oct 2000? Yea, 2000 Sydney Olympics --- ## Time series data is rich .pull-left[ .brown[1.] often intrinsically hierarchical .center[<img src = "img/tree.png", width = 55%></img>] ] .pull-right[ .brown[2.] different temporal frequencies <img src="figure/temp-freq-1.png" width="100%" style="display: block; margin: auto;" /> .center[<img src = "https://upload.wikimedia.org/wikipedia/en/thumb/8/81/2000_Summer_Olympics_logo.svg/800px-2000_Summer_Olympics_logo.svg.png", width = 30% style = "position:absolute; top: 25.5%; left: 62.5%"></img>] ] ??? * Now we should look forward to 2032 Brisbane Olympics and work on forecasting the number of visits to Brisbane in 2032 using Sydney's data. * information are embedded into temporal and structural context. * Given such rich data, how should we start exploring time series? --- ## Domestic trips in Australia š¦šŗ ```r *library(tsibbletalk) tourism_monthly ``` ``` *#> # A tsibble: 80,696 x 5 [1M] *#> # Key: State, Region, Purpose [308] #> Month State Region Purpose Trips #> <mth> <chr> <chr> <chr> <dbl> #> 1 1998 Mar ACT Canberra Business 111. #> 2 1998 Apr ACT Canberra Business 93.1 #> 3 1998 May ACT Canberra Business 78.1 #> 4 1998 Jun ACT Canberra Business 44.3 #> 5 1998 Jul ACT Canberra Business 129. #> 6 1998 Aug ACT Canberra Business 71.3 #> # ā¦ with 80,690 more rows ``` ??? * first and foremost, we need to arrange ts into a tsibble object. * here we unlock the domestic trips for lockdown Australia * What is a tsibble: a tibble represents time series data. * header * The idea of key variables is powerful: we know what each time series represents * The key provides a central hub for identifying and linking each series in different tables. --- class: center ## [tidyverts.org](http://tidyverts.org) <iframe src="http://tidyverts.org" frameborder="0" height="500" width="100%"> </iframe> ??? * You may wonder why tsibble? bc we have a growing ecosystem for the tsibble data structure. * here tidyverts doesn't mean advocates for the tidyverse; instead it means tidyverse for time series. --- ## Sad charts š <img src="figure/ggplot-1.png" width="100%" style="display: block; margin: auto;" /> ??? * a tsibble is a data frame behind the scene, we can ggplot it. * Overplotting, not particular useful * upward trends for business, and strong seasonal effects for holidays --- ## Viz š š on a feature space ```r library(feasts) tourism_monthly %>% features(Trips, feat_stl) ``` ``` #> # A tibble: 308 Ć 12 #> State Region Purpose trend_strength seasonal_strengā¦ #> <chr> <chr> <chr> <dbl> <dbl> #> 1 ACT Canbeā¦ Busineā¦ 0.316 0.332 #> 2 ACT Canbeā¦ Holiday 0.220 0.542 #> 3 ACT Canbeā¦ Other ā¦ 0.212 0.215 #> 4 ACT Canbeā¦ Visitiā¦ 0.262 0.429 #> 5 New South Wales Blue ā¦ Busineā¦ 0.185 0.247 #> 6 New South Wales Blue ā¦ Holiday 0.426 0.471 #> # ā¦ with 302 more rows, and 7 more variables: #> # seasonal_peak_year <dbl>, seasonal_trough_year <dbl>, #> # spikiness <dbl>, linearity <dbl>, curvature <dbl>, #> # stl_e_acf1 <dbl>, stl_e_acf10 <dbl> ``` ??? * The `features()` computes seasonal and trend features for all time series at once. * Time series data collapsed to a bunch of descriptive statistics. * The key variables are there that we can link back to the original tsibble. --- ## Viz š š on a feature space <img src="figure/viz-feature-1.png" width="100%" style="display: block; margin: auto;" /> ??? * each point represents a time series. --- ## .orange[<i class='far fa-comments'></i>] Crosstalk between lines and points .pull-left[ <img src="figure/p1-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="figure/p2-1.png" width="100%" style="display: block; margin: auto;" /> ] ??? * a tisbble -> lines * the featured table -> scatterplot * what they share in common are the key identifiers * for example, we want to highlight SYD and see how they behave on the feature space or vice versa. * Or find which series corresponds to the highest in trend but lowest in seasonality. * Static vis stop us doing this, bc writing lots of code, but this is a natural desire for interactive vis. * When we think about interactive viz in R, oh probably build a Shiny app. but shiny requires for UI and logics in server side. * What about if without shiny. {crosstalk} for linking between different html widgets. --- class: inverse middle ## Let tsibble talk --- ## Syntax sugar: nesting and crossing ```r tourism_shared <- tourism_monthly %>% * as_shared_tsibble(spec = (State / Region) * Purpose) tourism_shared ``` ``` #> # A tsibble: 80,696 x 5 [1M] #> # Key: State, Region, Purpose [308] #> Month State Region Purpose Trips #> <mth> <chr> <chr> <chr> <dbl> #> 1 1998 Mar ACT Canberra Business 111. #> 2 1998 Apr ACT Canberra Business 93.1 #> 3 1998 May ACT Canberra Business 78.1 #> 4 1998 Jun ACT Canberra Business 44.3 #> 5 1998 Jul ACT Canberra Business 129. #> 6 1998 Aug ACT Canberra Business 71.3 #> # ā¦ with 80,690 more rows ``` ??? * Turn a normal data frame to a mutable data object for easy interactions. * spec for specifying structures --- .pull-left[ ## hierarchy -> tree <br> <br> <br> ```r p_l <- plotly_key_tree(tourism_shared, height = 1100, width = 800) p_l ``` ] .pull-right[ .center[<img src = "img/tree.png", width = 65%, style = "box-shadow: 3px 5px 3px 1px #00000080;"></img>] ] --- ## Shared tsibble `%>%` {ggplot2} & {plotly} ```r p_tr <- tourism_shared %>% ggplot(aes(x = Month, y = Trips)) + geom_line(aes(group = Region)) + facet_wrap(~ Purpose, scales = "free_y") ``` ```r p_br <- tourism_shared %>% features(Trips, feat_stl) %>% ggplot(aes(x = trend_strength, y = seasonal_strength_year)) + geom_point(aes(group = Region)) ``` ```r subplot(p_l, subplot( ggplotly(p_tr, tooltip = "Region", width = 700, height = 800), ggplotly(p_br, tooltip = "Region", width = 700, height = 800), nrows = 2), widths = c(.4, .6)) %>% highlight(dynamic = TRUE) ``` ??? Interactive w/o shiny, enabling quick exploration --- .center[<img src = "img/tourism-linking.png", width = 55%, style = "box-shadow: 3px 5px 3px 1px #00000080;"></img>] --- class: inverse middle ## Slicing and dicing --- count: false .left-column[ ## Wrapping ### - overview ] .right-column[ <img src = "img/wrap-0.png", width = 55%, style = "position:absolute; top: 2.5%; left: 25.5%; box-shadow: 3px 5px 3px 1px #00000080;"></img> ] --- count: false .left-column[ ## Wrapping ### - overview ### - time of day ] .right-column[ <img src = "img/wrap-0.png", width = 55%, style = "position:absolute; top: 2.5%; left: 25.5%; box-shadow: 3px 5px 3px 1px #00000080;"></img> <img src = "img/wrap-1.png", width = 55%, style = "position:absolute; top: 20.5%; left: 42.5%; box-shadow: 3px 5px 3px 1px #00000080;"></img> ] --- count: false .left-column[ ## Wrapping ### - overview ### - time of day ### - day of week ] .right-column[ <img src = "img/wrap-0.png", width = 55%, style = "position:absolute; top: 2.5%; left: 25.5%; box-shadow: 3px 5px 3px 1px #00000080;"></img> <img src = "img/wrap-1.png", width = 55%, style = "position:absolute; top: 20.5%; left: 42.5%; box-shadow: 3px 5px 3px 1px #00000080;"></img> <img src = "img/wrap-7.png", width = 55%, style = "position:absolute; top: 50.5%; left: 30.5%; box-shadow: 3px 5px 3px 1px #00000080;"></img> ] --- ## A shiny module ```r library(shiny) p_line <- pedestrian20 %>% ggplot(aes(x = Date_Time, y = Count, colour = Lockdown)) + geom_line(size = .3) + facet_wrap(~ Sensor, scales = "free_y") + labs(x = "Date Time") + scale_colour_brewer(palette = "Dark2") + theme(legend.position = "none") ui <- fluidPage( * tsibbleWrapUI("dice") ) server <- function(input, output, session) { * tsibbleWrapServer("dice", ggplotly(p_line, height = 700), period = "1 day") } shinyApp(ui, server) ``` --- .center[<img src = "img/shiny-wrap.gif", width = 85%, style = "box-shadow: 3px 5px 3px 1px #00000080;"></img>] --- ## Acknowledgements * {crosstalk} * {plotly} * {shiny}