@earowang" />
class: center, middle, inverse, title-slide # The 15th time series standard
### Earo Wang
@earowang
### 13 July 2018
slides at
http://slides.earo.me/useR18
--- ## Time series standards in R ecosystem.red[<sup>1</sup>] .pull-left[ .timeline.timeline-left.purple-flirt.timeline-with-arrows[ .timeline-block[ .arrow-right[ .timeline-content[ **ts** represents regularly spaced time series using numeric time stamps. <br> `ts(data, start = 1, frequency = 1)` .timeline-date[ day 0 ]]]] .timeline-block[ .arrow-right[ .timeline-content[ **zoo** provides infrastructure for regularly and irregularly spaced time series using arbitrary classes for the time stamps. <br> `zoo(x, order.by = index(x))` .timeline-date[ 2004/02 ]]]] ] ] .pull-right[ .timeline.timeline-left.purple-flirt.timeline-with-arrows[ .timeline-block[ .arrow-right[ .timeline-content[ **xts** extends the `zoo` class but provides a mechanism to customize the object's meta-data. <br> `xts(x, order.by = index(x))` .timeline-date[ 2008/01 ]]]] .timeline-block[ .arrow-right[ .timeline-content[ **irts**, **fts**, **timeSeries**, **tis**, and etc. .timeline-date[ ... ]]]] ] .footnote[ .red[1.] [CRAN Task View: Time Series Analysis](https://cran.r-project.org/web/views/TimeSeries.html) ] ] --- background-image: url(https://imgs.xkcd.com/comics/standards.png) background-size: 70% .footnote[ .red[reference:] [XKCD on "standards"](https://xkcd.com/927/) ] --- ## Do we have too many restrictions on data? 🤔 The data structure that underlies these time series objects: `\begin{equation} \begin{bmatrix} X_{11} & X_{21} & \cdots & X_{p1} \\ X_{12} & X_{22} & \cdots & X_{p2} \\ \vdots & \vdots & \ddots & \vdots \\ X_{1T} & X_{2T} & \cdots & X_{pT} \end{bmatrix} \end{equation}` where `\(X_{jt}\)` represents series `\(j\)`, for `\(j = 1, \dots, p\)` and `\(1 \leq t \leq T\)`, in the form of a `\(T \times p\)` matrix. -- This matrix structure assumes * homogeneity * time indices implicitly inferred as attributes/meta-information It is **model-centric** rather than **data-centric**. ??? Too many defaults as if we live in an ideal data world --- background-image: url(img/tree-bg.png) background-size: 80% class: center ## Brisbane City Councils Contact Centre enquiries.red[<sup>2</sup>] .footnote[ .red[2.] data source: [Brisbane City Councils](https://www.data.brisbane.qld.gov.au/data/dataset/contact-centre-customer-enquiries) ] --- ## Brisbane City Councils Contact Centre enquiries ```r enquiry ``` ``` #> # A tibble: 110,397 x 5 #> date channel category service volume #> <date> <fct> <fct> <fct> <int> #> 1 2014-06-16 Email Built Structure Control Advert… 1 #> 2 2014-06-17 Email Built Structure Control Advert… 1 #> 3 2014-07-14 Email Built Structure Control Advert… 1 #> 4 2014-07-21 Email Built Structure Control Advert… 1 #> 5 2014-07-22 Email Built Structure Control Advert… 1 #> 6 2014-07-30 Email Built Structure Control Advert… 3 #> # ... with 1.104e+05 more rows ``` -- * heterogeneous data types * implicit missing values * nesting & crossing factors --- ## Brisbane City Councils Contact Centre enquiries
--- class: middle ## Wish list .pull-left[ .checked[ * abitrary time index class * easy to access index as an explict column, not an implict attribute * heterogeneous data types * nested and crossed structures * a unified and well-defined interface * human readable pipeline * ... ] ] .pull-right[ ![](img/checklist-min.jpg) ] --- class: inverse middle center .scale-up[<img src="img/tsibble.png" height=220px>] ## The 15th time series standard ### .orange[time series + tibble = tsibble] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - `as_tsibble()` ] .right-column[ ```r library(tsibble) enquiry_tsbl <- enquiry %>% * as_tsibble( * key = id(service | category, channel), index = date * ) enquiry_tsbl ``` ``` #> # A tsibble: 110,397 x 5 [1DAY] #> # Key: service | category, channel [204] #> date channel category service volume #> <date> <fct> <fct> <fct> <int> #> 1 2014-06-16 Email Built Structure Control Advert… 1 #> 2 2014-06-17 Email Built Structure Control Advert… 1 #> 3 2014-07-14 Email Built Structure Control Advert… 1 #> 4 2014-07-21 Email Built Structure Control Advert… 1 #> 5 2014-07-22 Email Built Structure Control Advert… 1 #> 6 2014-07-30 Email Built Structure Control Advert… 3 #> # ... with 1.104e+05 more rows ``` * **index**: an explicitly declared variable containing time indices. * **key**: uniquely identifies each unit that measurements take place on over time. ] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - `as_tsibble()` ### - `tbl_ts` ] .right-column[ ## A valid tsibble ``` *#> # A tsibble: 110,397 x 5 [1DAY] *#> # Key: service | category, channel [204] #> date channel category service volume #> <date> <fct> <fct> <fct> <int> #> 1 2014-06-16 Email Built Structure Control Advert… 1 #> 2 2014-06-17 Email Built Structure Control Advert… 1 #> 3 2014-07-14 Email Built Structure Control Advert… 1 #> 4 2014-07-21 Email Built Structure Control Advert… 1 #> 5 2014-07-22 Email Built Structure Control Advert… 1 #> 6 2014-07-30 Email Built Structure Control Advert… 3 #> # ... with 1.104e+05 more rows ``` * Given the nature of temporal ordering, a tsibble object is **sorted by its key and index from past to future**. * If data of regular time interval, it shares **a common time interval** across the units. ] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - `as_tsibble()` ### - `tbl_ts` ### - `fill_na()` ] .right-column[ ## Turn implicit missing values into explicit missing values ```r enquiry_tsbl %>% * fill_na() ``` ``` #> # A tsibble: 237,821 x 5 [1DAY] #> # Key: service | category, channel [204] #> date channel category service volume #> <date> <fct> <fct> <fct> <int> #> 1 2014-06-16 Email Built Structure Control Advert… 1 #> 2 2014-06-17 Email Built Structure Control Advert… 1 #> 3 2014-06-18 Email Built Structure Control Advert… NA #> 4 2014-06-19 Email Built Structure Control Advert… NA #> 5 2014-06-20 Email Built Structure Control Advert… NA #> 6 2014-06-21 Email Built Structure Control Advert… NA #> # ... with 2.378e+05 more rows ``` ] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - `as_tsibble()` ### - `tbl_ts` ### - `fill_na()` ] .right-column[ ## Turn implicit missing values into explicit missing values ```r enquiry_full <- enquiry_tsbl %>% * fill_na(volume = 0L) enquiry_full ``` ``` #> # A tsibble: 237,821 x 5 [1DAY] #> # Key: service | category, channel [204] #> date channel category service volume #> <date> <fct> <fct> <fct> <int> #> 1 2014-06-16 Email Built Structure Control Advert… 1 #> 2 2014-06-17 Email Built Structure Control Advert… 1 #> 3 2014-06-18 Email Built Structure Control Advert… 0 #> 4 2014-06-19 Email Built Structure Control Advert… 0 #> 5 2014-06-20 Email Built Structure Control Advert… 0 #> 6 2014-06-21 Email Built Structure Control Advert… 0 #> # ... with 2.378e+05 more rows ``` ] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - `as_tsibble()` ### - `tbl_ts` ### - `fill_na()` ### - `index_by()` ] .right-column[ ## Group time index ```r library(lubridate) enquiry_full %>% group_by(channel, category) %>% * index_by(year = year(date)) ``` ``` #> # A tsibble: 237,821 x 6 [1DAY] #> # Key: service | category, channel [204] #> # Groups: channel, category @ year [239] #> date channel category service volume year #> <date> <fct> <fct> <fct> <int> <dbl> #> 1 2014-06-16 Email Built Struc… Advertising… 1 2014 #> 2 2014-06-17 Email Built Struc… Advertising… 1 2014 #> 3 2014-06-18 Email Built Struc… Advertising… 0 2014 #> 4 2014-06-19 Email Built Struc… Advertising… 0 2014 #> 5 2014-06-20 Email Built Struc… Advertising… 0 2014 #> 6 2014-06-21 Email Built Struc… Advertising… 0 2014 #> # ... with 2.378e+05 more rows ``` ] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - `as_tsibble()` ### - `tbl_ts` ### - `fill_na()` ### - `index_by()` ### - `index_by()` + `summarise()` ] .right-column[ ## Aggregate over calendar periods ```r enquiry_year <- enquiry_full %>% group_by(channel, category) %>% * index_by(year = year(date)) %>% * summarise(annual_volume = sum(volume)) enquiry_year ``` ``` #> # A tsibble: 239 x 4 [1YEAR] #> # Key: category, channel [48] #> # Groups: channel [4] #> channel category year annual_volume #> <fct> <fct> <dbl> <int> #> 1 Email Animal Control 2014 962 #> 2 Email Animal Control 2015 2849 #> 3 Email Animal Control 2016 3159 #> 4 Email Animal Control 2017 3416 #> 5 Email Animal Control 2018 860 #> 6 Email Built Structure Control 2014 336 #> # ... with 233 more rows ``` ] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - `as_tsibble()` ### - `tbl_ts` ### - `fill_na()` ### - `index_by()` ### - `index_by()` + `summarise()` ### - viz ] .right-column[ ## Temporal change in % channel use <img src="figure/col-fill-1.svg" style="display: block; margin: auto;" /> ] --- ## Seamlessly work with tidyverse .pull-left[ * **dplyr:** - `arrange()`, `filter()`, `slice()` - `mutate()`, `transmute()`, `select()`, `rename()`, `summarise()`/`summarize()` - `*_join()` - `group_by()`, `ungroup()` * **tidyr**: - `gather()`, `spread()`, - `nest()`, `unnest()` ] -- .pull-right[ ```r enquiry_sum <- enquiry_full %>% summarise(ttl_volume = sum(volume)) enquiry_sum ``` ``` #> # A tsibble: 1,551 x 2 [1DAY] #> date ttl_volume #> <date> <int> #> 1 2014-01-01 636 #> 2 2014-01-02 2171 #> 3 2014-01-03 1968 #> 4 2014-01-04 559 #> 5 2014-01-05 489 #> 6 2014-01-06 3320 #> # ... with 1,545 more rows ``` ] --- class: inverse middle center ## A family of window functions <hr> ## A purrr-fect workflow --- ## A family of window functions .pull-left[ * `slide()`/`slide2()`/`pslide()`: sliding window with overlapping observations ] .pull-right[ ![](img/slide.gif) ] --- ## A family of window functions .pull-left[ * `slide()`/`slide2()`/`pslide()`: sliding window with overlapping observations * `tile()`/`tile2()`/`ptile()`: tiling window without overlapping observations ] .pull-right[ ![](img/slide.gif) ![](img/tile.gif) ] --- ## A family of window functions .pull-left[ * `slide()`/`slide2()`/`pslide()`: sliding window with overlapping observations * `tile()`/`tile2()`/`ptile()`: tiling window without overlapping observations * `stretch()`/`stretch2()`/`pstretch()`: fixing an initial window and expanding to include more observations ] .pull-right[ ![](img/slide.gif) ![](img/tile.gif) ![](img/stretch.gif) ] <hr> Type-stable: `slide()`/`tile()`/`stretch()` (a list)
other variants: `*_dbl()`, `*_int()`, `*_lgl()`, `*_chr()` --- .left-column[ <img src="img/tsibble.png" height=120px> ### - fixed ] .right-column[ ## Fixed window size ```r enquiry_sum %>% mutate(ma = slide_dbl(ttl_volume, mean, .size = 7)) ``` <img src="figure/slide-hide-1.svg" style="display: block; margin: auto;" /> ] --- .left-column[ <img src="img/tsibble.png" height=120px> ### - fixed ### - flexible ] .right-column[ ## Flexible calendar periods: row-oriented workflow ```r enquiry_sum %>% mutate(yrmth = yearmonth(date)) %>% nest(-yrmth) %>% mutate(ma = slide_dbl( data, ~ mean(bind_rows(.)$ttl_volume), .size = 2 )) %>% unnest(data) ``` <img src="figure/slide-month-1.svg" style="display: block; margin: auto;" /> ] --- class: inverse middle center ## One more thing ... --- background-image: url(img/tidyverse.png) background-size: 80% ## Data science workflow --- background-image: url(img/tidyverts.png) background-size: 65% ## tidyverts.org --- class: inverse middle center ### Joint work with
[Di Cook](http://dicook.org) & [Rob J Hyndman](http://robjhyndman.com) ### More on tsibble
<http://pkg.earo.me/tsibble> ### Slides created via xaringan ⚔️ <http://slides.earo.me/useR18> ### Open source
[earowang/useR18](https://github.com/earowang/useR18) ### This work is under licensed
[BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).