@earowang" />
+ - 0:00:00
Notes for current slide
Notes for next slide

The 15th time series standard

Earo Wang
@earowang

13 July 2018
slides at http://slides.earo.me/useR18

1 / 27

Time series standards in R ecosystem1

ts represents regularly spaced time series using numeric time stamps.
ts(data, start = 1, frequency = 1)

day 0

zoo provides infrastructure for regularly and irregularly spaced time series using arbitrary classes for the time stamps.
zoo(x, order.by = index(x))

2004/02

xts extends the zoo class but provides a mechanism to customize the object's meta-data.
xts(x, order.by = index(x))

2008/01

irts, fts, timeSeries, tis, and etc.

...

2 / 27
3 / 27

Do we have too many restrictions on data? 🤔

The data structure that underlies these time series objects:

[X11X21Xp1X12X22Xp2X1TX2TXpT]

where Xjt represents series j, for j=1,,p and 1tT, in the form of a T×p matrix.

4 / 27

Do we have too many restrictions on data? 🤔

The data structure that underlies these time series objects:

[X11X21Xp1X12X22Xp2X1TX2TXpT]

where Xjt represents series j, for j=1,,p and 1tT, in the form of a T×p matrix.

This matrix structure assumes

  • homogeneity
  • time indices implicitly inferred as attributes/meta-information

It is model-centric rather than data-centric.

4 / 27

Too many defaults as if we live in an ideal data world

Brisbane City Councils Contact Centre enquiries2

2. data source: Brisbane City Councils

5 / 27

Brisbane City Councils Contact Centre enquiries

enquiry
#> # A tibble: 110,397 x 5
#> date channel category service volume
#> <date> <fct> <fct> <fct> <int>
#> 1 2014-06-16 Email Built Structure Control Advert… 1
#> 2 2014-06-17 Email Built Structure Control Advert… 1
#> 3 2014-07-14 Email Built Structure Control Advert… 1
#> 4 2014-07-21 Email Built Structure Control Advert… 1
#> 5 2014-07-22 Email Built Structure Control Advert… 1
#> 6 2014-07-30 Email Built Structure Control Advert… 3
#> # ... with 1.104e+05 more rows
6 / 27

Brisbane City Councils Contact Centre enquiries

enquiry
#> # A tibble: 110,397 x 5
#> date channel category service volume
#> <date> <fct> <fct> <fct> <int>
#> 1 2014-06-16 Email Built Structure Control Advert… 1
#> 2 2014-06-17 Email Built Structure Control Advert… 1
#> 3 2014-07-14 Email Built Structure Control Advert… 1
#> 4 2014-07-21 Email Built Structure Control Advert… 1
#> 5 2014-07-22 Email Built Structure Control Advert… 1
#> 6 2014-07-30 Email Built Structure Control Advert… 3
#> # ... with 1.104e+05 more rows
  • heterogeneous data types
  • implicit missing values
  • nesting & crossing factors
6 / 27

Brisbane City Councils Contact Centre enquiries

Brisbane City CouncilsAnimal ControlBuilt Structure ControlCall Centre ServicesParking ControlPlumbing Control ServicesProperty Information ServicesRates AssessmentRates Payment ServicesRoad Network ManagementTree ManagementWaste CollectionOther
7 / 27

Wish list

  • abitrary time index class
  • easy to access index as an explict column, not an implict attribute
  • heterogeneous data types
  • nested and crossed structures
  • a unified and well-defined interface
  • human readable pipeline
  • ...

8 / 27

The 15th time series standard

time series + tibble = tsibble

9 / 27

- as_tsibble()

library(tsibble)
enquiry_tsbl <- enquiry %>%
as_tsibble(
key = id(service | category, channel), index = date
)
enquiry_tsbl
#> # A tsibble: 110,397 x 5 [1DAY]
#> # Key: service | category, channel [204]
#> date channel category service volume
#> <date> <fct> <fct> <fct> <int>
#> 1 2014-06-16 Email Built Structure Control Advert… 1
#> 2 2014-06-17 Email Built Structure Control Advert… 1
#> 3 2014-07-14 Email Built Structure Control Advert… 1
#> 4 2014-07-21 Email Built Structure Control Advert… 1
#> 5 2014-07-22 Email Built Structure Control Advert… 1
#> 6 2014-07-30 Email Built Structure Control Advert… 3
#> # ... with 1.104e+05 more rows
  • index: an explicitly declared variable containing time indices.
  • key: uniquely identifies each unit that measurements take place on over time.
10 / 27

- as_tsibble()

- tbl_ts

A valid tsibble

#> # A tsibble: 110,397 x 5 [1DAY]
#> # Key: service | category, channel [204]
#> date channel category service volume
#> <date> <fct> <fct> <fct> <int>
#> 1 2014-06-16 Email Built Structure Control Advert… 1
#> 2 2014-06-17 Email Built Structure Control Advert… 1
#> 3 2014-07-14 Email Built Structure Control Advert… 1
#> 4 2014-07-21 Email Built Structure Control Advert… 1
#> 5 2014-07-22 Email Built Structure Control Advert… 1
#> 6 2014-07-30 Email Built Structure Control Advert… 3
#> # ... with 1.104e+05 more rows
  • Given the nature of temporal ordering, a tsibble object is sorted by its key and index from past to future.
  • If data of regular time interval, it shares a common time interval across the units.
11 / 27

- as_tsibble()

- tbl_ts

- fill_na()

Turn implicit missing values into explicit missing values

enquiry_tsbl %>%
fill_na()
#> # A tsibble: 237,821 x 5 [1DAY]
#> # Key: service | category, channel [204]
#> date channel category service volume
#> <date> <fct> <fct> <fct> <int>
#> 1 2014-06-16 Email Built Structure Control Advert… 1
#> 2 2014-06-17 Email Built Structure Control Advert… 1
#> 3 2014-06-18 Email Built Structure Control Advert… NA
#> 4 2014-06-19 Email Built Structure Control Advert… NA
#> 5 2014-06-20 Email Built Structure Control Advert… NA
#> 6 2014-06-21 Email Built Structure Control Advert… NA
#> # ... with 2.378e+05 more rows
12 / 27

- as_tsibble()

- tbl_ts

- fill_na()

Turn implicit missing values into explicit missing values

enquiry_full <- enquiry_tsbl %>%
fill_na(volume = 0L)
enquiry_full
#> # A tsibble: 237,821 x 5 [1DAY]
#> # Key: service | category, channel [204]
#> date channel category service volume
#> <date> <fct> <fct> <fct> <int>
#> 1 2014-06-16 Email Built Structure Control Advert… 1
#> 2 2014-06-17 Email Built Structure Control Advert… 1
#> 3 2014-06-18 Email Built Structure Control Advert… 0
#> 4 2014-06-19 Email Built Structure Control Advert… 0
#> 5 2014-06-20 Email Built Structure Control Advert… 0
#> 6 2014-06-21 Email Built Structure Control Advert… 0
#> # ... with 2.378e+05 more rows
13 / 27

- as_tsibble()

- tbl_ts

- fill_na()

- index_by()

Group time index

library(lubridate)
enquiry_full %>%
group_by(channel, category) %>%
index_by(year = year(date))
#> # A tsibble: 237,821 x 6 [1DAY]
#> # Key: service | category, channel [204]
#> # Groups: channel, category @ year [239]
#> date channel category service volume year
#> <date> <fct> <fct> <fct> <int> <dbl>
#> 1 2014-06-16 Email Built Struc… Advertising… 1 2014
#> 2 2014-06-17 Email Built Struc… Advertising… 1 2014
#> 3 2014-06-18 Email Built Struc… Advertising… 0 2014
#> 4 2014-06-19 Email Built Struc… Advertising… 0 2014
#> 5 2014-06-20 Email Built Struc… Advertising… 0 2014
#> 6 2014-06-21 Email Built Struc… Advertising… 0 2014
#> # ... with 2.378e+05 more rows
14 / 27

- as_tsibble()

- tbl_ts

- fill_na()

- index_by()

- index_by() + summarise()

Aggregate over calendar periods

enquiry_year <- enquiry_full %>%
group_by(channel, category) %>%
index_by(year = year(date)) %>%
summarise(annual_volume = sum(volume))
enquiry_year
#> # A tsibble: 239 x 4 [1YEAR]
#> # Key: category, channel [48]
#> # Groups: channel [4]
#> channel category year annual_volume
#> <fct> <fct> <dbl> <int>
#> 1 Email Animal Control 2014 962
#> 2 Email Animal Control 2015 2849
#> 3 Email Animal Control 2016 3159
#> 4 Email Animal Control 2017 3416
#> 5 Email Animal Control 2018 860
#> 6 Email Built Structure Control 2014 336
#> # ... with 233 more rows
15 / 27

- as_tsibble()

- tbl_ts

- fill_na()

- index_by()

- index_by() + summarise()

- viz

Temporal change in % channel use

16 / 27

Seamlessly work with tidyverse

  • dplyr:
    • arrange(), filter(), slice()
    • mutate(), transmute(), select(), rename(), summarise()/summarize()
    • *_join()
    • group_by(), ungroup()
  • tidyr:
    • gather(), spread(),
    • nest(), unnest()
17 / 27

Seamlessly work with tidyverse

  • dplyr:
    • arrange(), filter(), slice()
    • mutate(), transmute(), select(), rename(), summarise()/summarize()
    • *_join()
    • group_by(), ungroup()
  • tidyr:
    • gather(), spread(),
    • nest(), unnest()
enquiry_sum <- enquiry_full %>%
summarise(ttl_volume = sum(volume))
enquiry_sum
#> # A tsibble: 1,551 x 2 [1DAY]
#> date ttl_volume
#> <date> <int>
#> 1 2014-01-01 636
#> 2 2014-01-02 2171
#> 3 2014-01-03 1968
#> 4 2014-01-04 559
#> 5 2014-01-05 489
#> 6 2014-01-06 3320
#> # ... with 1,545 more rows
17 / 27

A family of window functions


A purrr-fect workflow

18 / 27

A family of window functions

  • slide()/slide2()/pslide(): sliding window with overlapping observations

19 / 27

A family of window functions

  • slide()/slide2()/pslide(): sliding window with overlapping observations
  • tile()/tile2()/ptile(): tiling window without overlapping observations

20 / 27

A family of window functions

  • slide()/slide2()/pslide(): sliding window with overlapping observations
  • tile()/tile2()/ptile(): tiling window without overlapping observations
  • stretch()/stretch2()/pstretch(): fixing an initial window and expanding to include more observations


Type-stable: slide()/tile()/stretch() (a list) other variants: *_dbl(), *_int(), *_lgl(), *_chr()

21 / 27

- fixed

Fixed window size

enquiry_sum %>%
mutate(ma = slide_dbl(ttl_volume, mean, .size = 7))

22 / 27

- fixed

- flexible

Flexible calendar periods: row-oriented workflow

enquiry_sum %>%
mutate(yrmth = yearmonth(date)) %>%
nest(-yrmth) %>%
mutate(ma = slide_dbl(
data, ~ mean(bind_rows(.)$ttl_volume), .size = 2
)) %>%
unnest(data)

23 / 27

One more thing ...

24 / 27

Data science workflow

25 / 27

tidyverts.org

26 / 27

Joint work with Di Cook & Rob J Hyndman

More on tsibble http://pkg.earo.me/tsibble

Slides created via xaringan ⚔️ http://slides.earo.me/useR18

Open source earowang/useR18

This work is under licensed BY-NC 4.0.

27 / 27

Time series standards in R ecosystem1

ts represents regularly spaced time series using numeric time stamps.
ts(data, start = 1, frequency = 1)

day 0

zoo provides infrastructure for regularly and irregularly spaced time series using arbitrary classes for the time stamps.
zoo(x, order.by = index(x))

2004/02

xts extends the zoo class but provides a mechanism to customize the object's meta-data.
xts(x, order.by = index(x))

2008/01

irts, fts, timeSeries, tis, and etc.

...

2 / 27
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow