Lemme introduce myself a little bit. I'm currently doing my PhD at Monash, working with Di and Rob. My research is about visualising temporal data.
This is what I'm going to cover for the next one hour and half. First, I'll talk
about why tidyverse
is great. tidyverse
is a collection of R packages that
center around the "tidy data" concept. We'll learn some verbs/functions that do
the data wrangling to get the tidy temporal data. As data analysts ... Second, we'll learn about the
ggplot2
that handles with plotting. You may wonder why it's called ggplot2
and what gg
means. It's grammar of graphics for plots. I'll talk about how the
ggplot2
functions powered by the grammar help us to visualise time series data.
In the final bit, I'm going to show you some demos about interactive graphics using
plotly
and shiny
.
tidyverse
? Tidy datagg
in ggplot2
stand for? Grammar of graphicsggplotly()
, plot_ly
, animationwanderer4melb
Lemme introduce myself a little bit. I'm currently doing my PhD at Monash, working with Di and Rob. My research is about visualising temporal data.
This is what I'm going to cover for the next one hour and half. First, I'll talk
about why tidyverse
is great. tidyverse
is a collection of R packages that
center around the "tidy data" concept. We'll learn some verbs/functions that do
the data wrangling to get the tidy temporal data. As data analysts ... Second, we'll learn about the
ggplot2
that handles with plotting. You may wonder why it's called ggplot2
and what gg
means. It's grammar of graphics for plots. I'll talk about how the
ggplot2
functions powered by the grammar help us to visualise time series data.
In the final bit, I'm going to show you some demos about interactive graphics using
plotly
and shiny
.
tidyverse
: a collection of R packages surrounding "tidy data"stringr
: handle string manipulationforcats
: handle categorical variableslubridate
: lubricate date-times processplotly
: create web-based visualisationshiny
: build interactive web applicationsknitr
: provide tools for dynamic report generationdevtools
: help with R packages developmentHere's a list of R packages that we're going to use for this part. As I said
before, tidyverse
is a set of packages including ggplot2
, dplyr
, readr
etc. stringr
for strings and forcats
for categorical variables. lubridate
is for making dealing with dates easier. plotly
and shiny
for interactive
graphics on the web.
Okay. Now, let's start with tidy data.
What is tidy data? What makes a dataset tidy?
1.2.3
The data structure is a rectangular cases-by-variables data layout that underlines
the tidyverse
.
I'll use three datasets to explain what tidy data actually mean. How important is the tidy data for further data analysis and vis.
ped_loc <- read_csv("data/sensor_locations.csv")ped_loc %>% select( `Sensor ID`, `Sensor Description`, Longitude, Latitude )
#> # A tibble: 43 x 4#> `Sensor ID` `Sensor Description` Longitude#> <int> <chr> <dbl>#> 1 22 Flinders St-Elizabeth St (East) 144.9651#> 2 34 Flinders St-Spark La 144.9742#> 3 11 Waterfront City 144.9396#> 4 8 Webb Bridge 144.9472#> 5 7 Birrarung Marr 144.9714#> 6 13 Flagstaff Station 144.9566#> 7 15 State Library 144.9645#> 8 27 QV Market-Peel St 144.9566#> 9 12 New Quay 144.9429#> 10 24 Spencer St-Collins St (North) 144.9545#> # ... with 33 more rows, and 1 more variables:#> # Latitude <dbl>
* source: the city of Melbourne
The first dataset we're going to look at is ...
The dataset is sourced from Melbourne Open Data Portal. You can actually click here to check out the web page that hosts the dataset. Since 2009, the city of Melb started to install some sensors that capture the foot traffic every hour. This data can be used for urban planning or business management. For example, if you're a cafe owner, by looking at the hourly traffic to decide the trading hours. Until today, there are 43 sensors have been installed across the city.
Here, I read the sensor locations data into R using read_csv
from readr
pkg.
Rob has talked about the pipe operator. I pass the data to the select
function
and select four colnames, that is ...
As I know sensor's long and lat, I could plot these sensors on the map of Melb. It clearly shows the locations of these sensors.
ped_2017 <- read_csv("data/pedestrian_03_2017.csv")ped_2017
#> # A tibble: 744 x 45#> Date Hour `State Library` `Collins Place (South)`#> <chr> <int> <int> <int>#> 1 01/03/2017 0 140 36#> 2 01/03/2017 1 64 17#> 3 01/03/2017 2 29 11#> 4 01/03/2017 3 13 9#> 5 01/03/2017 4 13 10#> 6 01/03/2017 5 31 84#> 7 01/03/2017 6 92 252#> 8 01/03/2017 7 327 767#> 9 01/03/2017 8 908 1997#> 10 01/03/2017 9 775 1319#> # ... with 734 more rows, and 41 more variables: `Collins#> # Place (North)` <int>, `Flagstaff Station` <int>,#> # `Melbourne Central` <int>, `Town Hall (West)` <int>,#> # `Bourke Street Mall (North)` <int>, `Bourke Street Mall#> # (South)` <int>, `Australia on Collins` <int>, `Southern#> # Cross Station` <int>, `Victoria Point` <int>, `New#> # Quay` <int>, `Waterfront City` <int>, `Webb#> # Bridge` <int>, `Princes Bridge` <int>, `Flinders St#> # Station Underpass` <int>, `Sandridge Bridge` <int>,#> # `Birrarung Marr` <int>, `QV Market-Elizabeth#> # (West)` <int>, `Flinders St-Elizabeth St (East)` <int>,#> # `Spencer St-Collins St (North)` <int>, `Spencer#> # St-Collins St (South)` <int>, `Bourke St-Russell St#> # (West)` <int>, `Convention/Exhibition Centre` <int>,#> # `Chinatown-Swanston St (North)` <int>, `Chinatown-Lt#> # Bourke St (South)` <int>, `QV Market-Peel St` <int>,#> # `Vic Arts Centre` <int>, `Lonsdale St (South)` <int>,#> # `Lygon St (West)` <int>, `Flinders St-Spring St#> # (West)` <int>, `Flinders St-Spark Lane` <int>, `Alfred#> # Place` <int>, `Queen Street (West)` <int>, `Lygon#> # Street (East)` <int>, `Flinders St-Swanston St#> # (West)` <int>, `Spring St-Lonsdale St (South)` <int>,#> # `City Square` <int>, `St. Kilda-Alexandra#> # Gardens` <int>, `Grattan St-Swanston St (West)` <int>,#> # `Monash Rd-Swanston St (West)` <int>, `Tin#> # Alley-Swanston St (West)` <int>, Southbank <int>
Besides the locations, we're more interested in learning about the hourly pedestrian counts at every sensor. I read the second csv file that contains the pedestrian counts in March.
For this data, there are 744 obs and 45 columns. ...
Date is read into as character ...
Lemme me refer to this kind of format as wide format by contrast to long form. I'll explain what I mean by wide and long later.
The remedy is converting the wide format to the long data.
The top data table is what we have, wide
gather the headers to one key variable, and the counts to the value variable.
By having more than 40 columns, we have four vars to work with. Looking at this long form, what are the variables become clearer compared to the wide format. Each variable forms a column.
from tidyr
organise the same data in two different ways. To be consistent, tidy long data used.
ped_long <- ped_2017 %>% gather( key = Sensor_Name, value = Counts, `State Library`:Southbank ) %>% mutate( Date_Time = dmy_hms(paste(Date, Hour, "00:00")), Date = dmy(Date) )ped_long
#> # A tibble: 31,992 x 5#> Date Hour Sensor_Name Counts#> <date> <int> <chr> <int>#> 1 2017-03-01 0 State Library 140#> 2 2017-03-01 1 State Library 64#> 3 2017-03-01 2 State Library 29#> 4 2017-03-01 3 State Library 13#> 5 2017-03-01 4 State Library 13#> 6 2017-03-01 5 State Library 31#> 7 2017-03-01 6 State Library 92#> 8 2017-03-01 7 State Library 327#> 9 2017-03-01 8 State Library 908#> 10 2017-03-01 9 State Library 775#> # ... with 31,982 more rows, and 1 more variables:#> # Date_Time <dttm>
Any questions so far?
otway_weather <- read_csv("data/weather_2016.csv")head(otway_weather)
#> # A tibble: 6 x 35#> ID YEAR MONTH ELEMENT VALUE1 VALUE2 VALUE3#> <chr> <int> <chr> <chr> <int> <int> <int>#> 1 ASN00090015 2016 01 TMAX 209 195 193#> 2 ASN00090015 2016 01 TMIN 175 145 162#> 3 ASN00090015 2016 01 PRCP 0 0 0#> 4 ASN00090015 2016 01 TAVG 166 174 175#> 5 ASN00090015 2016 02 TMAX 217 239 185#> 6 ASN00090015 2016 02 TMIN 120 146 149#> # ... with 28 more variables: VALUE4 <int>, VALUE5 <int>,#> # VALUE6 <int>, VALUE7 <int>, VALUE8 <int>, VALUE9 <int>,#> # VALUE10 <int>, VALUE11 <int>, VALUE12 <int>,#> # VALUE13 <int>, VALUE14 <int>, VALUE15 <int>,#> # VALUE16 <int>, VALUE17 <int>, VALUE18 <int>,#> # VALUE19 <int>, VALUE20 <int>, VALUE21 <int>,#> # VALUE22 <int>, VALUE23 <int>, VALUE24 <int>,#> # VALUE25 <int>, VALUE26 <int>, VALUE27 <int>,#> # VALUE28 <int>, VALUE29 <int>, VALUE30 <int>,#> # VALUE31 <int>
* source: global historical climatology network
otway_weather %>% gather(DAY, VALUE, VALUE1:VALUE31)
#> # A tibble: 1,488 x 6#> ID YEAR MONTH ELEMENT DAY VALUE#> <chr> <int> <chr> <chr> <chr> <int>#> 1 ASN00090015 2016 01 TMAX VALUE1 209#> 2 ASN00090015 2016 01 TMIN VALUE1 175#> 3 ASN00090015 2016 01 PRCP VALUE1 0#> 4 ASN00090015 2016 01 TAVG VALUE1 166#> 5 ASN00090015 2016 02 TMAX VALUE1 217#> 6 ASN00090015 2016 02 TMIN VALUE1 120#> 7 ASN00090015 2016 02 PRCP VALUE1 2#> 8 ASN00090015 2016 02 TAVG VALUE1 187#> 9 ASN00090015 2016 03 TMAX VALUE1 243#> 10 ASN00090015 2016 03 TMIN VALUE1 172#> # ... with 1,478 more rows
otway_weather %>% gather(DAY, VALUE, VALUE1:VALUE31) %>% mutate( DAY = str_sub(DAY, start = 6), DATE = ymd(paste(YEAR, MONTH, DAY, sep = "-")) ) %>% arrange(DATE) %>% select(ID, DATE, ELEMENT, VALUE) %>% filter(!(is.na(DATE)))
#> # A tibble: 1,464 x 4#> ID DATE ELEMENT VALUE#> <chr> <date> <chr> <int>#> 1 ASN00090015 2016-01-01 TMAX 209#> 2 ASN00090015 2016-01-01 TMIN 175#> 3 ASN00090015 2016-01-01 PRCP 0#> 4 ASN00090015 2016-01-01 TAVG 166#> 5 ASN00090015 2016-01-02 TMAX 195#> 6 ASN00090015 2016-01-02 TMIN 145#> 7 ASN00090015 2016-01-02 PRCP 0#> 8 ASN00090015 2016-01-02 TAVG 174#> 9 ASN00090015 2016-01-03 TMAX 193#> 10 ASN00090015 2016-01-03 TMIN 162#> # ... with 1,454 more rows
otway_weather %>% gather(DAY, VALUE, VALUE1:VALUE31) %>% mutate( DAY = str_sub(DAY, start = 6), DATE = ymd(paste(YEAR, MONTH, DAY, sep = "-")) ) %>% arrange(DATE) %>% select(ID, DATE, ELEMENT, VALUE) %>% filter(!(is.na(DATE))) %>% mutate( VALUE = if_else(VALUE < -999, NA_integer_, VALUE), VALUE = VALUE / 10 ) %>% spread(ELEMENT, VALUE)
#> # A tibble: 366 x 6#> ID DATE PRCP TAVG TMAX TMIN#> * <chr> <date> <dbl> <dbl> <dbl> <dbl>#> 1 ASN00090015 2016-01-01 0 16.6 20.9 17.5#> 2 ASN00090015 2016-01-02 0 17.4 19.5 14.5#> 3 ASN00090015 2016-01-03 0 17.5 19.3 16.2#> 4 ASN00090015 2016-01-04 0 17.7 20.2 16.7#> 5 ASN00090015 2016-01-05 0 17.8 20.6 16.1#> 6 ASN00090015 2016-01-06 0 17.1 20.3 16.5#> 7 ASN00090015 2016-01-07 0 15.8 19.7 14.8#> 8 ASN00090015 2016-01-08 0 15.6 18.8 14.2#> 9 ASN00090015 2016-01-09 0 15.7 19.0 11.3#> 10 ASN00090015 2016-01-10 0 18.6 25.4 11.9#> # ... with 356 more rows
otway_tidy <- otway_weather %>% gather(DAY, VALUE, VALUE1:VALUE31) %>% mutate( DAY = str_sub(DAY, start = 6), DATE = ymd(paste(YEAR, MONTH, DAY, sep = "-")) ) %>% arrange(DATE) %>% select(ID, DATE, ELEMENT, VALUE) %>% filter(!(is.na(DATE))) %>% mutate( VALUE = if_else(VALUE < -999, NA_integer_, VALUE), VALUE = VALUE / 10 ) %>% spread(ELEMENT, VALUE) %>% mutate(NAVG = (TMAX + TMIN) / 2)head(otway_tidy)
#> # A tibble: 6 x 7#> ID DATE PRCP TAVG TMAX TMIN NAVG#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 ASN00090015 2016-01-01 0 16.6 20.9 17.5 19.20#> 2 ASN00090015 2016-01-02 0 17.4 19.5 14.5 17.00#> 3 ASN00090015 2016-01-03 0 17.5 19.3 16.2 17.75#> 4 ASN00090015 2016-01-04 0 17.7 20.2 16.7 18.45#> 5 ASN00090015 2016-01-05 0 17.8 20.6 16.1 18.35#> 6 ASN00090015 2016-01-06 0 17.1 20.3 16.5 18.40
TAVG: UTC time zone rather than local time
billboard.csv
records the date a song first entered the Billboard Top 100 in
2000 and its rank over 76 weeks.
#> # A tibble: 6 x 81#> year artist track#> <int> <chr> <chr>#> 1 2000 Backstreet Boys, The Shape Of My Heart#> 2 2000 Backstreet Boys, The Show Me The Meaning ...#> 3 2000 Backstreet Boys, The The One#> 4 2000 N'Sync Bye Bye Bye#> 5 2000 N'Sync It's Gonna Be Me#> 6 2000 N'Sync This I Promise You#> # ... with 78 more variables: time <time>,#> # date.entered <date>, `1` <int>, `2` <int>, `3` <int>,#> # `4` <int>, `5` <int>, `6` <int>, `7` <int>, `8` <int>,#> # `9` <int>, `10` <int>, `11` <int>, `12` <int>,#> # `13` <int>, `14` <int>, `15` <int>, `16` <int>,#> # `17` <int>, `18` <int>, `19` <int>, `20` <int>,#> # `21` <int>, `22` <int>, `23` <int>, `24` <int>,#> # `25` <int>, `26` <int>, `27` <chr>, `28` <chr>,#> # `29` <chr>, `30` <chr>, `31` <chr>, `32` <chr>,#> # `33` <chr>, `34` <chr>, `35` <chr>, `36` <chr>,#> # `37` <chr>, `38` <chr>, `39` <chr>, `40` <chr>,#> # `41` <chr>, `42` <chr>, `43` <chr>, `44` <chr>,#> # `45` <chr>, `46` <chr>, `47` <chr>, `48` <chr>,#> # `49` <chr>, `50` <chr>, `51` <chr>, `52` <chr>,#> # `53` <chr>, `54` <chr>, `55` <chr>, `56` <chr>,#> # `57` <chr>, `58` <chr>, `59` <chr>, `60` <chr>,#> # `61` <chr>, `62` <chr>, `63` <chr>, `64` <chr>,#> # `65` <chr>, `66` <chr>, `67` <chr>, `68` <chr>,#> # `69` <chr>, `70` <chr>, `71` <chr>, `72` <chr>,#> # `73` <chr>, `74` <chr>, `75` <chr>, `76` <chr>
song
#> # A tibble: 6 x 4#> id artist track#> <int> <chr> <chr>#> 1 1 Backstreet Boys, The Shape Of My Heart#> 2 2 Backstreet Boys, The Show Me The Meaning ...#> 3 3 Backstreet Boys, The The One#> 4 4 N'Sync Bye Bye Bye#> 5 5 N'Sync It's Gonna Be Me#> 6 6 N'Sync This I Promise You#> # ... with 1 more variables: time <time>
rank
#> # A tibble: 456 x 4#> id date.entered week rank#> <int> <date> <chr> <chr>#> 1 1 2000-10-14 1 39#> 2 2 2000-01-01 1 74#> 3 3 2000-05-27 1 58#> 4 4 2000-01-29 1 42#> 5 5 2000-05-06 1 82#> 6 6 2000-09-30 1 68#> 7 1 2000-10-14 2 25#> 8 2 2000-01-01 2 62#> 9 3 2000-05-27 2 50#> 10 4 2000-01-29 2 20#> # ... with 446 more rows
ggplot2
by Hadley Wickhamn
independent and identically distributed RVs X1,…,Xn
, the mean and
the standard deviation are defined as
ˉX=1nn∑i=1XiSn−1=1n−1n∑i=1(Xi−ˉX)2.
we're going to use some short but comprehensive vocabulary to describe different sorts of graphs.
ped_long
#> # A tibble: 31,992 x 5#> Date Hour Sensor_Name Counts#> <date> <int> <chr> <int>#> 1 2017-03-01 0 State Library 140#> 2 2017-03-01 1 State Library 64#> 3 2017-03-01 2 State Library 29#> 4 2017-03-01 3 State Library 13#> 5 2017-03-01 4 State Library 13#> 6 2017-03-01 5 State Library 31#> 7 2017-03-01 6 State Library 92#> 8 2017-03-01 7 State Library 327#> 9 2017-03-01 8 State Library 908#> 10 2017-03-01 9 State Library 775#> # ... with 31,982 more rows, and 1 more variables:#> # Date_Time <dttm>
ggplot2
data: ped_longlayer: mapping: x = Date_Time, y = Counts geom: line, pointfacet: Sensor_Name
ggplot(ped_long, aes(x = Date_Time, y = Counts)) + geom_line() + geom_point() + facet_grid(Sensor_Name ~ ., scale = "free_y")
autoplot
: against time indexggplot2
data: ped_longlayer: mapping: x = Date_Time, y = Counts, colour = Sensor_Name geom: line, pointfacet: Sensor_Name
ggplot(ped_long, aes(x = Date_Time, y = Counts)) + geom_line(aes(colour = Sensor_Name)) + geom_point(aes(colour = Sensor_Name)) + facet_grid(Sensor_Name ~ ., scale = "free_y")
ggplot2
wday <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")sx <- ped_long %>% filter(Sensor_Name == "Southern Cross Station") %>% mutate( Wday = wday(Date, label = TRUE, abbr = FALSE), Wday = if_else(Wday %in% wday, "Weekday", "Weekend"), Wday = ordered(Wday) )sx
#> # A tibble: 744 x 6#> Date Hour Sensor_Name Counts#> <date> <int> <chr> <int>#> 1 2017-03-01 0 Southern Cross Station 16#> 2 2017-03-01 1 Southern Cross Station 8#> 3 2017-03-01 2 Southern Cross Station 3#> 4 2017-03-01 3 Southern Cross Station 4#> 5 2017-03-01 4 Southern Cross Station 1#> 6 2017-03-01 5 Southern Cross Station 96#> 7 2017-03-01 6 Southern Cross Station 581#> 8 2017-03-01 7 Southern Cross Station 1847#> 9 2017-03-01 8 Southern Cross Station 3863#> 10 2017-03-01 9 Southern Cross Station 2063#> # ... with 734 more rows, and 2 more variables:#> # Date_Time <dttm>, Wday <ord>
ggplot2
data: southern-crosslayer: mapping: x = Hour, y = Counts, colour = Wday geom: line
ggplot(sx, aes(Hour, Counts, group = Date)) + geom_line(aes(colour = Wday))
ggplot2
data: southern-crosslayer: mapping: x = Hour, y = Counts geom: linefacet: Wday
ggplot(sx, aes(Hour, Counts, group = Date)) + geom_line() + facet_wrap(~ Wday, ncol = 2)
ggplot2
Expertise and Google in action
ggplot2
labour <- "Labour Day" # 2013-03-13adele <- "Adele Day" # 2017-03-18 to 19# Justin Bieber's gig 2017-03-10sx_more <- sx %>% mutate( Wday = fct_expand(Wday, labour, adele), Wday = if_else( Date == ymd("2017-03-13"), ordered(labour, levels(Wday)), Wday ), Wday = if_else( Date %in% ymd(c("2017-03-18", "2017-03-19")), ordered(adele, levels(Wday)), Wday ) ) head(sx_more)
#> # A tibble: 6 x 6#> Date Hour Sensor_Name Counts#> <date> <int> <chr> <int>#> 1 2017-03-01 0 Southern Cross Station 16#> 2 2017-03-01 1 Southern Cross Station 8#> 3 2017-03-01 2 Southern Cross Station 3#> 4 2017-03-01 3 Southern Cross Station 4#> 5 2017-03-01 4 Southern Cross Station 1#> 6 2017-03-01 5 Southern Cross Station 96#> # ... with 2 more variables: Date_Time <dttm>, Wday <ord>
transform the data
ggplot2
data: southern-cross-morelayer: mapping: x = Hour, y = Counts geom: linefacet: Wday
ggplot(sx_more, aes(Hour, Counts, group = Date)) + geom_line() + facet_wrap(~ Wday, ncol = 2)
grammar remains the same but using transformed data
I've just showed you how to use the graphics to explore the data. Notice something unexpected and use your expertise or other resources to explain and produce another graph.
Has anyone noticed there's another weired day that I haven't explained/exploited?
ggplot2
data: southern-cross-morelayer: mapping: x = Hour, y = Counts geom: linefacet: Wdaycoord: polar
ggplot(sx_more, aes(Hour, Counts, group = Date)) + geom_line() + facet_wrap(~ Wday, ncol = 2) + coord_polar()
known as rose plot
This slide also shows the advantage of using the grammar. Instead of referring to these as line and rose, the grammar informs you of the difference between the two plots is simply one linear and one polar coordinates.
polar: periodic behaviour
When you need to decide which plot you should pick up for you presentation, it should be conducted under the statistical hypothesis testing framework, known as visual inference.
Be aware of that it provides scientific tools to help with which display is more powerful.
otway_more <- otway_tidy %>% mutate( MONTH = month(DATE, label = TRUE), DAY = mday(DATE) ) otway_more
#> # A tibble: 366 x 9#> ID DATE PRCP TAVG TMAX TMIN NAVG#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 ASN00090015 2016-01-01 0 16.6 20.9 17.5 19.20#> 2 ASN00090015 2016-01-02 0 17.4 19.5 14.5 17.00#> 3 ASN00090015 2016-01-03 0 17.5 19.3 16.2 17.75#> 4 ASN00090015 2016-01-04 0 17.7 20.2 16.7 18.45#> 5 ASN00090015 2016-01-05 0 17.8 20.6 16.1 18.35#> 6 ASN00090015 2016-01-06 0 17.1 20.3 16.5 18.40#> 7 ASN00090015 2016-01-07 0 15.8 19.7 14.8 17.25#> 8 ASN00090015 2016-01-08 0 15.6 18.8 14.2 16.50#> 9 ASN00090015 2016-01-09 0 15.7 19.0 11.3 15.15#> 10 ASN00090015 2016-01-10 0 18.6 25.4 11.9 18.65#> # ... with 356 more rows, and 2 more variables:#> # MONTH <ord>, DAY <int>
Discuss with your neighbour what's the graph about and what's the grammar used?
data: otway_morelayer: 1. yintercept: year_average geom: hline 2. mapping: xmin = DAY-, xmax = DAY+, ymin = TMIN, ymax = TMAX geom: rect 3. mapping: x = DAY, y = NAVG geom: linefacet: MONTH
Let them do the lab exercise first without explaining the grammar.
70 mins get done
plotly
ggplotly
only needs one.ggplotly
.ggplotly
plot_ly
p1 <- sx %>% filter(Wday == "Weekday") %>% group_by(Date) %>% plot_ly(x = ~ Hour, y = ~ Counts) %>% add_lines()p2 <- sx %>% filter(Wday == "Weekend") %>% group_by(Date) %>% plot_ly(x = ~ Hour, y = ~ Counts) %>% add_lines()layout(subplot(p1, p2, shareY = TRUE), showlegend = FALSE)
ggplotly
shortcutplot_ly
replicate the plot.ggplotly
plot_ly
a10_df <- broom::tidy(zoo::as.zoo(fpp2::a10)) %>% mutate( year = year(index), month = month(index) )p3 <- a10_df %>% ggplot(aes(month, value)) + geom_line(aes(group = year), alpha = 0.2) + geom_line(aes(frame = year, colour = as.factor(year))) +animation_opts( ggplotly(p3), frame = 1000, easing = "elastic")
wanderer4melb
(click me) is a shiny app for
visualising Melbourne pedestrian and weather data in 2016.
# install.packages("devtools")devtools::install_github("earowang/wanderer4melb")library(wanderer4melb)launch_app()
sugrrants
🐜 is an R package (under development) that supports
graphics for analysing time series data.
devtools::install_github("earowang/sugrrants")library(sugrrants)
frame_calendar
🗓 is made available for this.If you find a bug or wanna suggest a new feature, please report/propose it on the Github page. Thanks.
frame_calendar
rearranges the data into a calendar format using linear algebra tools.
sx_cal <- sx %>% frame_calendar( x = Hour, y = Counts, date = Date, nrow = 1, ncol = 1 )sx_cal
#> # A tibble: 744 x 9#> Date Hour Sensor_Name Counts#> <date> <int> <chr> <int>#> 1 2017-03-01 0 Southern Cross Station 16#> 2 2017-03-01 1 Southern Cross Station 8#> 3 2017-03-01 2 Southern Cross Station 3#> 4 2017-03-01 3 Southern Cross Station 4#> 5 2017-03-01 4 Southern Cross Station 1#> 6 2017-03-01 5 Southern Cross Station 96#> 7 2017-03-01 6 Southern Cross Station 581#> 8 2017-03-01 7 Southern Cross Station 1847#> 9 2017-03-01 8 Southern Cross Station 3863#> 10 2017-03-01 9 Southern Cross Station 2063#> # ... with 734 more rows, and 5 more variables:#> # Date_Time <dttm>, Wday <ord>, .group_id <dbl>,#> # .x <dbl>, .y <dbl>
ggplot2
takes care of plotting a data.frame
or tibble
as usual.
p_sx <- sx_cal %>% ggplot(aes(.x, .y, group = .group_id, colour = Wday)) + geom_line()p_sx
prettify
takes a ggplot
object and then makes the calendar plot more readable.
prettify(p_sx)
ggmap
,
plotly
, stringr
, forcats
, forecast
, tidyverse
, lubridate
, broom
,
zoo
, shiny
, emo
tidyverse
? Tidy datagg
in ggplot2
stand for? Grammar of graphicsggplotly()
, plot_ly
, animationwanderer4melb
Lemme introduce myself a little bit. I'm currently doing my PhD at Monash, working with Di and Rob. My research is about visualising temporal data.
This is what I'm going to cover for the next one hour and half. First, I'll talk
about why tidyverse
is great. tidyverse
is a collection of R packages that
center around the "tidy data" concept. We'll learn some verbs/functions that do
the data wrangling to get the tidy temporal data. As data analysts ... Second, we'll learn about the
ggplot2
that handles with plotting. You may wonder why it's called ggplot2
and what gg
means. It's grammar of graphics for plots. I'll talk about how the
ggplot2
functions powered by the grammar help us to visualise time series data.
In the final bit, I'm going to show you some demos about interactive graphics using
plotly
and shiny
.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |