visualising big data (in r) - meetupfiles.meetup.com/1406240/bigvis.pdfhadley wickham @hadleywickham...

62
Hadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29, 13

Upload: doanque

Post on 11-Apr-2018

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 2: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Motivation

Monday, April 29, 13

Page 3: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Data

• Every US commercial domestic flight 2001-2011: ~76 million flights

• >100 variables. I’ll focus on 3: delay, distance, flight time and speed.

• (Total database: ~11 Gb)

Monday, April 29, 13

Page 4: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

library(ggplot2)library(bigvis)

# Can't use data frames :(dist <- readRDS("dist.rds")delay <- readRDS("delay.rds")time <- readRDS("time.rds")speed <- dist / time * 60

# There's always bad datatime[time < 0] <- NAspeed[speed < 0] <- NAspeed[speed > 761.2] <- NA

Monday, April 29, 13

Page 5: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

qplot(dist, speed, colour = delay) + scale_colour_gradient2()Monday, April 29, 13

Page 6: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

qplot(dist, speed, colour = delay) + scale_colour_gradient2()

One hour later...

Monday, April 29, 13

Page 7: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

x <- runif(2e5)y <- runif(2e5)system.time(plot(x, y))Monday, April 29, 13

Page 8: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Monday, April 29, 13

Page 9: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Motivating principles

• Support exploratory analysis (e.g. in R)

• Efficient

• 1d: 3,000; 2d: 3,000,000

• Fast on commodity hardware

• 100,000,000 in <5s

• 108 obs = 0.8 Gb, ~20 vars in 16 Gb

Monday, April 29, 13

Page 10: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Process

• Condense (bin & summarise)

• Smooth

• Visualise

Monday, April 29, 13

Page 11: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Related work• W. Härdle and D. Scott. Smoothing in low and

high dimensions by weighted averaging using rounded points. Computational Statistics, 7:97–128, 1992.

• J. Fan and J. S. Marron. Fast implementations of nonparametric curve estimators. Journal of Computational and Graphical Statistics, 3 (1):35–56, 1994.

• M. Wand. Fast computation of multivariate kernel estimators. Journal of Computational and Graphical Statistics, 3 (4):433–445, 1994.

Monday, April 29, 13

Page 12: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Condense

Monday, April 29, 13

Page 13: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

�x� origin

width

⌫Bin

Monday, April 29, 13

Page 14: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Fixed bins

• V. fast to compute.

• Not obviously worse for density estimation

• No automatic bin width estimation: err on the side of too many & fix up later

Monday, April 29, 13

Page 15: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Count

Mean

Std. dev.

Quantiles

Histogram, KDE

Regression, Loess

Boxplots, Quantile regression smoothing

Summarise

Monday, April 29, 13

Page 16: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

0

500000

1000000

1500000

0 1000 2000 3000 4000 5000dist

.count

dist_s <- condense(bin(dist, 10))autoplot(dist_s)Monday, April 29, 13

Page 17: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

NA

0

500000

1000000

1500000

0 1000 2000 3000time

.count

time_s <- condense(bin(time, 1))autoplot(time_s)Monday, April 29, 13

Page 18: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

0

250000

500000

750000

0 250 500 750 1000time

.count

autoplot(time_s, na.rm = TRUE)Monday, April 29, 13

Page 19: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

0

250000

500000

750000

0 100 200 300 400 500time

.count

autoplot(time_s[time_s < 500, ])Monday, April 29, 13

Page 20: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

0

500000

1000000

1500000

0 20 40 60time

.count

autoplot(time_s %% 60)Monday, April 29, 13

Page 21: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Algebraic 1 value

Distributive m values

Holistic f(n) values

Advantage of algebraic & distributive is that they can be re-aggregated which makes them trivially parallelisable & re-binnable

Monday, April 29, 13

Page 22: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Algebraic sum, count, min, max

Distributive mean, sd

Holistic quantile, cardinality

Advantage of algebraic & distributive is that they can be re-aggregated which makes them trivially parallelisable & re-binnable

Monday, April 29, 13

Page 23: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

●●

●●

●●●

●●●

●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●●●●

●●●●●●●

●●●● ●●

●●●

●●

●●

●●

200

400

600

0 1000 2000 3000 4000 5000dist

speed

1e+00

1e+02

1e+04

1e+06.count

Monday, April 29, 13

Page 24: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

●●

●●

●●●

●●●

●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●●●●

●●●●●●●

●●●● ●●

●●●

●●

●●

●●

200

400

600

0 1000 2000 3000 4000 5000dist

speed

1e+00

1e+02

1e+04

1e+06.count

sd1 <- condense(bin(dist, 10), z = speed)autoplot(sd1) + ylab("speed")Monday, April 29, 13

Page 25: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

0

200

400

600

800

0 1000 2000 3000 4000 5000dist

speed

0e+001e+052e+053e+054e+055e+056e+05

.count

Monday, April 29, 13

Page 26: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

0

200

400

600

800

0 1000 2000 3000 4000 5000dist

speed

0e+001e+052e+053e+054e+055e+056e+05

.count

sd2 <- condense(bin(dist, 20), bin(speed, 20))autoplot(sd2)Monday, April 29, 13

Page 27: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Smooth

Monday, April 29, 13

Page 28: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Why smooth?

• Fix over binning

• Dampen effect of outliers

• Focus on main trends

Monday, April 29, 13

Page 29: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

0

500000

1000000

1500000

0 1000 2000 3000 4000 5000dist

.count

dist_s <- condense(bin(dist, 10))autoplot(dist_s)Monday, April 29, 13

Page 30: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Demoshiny::runApp("smooth/", 8000)

Monday, April 29, 13

Page 31: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

● ●

● ●

● ●

−1.0

−0.5

0.0

0.5

1.0

0 1 2 3

BinnedMonday, April 29, 13

Page 32: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

● ●

● ●

● ●

−1.0

−0.5

0.0

0.5

1.0

0 1 2 3

RunningMonday, April 29, 13

Page 33: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

● ●

● ●

● ●

−1.0

−0.5

0.0

0.5

1.0

0 1 2 3

Kernel meanK(x) =

�1� |x|3

�2I|x|<1

Monday, April 29, 13

Page 34: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

● ●

● ●

● ●

−1.0

−0.5

0.0

0.5

1.0

0 1 2 3

Kernel regressionMonday, April 29, 13

Page 35: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

● ●

● ●

● ●

−1.0

−0.5

0.0

0.5

1.0

0 1 2 3

Kernel robust regressionMonday, April 29, 13

Page 36: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Locally constant(Nadaraya-Watson,

kernel mean)

Convolution (=fast)

Locally linear(Kernel regression/

smooth)

Better boundary behaviour

Locally linear (robust)

(loess)

Better resistance to outliers

Monday, April 29, 13

Page 37: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Locally constant(Nadaraya-Watson,

kernel mean)

Convolution (=fast)

Locally linear(Kernel regression/

smooth)

Better boundary behaviour

Locally linear (robust)

(loess)

Better resistance to outliers

Monday, April 29, 13

Page 38: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Locally constant(Nadaraya-Watson,

kernel mean)

Convolution (=fast)

Locally linear(Kernel regression/

smooth)

Better boundary behaviour

Locally linear (robust)

(loess)

Better resistance to outliers

Monday, April 29, 13

Page 39: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

“Best” bandwidth?

• Estimate using leave-one out cross-validation of rmse

• Not “optimal” for visualisation, but a good place to start.

• Possible to compute in one pass for locally constant smooths. (May be possible for others with enough thought)

Monday, April 29, 13

Page 40: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Visualise

Monday, April 29, 13

Page 41: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Challenges

• Prepare for outliers

• Always display count

• Always display missing values

Monday, April 29, 13

Page 42: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Demoshiny::runApp("mt/", 8002)

Monday, April 29, 13

Page 43: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

RcppDirk Eddelbuettel, Romain Francois,

& JJ Allaire

Monday, April 29, 13

Page 44: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

library(Rcpp)cppFunction('int one() { return 1;}')one()

Monday, April 29, 13

Page 45: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

#include <Rcpp.h>using namespace Rcpp;

// [[Rcpp::export]]int one() { return 1;}

Generate C++ file

Monday, April 29, 13

Page 46: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

#include <Rcpp.h>

RcppExport SEXP sourceCpp_86581_one() {BEGIN_RCPP Rcpp::RNGScope __rngScope; int __result = one(); return Rcpp::wrap(__result);END_RCPP}

Expose C++ to C

Converts C++ object to R object

Monday, April 29, 13

Page 47: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

/Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB -o 'sourceCpp_36763.so' 'file5907496612f3.cpp' clang++ -I/Library/Frameworks/R.framework/Resources/include -I/Library/Frameworks/R.framework/Resources/include/x86_64 -DNDEBUG -I/usr/local/include -I"/Users/hadley/R/Rcpp/include" -fPIC -g -O2 -c file5907496612f3.cpp -o file5907496612f3.og++ -arch x86_64 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/lib -o sourceCpp_36763.so file5907496612f3.o /Users/hadley/R/Rcpp/lib/x86_64/libRcpp.a -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation

Compile C & C++ code to a DLL

Monday, April 29, 13

Page 48: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

`.sourceCpp_86581_DLLInfo` <- dyn.load('/tmp/Rtmpt0ZCNp/sourcecpp_2cf047b27139/sourceCpp_91998.so')

Dynamically link DLL

Monday, April 29, 13

Page 49: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

one <- Rcpp:::sourceCppFunction(function() {}, FALSE, `.sourceCpp_86581_DLLInfo`, 'sourceCpp_86581_one')

rm(`.sourceCpp_86581_DLLInfo`)

Connect to C function in DLL to R

Monday, April 29, 13

Page 50: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

cppFunction("int one() { return 1;}")cppFunction("int one() { return 1;}")# Doesn't need to recompile!one()

Monday, April 29, 13

Page 51: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Why C++?

• Modern, high-performance language

• Precise control over memory allocation and copying.

• Excellent built-in libraries (e.g. STL)

• With Rcpp, much easier than C/Fortran

• Not too hard to learn

Monday, April 29, 13

Page 52: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Google for: “Rcpp”

“Rcpp gallery”“Rcpp hadley”

Monday, April 29, 13

Page 53: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

ShinyJoe Chen & Winston Chang

Monday, April 29, 13

Page 54: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

library(shiny)runApp("smooth/")

Monday, April 29, 13

Page 55: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

shinyUI(pageWithSidebar( headerPanel("Smoothing"), sidebarPanel(sliderInput(inputId = "h", label = "Bandwidth (in multiples of binwidth):", min = 1, max = 20, value = 1, step = 0.1)), mainPanel(plotOutput( outputId = "plot", height = "300px"))))

ui.r

Monday, April 29, 13

Page 56: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

library(bigvis)library(ggplot2)library(plyr)

dist <- readRDS("dist.rds")dist_s <- condense(bin(dist, 10))

shinyServer(function(input, output) { output$plot <- renderPlot({ n <- as.numeric(input$h) if (n <= 1) { print(autoplot(dist_s)) } else { print(autoplot(smooth(dist_s, n * 10))) } })})

server.r

Monday, April 29, 13

Page 57: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Why shiny?

• Create web apps easily with knowing html, js, css, ...

• You describe connections between UI and data & shiny takes care of managing the updates

• (Eventually) easily deploy locally or in the cloud

Monday, April 29, 13

Page 58: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Google for: “shiny”

“shiny mailing list”“shiny tutorial”

Monday, April 29, 13

Page 59: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Conclusions

Monday, April 29, 13

Page 60: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Performance

• “Interactive” exploration of 100,000,000 observations is possible in R

• Key is use of C++ and extreme care with memory allocation/copying

• RCpp makes this v. easy

Monday, April 29, 13

Page 61: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Studio

Future work

• Multi-core + out-of-memory

• In database, where possible

• More summary statistics

Monday, April 29, 13

Page 62: Visualising big data (in R) - Meetupfiles.meetup.com/1406240/bigvis.pdfHadley Wickham @hadleywickham Chief Scientist, RStudio Visualising big data (in R) April 2013 Monday, April 29,

Google for: “bigvis”

Monday, April 29, 13