betterment engineering - bootstrapping data intelligence - agile r dashboards

36
Light-weight Dashboarding and Reporting workflows with R And the GREAT stack... Yuriy Goldman Jon Mauney

Upload: yuriy-goldman

Post on 22-May-2015

516 views

Category:

Technology


3 download

DESCRIPTION

Key points from the presentation - Bootstrap. Don’t introduce complexity into your environment until you really need it. - Leverage the skill set of your organization. If your analysts are great with R, productionize an R workflow. - Automate. Pragmatic engineering can empower your analysts while supporting your process. - Freemium Cloud, first. IaaS providers like Amazon have a free tier to help you get started. Try it before you buy it. - Use Hosted Tools and Services. There are powerful hosted tools and services out there, like Travis-CI, to help you automate your workflow. Add them to your toolkit. For more content from Betterment's engineers, please visit: https://www.betterment.com/blog/topics/engineering/. Code samples: https://github.com/ygoldman/rwizflowy

TRANSCRIPT

Page 1: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Light-weight

Dashboarding and Reporting

workflows with R

And the GREAT stack...

Yuriy GoldmanJon Mauney

Page 2: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

http://www.meetup.com/BigOnData

http://www.meetup.com/NYC-Open-Data

http://www.meetup.com/FinTech

Page 3: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

@betterment #bootstrapbi

Page 4: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Team Polaris @ Betterment

Yuriy Jon

Avi Nick Andrew

https://www.betterment.com/blog/2014/03/07/bootstrap-data-team/

Page 5: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Bootstrapping Business Intelligence

Get Here

Page 6: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Walk before you run...

Page 7: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Leverage existing skillset

Page 8: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Minimally Viable Product

Page 9: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Lean and Efficient

Page 10: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards
Page 11: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Of

Page 12: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

• GREAT Stack in Layers• Exercise a workflow for Development, Staging,

Deployment• Teamwork or Mingle - we will build an “almost

realtime” Dashboard in R

Agenda

Page 14: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Engineer TestedAnalyst Approved

Page 15: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Workflow Overview

AUTHORING

STAGING

DEPLOYING

Page 16: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Local Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

(A) Set up Git(B) Open project in R Studio

(C) Mount S3 Bucket and Symlink

(D) Test DB Connection

WiFi: BettermentGuest: guest, welcome to betterment

Page 17: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Complete environment for R development(But you knew that already)

Page 18: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Collaborative Source Code ManagementContinuous Integration hooksPost Deployment processing

Page 19: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Like Dropbox, but gives you dependable, static URLs to files you save there (images, html pages)

AWS S3

Page 20: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Access AWS S3 Bucket as a local driveSymlink /Volume/rwizflowy-bucket to /mnt/rwizflowy-bucket

ExpanDrive

Page 21: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Sample data will come from MySQL.

Page 22: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

YAML is syntax for a config file our R scripts will load at runtime. It can tell us how to connect to MySQL or where to output our plots. Any settings that can change between your Local environment and Server environment should be defined here.

Page 23: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Assembly of Dashboards or Reports happen within Google Sites. But any Wiki will do. Use whatever as long as IMG and IFrames are supported.

Google Sites

Page 24: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Local Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

(A) Set up Git(B) Open project in R Studio

(C) Mount S3 Bucket and Symlink

(D) Test DB Connection

Page 25: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Team Exercise #1: AuthoringSet up Local EnvironmentExercise a sample scriptOutput to S3

1.Get Code 3.Connect to S3 and MySQL

4.Run Code, output to S3 2.Team Name

Page 26: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Find a team captain for the Authoring Challenge!● Reconvene in 20 minutes● Take one of the samples and come up with an

original graphic● Team with the best ‘custom’ content that is web

accessible (in the s3 bucket) gets t-shirts!

https://s3.amazonaws.com/rwizflowy-bucket/${team-name}

Page 27: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

StagingAdd R output to a Wiki

https://s3.amazonaws.com/rwizflowy-bucket/${TEAMNAME}/${FILENAME.png}

Page 28: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Local Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

Page 29: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Server Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

via S3Fuse

cron scheduler

Page 30: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Integration Environment

hook pullLocal Server

build and deploy

git push

Local Server

git push

Page 31: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Continuous Integration and Deployment tool. Connects to your GitHub account and listens for changes to your Branches. We tell it what to do via our .travis.yml file (in our project). Travis can execute unit/integration tests. If all is A.OK. it can push to EC2. Awesomeness!

Travis-CI

Page 32: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

DeployingCommit to MasterSit back and enjoy the show...

Page 33: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Server Environment

YAMLMySQL

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

via S3Fuse

cron scheduler

Page 34: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards
Page 35: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

Wrap Up, Q&A

AUTHORING

STAGING

DEPLOYING

Page 36: Betterment Engineering - Bootstrapping Data Intelligence - Agile R Dashboards

https://github.com/ygoldman/rwizflowy

http://www.betterment.com/jobs

https://www.betterment.com/blog

Get a Betterment account:https://www.betterment.com/fintech