betterment engineering - bootstrapping data intelligence - agile r dashboards

Post on 22-May-2015

516 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Key points from the presentation - Bootstrap. Don’t introduce complexity into your environment until you really need it. - Leverage the skill set of your organization. If your analysts are great with R, productionize an R workflow. - Automate. Pragmatic engineering can empower your analysts while supporting your process. - Freemium Cloud, first. IaaS providers like Amazon have a free tier to help you get started. Try it before you buy it. - Use Hosted Tools and Services. There are powerful hosted tools and services out there, like Travis-CI, to help you automate your workflow. Add them to your toolkit. For more content from Betterment's engineers, please visit: https://www.betterment.com/blog/topics/engineering/. Code samples: https://github.com/ygoldman/rwizflowy

TRANSCRIPT

Light-weight

Dashboarding and Reporting

workflows with R

And the GREAT stack...

Yuriy GoldmanJon Mauney

http://www.meetup.com/BigOnData

http://www.meetup.com/NYC-Open-Data

http://www.meetup.com/FinTech

@betterment #bootstrapbi

Team Polaris @ Betterment

Yuriy Jon

Avi Nick Andrew

https://www.betterment.com/blog/2014/03/07/bootstrap-data-team/

Bootstrapping Business Intelligence

Get Here

Walk before you run...

Leverage existing skillset

Minimally Viable Product

Lean and Efficient

Of

• GREAT Stack in Layers• Exercise a workflow for Development, Staging,

Deployment• Teamwork or Mingle - we will build an “almost

realtime” Dashboard in R

Agenda

Engineer TestedAnalyst Approved

Workflow Overview

AUTHORING

STAGING

DEPLOYING

Local Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

(A) Set up Git(B) Open project in R Studio

(C) Mount S3 Bucket and Symlink

(D) Test DB Connection

WiFi: BettermentGuest: guest, welcome to betterment

Complete environment for R development(But you knew that already)

Collaborative Source Code ManagementContinuous Integration hooksPost Deployment processing

Like Dropbox, but gives you dependable, static URLs to files you save there (images, html pages)

AWS S3

Access AWS S3 Bucket as a local driveSymlink /Volume/rwizflowy-bucket to /mnt/rwizflowy-bucket

ExpanDrive

Sample data will come from MySQL.

YAML is syntax for a config file our R scripts will load at runtime. It can tell us how to connect to MySQL or where to output our plots. Any settings that can change between your Local environment and Server environment should be defined here.

Assembly of Dashboards or Reports happen within Google Sites. But any Wiki will do. Use whatever as long as IMG and IFrames are supported.

Google Sites

Local Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

(A) Set up Git(B) Open project in R Studio

(C) Mount S3 Bucket and Symlink

(D) Test DB Connection

Team Exercise #1: AuthoringSet up Local EnvironmentExercise a sample scriptOutput to S3

1.Get Code 3.Connect to S3 and MySQL

4.Run Code, output to S3 2.Team Name

Find a team captain for the Authoring Challenge!● Reconvene in 20 minutes● Take one of the samples and come up with an

original graphic● Team with the best ‘custom’ content that is web

accessible (in the s3 bucket) gets t-shirts!

https://s3.amazonaws.com/rwizflowy-bucket/${team-name}

StagingAdd R output to a Wiki

https://s3.amazonaws.com/rwizflowy-bucket/${TEAMNAME}/${FILENAME.png}

Local Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

Server Environment

project

YAMLMySQL

R-scripts, system scripts,deployment-fu

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

via S3Fuse

cron scheduler

Integration Environment

hook pullLocal Server

build and deploy

git push

Local Server

git push

Continuous Integration and Deployment tool. Connects to your GitHub account and listens for changes to your Branches. We tell it what to do via our .travis.yml file (in our project). Travis can execute unit/integration tests. If all is A.OK. it can push to EC2. Awesomeness!

Travis-CI

DeployingCommit to MasterSit back and enjoy the show...

Server Environment

YAMLMySQL

network file storages3::rwizflowy-bucket

/mnt/rwizflowy-bucket

via S3Fuse

cron scheduler

Wrap Up, Q&A

AUTHORING

STAGING

DEPLOYING

https://github.com/ygoldman/rwizflowy

http://www.betterment.com/jobs

https://www.betterment.com/blog

Get a Betterment account:https://www.betterment.com/fintech

top related