effective applications of the r language

Post on 22-Jan-2018

283 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

From Data to Decisions Makers

A Behind the Scenes Look at Building The

Most Respected Report In Cybersecurity

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

ABOUT ME

(Briefly)

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

• DBIR team manager/author (more on this in a bit)

• Former cyber risk director for a Fortune 100 insurance company

• Serial #rstats Tweeter (@hrbrmstr), blogger (rud.is/b & @ddsecblog) & regular helper on StackOverflow

• Author of and contributor to 14 CRAN packages

• Co-author of Data-Driven Security (@ddsecbook)

• Co-host of the Data-Driven Security Podcast (@ddsecpodcast)

• Die-hard ggplot2 advocate, widgeteer, heavily addicted cartographer & shameless user of the forward assignment operator ←4EVA→

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

WHAT IS THE DBIR?

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

The Verizon Data Breach

Investigations Report (DBIR)

“The Verizon Data Breach Investigations Report

(DBIR) is an annual publication that provides

analysis of information security incidents, with a

specific focus on data breaches.”

http://searchsecurity.techtarget.com/definition/Verizon-Data-Breach-Investigations-Report-DBIR

verizonenterprise.com/DBIR

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

WHO IS THE DBIR?

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Wade Baker Dave Hylender Marc Spitler Jay Jacobs

Kevin Thompson Suzanne Widup Bhaskar Karambelkar Gabriel Bassett

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

The DBIR

• Started in 2008

• Cited by virtually every other cybersecurity report

by the 3❡

• Read by individual contributors up through senior

leadership at virtually every global enterprise

• A lot of fun to work on

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

#RSAC

#DBIR

2008 2009 2010 2011 2012 2013 2014 2015

1 1 2 3 618

50

70

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

WHAT DOES THIS HAVE TO

DO WITH ?

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

200,000

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Vocabulary for

Event

Recording and

Incident

Sharing

veriscommunity.netvcdb.org

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

verisr

github.com/vz-risk/verisr

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

library(verisr)

vcdb <- json2veris(jsondir)

summary(vcdb) # too big to show

getenum(vcdb, "actor") ## enum x## 1 external 955## 2 internal 535## 3 partner 100## 4 unknown 85

getenum(vcdb, "actor", add.n=TRUE, add.freq=TRUE) ## enum x n freq## 1 external 955 1643 0.581## 2 internal 535 1643 0.326## 3 partner 100 1643 0.061## 4 unknown 85 1643 0.052

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

vz-risk.github.io/dbir/2015/19/

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

• 200m successful vulnerability exploits across 20,000 enterprises• 170m malware events across over 10,000 enterprises• 6 months of malware traffic data from 30+m mobile devices• Live botnet traffic from compromised organizations• Millions of Indicators of Compromise• Details of all Denial of Service activity for 2014

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

• 200m successful vulnerability exploits across 20,000 enterprises• 170m malware events across over 10,000 enterprises• 6 months of malware traffic data from 30+m mobile devices• Live botnet traffic from compromised organizations• Millions of Indicators of Compromise• Details of all Denial of Service activity for 2014

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

PUTTING IT ALL TOGETHER

Getting the data

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

PUTTING IT ALL TOGETHER

Creating, organizing and sharing analyses

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

.R .Rmd .json .Rdata

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

1. Assign areas to each researcher

2. For “standard VERIS” analyses, generate reports from core Rmd

3. Have “Findings Review” collaborative meetings where we peer-review the work

4. (Repeat step 3 after refinement of findings)

5. Decide on final sections for the report and assign authors

6. Add rough draft visualizations to the findings

7. Lock in content

8. Refine visualizations

9. Finalize text content

10. Work with Marketing & Graphics

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

FIGURATIVELY SPEAKING

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

• Create one “Master Rmd” for all

visualization figures using canned data from

outputs of analyses, having one master

(giant) HTML document version and multiple

individual PDF versions to give to the

creative staff to work with

Why PDF? Complex ggplot2 SVGs crash

Illustrator and the fonts are horrible (they

get converted to polygons).

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

• When you decide you want to use a figure

from the analysis spend the time to make it

look as amazing (and final) as possible to

save $$, save time down the road and to

avoid seeing your creations on @wtfviz

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

LESSONS LEA NED

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

R Markdown (Rmd) makes it super amazingly awesomely easy to

document, iterate, modify & share analyses.

spinning is cool too.

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

ggplot2 makes is super amazingly awesomely straightforward to make

“camera ready” visualizations(PDF vs SVG)

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Do not upgrade your analysis stack or experiment with RStudio during the

core analysis phase

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Packages (even for analyses) > loosely connected documents and scripts

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Source code control & data versioning control is extremely important

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

A fellow researcher must be able to reproduce your analyses with the same

data & Rmd and understand your reasoning in the annotation

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Freezing or at least recording versions of packages you use may be vitally

important to your ability to reproduce at a later date (store them in version

control with analyses or perhaps embed in a container like Docker)

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

ABOUT THE COVER

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

Bob Rudis • Managing Principal & Senior Data Scientist

bob@rudis.net

• @vzdbir• dbir@verizon.com• verizonenterprise.com/dbir• veriscommunity.net• vcdb.org• github.com/vz-risk

• @wadebaker• @davehylender• @marc_spitler• @bfist• @jayjacobs• @SuzanneWidup• @bhaskar_vk• @gdbassett• @hrbrmstr

top related