big data conference

Post on 26-Jan-2015

121 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Big Data Conference 2013: Analytics and Applications for Federal Big Data

Data Tactics Corp: A Blended Approach to Big Data Analytics

!Richard Heimann,

Data Scientist at Data Tactics Corporation

Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Geoffrey B., Rich H.) ! Graduates from top universities...! Advanced degrees include:

mathematics, computer science, astrophysics, electrical engineering, mechanical engineering, statistics, social sciences.

!Base competencies (horizontals): clustering, association rules, regression, naive bayesian classifier, decision trees, time-series, text analysis.

!Going beyond the base (verticals)...

Horizontals & Verticals

Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis

econ

ometr

ics

spatia

l econ

ometr

ics

graph

theo

ry alg

orithm

s

astrop

hysica

l time-s

eries a

nalys

is

path

plann

ing alg

orithm

sba

yesian

statis

tics

const

rained

optim

izatio

ns

numeric

al inte

gratio

n tec

hniqu

es

PCA

bagg

ing/bo

osting

hierar

chica

l mod

els

IRT

DLISA

latent

class

analy

sis

struc

tural e

quatio

n mod

eling

mixture

modelsSVM

maxent

CARTau

toreg

ressiv

e mod

els

ICAfac

tor an

alysis

Rando

m Fores

t

dimen

siona

l redu

ction

topic m

odels

sentim

ent a

nalys

is

Hierarchy of Data Scientists

Data Tactics Analytics Practice

Why Analytics [Business]??? Why are analytics important?

(Business, Analytics, Practical) !!

"We need to stop reinventing the cloud and start using it!"

(Dave Boyd) !!!!

!

Why are analytics important? (Business, Analytics, Practical)

!!No Free Lunch (NFL): no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm.!!!!

Why Analytics [Analytics]???

If this guy doesn’t scale - none of us do.

Web Scales

Academic Publications Scale

IC Scales

N

t

t

Why Analytics [Practical]???

algo to users > algo to dataDevelopment

Deployment

Machine User

Parallel Distributed Objective Subjective

Valid

Nontrivial

Accurate

Useful

Novel

Comprehensible

M/R

MPP

HDFS

GPU

SOA

ShinyOpen Sourced by RStudio in November 2012!Not the first to wrap R in the browser but perhaps the easiest for R developers !Don’t need to know HTML, CSS and javascript to get started !Reactive Programming model !Web sockets for communication

server.R# Define server logic required to generate and plot a random !# distribution!shinyServer(function(input, output) {! ! # Expression that generates a plot of the distribution.! # renderPlot:! #! # 1: Is "reactive" and will therefore automatically ! # re-executed when inputs change.! # 2: Its output type is a plot. ! ! output$distPlot <- renderPlot({! ! # generate an rnorm distribution and plot it! dist <- rnorm(input$obs)! hist(dist)! })!})

ui.Rlibrary(shiny)!!# Define UI for application that plots random distributions !shinyUI(pageWithSidebar(! ! # Application title:! headerPanel("My Shiny App!"),! ! # Sidebar with a slider input for number of observations:! sidebarPanel(! sliderInput("obs", ! "Number of observations:", ! min = 0, ! max = 1000, ! value = 500)! ),! # Show a plot of the generated distribution:! mainPanel(! plotOutput("distPlot")! )!))

ui.R

headerPanel()

sidebarPanel() mainPanel()

server.R + ui.R = microscope

adjustable parameters (knobs): 0 < knobs < small k knobs = lighting, varying objectives, focusing (fine and course) !

knobs: fine and course filtering:

geographytimevariable of interest observations of interest

promote significant (objective) patternschange model parameters

BDE + Shiny

Latent Spatial Traffic Patterns

12

3

Overlapping SolutionsMultiple models allow more nuanced learning from data. !Convergent results serve as cross-validation. !Points of divergence provide additional insights and allow models to be calibrated further. !Different models can provide answers to different questions or answers to the same question for different analysts. !Multi-method excels to diverse teams with mutable missions. !smooth + rough = data !New paradigm where the question, “Are there multiple, overlapping ways to solve this problem” dominate.

Overlapping Solutions

Analyt

ic A

Analytic B

Analytic C

A + B + C

B + CA + C

A + B

Are there multiple, overlapping ways to solve this problem?

Summary:

# our blended approach !dt.philosophy <- lm(analytics ~ bigdata +

smalldata + objective + subjective:overlapping.solutions, data=data)

Overlapping Solutions

About (DS4G): !1: Improve on definitions of analytics.2: Outline optimal interactions with Data Scientists.3: Provide a life-cycle for Data Science.4: Most importantly, share a taxonomy to identify analytical questions one could ask of data (Causal Effects, Classification, Outlier Detection, Big Data and Analytics, Measurement Models, & Text Analysis) !Presented by Data Tactics Analytics TeamLocation: TBD Time: 1Q 2014Duration: ~ 5 hrs.Cost: FREEAudience: Government managers and Data Tactics partners with their customers.

Data Science for Government (DS4G)

http://www.meetup.com/Data-Science-DC/events/146953142/

LUBAP goes wild!421 attending!

Thank you...

Questions?Homepage: http://www.data-tactics.comBlog: http://datatactics.blogspot.comTwitter: @DataTactics

Or, me (Rich Heimann): rheimann@data-tactics-corp.comSlideshare: http://www.slideshare.net/DataTactics/presentations

top related