apache spark™ applications the easy way - pierre borckmans

25
Wri$ng Spark applica$ons, the easy way ¨¨ Pierre Borckmans Data Science Meetup - Spark & Machine Learning - October, 27th 2016 - Brussels

Upload: sparktc

Post on 16-Apr-2017

798 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Wri$ng Spark applica$ons,the easy way

¨¨Pierre Borckmans

Data Science Meetup - Spark & Machine Learning - October, 27th 2016 - Brussels

Page 2: Apache Spark™ Applications the Easy Way - Pierre Borckmans
Page 3: Apache Spark™ Applications the Easy Way - Pierre Borckmans
Page 4: Apache Spark™ Applications the Easy Way - Pierre Borckmans
Page 5: Apache Spark™ Applications the Easy Way - Pierre Borckmans

The pivot...

Page 6: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Pla$orm overview

Page 7: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Data pipeline overview

Page 8: Apache Spark™ Applications the Easy Way - Pierre Borckmans

The journey...3 paradigms for spark applica0ons development

Page 9: Apache Spark™ Applications the Easy Way - Pierre Borckmans

From hardcoded dataflows...val subscribers = cdrs.map( x => ( x.A.toLong, x ) ).groupByKey

subscribers.mapValues(_.map( cdr => { for ( ( category, dimensions ) <- allDimensions ) yield ( category, for ( dim <- dimensions ) yield { val fields = dim._1 val values = dim._2 if ( cdr.check( fields, values ) ) f( category )( cdr ) else f0( category ) } )} ).reduce( ( m1, m2 ) => { for ( ( category, l1 ) <- m1 ) yield { val l2 = m2( category ) val d = l1.zip( l2 ).map( l => { g( category )( l._1, l._2 ) } ) ( category, d ) }} ) )

Page 10: Apache Spark™ Applications the Easy Way - Pierre Borckmans

...to fully interac/ve ones...

Page 11: Apache Spark™ Applications the Easy Way - Pierre Borckmans

and back to code...

Page 12: Apache Spark™ Applications the Easy Way - Pierre Borckmans

... with benefits !

Page 13: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Harcoded dataflows

Page 14: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Dataflow Editor

Page 15: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Dataflow EditorShow &me!

Page 16: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Video

Page 17: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Video

Page 18: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Datamodules• self-contained units of the pipeline

• expressing dependencies on sources and other dms

• recycling the dataflow engine

• DSL to declare dataflows

• unit test DSL to test flow and individual transforma=ons

• sbt plugin to handle all devops related tasks

• automa=c orchestra=on through Airflow

Page 19: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Dataflow DSL

Page 20: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Dataflow Test DSL

Page 21: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Automated Data Modules Orchestra2on

Page 22: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Data Module ExplorerShow &me!

Page 23: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Video

Page 24: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Video

Page 25: Apache Spark™ Applications the Easy Way - Pierre Borckmans

Video