scala: lingua franca of fast data - meetupfiles.meetup.com/7770922/20160524 ibm fast data...
TRANSCRIPT
![Page 1: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/1.jpg)
Scala: Lingua Franca of Fast Data
Jamie AllenSr. Director of Global Solutions Architects
![Page 2: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/2.jpg)
• Why Scala?• Who is doing this?• What is Fast Data?• Architecting for Fast Data
Agenda
![Page 3: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/3.jpg)
• Cloud portability versus native control• Application correctness versus speed of development• Modularity versus global namespace• Concise syntax versus boilerplate• Multi-threaded simplicity via abstractions versus low-level control
Tradeoffs
![Page 4: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/4.jpg)
• REPL• Type safety• Modularity• Concise syntax• Multi-threaded simplicity• Data-centric semantics• Managed runtime for cloud portability• Ecosystem
Scala is the local optimum
![Page 5: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/5.jpg)
Scala is the local optimum
![Page 6: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/6.jpg)
The JVM is a primary reason for Scala’s success
![Page 7: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/7.jpg)
• No REPL or Notebook• Not a data-centric language, particularly collections semantics
Why not Java?
![Page 8: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/8.jpg)
• Data-centric language, has all of the wonderful collections semantics we want• No type safety• No modularity
Why not Python?
![Page 9: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/9.jpg)
• Weak type safety• Collections are too elemental• Native execution is a non-starter, so Go is the only option• Garbage collection is not generational
Why not Go or C++?
![Page 10: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/10.jpg)
• Scala just so happened to fit well in this space• Performance• Correctness• Conciseness
• Scala will evolve• Other languages will come in time
Scala is NOT the end of the road
![Page 11: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/11.jpg)
Who is doing this?
![Page 12: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/12.jpg)
![Page 13: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/13.jpg)
One Caveat: Apache Beam and TensorFlow
![Page 14: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/14.jpg)
Why Scala?At the time we started, I really wanted a PL that supports a language-integrated interface (where people write functions inline, etc)… However, I also wanted to be on the JVM in order to easily interact with the Hadoop filesystem and data formats for that. Scala was the only somewhat popular JVM language that offered this kind of functional syntax and was also statically typed (letting us have some control over performance), so we chose that. Today there might be an argument to make the first version of the API in Java with Java 8, but we also benefitted from other aspects of Scala in Spark, like type inference, pattern matching, actor libriaries, etc.Matei Zaharia, creator of Spark
![Page 15: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/15.jpg)
What is Fast Data?
![Page 16: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/16.jpg)
A bit of history: Hadoop
![Page 17: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/17.jpg)
YARN
HDFS
MRjob#1
MRjob#2
Flume Sqoop
DBs
SlaveNode
DiskDiskDiskDiskDisk
NodeMgr
DataNode
Master
ResourceManager
NameNode
Worker
![Page 18: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/18.jpg)
Hadoop strengths• Lowest capital expenditure for big data• Excellent for ingesting and integrating diverse datasets• Flexible
• Classic analytics (aggregations and data warehousing)• Machine learning
![Page 19: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/19.jpg)
Hadoop weaknesses• Complex administration• YARN requires dedicated cluster• MapReduce foibles
• Poor performance• Imperative programming model• No stream processing support
![Page 20: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/20.jpg)
Fast Data with Spark
![Page 21: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/21.jpg)
![Page 22: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/22.jpg)
Spark• 100x faster as a replacement for Hadoop MapReduce• Uses much fewer machines and resources• Incredible support from the community and enterprise
![Page 23: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/23.jpg)
Spark use cases• Primarily anomaly detection
• Risk management• Fraud detection• Odds recalculation
• Spam filters• Update search engine results quickly
![Page 24: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/24.jpg)
• Spark had it with RDDs• They removed it with the DataFrames API• Brought it back with DataSets, but not as comprehensively as RDDs
Type safety
![Page 25: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/25.jpg)
Why not Flink?• Flink has much better stream handling for low latency systems that Spark currently
lacks• Event timing• Watermarks• Triggers
• Exactly-once semantics• Pipeline portability via Apache Beam integration
![Page 26: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/26.jpg)
Why not Flink?
![Page 27: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/27.jpg)
Architecting for Fast Data
![Page 28: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/28.jpg)
This isn’t enough
![Page 29: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/29.jpg)
Old and busted
![Page 30: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/30.jpg)
Traditional application architectures and platforms are obsolete.Gartner
![Page 31: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/31.jpg)
How do we avoid messing this up?
![Page 32: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/32.jpg)
• At the API• In our source• For our data
We want isolation
Wikipedia, Creative Commons, created by DFoerster
![Page 33: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/33.jpg)
We want realistic data management• Use CQRS and Event Sourcing, not CRUD• Transactions, especially distributed, will not work• Consistency is an anti-pattern at scale• Distributed locks and shared data will limit you• Data fabrics break all of these conventions
Think in terms of compensation, not prevention.Kevin Webber, Lightbend
![Page 34: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/34.jpg)
We want to ACID v2• Associativity, not Atomicity• Commutativity, not Consistency• Idempotent, not Isolation• Distributed, not Durable
Wikipedia, Creative Commons, created by Weston.pace
![Page 35: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/35.jpg)
New hotness
![Page 36: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/36.jpg)
Mesos,YARNonBareMetal,Cloud
HDFS,S3,CFSv2SQL/NoSQL
Core
Streaming SQL
MLlib GraphX
Fast Data Architecture
HTTP/RESTInternet
ReacHveServices
LogsandOtherFiles
Actors
Cluster …Persist
AkkaStreams
WebServices
![Page 37: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/37.jpg)
![Page 38: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/38.jpg)
![Page 39: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/39.jpg)
![Page 40: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/40.jpg)
![Page 41: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/41.jpg)
![Page 42: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/42.jpg)
![Page 43: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/43.jpg)
![Page 44: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/44.jpg)
![Page 45: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/45.jpg)
![Page 46: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/46.jpg)
![Page 47: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/47.jpg)
![Page 48: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/48.jpg)
Learning Spark• Go to http://bigdatauniversity.com, built by IBM
![Page 49: Scala: Lingua Franca of Fast Data - Meetupfiles.meetup.com/7770922/20160524 IBM Fast Data Meetup.pdf · We want realistic data management • Use CQRS and Event Sourcing, not CRUD](https://reader034.vdocument.in/reader034/viewer/2022042220/5ec5fd6a4f8ce2596d27b4e7/html5/thumbnails/49.jpg)