enterprise solution engineer twitter: @knight cloud · akka clustering • peer-to-peer based...
TRANSCRIPT
Next-Generation Scala Architectures
Ryan KnightEnterprise Solution Engineer
Twitter: @knight_cloud
My Experience• Sun Microsystems
• Oracle
• Family Search / LDS Church
• Riot Games
• Adobe / T-Mobile
• Deloitte / State of Louisiana
• Typesafe
• Tomax / Demandware
• DataStax - Enterprise Technical Sales
Which is Faster?
• Fighter Jet
• Mantis Shrimp
• Bullet
• Mushroom Spores
Sphagnum Moss!
• Launches Spores at 89 MPH in less than a thousandth of second
• Spores Travel over 80 Height of the Launching Capsule
"Você nunca muda as coisas lutando contra a realidade existente.Para mudar alguma coisa, construa um novo modelo que torne o modelo
existente obsoleto ".
Agenda
• Architecting the Application Tier
• Architecting the Data Tier
• Fundamental Architectural Principles
Architecting the Application Tier
with Scala
Evaluation Criteria
Professor Zapinsky provou que a lula é mais inteligente do que o gato doméstico quando
desafiados em condições semelhantes.
Flaw of Performance Benchmarking
• Unrealistic Load Scenario
• Unrealistic Application Scenario
• Performance is only one criteria
• Framework optimized for benchmark
Four Traits of Reactive Architectures
Why Scala?
• Type Inference
• Uniform Access of Principle - fields can be declared via methods or fields
• Traits
• Value Classes
• Package-level methods & fields
• Default and Named Parameters
• Higher Ordered Types
• Functions as First Class Citizens
• Currying / Methods with multiple parameter lists
• Qualified Imports
• Scoped access modifiers
• Case Classes
• Singleton Objects
• Default Methods - apply / unapply / set
• Implicit Conversion and Views
• Macros
• Parser Combinators
• Multi-Line Strings
• String Interpolation
• Traits
• Default Public Access
• Type Classes
• Extractor Patterns
Functional
XKCD
Why Functional Rocks!
• Immutability
• Higher-Level of Abstraction
• Define the What not the How
• Eliminating side effects
• Inherent Parallelism
Functional in Reactive Programming
• Easy to create callbacks
• Easy to handle Events and Async Results
Statements vs. Expressionsdef errMsg(errorCode: Int): String = { var result: String = _ errorCode match { case 1 => result = "Network Failure" case 2 => result = "I/O Failure" case _ => result = "Unknown Error" } return result; }
Statements vs. Expressions
def errMsg(errorCode: Int): String = errorCode match { case 1 => "Network Failure" case 2 => "I/O Failure" case _ => "Unknown Error" }
No Imperative Code!
• Imperative programming - Describes computation in terms of statements that change a program state.
def findPeopleIn(city: String, people: Seq[People]): Set[People] = val found = new mutable.HashSet[People] for(person <- people) { for(address <- person.addresses) { if(address.city == city) found.put(person) } } return found }
No Imperative Code!
No Imperative Code!
def findPeopleIn(city: String, people: Seq[People]): Set[People] = for { person <- people.toSet[People] address <- person.addresses if address.city == city } yield person
Down with Null Pointers!def authenticateSession( session: HttpSession, username: Option[String], password: Option[Array[Char]]) = for { u <- username p <- password if canAuthenticate(u, p) privileges <- privilegesFor.get(u) } injectPrivs(session, privileges)
NO BLOCKING!
Scala Futures
Future API
import scala.concurrent._
import ExecutionContext.Implicits.global
def calcInt(x: Int) = {
Future(x * 5)
}
calcInt(10).map { rslt => println(rslt) } // prints 50
Traditional Request/Response
Client Server Serviceblocking blocking
Problems?
Reactive Request/Response
def getTweets = Action.async { Ok(WS.get("http://twitter.com/"))}}
Client Server Servicenon-blocking non-blocking
Reactive CompositionAsync & Non-Blocking
def foo = Action.async {
val futureTS = WS.url("http://www.typesafe.com").get
val futureTwitter = WS.url("http://www.twitter.com").get
for {
ts <- futureTS
twitter <- futureTwitter
} yield Ok(ts.body + twitter.body)
}
• Futures Treated as Collections
• For Expression used to represent a “callback”
Akka
• Actor Based Toolkit
• Simple Concurrency & Distribution
• Error Handling and Self-Healing
• Elastic and Decentralized
• Adaptive Load Balancing
What is an Actor?• Isolated lightweight processes• Message Based / Event Driven• Non-Request Based Lifecycle• Share nothing • Isolated Failure Handling• Same Semantics for Local and Remote
Akka Clustering• Peer-to-peer based cluster membership service
• No single point of failure or single point of bottleneck.
• Automatic node failure detector
• Cluster Events / Cluster-Aware Routers
• Cluster Routing
• Cluster Sharding
Programming Actors
32
case class Greeting(who: String) case class Departure(who: String)
class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}”)
case Departure(who) => log.info(s”Good by ${who}") } }
val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], name = "greeter") greeter ! Greeting("Charlie Parker")
Location Transparency!
Akka Supervisor Hierarchies• Parents send work to Children
• Router to Balance Work
• Parents supervise children actors
• Children delegate failure to parent
• Error-prone tasks delegated to children- “Error Kernel Pattern”
A
CB
D
GFE
Failure Recovery• Supervisor hierarchies with “let-it-crash”
semantics
• Lifecycle Monitoring
• Parent can resume, restart or terminate Child
• Error-prone tasks are delegated to child Actors - “Error Kernel Pattern”
Reference Architecture
35
Web Tier Work Tier
Data Service
AkkaRouter
Tweet Service
Geo Location
UserActor
UserActor
UserActor
UserActor
Reactive Server
UserActor
UserActor
UserActor
UserActor
Reactive Server
Architecting the Data Tier
It’s all Trade-offs
Intelligent Data• Not just about Big Data or NoSQL
• Batch processing is dead! Ala Haddop
• Real-time data processing!
• Fluent API
• Integrated Batch, Iterative and Streaming Analysis!
The Event Log
• Append-Only Logging• Database of Facts• Disks are Cheap• Why Delete Data any more?• Replay Events
39
Akka Persistence Webinar
Domain Events
• Things that have completed, facts• Immutable• Verbs in past tense
• CustomerRelocated• CargoShipped• InvoiceSent
• State Transitions
41
“In general, application developers simply do not implement large scalable applications
assuming distributed transactions.”- Pat Helland
Life beyond Distributed Transactions:
an Apostate’s Opinion
What is Cassandra?
Distributed Database
✓ Individual DBs (nodes)
✓ Working in a cluster
✓ Nothing is shared
C *
Client
Why Cassandra?
It’s Hugely Scalable (High Throughput)
Spark• Clustered In-Memory Data Analytics
• Fault Tolerant Distributed Datasets
• Batch, iterative and streaming analysis
• In Memory Storage and Disk
• 2-5× less code
• 10x faster on disk, 100x faster in memory than Hadoop MR
Spark Cassandra Connector • Loads data from Cassandra to Spark
• Writes data from Spark to Cassandra
• Implicit Type Conversions and Object Mapping
• Implemented in Scala (offers a Java API)
• Open Source
• Exposes Cassandra Tables as Spark RDDs + Spark DStreams (Soon)
Spark Cassandra Connector
• Data locality-aware (speed)
• Server-Side filters (where clauses)
• Cross-table operations (JOIN, UNION, etc.)
• Data transformation, aggregation, etc.
• Natural Time Series Integration
Intelligent Data Architecture
val conf = new SparkConf(loadDefaults = true) .set("spark.cassandra.connection.host", "127.0.0.1").setMaster("spark://127.0.0.1:7077") Initialization
val sc = new SparkContext(conf)
val table: CassandraRDD[CassandraRow] = sc.cassandraTable("keyspace", "tweets")
val ssc = new StreamingContext(sc, Seconds(30)) val stream = KafkaUtils.createStream[String, String, StringDecoder, StringDecoder]( ssc, kafka.kafkaParams, Map(topic -> 1), StorageLevel.MEMORY_ONLY) stream.map(_._2).countByValue().saveToCassandra("demo", "wordcount") ssc.start() ssc.awaitTermination()
val sc = new SparkContext( "local", "Inverted Index") sc.textFile("data/crawl") .map { line => val array = line.split("\t", 2) (array(0), array(1)) } .flatMap { case (path, text) => text.split("""\W+""") map { word => (word, path) } } .map { case (w, p) => ((w, p), 1) } .reduceByKey { (n1, n2) => n1 + n2 } .groupBy { case (w, (p, n)) => w } .map { case (w, seq) =>
Architectural Principles
How to Fail
Shared Mutable State +
Locks / Thread Libraries
AVOID AT ALL COSTS!
Traditional Request/Response
Client Server Serviceblocking blocking
Problems?
• SINGLE thread of control• If thread blows - you are screwed!• Explicit error handling WITHIN this single thread• Errors do not propagate between threads so there
is NO WAY OF EVEN FINDING OUT that something have failed
Failure Recovery in Java/C/C# etc.
Never block
• ...unless you really have to
• Blocking kills scalability (and performance)
• Never sit on resources you don’t use
• Use non-blocking IO
Go Async
• Isolate the failure
• Compartmentalize
• Manage failure locally
• Avoid cascading failures
Use Bulkheads
Backpressure
• http://ferd.ca/queues-don-t-fix-overload.html
Backpressure
• http://ferd.ca/queues-don-t-fix-overload.html
Handling Backpressure• Fail Fast
• Circuit Breaker with default responses
• Load Shedding - Bounded Mailboxes
• Worker Pull Pattern vs. Push to Overload
• Throttling
Questions?
©DataStax 2015 – All Rights Reserved