spark for reactive machine learning: building intelligent agents at scale
TRANSCRIPT
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Jeff Smith @jeffksmithjr
x.ai is a personal assistant who schedules meetings for you
Agents
AgentsAutonomous Goal-oriented Capable of learning Reactive
AgentsAutonomous Goal-oriented Capable of learning Reactive
Tech Stacks
You
Scala & Python Spark MongoDB Machine Learning
nom nom, the data dogScala & Python Spark & Akka Couchbase Machine Learning
Reactive + Machine Learning
Machine Learning Systems
Traits of Reactive Systems
Responsive
Resilient
Elastic
Message-Driven
Reactive Strategies
Reactive Machine Learning
Reactive Machine Learning
Generating Features
Machine Learning Systems
Feature Generation
Raw Data FeaturesFeature Generation Pipeline
Microblogging Data
Pipeline Failure
Raw Data FeaturesFeature Generation Pipeline
Raw Data FeaturesFeature Generation Pipeline
Supervising Feature Generation
Raw Data FeaturesFeature Generation Pipeline
Supervision
Original Features
object SquawkLength extends FeatureType[Int]
object Super extends LabelType[Boolean]
val originalFeatures: Set[FeatureType] = Set(SquawkLength)val label = Super
Basic Features
object PastSquawks extends FeatureType[Int]
val basicFeatures = originalFeatures + PastSquawks
More Features
object MobileSquawker extends FeatureType[Boolean]
val moreFeatures = basicFeatures + MobileSquawker
Feature Collections
case class FeatureCollection(id: Int, createdAt: DateTime, features: Set[_ <: FeatureType[_]], label: LabelType[_])
Feature Collectionsval earlierCollection = FeatureCollection(101, earlier, basicFeatures, label)
val latestCollection = FeatureCollection(202, now, moreFeatures, label)
val featureCollections = sc.parallelize( Seq(earlierCollection, latestCollection))
Fallback Collections
val FallbackCollection = FeatureCollection(404, beginningOfTime, originalFeatures, label)
Fallback Collectionsdef validCollection(collections: RDD[FeatureCollection], invalidFeatures: Set[FeatureType[_]]) = { val validCollections = collections.filter( fc => !fc.features .exists(invalidFeatures.contains)) .sortBy(collection => collection.id) if (validCollections.count() > 0) { validCollections.first() } else FallbackCollection}
Learning Models
Machine Learning Systems
Learning Models
Features ModelModel Learning Pipeline
Models of Love
Data Preparationval labelIndexer = new StringIndexer() .setInputCol("label") .setOutputCol("indexedLabel") .fit(instances)
val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .fit(instances)
val Array(trainingData, testingData) = instances.randomSplit( Array(0.8, 0.2))
Learning a Modelval decisionTree = new DecisionTreeClassifier() .setLabelCol("indexedLabel") .setFeaturesCol("indexedFeatures")
val labelConverter = new IndexToString() .setInputCol("prediction") .setOutputCol("predictedLabel") .setLabels(labelIndexer.labels)
val pipeline = new Pipeline() .setStages(Array(labelIndexer, featureIndexer, decisionTree, labelConverter))
Evolving Modeling Strategiesval randomForest = new RandomForestClassifier() .setLabelCol("indexedLabel") .setFeaturesCol("indexedFeatures")
val revisedPipeline = new Pipeline() .setStages(Array(labelIndexer, featureIndexer, randomForest, labelConverter))
Deep Models of Artistic Style
Refactoring Command Line Tools> python neural-art-tf.py -m vgg -mp ./vgg -c ./images/bear.jpg -s ./images/style.jpg -w 800
def produce_art(content_image_path, style_image_path, model_path, model_type, width, alpha, beta, num_iters):
Exposing a Serviceclass NeuralServer(object): def generate(self, content_image_path, style_image_path, model_path, model_type, width, alpha, beta, num_iters): produce_art(content_image_path, style_image_path, model_path, model_type, width, alpha, beta, num_iters) return True
daemon = Pyro4.Daemon() ns = Pyro4.locateNS() uri = daemon.register(NeuralServer) ns.register("neuralserver", uri) daemon.requestLoop()
Encoding Model Types
object ModelType extends Enumeration { type ModelType = Value val VGG = Value("VGG") val I2V = Value("I2V") }
Encoding Valid Configurationcase class JobConfiguration(contentPath: String, stylePath: String, modelPath: String, modelType: ModelType, width: Integer = 800, alpha: java.lang.Double = 1.0, beta: java.lang.Double = 200.0, iterations: Integer = 5000)
Finding the Serviceval ns = NameServerProxy.locateNS(null) val remoteServer = new PyroProxy(ns.lookup("neuralserver"))
Calling the Servicedef callServer(remoteServer: PyroProxy, jobConfiguration: JobConfiguration) = { Future.firstCompletedOf( List( timedOut, Future { remoteServer.call("generate", jobConfiguration.contentPath, jobConfiguration.stylePath, jobConfiguration.modelPath, jobConfiguration.modelType.toString, jobConfiguration.width, jobConfiguration.alpha, jobConfiguration.beta, jobConfiguration.iterations).asInstanceOf[Boolean] }))}
Profiles with Style
Hybrid Model learning
Features ModelModel Learning Pipeline
Publishing Models
Machine Learning Systems
Publishing Models
Model Predictive ServicePublishing Process
Detecting Fraud
False Negative
False Positive
Model Metrics
val trainingSummary = model.summary val binarySummary = trainingSummary
.asInstanceOf[BinaryLogisticRegressionSummary]
binarySummary.roc roc.show() binarySummary.areaUnderROC
Model Metricsval predictions = model.transform(testingData)
val evaluator = new BinaryClassificationEvaluator() .setLabelCol("label") .setRawPredictionCol("rawPrediction") .setMetricName("areaUnderPR")
val areaUnderPR = evaluator.evaluate(predictions)
Building Lineages
val rawData: RawData val featureSet: Set[FeatureType] val model: ClassificationModel val modelMetrics: BinaryLogisticRegressionSummary
Summary
AgentsAutonomous Goal-oriented Capable of learning Reactive
Machine Learning Systems
Traits of Reactive Systems
Reactive Strategies
Reactive Machine Learning
For Later
@jeffksmithjr manning.com reactivemachinelearning.com medium.com/data-engineering
M A N N I N G
Jeff SmithUse the code opensanmu for 40% off!
x.ai @xdotai [email protected] New York, New York
We’re hiring!
Thank You!