wattgo: analyses temps-réél de series temporelles avec spark et solr (français)

31
Smart Energy as a Service RealTime Analycs Spark SolR Cassandra @WagoHQ www.wago.com

Upload: planet-cassandra

Post on 30-Jul-2015

110 views

Category:

Technology


0 download

TRANSCRIPT

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

@WattgoHQ www.wattgo.com

Smart Energy as a Service

Founded in 2011 by experts in data analytics, utilities business and big data

French households panel equiped with meter sensors

A team of 18 people, with a core R&D team working in building load curve disaggregation algorithms

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

UIMeteorMicroServices

Spark

ProtoBuf RPC

Put your logic

hereCa

ssandra

So

lR - DSE Field Transformer

Trig

gers

UsersProtoBufsstored asBlobs

SensorsTimeSeries

Kafka<<

<<< CQL SolR Query >>>>>

Kafka

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

RealTime Analytics using DSE search (SolR) Apache Spark

and Cassandra Triggers

Real-time aggregation on arbitrary groupsbased on customer metadata

Demo usecase :Real time monitoring of energy consumption

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

UIMeteorMicroServices

Spark

ProtoBuf RPC

Put your logic

hereCa

ssandra

So

lR - DSE Field Transformer

Trig

gers

UsersProtoBufsstored asBlobs

SensorsTimeSeries

Kafka<<

<<< CQL SolR Query >>>>>

Kafka

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

CREATE TABLE cassandradays.queries ( name text PRIMARY KEY, query text) WITH ...;

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

CREATE TABLE cassandradays.queries ( name text PRIMARY KEY, query text) WITH ...;

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

abstract class CassandraReadOnlyTrigger extends ITrigger {

override def augment(key: ByteBuffer, mut: ColumnFamily): util.Collection[Mutation] = { handleTrigger(key, mut) // Non blocking call null // Let C* proceed }

def handleTrigger(key: ByteBuffer, mut: ColumnFamily): Future[Unit] = Future {

def handler:(MutationAccessor => Unit) = if(mut.isMarkedForDelete) delete else read

handler(new MutationAccessor(key, mut))

}

def read(mut: MutationAccessor): Unit

def delete(mut: MutationAccessor): Unit

}

CassandraReadOnlyTrigger.scala

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

AggregatorTrigger.scala

class AggregatorTrigger extends CassandraReadOnlyTrigger{

// Netty boilerplate [...]

val aggregatorService = AggregatorServiceRPC.Aggregatron.newStub(channel)

override def read(mut: MutationAccessor): Unit = { // triggered on upserts

val request = Aggregate.newBuilder()

// Name of our aggregation request.setName(mut.getValue[String]("name"))

//SolR query itself request.setQuery(mut.getValue[String]("query"))

aggregatorService.registerNew(controller, request.build(), callback)

}

override def delete(mut: MutationAccessor): Unit = { // triggered on deletes

val request = Aggregate.newBuilder() request.setName(mut.getValue[String]("name"))

aggregatorService.delete(controller, request.build(), callback) }}

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

class MutationAccessor(partitionKey: ByteBuffer, update: ColumnFamily) {

trait ValueMapper[T] { def getValue: T }

object ValueMapper {

implicit def stringMapper(name: String): ValueMapper[String] = makeMapper(name, UTF8Type.instance.compose)

implicit def intMapper(name: String): ValueMapper[Int] = makeMapper(name, Int32Type.instance.compose)

[…]

def makeMapper[T](name: String, f: ByteBuffer => T): ValueMapper[T] = { new ValueMapper[T] { def getValue = f(getBuffer(name)) } }

} def getValue[T](implicit vm: ValueMapper[T]): T = vm.getValue}

MutationAccessor.scala

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

$ nodetool reloadtriggersLoad your trigger on each node :

Bind it to your Cassandra table :cqlsh:> CREATE trigger aggregatorTrigger on cassandradays.queries using 'AggregatorTrigger';

No need to restart CassandraINFO 13:37:00 Loading new jar /path/to/your/trigger/directory/AggregatorTrigger.jar

Enjoy !

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

UIMeteorMicroServices

Spark

ProtoBuf RPC

Put your logic

hereCa

ssandra

So

lR - DSE Field Transformer

Trig

gers

UsersProtoBufsstored asBlobs

SensorsTimeSeries

Kafka

<<<<< CQL SolR Query >>>>>

Kafka

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

class AggregatorRPCService(val aggregationsHandler: ActorRef) extends Aggregatron { import AggregationsHandlerMessages._

override def registerNew(controller: RpcController, request: Aggregate, done: RpcCallback[RegistrationResponse]): Unit = { aggregationsHandler ! UpdateEntry(request.getName, request.getQuery) } override def delete(controller: RpcController, request: Aggregate, done: RpcCallback[DeletionResponse]): Unit = { aggregationsHandler ! DeleteEntry(request.getName) } }

AggregatorRPCService.scala

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

WIDs = cassandra.withSessionDo(session => {

class AggregationsHandler(conf: SparkConf) extends Actor { import AggregationsHandlerMessages._ val cassandra = CassandraConnector(conf) val prepared = cassandra.withSessionDo(session => { session.prepare("SELECT * FROM cassandradays.users WHERE solr_query = ?") }) AggregatorServiceEndPoint.start(new AggregatorRPCService(self), 7777) val aggregations = mutable.HashMap[String, Seq[String]]() def receive = { case GetAggregations => sender ! aggregations.toMap case UpdateEntry(name, query) => val val bound = prepared.bind bound.setString("solr_query", query) val i = session.execute(bound).iterator() i.map(_.getString("wid")).toSeq }) aggregations += name -> WIDs case DeleteEntry(name) => aggregations.remove(name) } }

AggregationsHandler.scala

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

def getAggregations(ip: String, port: Int) : Map[String, Seq[String]] = { val fw = (aggregatesHandlerActor ? GetAggregations)(Timeout(5.seconds)) Await.result(fw, 5.seconds).asInstanceOf[Map[String, Seq[String]]] }

[...] kafkaStream.flatMap(msg => { val dp = msg._2.parseJson.convertTo[RawDataPoint]

getAggregations(ip, port).flatMap(agg => { if (agg._2.contains(dp.key)) { Some(agg._1 -> OutputData(agg._1, dp.value, 1)) } else { None } }) }).reduceByKeyAndWindow((a: OutputData, b: OutputData) => { OutputData(a.name, a.sum + b.sum, a.count + b.count) }, Seconds(60), Seconds(3)).foreachRDD(rdd => { rdd.collect().foreach{ x => val message = new ProducerRecord[String, String](outputTopic, null, x._2.toJson.toString()) producer.send(message) } }) [...]

DemoCassandraDays.scala

{ "key" : "519888bdeabc888934000000", "ts" : 1434458546000, "value" : 147.3}

{ "name" : "13", "sum" : 88760.0, "count" : 126 }

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

Deploy with :

Same syntax as original spark-submit

dse spark-submit target/scala-2.10/DemoCassandraDays.jar

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

UIMeteorMicroServices

Spark

ProtoBuf RPC

Put your logic

here

Cassa

ndra

SolR - DSE Field Transform

er

UsersProtoBufsstored asBlobs

Kafka<<

<<< CQL SolR Query >>>>>

Kafka

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

CREATE TABLE cassandradays.users ( wid text PRIMARY KEY, "protobuf:com.wattgo.users.User" blob) WITH ...;

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

package com.wattgo.users;

option java_package = "com.wattgo.users";option java_outer_classname = "WattGoUser";

import "Details.proto";

message User { required string wid = 1; optional Details details = 2;}

[...]

$ protoc --java_out=. User.proto Details.proto [...]

User.proto :

Generate Protobuf DescriptorSet File for later use of Protobuf Reflection API :

Generate WattGoUser Java class :

$ protoc --include_imports --descriptor_set_out=User.desc User.proto

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

WattGoUserDetails.Details details = WattGoUserDetails.Details.newBuilder() .setEmail("[email protected]") .setFirstname("Denis") .setLastname("Ritchie") .setAddress(address) .build();

WattGoUser.User user = WattGoUser.User.newBuilder() .setWid("49b96edde3a1d5444f5cd145b7117144") .setDetails(details) .build();

byte[] blob = user.toByteArray();

cqlsh:> SELECT * FROM users WHERE wid = ‘49b96edde3a1d5444f5cd145b7117144’;

wid | protobuf:com.wattgo.users.User 51548164eabc884b2d00014f | 0x0a18353135343831363465616263...

WattgoUser.class

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

" The FieldInputTransformer and FieldOutputTransformer classes must be extended to define a custom column-to-document field mapping [...]. FieldInputTransformer takes an inserted Cassandra column and modifies it prior to Solr indexing, while FieldOutputTransformer parses a Cassandra row just before returning the result of a Solr query. "

EDWARD RIBEIRO, DataStaxhttp://www.datastax.com/dev/blog/dse-field-transformers

FieldInputTransformerimport com.datastax.bdp.search.solr.FieldInputTransformer;

FieldOutputTransformerimport com.datastax.bdp.search.solr.FieldOutputTransformer;

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

SolR schema.xml

<schema name="users" version="1.1">... <fields> <field name="wid" type="string" indexed="true" stored="true"/> <field name="protobuf:com.wattgo.users.User" type="binary" indexed="true" stored="true"/> ... <field name="details.address.zipcode" type="string" indexed="true" stored="false"/> <field name="details.address.city" type="string" indexed="true" stored="false"/> <field name="details.address.country" type="string" indexed="true" stored="false"/> ... </fields>... <uniqueKey>wid</uniqueKey></schema>

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

SolR solrconfig.xml

<config>... <fieldInputTransformer name="dse" class="com.wattgo.search.transformers.protobuf. InputTransformer"> </fieldInputTransformer>

<fieldOutputTransformer name="dse" class="com.wattgo.search.transformers.protobuf. OutputTransformer"> </fieldOutputTransformer>...</config>

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

Start indexing your data :

Using dsetool :

Using curl :

$ dsetool create_core cassandradays.users schema=schema.xml solrconfig=solrconfig.xml

$ curl "http://localhost:8983/solr/resource/cassandradays.users/solrconfig.xml" \--data-binary @solrconfig.xml -H 'Content-type:text/xml; charset=utf-8'

$ curl "http://localhost:8983/solr/resource/cassandradays.users/schema.xml" \--data-binary @schema.xml -H 'Content-type:text/xml; charset=utf-8' $ curl -XPOST "http://localhost:8983/solr/admin/cores?action=CREATE&name=cassandradays.users"\-H 'Content-type:text; charset=utf-8'

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

Now our 'users' CQL table looks like this :

CREATE TABLE cassandradays.users ( wid text PRIMARY KEY, "protobuf:com.wattgo.users.User" blob, solr_query text) WITH ...;

CREATE CUSTOM INDEX cassandradays_users_protobufcomwattgousersuser_index ON cassandradays.users ("protobuf:com.wattgo.users.User") USING 'com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex';

CREATE CUSTOM INDEX cassandradays_users_solr_query_index ON cassandradays.users (solr_query) USING 'com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex';

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

public class InputTransformer extends FieldInputTransformer {

public static final String prefix = "protobuf:"; @Override public boolean evaluate(String field) { return field.startsWith(prefix); }

@Override public void addFieldToDocument( SolrCore core, IndexSchema schema, String key, Document doc, SchemaField fieldInfo, String fieldValue, float boost, DocumentHelper helper) throws IOException {

String className = fieldInfo.getName().substring(prefix.length());

Descriptor descriptor = ProtobufFileDescriptorSetParser.getDescriptor(className);

byte[] data = Hex.decodeHex(fieldValue.toCharArray()); DynamicMessage message = DynamicMessage.parseFrom(descriptor, data);

Map<String, ProtobufField> fields = ProtobufMessageParser.flattenFields(message, "");

for (Map.Entry<String, ProtobufField> field: fields.entrySet()) {

ProtobufField entry = field.getValue(); Set<Object> values = entry.getValues(); String fieldName = field.getKey();

SchemaField fieldSchema = core.getLatestSchema().getFieldOrNull(fieldName);

for (Object value: values) { helper.addFieldToDocument(core, core.getLatestSchema(), key, doc, fieldSchema, value.toString(), boost); } } }}

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

public class OutputTransformer extends FieldOutputTransformer {

@Override public void binaryField(FieldInfo fieldInfo, byte[] value, StoredFieldVisitor visitor, FieldOutputTransformer.DocumentHelper helper) throws IOException { String prefix = "protobuf:"

;

if(!fieldInfo.name.startsWith(prefix)) { visitor.binaryField(fieldInfo, value); return; }

String className = fieldInfo.name.substring(prefix.length());

Descriptor descriptor = ProtobufFileDescriptorSetParser.getDescriptor(className); DynamicMessage message = DynamicMessage.parseFrom(descriptor, value);

Map<String, ProtobufField> fields = ProtobufMessageParser.flattenFields(message,

""); for (Map.Entry<String, ProtobufField> field: fields.entrySet()) {

FieldInfo info = helper.getFieldInfo(field.getKey()); ProtobufField current = field.getValue(); FieldDescriptor fieldDescriptor = current.getFieldDescriptor(); Set<Object> fieldValues = current.getValues(); FieldDescriptor.JavaType type = fieldDescriptor.getJavaType();

for (Object fieldValue: fieldValues) { if (type == FieldDescriptor.JavaType.STRING) visitor.stringField(info, (String) fieldValue); else if (type == FieldDescriptor.JavaType.BYTE_STRING) visitor.binaryField(info, (byte[]) fieldValue); else if [...] } } } }}

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

cqlsh:> SELECT * FROM users WHERE solr_query = 'details.address.zipcode:13*';

wid | protobuf:com.wattgo.users.User | solr_query51548164eabc884b2d00014f | 0x0a18353135343831363465616263... | null51264fb2eabc88610c00001f | 0x0a18353132363466623265616263... | null5199e9d0eabc88172c000001 | 0x0a18353139396539643065616263... | null5127a249eabc88641500001c | 0x0a18353132376132343965616263... | null51548164eabc884b2d000143 | 0x0a18353135343831363465616263... | null[...]

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

UIMeteorMicroServices

Spark

ProtoBuf RPC

Put your logic

hereCa

ssandra

So

lR - DSE Field Transformer

Trig

gers

UsersProtoBufsstored asBlobs

SensorsTimeSeries

Kafka<<

<<< CQL SolR Query >>>>>

Kafka

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

Real time monitoring of energy consumption in France

Smart Energy as a Service

RealTime Analytics Spark SolR Cassandra

https://github.com/wattgo/cassandradays-*

What’s next :

Meteor over Cassandra ( waiting for oplog )Cassandra pub / sub ?

Play, fork and contribute :