Transcript
Page 1: Big and fast a quest for relevant and real-time analytics

Big & Fast: A quest for relevant and real-time analytics

Natalino Busa@natalinobusa

Page 2: Big and fast a quest for relevant and real-time analytics

Parallelism Mathematics Programming

Languages Machine Learning Statistics

Big Data Algorithms Cloud Computing

Natalino Busa@natalinobusa

www.natalinobusa.com

Page 3: Big and fast a quest for relevant and real-time analytics

Big and Fast. Methodology Architecture Roles and organization

Page 4: Big and fast a quest for relevant and real-time analytics

Conversion is the ultimate form of permission marketing

Permission marketing is about the honour of being heard.

How to earn it ? Provide the right suggestions, at the right time. This is what makes data analysis valuable

Page 5: Big and fast a quest for relevant and real-time analytics

When do you really know your customer ?

know about last unique:

5 songs?

100 songs?

10’000 songs?

Page 6: Big and fast a quest for relevant and real-time analytics

Old & New stuff.

We evolve slowly, our personality, our habits.

But events and trends can affect us on a short notice

How do you combine old with new?

Page 7: Big and fast a quest for relevant and real-time analytics

The customer’s contextComplex on many dimensions:

Personal history: amount of transactions ever done

Long term Interaction:how the users’ action correlate with others

Real time events:Trends and recent events

Page 8: Big and fast a quest for relevant and real-time analytics

The customer’s context

context is related to time:

slow changing: the defining characteristic of a person

fast changing: events which influence our lives, trends

Require very different technology solutions !!!

Page 9: Big and fast a quest for relevant and real-time analytics

Challenges

millions of billions of

Not much time to reactwindow of opportunity sometimes is just a few seconds

Load of information to processyou want to understand well the user history

Page 10: Big and fast a quest for relevant and real-time analytics

Slow and fast

ranking and preference analysis

segmentation and clustering

short term trending topics

rule-based recommendations

10’s Terabytes of Data. This can take hours ….

100’s of events per second.This must be fast ….

Page 11: Big and fast a quest for relevant and real-time analytics

Hadoop: Distributed Data OS

ReliableDistributed, Replicated File System

Low cost↓ Cost vs ↑ Performance/Storage

Computing Powerhouse

All clusters CPU’s working in parallel for running queries

Page 12: Big and fast a quest for relevant and real-time analytics

Scala / Akka / Spray: a WEB API reactive framework

ActorA Actor

B

ActorC

msg 1msg 2

msg 3

msg 4● it scales horizontally (can run in cluster mode)

● maximum use of the available cores/memory

1. processing is non-blocking, threads are re-used

2. can parallelize computing power across many actors

Very fast: 1000’s messages/sec

Very reliable: auto recovery

Page 13: Big and fast a quest for relevant and real-time analytics

Distributed computing: lambda architecture

BatchComputing

HTTP RESTful API

In-MemoryDistributed Database

In-memoryDistributed DB’s

Lambda ArchitectureBatch + Streaming

low-latencyWeb API services

StreamingComputing

Data Warehouses Messaging Busses

Page 14: Big and fast a quest for relevant and real-time analytics

Distributed computing: some techs

Hadoop

Cassandra

millions of billions of

λ= conversions

( lamda )

Page 15: Big and fast a quest for relevant and real-time analytics

All Things Distributed

Distributing computing and storage

more machines = more storage/computing

Open Source software solutions

mature enough for pragmatic adopters

Near realtime + big data technologies

Hadoop, Scala, Akka, Spray, Cassandra

Page 16: Big and fast a quest for relevant and real-time analytics

Science & Engineering

Statistics, Data Science

PythonRVisualization

IT InfraBig Data

JavaScalaSQL

Hadoop: Big Data Infrastructure, Data Science on large datasets

Big Data and Fast Data requires different profiles to be able to achieve the best results

Page 17: Big and fast a quest for relevant and real-time analytics

Parallelism Mathematics Programming

Languages Machine Learning Statistics

Big Data Algorithms Cloud Computing

Natalino Busa@natalinobusa

www.natalinobusa.com

Thanks !Any questions?

Page 18: Big and fast a quest for relevant and real-time analytics

Natalino Busa@natalinobusa


Top Related