dna - einstein - data science ja bigdata

46
Bigdata -> Data Science -> AI, and some $$$ in between DNA’s journey in data science & big data

Upload: rolf-koski

Post on 06-Apr-2017

51 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: DNA - Einstein - Data science ja bigdata

Bigdata -> Data Science -> AI,and some $$$ in betweenDNA’s journey in data science & big data

Page 2: DNA - Einstein - Data science ja bigdata

prologue of prologue

you have to have an idea

Page 3: DNA - Einstein - Data science ja bigdata
Page 4: DNA - Einstein - Data science ja bigdata

THE IDEA

ALL OF THE DATA WE HAVE

PROFIT

some datasome data

some data some data

some data

some data

some data

some report

some report

some report

some report some

report

some report

some report

ONE SOURCE OF TRUTH

+ CUSTOMER FIRST

+ AUTOMATE ALL

THE THINGS

WTF?

PROFIT?activities?

who cares

webdata? who cares

Page 5: DNA - Einstein - Data science ja bigdata
Page 6: DNA - Einstein - Data science ja bigdata

Agenda

Prologue: The big thing(s)

The four things of analytics ~ the roadmap on how to do those things

Achievements

Whats inside: AWS good stuff & hype & love

Culture stuff

Upcoming

Page 7: DNA - Einstein - Data science ja bigdata

prologue

Page 8: DNA - Einstein - Data science ja bigdata

The BIG THING(s)

1. Business: it was the omnichannel customera. the ever-more-demanding, influential and independent customerb. rise of need for analytical insight & datac. demanding inf. management and analytics to be operational, not

finance-drivend. stop sub-optimizing the system (customer)

2. Tech: it was cloud, open-source, and data sciencea. suddenly - endless scale & processing powerb. reduced time-to-environment from weeks to minutesc. reduced costd. ability to create intelligent data products that reduce time-to-insight and

time-to-action

Page 9: DNA - Einstein - Data science ja bigdata

hard for humans data science, machine learning data engineering, data pipelines

easy for humans AI / NLP reporting, basic calculus

hard for machines easy for machines

Page 10: DNA - Einstein - Data science ja bigdata

System requirements- Infinite scale- Process 10’000++ messages per sec- Automated deploy & tests- Version control- Pay-for-use, not for-licence- Real-time pipeline, disaster recovery, exactly-once-quarantees- Real-time analytics, sub-second latency for everything- Infinite processing power for data science stuff & large analytical deployments- Array of libraries to make the data scientist’s life easier - Modular, i can change any part of it, being that software or hardware- Secure, EU referendums and Safe Harbour etc.- Pipeline and persistent storage & data platform can be done from scratch to

production in 6 months - Cant cost really anything, since had to scrape a small budget. 3 developers max.

OKAY! SOUNDS FAIR.

Page 11: DNA - Einstein - Data science ja bigdata

Business requirements- Understand the omnichannel customer- Reduce churn- Increase cross-sales- Increase product usage & increase retention- Increase marketing ROI- Insight should be real-time - Actions should be near-real-time and everyone can do them- Know where to put infrastructure better than before- Make sense of unstructured data & text & speech & so forth- Automate 80% of insight / data that was previously done by hand- Your system shall not cost anything- But it should deliver competitive advantage

OKAY! SOUNDS FAIR.

Page 12: DNA - Einstein - Data science ja bigdata

WHAT WOULD MACGYVER DO?

Page 13: DNA - Einstein - Data science ja bigdata

WHAT WOULD MACGYVER DO?

WOULD HE:a) go and buy a licence and servers

and then wait aroundb) build the damn thing from what

he happens to find with zero cost

Page 14: DNA - Einstein - Data science ja bigdata

WHAT WOULD MACGYVER DO?

YES!b) build the damn thing from what he happens to find with zero cost

Page 15: DNA - Einstein - Data science ja bigdata

Achievements & upcomingDone (within a year):Assisted investments & business (1) operations:

xx-xxx mil. / yearDirectly optimized / machine learning (2) -handled

operations: x-xx mil. / yearMachine learning* & Data Science introducedMarketing efforts from weeks to minutesAutomation from 10% to 80%Conversion on direct channels up from 50 to 300

percentAmount of automated & personalized channels

from 1 to 5 (all)One source of truth & self-made

-> we know how it works Ability to handle all types of data

Upcoming 2017:Artifical intelligence (AI)*Chatbots (AI)“Acquistion” of display advertisingUnderstanding speech (AI)Moving from CPU to GPUDNA.FI fully personalized (w/ new concept)

* Data Science -> Machine Learning -> Artifical Intelligence

Page 16: DNA - Einstein - Data science ja bigdata

whats inside

Page 17: DNA - Einstein - Data science ja bigdata

code! (surprise)

clojurepythonc++tensorflowsyntaxnetsparkscalasqlpostgresredshiftec2Rrandom forests3jenkinsansiblecnn / rnn / lstm

jupyteraerospikekafkasnowplowscikit learnmatplotlibalsk meansmllibnumpy, pandas, scipy… etc

Page 18: DNA - Einstein - Data science ja bigdata
Page 19: DNA - Einstein - Data science ja bigdata

COLLECTreal-time

batchomnichannel

COMBINEdigital to brick n mortar

digital to everythingcontext to everything

customer to everything

COMPUTErecommendations

analysisreports

segmentspredictionsdescriptions

next best actionscustomer journey

EXECUTEchurn prevention

cross-salestargeted marketing

customer service efficiencycustomer experience improvement

omnichannel optimizationreact in real time

product development

CONTROLcontinuous deploymentinfrastructure as code

Page 20: DNA - Einstein - Data science ja bigdata

Customer interface layer

Channel layer

Delivery layer

Data / Machine learning layer

Collecting layer

Page 21: DNA - Einstein - Data science ja bigdata

realtime 1.3T batch ~ 100gb

-> to redshift, we load 5’511’649’731 rows

Page 22: DNA - Einstein - Data science ja bigdata

Why redshift? reporting on top of raw data;17’072’941 rows joined to 110’773’366 rows joined to 24’945’364 rows joined to 2’297’076 rows joined to 1’841’262 rows + some dimensions and result returned in < 10 sec -> no db-admins, no indexes, no “tuning”

Page 23: DNA - Einstein - Data science ja bigdata

Class: TV, LiigaRank: 0.87, 0.90

What happens in social media? What is talked

about?

Page 24: DNA - Einstein - Data science ja bigdata

What’s wrong?

from reporting sales to reporting potential(and the ways of going from potential to sales)

Page 25: DNA - Einstein - Data science ja bigdata

R is still goooood.And jupyter.

Page 26: DNA - Einstein - Data science ja bigdata

ALS recommendations /w 1.3 T data = good

1 0 1 1 0 1 0 0 1 1 1 0 1

Page 27: DNA - Einstein - Data science ja bigdata

ALS recommendations /w 1.3 T data = good

1 0 1 1 0 1 0 0 1 1 1 0 1

Page 28: DNA - Einstein - Data science ja bigdata
Page 29: DNA - Einstein - Data science ja bigdata

culture stuff

more important than you’d think

Page 30: DNA - Einstein - Data science ja bigdata

http://www.slideshare.net/reed2001/culture-1798664/

Page 31: DNA - Einstein - Data science ja bigdata

http://www.slideshare.net/reed2001/culture-1798664/

Page 32: DNA - Einstein - Data science ja bigdata

MacGyver (remember?, what would MacGyver do) = The thinker-doer

- Usually development methods split thinkers (project managers, scrum managers, product owners and the lot) with doers (developers, analysts)

- This is (mostly) shit- You’d need people leading who also know their stuff

- Saves money, time and nerves- People communicate better

- Thinker-doers can communicate with business and translate to development actions, even develop the things themselves

Page 33: DNA - Einstein - Data science ja bigdata

Demos & openness = The secret sauce to success (and freedom to do more stuff)

- We sit on the “business floor”, right in between of basically everyone- And we almost always have something displayed on a screen- We make it easy to come and talk to us- We make demos available to everyone- We connect

- This makes all the difference

Page 34: DNA - Einstein - Data science ja bigdata

always connected kindergarten - no output but loads of fun if done right, ultimate success

forced connection (procedures!) basic IT waterfall project basic IT “agile” project

never connected cave-people? chaos

nothing changes (or we close our eyes that it does)

everything changesall-the-time

business - IT alignment

Page 35: DNA - Einstein - Data science ja bigdata

Bigdata/AI

Business

Page 36: DNA - Einstein - Data science ja bigdata

Directors* are doing their own marketing automation activities without any help

*ping Solita, how many directors code...

Page 37: DNA - Einstein - Data science ja bigdata

And now, we have business even writing their own code! (no, really)

Page 38: DNA - Einstein - Data science ja bigdata

upcoming

Page 39: DNA - Einstein - Data science ja bigdata
Page 40: DNA - Einstein - Data science ja bigdata

1st try: word2vec + naive bayes2nd try: convolutional neural net3rd try: LSTM/RNN

Page 41: DNA - Einstein - Data science ja bigdata

4th try: syntaxnet5th “try”: -> include speech recognition6th try: spaCy

Page 42: DNA - Einstein - Data science ja bigdata

7th try, part I: latent dirichlet allocation8th try: ?

Page 43: DNA - Einstein - Data science ja bigdata

Nth try: ?

Page 44: DNA - Einstein - Data science ja bigdata

Now?

in a good place. can’t fully disclose what we’re running though. :)

basically we can understand both speech and written natural language so that the language can “flow” and it can be in a chat context or in longer formats;

ex:- hi do you happen to have iPhones on stock?

- yea!- cool. what’s the price? <- have to link to previous parts of conversation

NB! this is quite simple in English but tear-your-eyes-off-to-scratch-your-brain* -hard with Finnish. we might be the first ones actually there.

*modified from: Friends, 1995, The One with the Baby on the Bus

Page 45: DNA - Einstein - Data science ja bigdata

Lessons learned

Understand the BIG THINGS (cloud, open source, omnichannel customer, data science, time-to-x)

Sit where business sits. And sit together. DO STUFF TOGETHER.

Don’t use project managers who can’t code (or who are not really good in the subject domain).

Apply advanced analytics to automate 80% of small decisions made all the time.

Continuous communication beats meetings. Don’t meet.

At least start with AI. dont just tweet about that shit.

Page 46: DNA - Einstein - Data science ja bigdata