the big data journey at connexity - big data day la 2015

20
The Big Data Journey at Connexity Will Gage [email protected] @gapjump

Upload: will-gage

Post on 14-Aug-2015

204 views

Category:

Software


1 download

TRANSCRIPT

The Big Data Journey!at Connexity!!Will [email protected]!! @gapjump!!

Connexity

Shopping powers our marketing platforms!

2!

•  Paid  Search  &  Marketplace  Performance-­‐based  marke8ng  that  finds  in-­‐market  shoppers  and  delivers  conversions  at  lower  cost  

•  Bizrate  Insights  A  repor8ng  and  ra8ngs  plaAorm  that  captures  the  power  of  the  consumer  voice.  

•  Display  Media  An  audience  ac8va8on  plaAorm  that  integrates  retail  data  and  programma8c  buying.  

Connexity History

Don’t worry - there is no test later!

3!

Connexity Technology

The Pre-Big Data Era!!

4!

Connexity Technology

The Big Data Explosion!!!

5!

Lessons Learned“There’s a funny thing about regret... It’s better to regret something you have done, than something you haven’t.” – Gibby Haynes

6!

Keep It Edgy

It is better to be closer to the bleeding edge than behind the curve!

Case Study: Riak in SEM Keyword Service

7!

o  Online access to metadata for keywords marketed through SEM channels!o  Used in-line with handling end-user traffic from search engines – revenue impacting!o  Handled 1.2 billion keywords at the time of this project!o  Projected 2x growth in 12 months!o  Needed to create system that could run in external cloud data center!o  Existing system scaled via proprietary memory grid cache!

Keep It Edgy

Case Study: Riak in SEM Keyword Service

8!

o  Prototyped several solutions: Redis, MongoDB, MySQL!o  Chose Riak for scalability, stability, unfussiness!o  Hardware:!

6 nodes @ 16GB RAM, 4 cores, Ubuntu VMs on KVM, RAID 5 array shared across chassis!

A few examples that graduated to production!!o  Use of Cassandra within Inventory systems!o  SitePerf: in-house availability monitoring tool!o  Several different customer-facing advertising products!o  Hadoop implementations of core bidding platform!o  Mock Service: Like Wiremock with persistence to MySQL!o  Numerous internal tools for managing our systems!

R & D

10% time: Give all engineers the opportunity to experiment!

9!

R & D10% time: Give all engineers the opportunity to experiment!

10!

Quality AssuranceAny new technology choice should improve or maintain test automation coverage!

Case Study: Hadoop + Solr + BDD

11!

Existing Technologies

Reasons to stay with an older technology!

!

1.  It works well!2.  Your business depends on it!3.  Your team is very knowledgeable in its operation!4.  It fits your budget!

!!!

12!

New Technologies

Reasons to use a new technology!

!

1. It makes new things possible or very difficult things easier!

•  Hadoop / MapReduce !•  Auto-sharding distributed key-value data

stores (Cassandra, Hbase, VoltDB, Riak, etc)!

•  Distributed stream-processing systems (Storm)!

13!

New Technologies

Reasons to use a new technology!

!

2. It will save your company money!•  Hardware !•  Software Licensing!•  Bandwidth!•  Power Consumption!

!

14!

New Technologies

Reasons to use a new technology: saving money!!

15!

New Technologies

Reasons to use a new technology!

!

3. It will save you time!•  Time to market !•  Time spent on operational complexity!•  Time fighting fires!•  Compute time!

16!

New Technologies

Reasons to use a new technology: saving time!!Example: FastTrack!

!

17!

New Technologies

Reasons to use a new technology!

!

4. It brings you in line with industry standards!•  Moving from home-grown frameworks to

Hadoop, Solr!•  Where possible, running on JVM-based

systems!!

18!

Future Trends

19!

o  Like you, the data we work with is only growing!o  We are consolidating the number and variety of NoSQL solutions that we

use.!o  We’re looking at better abstractions for Java MapReduce programming:

Crunch, Cascading, …!o  Have dipped our toes in the water with Storm, but expect heavier stream-

processing needs soon!o  Still looking for a bulletproof way of importing data from various sources into

Hadoop: LinkedIn’s Gobblin shows some promise there!o  Big data technologies are becoming more distributed across our

organization!

!

In Closing

20!

You should:!!o  Stay within walking distance of the bleeding edge!o  Empower your engineers to experiment!o  Always move in the direction of better automated testing!o  Keep using the old technologies that are awesome!o  Make new things possible!o  Save your company money!o  Save your company time!o  Stay in line with industry standards!o  Call your family once in a while!

!… and you can do all of these things on your own big data journeys!

!