the big data journey at connexity - big data day la 2015
TRANSCRIPT
The Big Data Journey!at Connexity!!Will [email protected]!! @gapjump!!
Connexity
Shopping powers our marketing platforms!
2!
• Paid Search & Marketplace Performance-‐based marke8ng that finds in-‐market shoppers and delivers conversions at lower cost
• Bizrate Insights A repor8ng and ra8ngs plaAorm that captures the power of the consumer voice.
• Display Media An audience ac8va8on plaAorm that integrates retail data and programma8c buying.
Lessons Learned“There’s a funny thing about regret... It’s better to regret something you have done, than something you haven’t.” – Gibby Haynes
6!
Keep It Edgy
It is better to be closer to the bleeding edge than behind the curve!
Case Study: Riak in SEM Keyword Service
7!
o Online access to metadata for keywords marketed through SEM channels!o Used in-line with handling end-user traffic from search engines – revenue impacting!o Handled 1.2 billion keywords at the time of this project!o Projected 2x growth in 12 months!o Needed to create system that could run in external cloud data center!o Existing system scaled via proprietary memory grid cache!
Keep It Edgy
Case Study: Riak in SEM Keyword Service
8!
o Prototyped several solutions: Redis, MongoDB, MySQL!o Chose Riak for scalability, stability, unfussiness!o Hardware:!
6 nodes @ 16GB RAM, 4 cores, Ubuntu VMs on KVM, RAID 5 array shared across chassis!
A few examples that graduated to production!!o Use of Cassandra within Inventory systems!o SitePerf: in-house availability monitoring tool!o Several different customer-facing advertising products!o Hadoop implementations of core bidding platform!o Mock Service: Like Wiremock with persistence to MySQL!o Numerous internal tools for managing our systems!
R & D
10% time: Give all engineers the opportunity to experiment!
9!
Quality AssuranceAny new technology choice should improve or maintain test automation coverage!
Case Study: Hadoop + Solr + BDD
11!
Existing Technologies
Reasons to stay with an older technology!
!
1. It works well!2. Your business depends on it!3. Your team is very knowledgeable in its operation!4. It fits your budget!
!!!
12!
New Technologies
Reasons to use a new technology!
!
1. It makes new things possible or very difficult things easier!
• Hadoop / MapReduce !• Auto-sharding distributed key-value data
stores (Cassandra, Hbase, VoltDB, Riak, etc)!
• Distributed stream-processing systems (Storm)!
13!
New Technologies
Reasons to use a new technology!
!
2. It will save your company money!• Hardware !• Software Licensing!• Bandwidth!• Power Consumption!
!
14!
New Technologies
Reasons to use a new technology!
!
3. It will save you time!• Time to market !• Time spent on operational complexity!• Time fighting fires!• Compute time!
16!
New Technologies
Reasons to use a new technology!
!
4. It brings you in line with industry standards!• Moving from home-grown frameworks to
Hadoop, Solr!• Where possible, running on JVM-based
systems!!
18!
Future Trends
19!
o Like you, the data we work with is only growing!o We are consolidating the number and variety of NoSQL solutions that we
use.!o We’re looking at better abstractions for Java MapReduce programming:
Crunch, Cascading, …!o Have dipped our toes in the water with Storm, but expect heavier stream-
processing needs soon!o Still looking for a bulletproof way of importing data from various sources into
Hadoop: LinkedIn’s Gobblin shows some promise there!o Big data technologies are becoming more distributed across our
organization!
!
In Closing
20!
You should:!!o Stay within walking distance of the bleeding edge!o Empower your engineers to experiment!o Always move in the direction of better automated testing!o Keep using the old technologies that are awesome!o Make new things possible!o Save your company money!o Save your company time!o Stay in line with industry standards!o Call your family once in a while!
!… and you can do all of these things on your own big data journeys!
!