hive@king threshing data

Post on 23-Mar-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

hive@king Threshing data. Mattias Andersson, BI Developer, matte@king.com. - PowerPoint PPT Presentation

TRANSCRIPT

hive@kingThreshing dataMattias Andersson, BI Developer, matte@king.com

“Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.”

3

Agenda

• A short history of King• Why do we use hive at King?• I will discuss hive from an analytics and data warehouse

user perspective• Keep it short

This is

Bragging warning!

Level 1

Thomas Hartwig (CTO), Patrik Stymne (Architect) Sebastian Knutsson (Chief Product Officer), Riccardo Zacconi (CEO), Lars Markgren (GM Sweden)

Founded in 2003 by a bunch of ex-Spray guys

+ in London, Malmö, Bucharest, San Fran, Malta & Barcelona.

A European developer with its heart in Sthlm

”Silicontull”

We create & publish casual games

2003-2010

200+ casual games

The foundation for our crusade on Facebook and mobile

2003-2010

Fucked by Facebook (FBF Index)

500m

2004 2005 2006 2007 2008 2009 2010

Facebook unique visitors Yahoo Games US unique visitors

Fall of 2010

Facebook Fall of 2012, Industry

experts:

“King missed the train, it’s too

late now” “Zynga and Wooga

owns the market”

King’s response?

It is never too late to disrupt an industry

April 2011: Bubble Saga on Facebook 2011The Saga format

Bubble Saga was a hit…n.7 on Facebook after 4 months

Daily Active Users (DAU

2.4 million DAU!

April 2011

Bubble Witch Saga…

Daily Active Uniques (DAU)

Explosive growth: from 0 to 6 million daily players

in 4 months

Oct 2011-2012

1 year growth: from 220,000 DAU to 8,500,000!

Mobile: July 2012

Mobile July 2012 - now

Also #1 top grossing app in Sweden since February

19

How we succeeded technically speaking…Our platform

Tech choices:Application – 96 servers (java)MySQL – 59 serversMemcache – 24 serversHadoop cluster – 20 servers

How it all works from a BI perspectiveMySQL shards with user state, they are off limits for BIThe game logs events whenever something interesting has happenedHourly rolling of logs to central logserver where we fetch the data

20

Big data, bigger metadataMetadata…

21

We are on our way…Are we Big Data?

22

The most important successfactor for hiveHive connectivity

Web interface to hiveEasy to use so is a great first encounter

Hue

Enables us to pull data from hive into Qlikview/R/Excel

ODBC

The default/advanced interfaceCommand line interface

Different interfaces use different escape sequences/variable substitution…

Scumbag hive:

23

This is what sold it to meHive programmability

Hive custom transformfrom ( from dual map a using 'seq 1 5' as sequence int

sort by sequence ) map_outreduce sequenceusing 'awk "{sum+=$0\; print sum}"' as cumulative int;Output:1361015

Really easy to make something horribly unmaintainable. Perl/xslt/wget in one hql-statement…

Scumbag hive:

24

Map as a double entendreHive complexity

Map datatypecreate table if not exists test2(test map<string,map<string,int>>)ROW FORMAT DELIMITEDSTORED AS TEXTFILE;

select test ["test"]["x"] from test2;

There is no syntax to declare map/array separators after the first for hive in textfile format, \004 \005 and \006 \007 is hardcoded.

Scumbag hive:

25

Its complicated…So why did we choose to use hive?

Pros• SQL is easy to learn• Supports custom mapreduce jobs• ODBC connection for QlikView• Hue for lightweight access• Development is moving fast • Open source

Cons• High latency• Lots of moving parts• Not free from bugs

The end.

top related