webanalytics with haddop and hive

16
Website Analytics System Technology and system Hive – Hadoop

Upload: le-kien-truc

Post on 02-Jul-2015

130 views

Category:

Software


2 download

DESCRIPTION

Webanalytics with haddop and Hive

TRANSCRIPT

Page 1: Webanalytics with haddop and Hive

Website Analytics System

Technology and systemHive – Hadoop

Page 2: Webanalytics with haddop and Hive

Hadoop overview

Page 3: Webanalytics with haddop and Hive

Hadoop overview

Page 4: Webanalytics with haddop and Hive

How hadoop works?

Page 5: Webanalytics with haddop and Hive

How hadoop works?

Page 6: Webanalytics with haddop and Hive

Map Reduce

Page 7: Webanalytics with haddop and Hive

Map Reduce

void map(String name, String document): // name: document name // document: document contents for each word w in document: EmitIntermediate(w, "1"); void reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts int sum = 0; for each pc in partialCounts: sum += ParseInt(pc); Emit(word, AsString(sum));

Page 8: Webanalytics with haddop and Hive

Hive

Hive is a data warehouse system for Hadoop that:

facilitates easy data summarization ad-hoc queries the analysis of large datasets stored in

Hadoop query the data using a SQL-like language

called HiveQL.

Page 9: Webanalytics with haddop and Hive

Hive example

SELECT COUNT(*) FROM u_data;

FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo;

Simple select

Join

Page 10: Webanalytics with haddop and Hive

Hive Data Units

Databases Tables Partitions Buckets

Page 11: Webanalytics with haddop and Hive

Hive Type System

Primitive Types (Integers, String, Float ...) Complex Types :

− Structs− Maps− Arrays

Page 12: Webanalytics with haddop and Hive

Partition query

INSERT OVERWRITE TABLE xyz_com_page_views SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31' AND page_views.referrer_url like '%xyz.com';

Page 13: Webanalytics with haddop and Hive

User Tracking System

Javascript collect data on tracking site

Log serverReceive and store data

on hadoop

Report dashboardQuery, analysis and

display useful informations

Page 14: Webanalytics with haddop and Hive

Database design

idvisit id của visit, tự động tăng

idsite id của site

idvisitor id của người truy cập (lưu trong cookies)

fptid FPT id

visitor_localtime thời gian của lần truy cập (lấy từ máy tính user)

visitor_returning vistor mới hay cũ

referer_type dạng reference

referer_name tên reference

referer_url địa chỉ reference

referer_keyword từ khóa reference (tìm kiếm)

config_id hash từ các thông số bên dưới

location_ip địa chỉ IP

Visit Table

Page 15: Webanalytics with haddop and Hive

Database design

Action Table

idaction id của action, tự động tăng

idsite id của site

idvisitor id của người truy cập (lưu trong cookies)

server_time thời gian của action trên server

idvisit id của Visit

action_url địa chỉ action

action_name Tên của action

action_type loại action

action_url_ref Reference của action

action_name_ref Tên reference của action

Page 16: Webanalytics with haddop and Hive