better performance for big data shuya zhang; shyam sundar somasundaram [10/03/13] 1 [1] bhasker...

8

Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data” Intel White Paper, 2013 Intel Corporation. Reference

Upload: clement-todd

Post on 24-Dec-2015

218 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

Better Performance for Big Data

Shuya Zhang; Shyam Sundar Somasundaram

[10/03/13]

1

[1] Bhasker Allene, Marco Righini, “Better Performance for Big Data” Intel White Paper, 2013 Intel Corporation.

Reference

Page 2: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

What is Hadoop

Apache Hadoop

an open-source software framework Supports data-intensive distributed applications

Enables the running of

applications on large

clusters of commodity

Hardware Derived from Google's MapReduce and Google File System

(GFS) papers

2

Hadoop is one of the poster children of Big Data,

especially the most beloved one!

Page 3: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

The Intel Distribution for Hadoop framework

● Oozie is a workflow scheduler

● Hive enables SQL queries on Hadoop

● Hbase is a non-relational, distributed database

Page 4: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

Oozie Workflow

Step1 Convert EBCDIC to ASCII

Step 2 Scan for New Columns

Step3 Move Columns List to Metadata

Step4 Optimize Data for Hive

Step5 Move Data to Hive Warehouse

Step6 Drop and Create Hive Table Structure

Page 5: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

Data Flow & Data Optimization

● Benefits:- Data is stored in a normalized

way

- Hive queries quite similar to RDBMS queries

- The learning curve is minimized, with fewer computations and less disk space

- Data is easily consumable for data analysis tools

Page 6: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

Comparing SQL and Hive Queries

Query 1• RDBMS: select distinct tp_ndg as N010from scontrpf50m0130331• Hive: SELECT DISTINCT N010 FROM O0A011 LIMIT10;

Query 2• RDBMS: SELECT DISTINCT B. COD_UO, b.Tp_conto, a. Tp_ndg as N010 FROM A JOIN SCONTRPF50M0130331 CUBOM0100M0130331 B ON A. NDG = B. NDG WHERE POSIZ_SOFF_INCAGLT = '1 'and a. TP_NDG in ('DIN', 'IOC', 'SPF') and b. dt_accs_rapprt> = '2013-03-01 'and b. dt_accs_rapprt <= '2013-03-15' ORDER BY B. COD_UO, b. Tp_conto, a. TP_NDG• HIVE: SELECT DISTINCT B.DOOR, B.TP_INCOME, A.N010 FROM O0A011 A JOIN CUBOM0100M0130331 B ON A.NDG = B.NDG WHERE N011 ='1' AND A.N010 in ('DIN', 'IOC', 'SPF') AND B.R021 > '130100' AND B.R021 < '130316'; ORDER BY DOOR, TP_INCOME, N010;

Query 3• RDBMS: select b. cod_uo, b. forma_tec,TP_NDG as N010, substring (a. sae_rae, 1, 3) as N003, a. u_segmgest_2004 as N088, a. u_modserv_gest as N089, sum (b. qc_rata_scd) as D505 from scon- trpf50m0130331 to join cubom0100m0130331 b on a. ndg = b. ndg where a. TP_NDG in ('DIN', 'IOC', 'SPF') and b. forma_tec in ('MW500', 'MW100', 'MW200') group by b. cod_uo, b. forma_tec, a. TP_NDG, substring (a. sae_rae, 1, 3), a. u_segmgest_2004, a. u_modserv_gest order by cod_uo, forma_tec, TP_NDG, substring (a. sae_rae, 1, 3), a. u_seg- mgest_2004, a. u_modserv_gest

Page 7: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

What is related with our course Hadoop VS SQLhttp://www.youtube.com/watch?v=3Wmdy80QOvw#t=16

New Trend: Big Data, Cloud Computing

RDBMS(Relational DBMS)

OLAP

NoSQL

Database Management Systems

Tables Cubes Collections

StructuredData

StructuredData

Structured/Unstructured

http://www.youtube.com/watch?v=3Wmdy80QOvw%23t=16

http://www.youtube.com/watch?v=3Wmdy80QOvw%23t=16

Page 8: Better Performance for Big Data Shuya Zhang; Shyam Sundar Somasundaram [10/03/13] 1 [1] Bhasker Allene, Marco Righini, “Better Performance for Big Data”

Questions?

Molecular Interactions of Turmeric with Cancer Chemotherapyagrilifecdn.tamu.edu/phytochemicals/files/2011/03/... · 2016-12-21 · with Cancer Chemotherapy Dr. Siva Somasundaram Professor

Information Storage and Management...Information Storage and Management Storing, Managing, and Protecting Digital Information Edited by G. Somasundaram Alok Shrivastava EMC Education

Giancarlo C. Righini Andrea Chiappini€¦ · Glass optical waveguides: a review of fabrication techniques Giancarlo C. Righini a,b, * and Andrea Chiappini c a Enrico Fermi Centre,

Hummingbird: DreamWorks Feather System...Hummingbird: DreamWorks Feather System Author: Nicholas Augello, David Tonnesen, and Arunachalam Somasundaram Subject - Computing methodologies

- Are you Shuya ? -Yes, we are Shuya !

Presented by: Hei Cheng, Kieng Iv, May Leung, Tiffany Liu, Vikram Somasundaram

Righini - Exercises Pour Se Perfectionner Dans l Art Du Chant

Data Archiving How to archive (back-up) your data KLB Crystallography Club Meeting | Sept. 29, 2004 | Thayumanasamy Somasundaram

Asian Coalition of Tallahassee - Somasundaram · •Filipinos considered American nationals ... Asian American Heritage Month ... –Asian Coalition of Tallahassee –Asian American

N. Bianchessi ([email protected]) G. Righini ([email protected])

Branch-and-price algorithms for partitioning problemsoptlab.di.unimi.it/docs/PhDCeselli.pdf · Acknowledgments I ﬂrstly wish to thank my supervisor Giovanni Righini: in the last

Tan Dun, Guo Wenjing Ge Ganru, Qu Xiaosong Xu Shuya, Mo Wuping · Tan Dun, Guo Wenjing Ge Ganru, Qu Xiaosong Xu Shuya Hommage à Mo Wuping 1959-1993 Opéra National de Paris/Amphithéâtre

Launch presentation January 14 - 16, 2004 West Dealers Sharon Shuya Director of Marketing

Computationally Efficient Robust Adaptive Beamforming for ... · COMPUTATIONALLY EFFICIENT ROBUST ADAPTIVE BEAMFORMING FOR PASSIVE SONAR S. D. Somasundaram∗, N. R. Butt†,A. Jakobsson†,

Room P - main.spsj.or.jpmain.spsj.or.jp/nenkai/67nenkai/en/67poster_pro_e0416.pdf · Chigusa Nagano,Aya Fujimoyo,Shiori Masuda,Chao Hung Cheng ... Saki Kawana,Shuya Nakai,Shintaro

RESEARCH NEWSLETTER - McGill University · Mehmet Gumus, Saibal Ray and Shuya Yin, "Channel Returns Policies Between Channel Partners for Durable Products with Used Goods," Marketing

Alessandra De Marco Massimo DIsidoro, Mihaela Mircea, Gaia Righini, Lina Vitali Bologna, 23-24 Marzo

Hill climbing: Simulated annealing and Tabu search ... · Hill climbing: Simulated annealing and Tabu search Heuristic algorithms Giovanni Righini University of Milan ... (it can

Incentive Program Information Sharon Shuya TELUS Complementary Channels Director of Marketing

Good Lab Practices & Etiquettes - sb.fsu.edu · Good Lab Etiquettes & Procedures 2006 Thayumanasamy Somasundaram With additional contributions from: Mr. Wade Baggett, Ms. Janice Dodge

Unipol Gruppo Finanziario 2013 Annual Report · Milo Pacchioni Elisabetta Righini ... is insufficient to relaunch the Italian economy. ... Unipol Gruppo Finanziario

Cover created by Arunachalam Somasundaram

Introduction - cvgmt.sns.itcvgmt.sns.it/media/doc/paper/4093/CPn_6_11_18.pdf · ERIKA BATTAGLIA, ROBERTO MONTI, AND ALBERTO RIGHINI Abstract. We characterize the sphere with radius

Java Service Replication Mattia Righini Mat: 0000170738

February 15, 2007©2007 Thayumanasamy & Gnanabhanu Somasundaram1 India: Heritage, Culture, & Customs Thayumanasamy & Gnanabhanu Somasundaram Tallahassee,

A unified materials approach to mitigating optical ...righini/TC20/Ballato_Mitigating Nonlinearities III... · optical fibers whose enhanced performance—in particular, marked reductions

MIT Sloan Additional Document_Madan Somasundaram

Heuristic Algorithms Giovanni Righini - unimi.it › righini › Didattica › AlgoritmiEuristici › MaterialeAE › 2...Combinatorial optimization The combinatorial problems structure

Optical Properties of Transparent Glass–Ceramics ...righini/TC20/PASCUAL_ijags_2016.pdf · The possible excitation mechanisms responsible for this up-conversion luminescence are

Adding custom instructions to Simplescalar/GCC architecture Somasundaram

5-7 December, Sydney, Australia€¦ · Shuya Wei, Institute of Cultural Heritage and History of Science & Technology, Beijing University of Science and Technology, Beijing, China

Righini - Exercises Pour Se Perfectionner Dans l Art Du Chant (1)

An Introduction to Optoelectronic Sensors (Giancarlo C. Righini, Antonella Tajani, Antonello Cutolo, 2009)

Viji Somasundaram, Director & OSA Team Members · Viji Somasundaram, Director & OSA Team Members March 16, ... listening scripts may be provided to test ... Syble Hopp Secondary and

Unipol Gruppo Finanziario 2013 Consolidated Financial ... · Milo Pacchioni Elisabetta Righini ... relaunch the Italian economy. Financial markets ... Unipol Gruppo Finanziario