hadoop presentation

17
Hadoop Presentation 2012 Presenter : Pham Thai Hoa Email : [email protected] Web : http://mobion.com/hoa 08/29/2022 Pham Thai Hoa

Upload: pham-hoa

Post on 08-May-2015

3.761 views

Category:

Technology


0 download

DESCRIPTION

Hadoop Presentation

TRANSCRIPT

Page 1: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Hadoop Presentation 2012

Presenter : Pham Thai HoaEmail : [email protected] : http://mobion.com/hoa

Page 2: Hadoop Presentation

04/11/2023 Pham Thai Hoa

TopicIntroduce to HadoopIntroduce to HiveIntroduce to LoggerUsing Hadoop at MobionWarehouse at MobionQ&A

Page 3: Hadoop Presentation

04/11/2023 Pham Thai Hoa

What is HadoopIt’s a framework for the distributed

processingInspired by Google’s architecture:

Map Reduce and GFSA top-level Apache projectHadoop is the open sourceHadoop have the two important

elements+ Map – Reduce core+ Hadoop Distributed File System

Page 4: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Why use HadoopFault-tolerant hardware is expensiveHadoop is designed to run on cheap

commodity hardwareIt automatically handles data

replication and node failureIt does the hard work – you can focus

on processing dataIt has the three supported modes :

Local, Pseudo-Distributed, Fully-Distributed Mode

Page 5: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Data Flow into Hadoop

Page 6: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Who use HadoopAmazon's product search indices

using the streaming API and pre-existing C++, Perl, and Python tools

Yahoo : More than 100,000 CPUs in >40,000 computers running Hadoop

Facebook use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning

Page 7: Hadoop Presentation

04/11/2023 Pham Thai Hoa

What is HiveHive is a data warehouse system

for HadoopUsing Map-Reduce for executionUsing HDFS for storageMetadata in an RDBMSScalability and performanceInteroperabilityUsing a SQL-like language called

HiveQL

Page 8: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Data Flow into Hive

Page 9: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Hive Data ModelTables

+ Typed columns (int, float, string,…)+ Also, array/map/struct for JSON-like data

Partitions+ e.g., to range-partition tables by date

Buckets+ Hash partitions within ranges (useful for sampling, join optimization)

Page 10: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Hive MetastoreDatabase: namespace containing

a set of tablesHolds Table/Partition definitions

(column types,mappings to HDFS directories)

StatisticsImplemented with DataNucleus

ORM. Runs on Derby, MySQL, and many other relational databases

Page 11: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Introduce to LoggerA logging system has three broad

components+ Client Code Interface+ Distribution System+ Do Something Usefullizer

Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures

Page 12: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Why use ScribeScalability and performanceEvent Notification libraryThrift frameworkHadoop is optionalClient usingDistributed scribe systemOver 1 million messages per

second for loggingHierarchy stores

Page 13: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Warehouse at MobionLog CollectorLog/Data TransformerData AnalyzerWeb ReporterLog defineLog integrate (into application)Log/Data analyzeReport develop (API, Mobion,

Music …)

Page 14: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Warehouse at MobionData miningMusic RecommendationSpam DetectionApplication performanceExport data and import into

MySQL for web reportAnalytic system

Page 15: Hadoop Presentation

04/11/2023 Pham Thai Hoa

Q&AWhy use hadoop ?Why use Hive ?Why need a logging system ?What is the warehouse system

architecture ?Do we use these system for

voting, chat, message and feed ??How can we use them for

recommendation, suggestion ?

Page 17: Hadoop Presentation

04/11/2023 Pham Thai Hoa

THANK YOU