Download - Hadoop Presentation
![Page 1: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/1.jpg)
04/11/2023 Pham Thai Hoa
Hadoop Presentation 2012
Presenter : Pham Thai HoaEmail : [email protected] : http://mobion.com/hoa
![Page 2: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/2.jpg)
04/11/2023 Pham Thai Hoa
TopicIntroduce to HadoopIntroduce to HiveIntroduce to LoggerUsing Hadoop at MobionWarehouse at MobionQ&A
![Page 3: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/3.jpg)
04/11/2023 Pham Thai Hoa
What is HadoopIt’s a framework for the distributed
processingInspired by Google’s architecture:
Map Reduce and GFSA top-level Apache projectHadoop is the open sourceHadoop have the two important
elements+ Map – Reduce core+ Hadoop Distributed File System
![Page 4: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/4.jpg)
04/11/2023 Pham Thai Hoa
Why use HadoopFault-tolerant hardware is expensiveHadoop is designed to run on cheap
commodity hardwareIt automatically handles data
replication and node failureIt does the hard work – you can focus
on processing dataIt has the three supported modes :
Local, Pseudo-Distributed, Fully-Distributed Mode
![Page 5: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/5.jpg)
04/11/2023 Pham Thai Hoa
Data Flow into Hadoop
![Page 6: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/6.jpg)
04/11/2023 Pham Thai Hoa
Who use HadoopAmazon's product search indices
using the streaming API and pre-existing C++, Perl, and Python tools
Yahoo : More than 100,000 CPUs in >40,000 computers running Hadoop
Facebook use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning
![Page 7: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/7.jpg)
04/11/2023 Pham Thai Hoa
What is HiveHive is a data warehouse system
for HadoopUsing Map-Reduce for executionUsing HDFS for storageMetadata in an RDBMSScalability and performanceInteroperabilityUsing a SQL-like language called
HiveQL
![Page 8: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/8.jpg)
04/11/2023 Pham Thai Hoa
Data Flow into Hive
![Page 9: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/9.jpg)
04/11/2023 Pham Thai Hoa
Hive Data ModelTables
+ Typed columns (int, float, string,…)+ Also, array/map/struct for JSON-like data
Partitions+ e.g., to range-partition tables by date
Buckets+ Hash partitions within ranges (useful for sampling, join optimization)
![Page 10: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/10.jpg)
04/11/2023 Pham Thai Hoa
Hive MetastoreDatabase: namespace containing
a set of tablesHolds Table/Partition definitions
(column types,mappings to HDFS directories)
StatisticsImplemented with DataNucleus
ORM. Runs on Derby, MySQL, and many other relational databases
![Page 11: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/11.jpg)
04/11/2023 Pham Thai Hoa
Introduce to LoggerA logging system has three broad
components+ Client Code Interface+ Distribution System+ Do Something Usefullizer
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures
![Page 12: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/12.jpg)
04/11/2023 Pham Thai Hoa
Why use ScribeScalability and performanceEvent Notification libraryThrift frameworkHadoop is optionalClient usingDistributed scribe systemOver 1 million messages per
second for loggingHierarchy stores
![Page 13: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/13.jpg)
04/11/2023 Pham Thai Hoa
Warehouse at MobionLog CollectorLog/Data TransformerData AnalyzerWeb ReporterLog defineLog integrate (into application)Log/Data analyzeReport develop (API, Mobion,
Music …)
![Page 14: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/14.jpg)
04/11/2023 Pham Thai Hoa
Warehouse at MobionData miningMusic RecommendationSpam DetectionApplication performanceExport data and import into
MySQL for web reportAnalytic system
![Page 15: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/15.jpg)
04/11/2023 Pham Thai Hoa
Q&AWhy use hadoop ?Why use Hive ?Why need a logging system ?What is the warehouse system
architecture ?Do we use these system for
voting, chat, message and feed ??How can we use them for
recommendation, suggestion ?
![Page 16: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/16.jpg)
04/11/2023 Pham Thai Hoa
Following Linkhttp://facebook.comhttp://highscalability.com/product
-scribe-facebooks-scalable-logging-system
http://hadoop.apache.org/http://hive.apache.org/http://wiki.apache.org/hadoop/Po
weredByhttp://www.apache.org/foundatio
n/thanks.html
![Page 17: Hadoop Presentation](https://reader036.vdocument.in/reader036/viewer/2022082920/554caa99b4c905335b8b46f5/html5/thumbnails/17.jpg)
04/11/2023 Pham Thai Hoa
THANK YOU