bigdata : big picture
Post on 19-Feb-2017
777 Views
Preview:
TRANSCRIPT
ZEKERIYA BEŞIROĞLUBILGINC IT ACADEMYORACLE CLOUD DAY19-11-2015TROUG-TURKISH ORACLE USER GROUP
BIG DATA : BIG PICTURE
ZEKERIYA BEŞIROĞLU▸ +18 IT
▸ +15 ORACLE DB&DWH
▸ +3 BIG DATA
▸ Leader of TROUG
▸ Instructor&Consultant
▸ http://zekeriyabesiroglu.com
▸ @zbesiroglu
TROUG BIG DATA BIG PICTURE
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
METIN
BIG DATA
Social networksBanking and financial servicesE-commerce servicesWeb-centric servicesInternet search indexesScientific and document searchesMedical recordsWeb loggs
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
METIN
BIG DATA
▸ VOLUME▸ VELOCITY▸ VARIETY
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
FIRMALAR , MÜŞTERILERININ DNA SINI ANALIZ ETMEK ZORUNDALAR.
Zekeriya Beşiroğlu
TROUG
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
TROUG
BIG DATADA HEDEF NEDİR? NASIL YAPILMALI?▸ Big data teknolojilerini kullanarak business’a nasıl
değer katabilirim. Bir takım costları azaltabilirmiyim?▸ Big Data ile geleneksel database nasıl entegre
edeceğim? Structured,semi structured ve unstructured verileri birleştirme
▸ Analytics toolları ile sonuça ulaşma. Oracle Advance Analytics,BI ve DW teknolojileri
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
TROUG
DATA
▸ Schema on Write yapıyoruz
▸ Schema on READ yapalım.
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
TROUG
BIG DATA PROJESI SAFHALARI
▸ DATA ACQUISITION and Storage▸ DATA ACCESS and Processing▸ Data Unification and Analysis
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACQUISITION AND STORAGE
HADOOP DISTRIBUTED FILE SYSTEM-HDFS
▸ petabyte-scale distributed file system▸ linearly scalable on commodity hardware▸ Schema on Read▸ Cheaper▸ low security▸ write once,read many
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACQUISITION AND STORAGE
HADOOP DISTRIBUTED FILE SYSTEM-HDFS
▸ Basic file system operations
▸ JSON log file HDFS yükleyebilirim. (hadoop fs -put)
DATA ACQUISITION AND STORAGE
WHAT IS FLUME?
▸ Avro Source▸ Memory Channel▸ HDFS Sink
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACQUISITION AND STORAGE
ORACLE NOSQL DATABASE
▸ Key Value Database▸ Access by java Apı▸ Stores unstructured or semi structured data as byte
arrays▸ Highly reliable▸ Scalable throughput and predictable latency
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACQUISITION AND STORAGE
RDBMS & NOSQL
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACQUISITION AND STORAGE
HDFS & NOSQL
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACQUISITION AND STORAGE
APPLICATION DATABASE TECHNOLOGY
▸ High Volume with Low value▸ Dynamic application schema
▸ if answer yes NOSQL
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACQUISITION AND STORAGE
NOSQL EXAMPLE
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
MAP REDUCE
▸ Write applications that process vast amounts of data , in parallel on large cluster of commodity hardware in reliable and fault tolerant.
▸ Storing data in HDFS is low cost , fault tolerant and scalable.
▸ Integrates with HDFS to provide parallel data processing
▸ Batch-oriented
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
MAP REDUCE ORNEKmap(String input_key, String input_value)foreach word w in input_value:emit(w, 1)reduce(String output_key, Iterator<int> intermediate_vals) set count = 0 foreach v in intermediate_vals: count += vemit(output_key, count)
(1000,’Galatasaray sampiyon olur’)(2000,’beşiktas sampiyon olur’)(2200,’Galatasaray Türkiyedir’)(3000,’fenerbahce sampiyon olur’)
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
MAP REDUCE ORNEKOutput Mapper(‘Galatasaray’, 1), (‘sampiyon’, 1), (‘olur’, 1), (‘beşiktas’, 1),(‘sampiyon, 1), (‘olur’, 1), (‘Galatasaray’, 1), (‘Türkiyedir’, 1) (‘fenerbahce’, 1),(‘sampiyon, 1), (‘olur’, 1)
Intermediate Data Reducer’a gönderilen(‘Galatasaray’,[1,1])(‘sampiyon’,[1,1,1])(‘olur’,[1,1,1])(‘beşiktas’,[1])(‘fenerbahce’,[1])(‘Türkiyedir’,[1])
Reducer’ın son cıktısı
(‘sampiyon’,3)(‘olur’,3)(‘Galatasaray’,2)(‘fenerbahce’,1)(‘beşiktas’,1)(‘Türkiyedir’,1)
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
HIVE
▸ SQL to query HDFS by using Hive QL(SQL like language)
▸ Hive transform HiveQL queries into standard Mapreduce jobs
▸ Schema on Read via InputFormat and SerDe▸ Not ideal for ad hoc(slow)▸ Immature optimizer
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
HIVE
▸ Log Processing▸ Text mining▸ Document Indexing▸ Business Analytics▸ Predictive Modeling▸ Not ideal for ad hoc query
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
PIG
▸ Open Source Data flow system▸ simple language for queries and data manipulation,
which is compiled into map-reduce jobs that are run on hadoop
▸ Provides common operations like join,group,sort▸ Works on files in HDFS▸ Ad hoc queries across large data sets.▸ log analysis
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
CLOUDERA IMPALA▸ DATABASE -LIKE SQL layer on top of Hadoop▸ Distributed,massively parallel processing database
engine▸ SQL is the primary development language▸ Open Source,Impala process data in hadoop cluster
WITHOUT using MapReduce▸ Interactive analysis on data stored in HDFS and
Hbase
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
ORACELE XQUERY FOR HADOOP
▸ Is a transform engine for semistructured data that is stored in Apache Hadoop
▸ Transform Xquery language translating them into series of Mapreduce
▸ load data efficiently into Oracle Database by using Oracle Loader for Hadoop
▸ Provides read and write support to Oracle NOSQL DB
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
ORACELE XQUERY FOR HADOOP
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
APACHE SPARK
▸ Open Source parallel data processing▸ Develop Fast▸ Online Streaming▸ Interactive analytics▸ Machine Learning▸ Speed
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA ACCESS AND PROCESSING
APACHE SPARK ÖRNEK
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA UNIFICATION AND ANALYSIS
APACHE SQOOP
▸ Batch Loading▸ Transfer bulk data between structured data stores
and Apache Hadoop▸ Data import and Export between external data stores
and Hadoop▸ Parallelizes data transfer for fast performance
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA UNIFICATION AND ANALYSIS
ORACLE LOADER FOR HADOOP
▸ Batch Loading▸ High performance loader for fast movement of data
from Hadoop into a table in Oracle Database▸ Loading using online and offline modes▸ offloading expensive data processing from the
database server to hadoop
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA UNIFICATION AND ANALYSIS
COPY TO BDA▸ Batch Loading
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA UNIFICATION AND ANALYSIS
ORACLE SQL CONNECTOR FOR HADOOP
▸ Generate external table in database pointing to HDFS data
▸ Load into database or query data in place on HDFS
▸ Fine-grained control over type mapping
▸ Parallel load with automatic load balancing
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA UNIFICATION AND ANALYSIS
ORACLE TECHNOLOGIES
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
DATA UNIFICATION AND ANALYSIS
ORACLE ADVANCED ANALYTICS
▸ OAA=Oracle Data Mining+Oracle R enterprise▸ Performance▸ Predictive Analytics▸ Easy
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
METIN
ORACLE BDA BENEFITS
▸ Ships with leading Hadoop distribution(Cloudera)
▸ Hdfs,hbase,hive,flume,kafka,spark …
▸ Cloudera manager
▸ Ships with great connectivity to Oracle Db
▸ Big Data SQL
▸ Big Data Connectors & ODI
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
TEŞEKKÜRLERZEKERIYA BEŞIROĞLUBILGINC IT ACADEMY
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG PICTURE
top related