impala first stepsofa oracle expert · project – twitter 400 –500 mio tweets per day 1 tweet...
TRANSCRIPT
2015©Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIENBASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
BigDataImpala
Why doIneed Impala?Firststeps of aOracleExpert
Author: JanOtt– Trivadis AG
BigData- Impala1
2015©Trivadis
Ourcompany.
©Trivadis– TheCompany2
21.11.15
Trivadis is amarket leader inITconsulting,system integration,solutionengineeringand the provision of ITservices focusingonandtechnologiesinSwitzerland,Germany,Austriaand Denmark.We offer our services inthefollowing strategicbusiness fields:
Trivadis Servicestakes over the interactingoperation of your ITsystems.
O P E R A T I O N
2015©Trivadis
COPENHAGEN
MUNICH
LAUSANNEBERN
ZURICHBRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
Withover600specialistsandITexpertsinyourregion.
©Trivadis– TheCompany3
21.11.15
14Trivadis branches andmore than600employees
200ServiceLevelAgreements
Over4,000trainingparticipants
Researchand development budget:CHF5.0million
Financially self-supportingandsustainably profitable
Experiencefrommore than 1,900projects peryear atover 800customers
2015©Trivadis
JanOtt – WhoamI?
BigData- Impala4
§ 25+yearsinIT§ 25+yearsusingOracle§ 15+yearsforTrivadis AG
§ BI– DWH
§ Tuning§ Speaker/Trainer
§ http://janottblog.wordpress.com/
2015©Trivadis
Agenda
1. Introduction2. FirstStepsintheImpalaWorld3. Project– HadoopasafilestorefortheDWH4. Project– HadoopforarchivingofaOracleDB5. Project- Twitter6. Summary
BigData- Impala5
2015©Trivadis
Introduction§ AfewwordsaboutBigData
§ BigData§ Hadoop§ Impala– Why?
§ Impala– myfirststeps§ GetsomedataintoHadoop§ TablesinImpala§ UseSQL§ Diverse
§ Project1– DatainHadoopforaDWH§ Project2– HadoopforArchiving§ Project3– Twitter
BigData- Impala6
2015©Trivadis
BigData:Introduction§ BigData- V’s– 3,4or5
§ Volume– scaleofdata§ Velocity– analysisofstreamingdata§ Variety– differentformofdata§ Veracity– uncertaintyofdata(IBM)§ Value– businessvalue(Microsoft)
§ HadoopanditsZoo§ HDFS– MapReduce§ Impala,HBase,Hive,…§ Zookeeper
§ NoSQL Databases§ Architecture
§ LAMBDA
BigData- Impala7
TurningDataintoInsights
2015©Trivadis
What is Hadoop
§ afilesystem– HDFS§ BasedonpapersfromGoogle§ ApacheOpenSourceProject
§ Goal§ Fast§ Handleshugeamountofdata§ Handlesunstructured tofullystructureddata§ Horizontally scalable§ Reliable
BigData- Impala8
2015©Trivadis
What is Impala
§ aSQLQueryEngineontoHDFS– Hive§ NotanApacheOpenSourceProject§ OpenSource– Cloudera,Oracle,Amazon
§ Hive&Impala§ ImpalausesthemetadatastoreofHive
§ Goal§ Easytouse- SQL§ Fast§ Handleshugeamountofdata§ Handlesunstructured tofullystructureddata§ Horizontally scalable§ Reliable
BigData- Impala9
2015©Trivadis
Agenda
1. Introduction2. FirstStepsintheImpalaWorld3. Project– HadoopasafilestorefortheDWH4. Project– HadoopforarchivingofaOracleDB5. Project- Twitter6. Summary
BigData- Impala10
2015©Trivadis
FirstSteps
§ Keepitsimple
§ GetsomedataintoHadoop
§ GetsomedataintoImpala
§ Java– keepittoaminimum
§ Getanenvironmentthatissetup§ OracleVM– BigDataLight
§ PickonewaytogetthedataintoImpala§ Impalashellinterface
§ SeeSQLonaHDFSsystem
BigData- Impala11
2015©Trivadis
Pre-Requisite– Environment
§ OracleBigDataLite§ VM§ Version4.1§ http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-
bigdatalite-2104726.html
§ Contains§ OracleDatabase12c(12.1.0.2)§ Cloudera’s Distribution including ApacheHadoop (CDH5.1.2)§ Hadoop2.3.0§ Hive0.12.0§ Impala2.1§ OracleBigDataConnectors4.0§ OracleSQLDeveloper4.0.3
§ OracleVirtualBox
BigData- Impala12
2015©Trivadis
Informationabout the VM
§ Login§ oracle/welcome1
§ Starthere§ file:///home/oracle/GettingStarted/StartHere.html
§ Start§ OracleDB§ Hive§ Impala§ HDFS
§ Yourdonepreparing
§ OraclehasaMovieexample
BigData- Impala13
2015©Trivadis
TheSteps – simple– focus
BigData- Impala14
SQLQuery
ImpalaTable
2015©Trivadis
Impala
§ Sourcescanbe§ Adelimitedtextfileinthehostfilesystem
- whichwillbecopiedintotheImpalaStore(HDFS)§ AdelimitedtextfileinHDFS
§ Limits§ Readonlyoradd§ NoUpdate§ NoDelete§ NoCommit/Rollback§ NoIndexes§ Alwaysfulltable/filescan– orpartitionscan
§ UsesHiveMetastore
BigData- Impala15
2015©Trivadis
HDFS– CommandShell§ cat§ chmod§ cp§ ls§ put§ …§ https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-
common/FileSystemShell.html
§ Example
BigData- Impala16
$ hadoop fs –lsFound 6 itemsdrwx------ - oracle supergroup 0 2014-03-19 11:11 .Trashdrwx------ - oracle supergroup 0 2014-01-24 20:15 .stagingdrwxr-x--- - oracle supergroup 0 2014-01-13 00:15 moviedemodrwx------ - oracle supergroup 0 2014-01-24 16:32 movieworkdrwxr-xr-x - oracle supergrou 0 2013-12-27 16:36 olhcachedrwxr-xr-x - oracle supergroup 0 2013-12-27 16:36 temp_out_session
2015©Trivadis
Step 1– TheData
§ /home/oracle/Desktop/impala_test/t10.txt§ Commadelimited§ Flatfile§ FormatthedatesoitfitsImpalasdateformat
- YYYY-MM-DDHH24:MI:SS.XXXX
BigData- Impala17
1,Hans,Meier,3000,1968-02-02 00:00:00,2000-01-01 00:00:00,12,Stefan,Müller,5000,1970-10-15 00:00:00,2001-07-01 00:00:00,13,Susanne,Kieser,3500,1972-03-14 00:00:00,2005-05-01 00:00:00,24,Paul,Steiner,4000,1960-07-28 00:00:00,2000-01-01 00:00:00,25,Monika,Hausmann,7000,1975-03-29 00:00:00,2000-01-01 00:00:00,36,Manuela,Ziegler,3700,1980-11-05 00:00:00,2010-01-01 00:00:00,47,Anna,Bosshard,4100,1984-11-08 00:00:00,2012-04-01 00:00:00,58,Armin,Studer,4900,1988-12-17 00:00:00,2013-05-22 00:00:00,39,Thomas,Bergmann,6000,1976-07-24 00:00:00,2012-08-15 00:00:00,510,Heiko,Zimmermann,4800,1955-04-21 00:00:00,2012-10-01 00:00:00,4
2015©Trivadis
Step 2– Get the data intoHDFS
§ Filecopy– Referenceitincreatetable§ LOCATION
'hdfs://bigdatalite.localdomain:8020/user/hive/warehouse/impala_test.db/t_10'
§ CreateTable– Copyfiletorightdirectory
§ LoadData
BigData- Impala18
2015©Trivadis
Impala– SQL– general
§ Impala-shell
§ SQL§ noDUAL§ ANSISQL92sortoff§ Nodelete/update§ 1fileperinsert
§ DifferentDataTypes
§ DataDictionary§ showtables§ describe
BigData- Impala19
2015©Trivadis
Hive - Metastore
§ ShowTables
BigData- Impala20
hive> SHOW tables LIKE 'dept';OKdeptTime taken: 0.03 seconds, Fetched: 1 row(s)hive>
§ Describehive>DESCRIBE dept;OKdeptno int None dname string None loc string None Time taken: 0.069 seconds, Fetched: 3 row(s)hive>
2015©Trivadis
Impala– Performance
§ Statistics§ Computestats§ Showtablestats§ Showcolumnstats
§ ExplainPlan§ Explain
BigData- Impala21
2015©Trivadis
Impala– Miscellaneous§ DataTypes
§ BOOLEAN§ VARCHAR2notbutVARCHAR
§ OracleConnectors§ UsesHive§ Partitionaware
§ Parquet
§ NoIndexnotevenHiveIndexes
§ Schemapossible
BigData- Impala22
§ UDF– UserDefinedFunctions§ C++§ Selfwritten§ ImpalawritteninC++
§ ODBC/JDBC§ BusinessObjects§ Cognos§ OtherTools§ EveryonewithSQLknowledge
§ BIG§ 1GBdefaultfilesize(parquet)
2015©Trivadis
Agenda
1. Introduction2. FirstStepsintheImpalaWorld3. Project– HadoopasafilestorefortheDWH4. Project– HadoopforarchivingofaOracleDB5. Project- Twitter6. Summary
BigData- Impala23
2015©Trivadis
Project– Hadoopas afile store for the DWH
§ MovetoHadoopfordeliveredfiles§ Startcollecting§ FilesgetcopiedintoHDFSonetoone§ Nodecisionhadtobetaken
- Schema– schema-less- Tabledesign- non- …
§ ImmutableDataStore- CreateandRead- Noupdate/Nodelete
§ AddExternalTableswithORACLESQLConnector§ Datauseable
§ BuildHadoopinfrastructure– useImpala
BigData- Impala24
2015©Trivadis
Agenda
1. Introduction2. FirstStepsintheImpalaWorld3. Project– HadoopasafilestorefortheDWH4. Project– HadoopforarchivingofaOracleDB5. Project- Twitter6. Summary
BigData- Impala25
2015©Trivadis
Project– Hadoopfor archiving of aOracleDB
§ MovetoHadoopforArchiving§ Possibletousethedata§ ImmutableDataStore- CreateandRead
- Noupdate/Nodelete
§ AddExternalTableswithORACLESQLConnector§ DatauseableinOracletoo
§ BuildHadoopinfrastructure– useImpala
§ Next§ AnalyzeOracleBigDataAppliance
- Exadata – BigDataAppliancecombined=>“usesameblockstructure”
BigData- Impala26
2015©Trivadis
Agenda
1. Introduction2. FirstStepsintheImpalaWorld3. Project– HadoopasafilestorefortheDWH4. Project– HadoopforarchivingofaOracleDB5. Project- Twitter6. Summary
BigData- Impala27
2015©Trivadis
Project– Twitter
§ 400– 500Miotweetsperday
§ 1tweetcontains§ Around50metadatapieces
- Geo-location- Re-tweets- Followers
§ Thatisabout2A4pages
§ TwitterSampleStream§ 1%§ 4-5Miotweetsperday§ 50tweetspersecond
§ 20otherstreamswithdefinedkeywords
§ HDFS§ 1TBevery2monthsincluding replication
BigData- Impala28
2015©Trivadis
TheLambdaArchitecture- adopted
BigData- Impala29
Batchlayer
Speedlayer
AllData(HDFS)
Pre-computedViews
(MapReduce)Batch(re)compute
Query&
MergeREST
ProcessStream
IncrementedViews
Realtime Increment
Servinglayer
QFD= QueryFocusedData
QFD1 QFD2 QFDn…
QFD1 QFD2 QFDn
Realtime views
…
BatchviewsMessagingKafka
ClientWebApp
Consumerlayer
TwitterAPI
JavaAPP
Hadoop
Storm
Impala
Cassandra
2015©Trivadis
Agenda
1. Introduction2. FirstStepsintheImpalaWorld3. Project14. Project25. Summary
BigData- Impala30
2015©Trivadis
Summary
§ BigData<>Hadoop,Impala
§ Impala§ SQLExtensionforHadoop§ BlockSize– 1GB§ Nothing forsmallfiles§ Nooptimizationwithindexes
§ AnewWorld
§ Impala,Hive,HadoopanditsZoo
§ LotscanbedonewithRDBMS
§ Starttocollectnow
BigData- Impala31
2015©Trivadis
Why Impala
§ SQL§ ANSISQL92
§ Noprogrammingneeded
§ Speed!§ Adhoc§ Hive– batch
§ ItisINMEMORY- Limit§ NotlikeOracle– pinaobjecttomemory§ Loadedduringexecution
BigData- Impala32
2015©Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIENBASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
Questions?
THANKYOU.
Trivadis AG
JanOtt
Europa-Strasse 5CH-8152Glattbrugg-Zurich
Tel. +41-44-808 7020(reception)Fax +41-44-808 7021
BigData- Impala33
2015©Trivadis
©Trivadis– DasUnternehmen34
21.11.15
Trivadis @DOAG2015
3rdFloor– next to the escalator
We look forward to your visit.
BecausewithTrivadis youalwayswinJ
2015©Trivadis
Sources§ Impala
§ http://impala.io§ http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/i
mpala_impala_shell.html
§ OracleConnectiontoHadoopwithOracle§ https://blogs.oracle.com/bigdataconnectors/entry/how_to_load_oracle_tables
§ Books:§ BigData– MEAPbyNathanMarz§ GettingStartedwithImpalabyJohnRussell§ LearningCloudera ImpalabyAvkash Chauhan
§ Pictures§ Oracle.com§ Twitter.com§ Apache.com§ Cloudera.com
BigData- Impala35