replacing telco db/dw to hadoop and hive
DESCRIPTION
the way to migrate oracle DW to hive.TRANSCRIPT
Replacing Telco DB/DW to Hadoop and Hive
JunHo Cho
Data Analysis Platform Team
Friday, July 1, 2011
• Cloud Computing Platform - Xen
• Cloud Storage Platform - hadoop
• Massive Email Archiving Solution - hadoop, lucene
• HIVE : social network analysis using email
• Log Archiving Solution - hadoop
• Data Analysis data mining, machine learning, data statistic
• Data Platform - hadoop, lucene, hive
• Cloud Architecture - KT Cloud
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco Data
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Bottleneck
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Bottleneck
Bottleneck
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Bottleneck
Bottleneck
Bottleneck
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Bottleneck
Bottleneck
Bottleneck
Bottleneck
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Bottleneck
Bottleneck
Bottleneck
Bottleneck
Availability
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Bottleneck
Bottleneck
Bottleneck
Bottleneck
Availability
Scalability
Friday, July 1, 2011
Telco DW & ETL
Collect Server
DataConverting
BatchETL
RDBMS ServerData Sources
RawData
SummaryTable
DimensionTable
Near-RT Search
OLAP
Bottleneck
Bottleneck
Bottleneck
Bottleneck
Availability
Scalability
Expensive
Friday, July 1, 2011
OpenSource
Friday, July 1, 2011
OpenSource
Storage & Computing
Friday, July 1, 2011
OpenSource
Friday, July 1, 2011
OpenSource
Collection
Friday, July 1, 2011
OpenSource
Friday, July 1, 2011
OpenSource
Search
Friday, July 1, 2011
OpenSource
Friday, July 1, 2011
OpenSource
Analysis
Friday, July 1, 2011
OpenSource
Friday, July 1, 2011
OpenSource
Coordination
Friday, July 1, 2011
OpenSource
Friday, July 1, 2011
NexR Data Platform
Data SourcesHDFS
Index
RawData
Real-Time& BatchIndexing
Near RT Search &Monitoring
SummaryTable
DimensionTable
BatchETL
Collection Platform
AnalysisPlatform
SearchPlatform
OLAP
AdvancedAnalytics
Friday, July 1, 2011
NexR Data Platform
Data SourcesHDFS
Index
RawData
Real-Time& BatchIndexing
Near RT Search &Monitoring
SummaryTable
DimensionTable
BatchETL
Collection Platform
AnalysisPlatform
SearchPlatform
OLAP
AdvancedAnalytics
Friday, July 1, 2011
Friday, July 1, 2011
Hive Internal
Friday, July 1, 2011
Hive Architecture
UI Driver
CompilerMetaStore
ExecutionEngine
Hadoop
HQLWorks
ResultORM
DDL
Friday, July 1, 2011
Hive Architecture
UI Driver
CompilerMetaStore
ExecutionEngine
Hadoop
HQLWorks
ResultORM
DDL
select col1 from tab1 where ...
Friday, July 1, 2011
Hive Architecture
UI Driver
CompilerMetaStore
ExecutionEngine
Hadoop
HQLWorks
ResultORM
DDL
Friday, July 1, 2011
Hive Architecture
UI Driver
CompilerMetaStore
ExecutionEngine
Hadoop
HQLWorks
ResultORM
DDL
Friday, July 1, 2011
Hive Architecture
UI Driver
CompilerMetaStore
ExecutionEngine
Hadoop
HQLWorks
ResultORM
DDL
Friday, July 1, 2011
Hive Architecture
UI Driver
CompilerMetaStore
ExecutionEngine
Hadoop
HQLWorks
ResultORM
DDL
a 123344b 121211c 342434
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
Parser
Select col1,col2 From tab1 Where col3 > 5
TOK_QUERY
TOK_FROM TOK_INSERT
TOK_TABNAME
TOK_DESTINATION TOK_SELECT
TOK_DIR
TOK_TMP_FILE
TOK_SELEXPR TOK_SELEXPR
TOK_TABLE_OR_COL TOK_TABLE_OR_COL
TOK_WHERE
>
TOK_TABLE_OR_COL 5
Parser
Friday, July 1, 2011
Parser
Select col1,col2 From tab1 Where col3 > 5
TOK_QUERY
TOK_FROM TOK_INSERT
TOK_TABNAME
TOK_DESTINATION TOK_SELECT
TOK_DIR
TOK_TMP_FILE
TOK_SELEXPR TOK_SELEXPR
TOK_TABLE_OR_COL TOK_TABLE_OR_COL
TOK_WHERE
>
TOK_TABLE_OR_COL 5
QB
Parser
Friday, July 1, 2011
Parser
Select col1,col2 From tab1 Where col3 > 5
TOK_QUERY
TOK_FROM TOK_INSERT
TOK_TABNAME
TOK_DESTINATION TOK_SELECT
TOK_DIR
TOK_TMP_FILE
TOK_SELEXPR TOK_SELEXPR
TOK_TABLE_OR_COL TOK_TABLE_OR_COL
TOK_WHERE
>
TOK_TABLE_OR_COL 5
QB tab1
Parser
Friday, July 1, 2011
Parser
Select col1,col2 From tab1 Where col3 > 5
TOK_QUERY
TOK_FROM TOK_INSERT
TOK_TABNAME
TOK_DESTINATION TOK_SELECT
TOK_DIR
TOK_TMP_FILE
TOK_SELEXPR TOK_SELEXPR
TOK_TABLE_OR_COL TOK_TABLE_OR_COL
TOK_WHERE
>
TOK_TABLE_OR_COL 5
QB
tab1
insclause-0
Parser
Friday, July 1, 2011
Parser
Select col1,col2 From tab1 Where col3 > 5
TOK_QUERY
TOK_FROM TOK_INSERT
TOK_TABNAME
TOK_DESTINATION TOK_SELECT
TOK_DIR
TOK_TMP_FILE
TOK_SELEXPR TOK_SELEXPR
TOK_TABLE_OR_COL TOK_TABLE_OR_COL
TOK_WHERE
>
TOK_TABLE_OR_COL 5
QB
tab1
insclause-0
col1
Parser
Friday, July 1, 2011
Parser
Select col1,col2 From tab1 Where col3 > 5
TOK_QUERY
TOK_FROM TOK_INSERT
TOK_TABNAME
TOK_DESTINATION TOK_SELECT
TOK_DIR
TOK_TMP_FILE
TOK_SELEXPR TOK_SELEXPR
TOK_TABLE_OR_COL TOK_TABLE_OR_COL
TOK_WHERE
>
TOK_TABLE_OR_COL 5
QB
tab1
insclause-0
col1 col2
Parser
Friday, July 1, 2011
Parser
Select col1,col2 From tab1 Where col3 > 5
TOK_QUERY
TOK_FROM TOK_INSERT
TOK_TABNAME
TOK_DESTINATION TOK_SELECT
TOK_DIR
TOK_TMP_FILE
TOK_SELEXPR TOK_SELEXPR
TOK_TABLE_OR_COL TOK_TABLE_OR_COL
TOK_WHERE
>
TOK_TABLE_OR_COL 5
QB
tab1
insclause-0
col1 col2
Parser
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
TOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
TableScanOperatorTOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
TableScanOperatorTOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
FilterOperator
TableScanOperatorTOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
FilterOperator
TableScanOperatorTOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
FilterOperator
TableScanOperator
SelectOperator
TOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
FilterOperator
TableScanOperator
SelectOperator
TOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
QB
PlanPlan Select col1,col2 From tab1 Where col3 > 5
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
TOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
Context
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPrunerFIL
SELTS
Context
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPrunerFIL
SELTS
Context
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
FIL
SELTSContext
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
FIL
SELTSContext
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
FIL
SELTSContext
tab1 {col1, col2, col3, col4,col5,col6,col7}
col1, col2
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
FIL
SELTSContext
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
FIL
SELTSContext
tab1 {col1, col2, col3, col4,col5,col6,col7}
col1, col2, col3
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
FIL
SELTSContext
tab1 {col1, col2, col3, col4,col5,col6,col7}
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
ColumnPruner
FIL
SELTSContext
tab1 {col1, col2, col3, col4,col5,col6,col7}
col1, col2, col3
FilterOperator
OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FSOperator
SELOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF/UDAFsubstrsum
average
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
Friday, July 1, 2011
TaskFactory
QB
TS - GenMRTableScan1
FS - GenMRFileSink1
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
TS - GenMRTableScan1
FS - GenMRFileSink1
FetchTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator
TS - GenMRTableScan1
FS - GenMRFileSink1
FetchTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator
TS - GenMRTableScan1
FS - GenMRFileSink1
FetchTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator
FS - GenMRFileSink1
FetchTask
MapRedTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator
FS - GenMRFileSink1
FetchTask
MapRedTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator
FS - GenMRFileSink1
FetchTask
MapRedTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator
FS - GenMRFileSink1
FetchTask
MapRedTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator
FS - GenMRFileSink1
FetchTask
MapRedTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator FetchTask
MapRedTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
Friday, July 1, 2011
TaskFactory
QB
FilterOperator
TableScanOperator
SelectOperator
FileSinkOperator
FilterOperator FetchTask
MapRedTask
TaskTask Select col1,col2 From tab1 Where col3 > 5
MapRedTask
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FILOperator
FILOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
SELOperator
FSOperator
Friday, July 1, 2011
Map Reduce
Hive Internal
Web UI Hive CLI JDBC
Hive QL
Browse, Query, DDL
MetaStore
Thrift API
TSOperator
FILOperator
FILOperator
HDFS
HBaseDB
StorageHandler
...
Parser
Plan
Optimizer
Task
UDF
SerDe
Input/OutputFormat
RCFile
User Script
ExecMapper/ExecReducer
SELOperator
FSOperator
Friday, July 1, 2011
Oracle Migration to Hive
Friday, July 1, 2011
Oracle to Hive
l DDL
l SQL
l Statistic Function
l Analytic Function
Friday, July 1, 2011
l DDL
l HQL (ANSI-SQL)
l Built-In/UDF/UDAF
l HQL + UDF, Pig, MapReduce
Oracle to Hive
l DDL
l SQL
l Statistic Function
l Analytic Function
Friday, July 1, 2011
l DDL
l HQL (ANSI-SQL)
l Built-In/UDF/UDAF
l HQL + UDF, Pig, MapReduce
Oracle to Hive
l DDL
l SQL
l Statistic Function
l Analytic Function
No UpdateNo InsertNo Low Latency
Friday, July 1, 2011
Understand Oracle SQL
• more than 3000 ETL SQL
• understand Data-Flow
• Group similar SQL Pattern
• Investigate used Oracle Function
Friday, July 1, 2011
Oracle SQL
Friday, July 1, 2011
Data Model Convert
Friday, July 1, 2011
Table
Data Model Convert
Friday, July 1, 2011
TableTable
Data Model Convert
Friday, July 1, 2011
TableTable
Partition
Data Model Convert
Friday, July 1, 2011
Partition
TableTable
Partition
Data Model Convert
Friday, July 1, 2011
Partition
TableTable
Sampling
Partition
Data Model Convert
Friday, July 1, 2011
Bucket
Partition
TableTable
Sampling
Partition
Data Model Convert
Friday, July 1, 2011
DataType Convert
Friday, July 1, 2011
NUMBER(n)
DataType Convert
Friday, July 1, 2011
TINYINTINT/BIGINT
NUMBER(n)
DataType Convert
Friday, July 1, 2011
TINYINTINT/BIGINT
NUMBER(n)
NUMBER(n,m)
DataType Convert
Friday, July 1, 2011
TINYINTINT/BIGINT
FLOAT/DOUBLE
NUMBER(n)
NUMBER(n,m)
DataType Convert
Friday, July 1, 2011
TINYINTINT/BIGINT
FLOAT/DOUBLE
NUMBER(n)
NUMBER(n,m)
VARCHAR2
DataType Convert
Friday, July 1, 2011
TINYINTINT/BIGINT
STRING
FLOAT/DOUBLE
NUMBER(n)
NUMBER(n,m)
VARCHAR2
DataType Convert
Friday, July 1, 2011
TINYINTINT/BIGINT
STRING
FLOAT/DOUBLE
NUMBER(n)
NUMBER(n,m)
DATE
VARCHAR2
DataType Convert
Friday, July 1, 2011
TINYINTINT/BIGINT
STRING “yyyy-MM-dd HH:mm:ss” format
STRING
FLOAT/DOUBLE
NUMBER(n)
NUMBER(n,m)
DATE
VARCHAR2
DataType Convert
Friday, July 1, 2011
HIVE DML
• HIVE supports ANSI-SQL
• Only Support Sub-Queries in FROM clause
• Join query : equi-join/inner-join
outer-join
self-join
Friday, July 1, 2011
IN Clause
Friday, July 1, 2011
IN Clause
IN SubQuery
Friday, July 1, 2011
IN Clause
SELECT * from Employee e WHERE e.DeptNo
IN(SELECT d.DeptNo FROM Dept d)
IN SubQuery
Friday, July 1, 2011
IN Clause
SELECT * from Employee e WHERE e.DeptNo
IN(SELECT d.DeptNo FROM Dept d)
IN SubQuery
SELECT * from Employee e
LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo)
Friday, July 1, 2011
NOT IN Clause
Friday, July 1, 2011
NOT IN Clause
NOT IN SubQuery
Friday, July 1, 2011
NOT IN Clause
SELECT * from Employee e WHERE e.DeptNo
NOT IN(SELECT d.DeptNo FROM Dept d)
NOT IN SubQuery
Friday, July 1, 2011
NOT IN Clause
SELECT * from Employee e WHERE e.DeptNo
NOT IN(SELECT d.DeptNo FROM Dept d)
NOT IN SubQuery
SELECT e.* from Employee e
LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo)
WHERE d.DeptNo IS NULL
Friday, July 1, 2011
JOIN Operator
Friday, July 1, 2011
JOIN Operator
JOIN
Friday, July 1, 2011
JOIN Operator
SELECT *
FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id
JOIN
Friday, July 1, 2011
JOIN Operator
SELECT *
FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id
JOIN
SELECT *
FROM Employee e1 JOIN Dept d1 ON (e1.ID = d1.Id)
Friday, July 1, 2011
Oracle Function
Friday, July 1, 2011
Functions
Friday, July 1, 2011
Functions
Math Functionround,ceil,mod,
power,sqrt,sin/cos
Friday, July 1, 2011
Math Functionround,ceil,pmod,
power,sqrt,sin/cos
Functions
Math Functionround,ceil,mod,
power,sqrt,sin/cos
Friday, July 1, 2011
Math Functionround,ceil,pmod,
power,sqrt,sin/cos
Functions
Math Functionround,ceil,mod,
power,sqrt,sin/cos
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,replace
Friday, July 1, 2011
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,regexp_replace
Math Functionround,ceil,pmod,
power,sqrt,sin/cos
Functions
Math Functionround,ceil,mod,
power,sqrt,sin/cos
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,replace
Friday, July 1, 2011
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,regexp_replace
Math Functionround,ceil,pmod,
power,sqrt,sin/cos
Functions
Math Functionround,ceil,mod,
power,sqrt,sin/cos
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,replace
NULL Functioncoalesce,nvl,nvl2
Friday, July 1, 2011
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,regexp_replace
Math Functionround,ceil,pmod,
power,sqrt,sin/cos
Functions
Math Functionround,ceil,mod,
power,sqrt,sin/cos
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,replace
NULL Functioncoalesce
NULL Functioncoalesce,nvl,nvl2
Friday, July 1, 2011
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,regexp_replace
Math Functionround,ceil,pmod,
power,sqrt,sin/cos
Functions
Math Functionround,ceil,mod,
power,sqrt,sin/cos
Character Functionsubstr,trim,lpad/rpad
ltrim/rtrim,replace
NULL Functioncoalesce
NULL Functioncoalesce,nvl,nvl2
No NVL,NVL2
Friday, July 1, 2011
• Condition Function
• DECODE, GREATEST
• Null Comparison Function
• NVL / NVL2
• Type Conversion
• TO_NUMBER
• TO_CHAR
• TO_DATE
• INSTR4
• DATE_FORMAT
• LAST_DAY
Custom UDF Function
Friday, July 1, 2011
Oracle Analytic Function
Friday, July 1, 2011
Analytic Function
Friday, July 1, 2011
Analytic Function
RANK
Friday, July 1, 2011
Analytic Function
SELECT name,dept,salary,RANK() OVER (PARTITION BY dept
ORDER BY salary DESC) FROM emp
RANK
Friday, July 1, 2011
Analytic Function
SELECT name,dept,salary,RANK() OVER (PARTITION BY dept
ORDER BY salary DESC) FROM emp
RANK
SELECT e.name,e.dept,e.salary,RANK(e.dept,e.salary) FROM (SELECT name, dept, salary FROM emp DISTRIBUTED BY dept SORT BY dept, salary DESC) e
Friday, July 1, 2011
Analytic Function
SELECT name,dept,salary,RANK() OVER (PARTITION BY dept
ORDER BY salary DESC) FROM emp
RANK
SELECT e.name,e.dept,e.salary,RANK(e.dept,e.salary) FROM (SELECT name, dept, salary FROM emp DISTRIBUTED BY dept SORT BY dept, salary DESC) e
RANK(arg1,arg2) - Custom UDF
Friday, July 1, 2011
Analytic Aggregation Function
Friday, July 1, 2011
Analytic Aggregation Function
MIN
Friday, July 1, 2011
Analytic Aggregation Function
SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp
MIN
Friday, July 1, 2011
Analytic Aggregation Function
SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp
MIN
SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept
Friday, July 1, 2011
Analytic Aggregation Function
SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp
MIN
SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept
Aggregation + JOIN
Friday, July 1, 2011
Hive Internal
Friday, July 1, 2011
Merge Join Tree Bug
• select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2
• select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1
Friday, July 1, 2011
Merge Join Tree Bug
• select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2
• select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1
MapReduce #3
Friday, July 1, 2011
Merge Join Tree Bug
• select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2
• select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1
MapReduce #3
MapReduce #2
Friday, July 1, 2011
• SemanticAnalyzer private void mergeJoinTree(QB qb) {
QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());
if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); }
} else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); }
Merge Join Tree Bug Fix
Friday, July 1, 2011
• SemanticAnalyzer private void mergeJoinTree(QB qb) {
QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());
if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); }
} else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); }
Merge Join Tree Bug Fix
} else { if (merged) { root = qb.getQbJoinTree();
} else { parent = parent.getJoinSrc(); root = parent.getJoinSrc();
}}
Friday, July 1, 2011
New HQL Syntax
Friday, July 1, 2011
New HQL Syntax
INSERT INTO
Friday, July 1, 2011
New HQL Syntax
INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ...
INSERT INTO
Friday, July 1, 2011
New HQL Syntax
• INSERT [OVERWRITE] destination
• grammar
• modify FileSinkPlan
• New Feature - HIVE-306
• INSERT INTO destination
INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ...
INSERT INTO
Friday, July 1, 2011
Tuning
Friday, July 1, 2011
Tuning
• Hadoop Tunning
Friday, July 1, 2011
Tuning
• Hadoop Tunning
• mapred.job.reuse.jvm.num.task
Friday, July 1, 2011
Tuning
• Hadoop Tunning
• mapred.job.reuse.jvm.num.task
• mapred.child.java.opts
Friday, July 1, 2011
Tuning
• Hadoop Tunning
• mapred.job.reuse.jvm.num.task
• mapred.child.java.opts
• mapred.min.split.size / mapred.max.split.size
Friday, July 1, 2011
Tuning
• Hadoop Tunning
• mapred.job.reuse.jvm.num.task
• mapred.child.java.opts
• mapred.min.split.size / mapred.max.split.size
• dfs.block.size
Friday, July 1, 2011
Tuning
• Hadoop Tunning
• mapred.job.reuse.jvm.num.task
• mapred.child.java.opts
• mapred.min.split.size / mapred.max.split.size
• dfs.block.size
• Hive Tunning
Friday, July 1, 2011
Tuning
• Hadoop Tunning
• mapred.job.reuse.jvm.num.task
• mapred.child.java.opts
• mapred.min.split.size / mapred.max.split.size
• dfs.block.size
• Hive Tunning
• hive.input.format = CombineHiveInputFormat
Friday, July 1, 2011
Tuning
• Hadoop Tunning
• mapred.job.reuse.jvm.num.task
• mapred.child.java.opts
• mapred.min.split.size / mapred.max.split.size
• dfs.block.size
• Hive Tunning
• hive.input.format = CombineHiveInputFormat
• query tuning - reduce # of MapReduce using HQL Plan
Friday, July 1, 2011
Oracle 2 Hive
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & model
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf - join + udf (aggregation)
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hive
Wrap-Up
Friday, July 1, 2011
Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hiveHadoop + Hive Tunning
Wrap-Up
Friday, July 1, 2011
Friday, July 1, 2011
Friday, July 1, 2011
Question ?
Friday, July 1, 2011