場次:a-1 公司名稱:emc taiwan 主講人:李百飛 業務拓展總監 · sql server...
TRANSCRIPT
巨量資料的來源與資料型態
GROW 50XIN THE NEXT 10
YEARS
INFORMATION IN THE ENTERPRISE WILL
結構性資料(DB型態)
非結構性資料(File型態)
巨量資料下的興新 IT 技術
Hadoop NoSQL DB
MapReduce Visual Discovery
Predictive AnalyticsStreaming Data Technology
In-Memory Computing
EDWGrid &ClusteringScale-out
巨量資料應用另一個棘手的問題
資料散落在各處 (Data Silo)• 資料存在企業內與企業外• 各資料系統獨立運作• 現有 IT 架構很難或不易整合• 共享效率奇差無比• 沒有全貌性的資料視野• 高昂的系統升級成本
Today’s programs are silo’dand too expensive for
enterprise to maintain. They are also often confusing for the
constituent to access and understand !
巨量資料成功應用的關鍵點
Velocity
Big Data ValueBusiness Impact =
Velocity(愈快愈好)
及時資料處理 (OLTP: Fast Data)
+ 及時資料到位 (ETL: Time Overhead)
+ 及時資料透析 (OLAP: Fast Analytics)=
* Cost
Cost(愈少愈好)
持續商轉成本 (HA, Downtime, Bkup, DR, Support)
+ 超額成本負擔 (ETL, Server, Net, Storage, License)
+ 資料管理風險 (Risk: Data Governance) =
Cost-
結構性資料(DB型態)
非結構性資料(File型態) 交叉直
接存取
•Hadoop•MapReduce•NoSQL DB•Index•OLAP
•EDW
X86 or VM: 通用型主機 & 雲架構Grid & Cluster: 高效能 & 高可用性
Scale-out: 從小長到極大
標準 ANSI SQL 支援JDBC / ODBC: 客製化 APP 整合
各種 BI tool 整合先進統計分析演算法整合: open source
MPP: 大量平行分析處理架構
Collaboration: 協同合作支援
MPP: 大量平行資料載入架構 (ETL)
兼顧即時處理 & 歷史資料分析能力
應用層(Open Source)
基楚架構(Open Arch.Cloud, HA, Backup, DR total solution )
節省成本支持商轉容易擴充容易管理EMC原廠支援
開放平台標準化容易導入學習門檻低低成本
優勢效益
完整的企業Big Data應用架構
Other Cloud & BigData
Solution
Scale-out虛 擬 化標 準 化自 動 化
EMC, VMware & Pivotal 聯合策略版圖提供 Cloud & Big Data 所需的 ITaaS 整合式架構
Big/Fast DataSD-Datacenter
FASTData
FASTAnalytics
SD-Server
SD-Network
Storage Infrastructure
IaaS
VMAX VNX IsilonHA, QoS, Backup, DR, DACDP/CRR, Mgt, Security
PaaS
Xtream SW/SF
SQLfireGEMfirevFabric
VAAI, VASA, REST HDFSFlash
SD-Storage ViPR
AutomationEfficiency
Open SourceBiz Operation
Atmos
GPDBGPHD
VM, x86
vSphere
Nicira
Big Data Analytics Case study:
EMC 客戶品質部門提升磁碟可靠度計劃~ 效能大幅提升 & 複雜度大減
Lines of code:
Run time*: 5 days
Oracle EMC Greenplum
2000+ 210
< 5 mins
• Performance gains are mainly due to MPP architecture (大量平行處理)of Greenplum.
• But it is also because of improved SQL code. Previous code had:– No Window Functions– No nested queries to reshape data
SAP Architecture
SAP Apps
Database(SAP+ AIC)
Application Integration
Cloud(AIC)
50+ Legacy Systems LegacyApps
Virtualized on Vblock ( VCE: VMware+CISCO+EMC )
Best Of Breed Technology Components
Over 470 Virtual Hosts Built To
Support 9 SAP Landscapes– 370 SAP– 100+ AIC
New AIC Architecture Reviewed And Validated By Vmware
Performance:– Simulated End-To-End Transaction
Testing At 2.5X Peak Volumes– Successfully Processed 3,000 Orders In
< 10 Hours (10X Throughput Vs. Today)– 448,000 Saps In Prod
– 1 Million SAPS Capacity Installed Across 2 Vblocks
High Availability / DR Testing:
– Architecture Validated For HA At Single Points Of Failure
– RPO – Zero Data Loss
– RTO < 4 Hours
Supplier Relationship
Management (SRM)
Business Warehouse
(BW)
Supply Chain Management (SCM)
Business Planning and Consolidation
(BPC)
ERP Central Component (ECC)
Financial Supply Chain Management
(FSCM)
OracleNon-RAC
SQL server
Hyperic
Actional
Gemfire Data Fabric
Spring Integration
Sonic MQ
Contivo
Spring Batch
iWay Adapters
Spring-based WS
EMC IT 關鍵應用系統虛擬化案例~ SAP ERP system with 1,000,000SAPS capacity; Production: 56TB
TECHNOLOGY SHOWCASEEMC
• VMAX / FAST(SAN 磁碟儲存系統)• VNX (SAN+NAS磁碟儲存系統)• PowerPath (I/O路徑HA與最佳化)• SRDF (DR資料複製 ~ 1,000KM)• NetWorker (備份軟體)• Data Domain (虛擬磁帶館)• Avamar (資料備份機制)• RSA Access Manager (身份認證)• RSA enVision (Log 資安管理)• RSA Archer (資安政策與儀表板)
VCE (VMware, Cisco, EMC)• Vblock Series 700
VMware• ESXi, Vsphere, Vfabric• Site Recovery Manager
Pivotal• TC Server • SpringSource (Java Framework)• Gemfire (in-memory NoSQL DB 架構)• Greenplum (Big Data 資料分析)
ARCHITECTURE
SAP Apps
Database(SAP+ AIC)
Application Integration
Cloud(AIC)
50+ Legacy Systems LegacyApps
Virtualized on Vblock
Supplier Relationship
Management (SRM)
Business Warehouse
(BW)
Supply Chain Management (SCM)
Business Planning and Consolidation
(BPC)
ERP Central Component (ECC)
Financial Supply Chain Management
(FSCM)
OracleNon-RAC
SQL server
Hyperic
Actional
Gemfire Data FabricSpring Integration
Sonic MQ
Contivo
Spring Batch
iWay Adapters
Spring-based WS
EMC IT 關鍵應用系統虛擬化案例~ SAP ERP system with 1,000,000SAPS capacity; Production: 56TB
資料匯總
Web & App ServersN > 100
Web伺
服器
集群
應用
伺器
集群
資料庫(x86)SQL語句抽取
Rabbit MQ (x86)集群
數據同步
Gemfire伺服器(x86)集群 > 5
.
.
.數據分流分散式
並行運算
網上訂票- 餘票和訂單查詢系統解決方案
分支機搆 NN》 15
即時資料流程
原有IT系統結構資料分流 雲應用系統設計結構
即時資料複製
即時資料複製
中央資料庫小型機
資料庫小型機N > 5
資料庫小型機M > 50
• 單次查詢耗時15秒左右, result with up to 10min gap
• 無法支援高流量併發查詢,只能通過分庫來實現
• 在極端高流量併發情況,系統無法支撐
• 運行在 Unix 主機
• 單次訂票餘票查詢最長耗時150-200毫秒• 單次查詢最短耗時1-2毫秒• 同步即時變化的資料耗時秒級• 支持每秒上萬次的併發查詢,按需彈性動態擴展• 運行在Linux X86伺服器欉集
採用“資料分流”雲應用虛擬化技術方案前後對比
採用資料分流和雲應用虛擬化技術
原有技術設計框架
網上訂票-餘票查詢系統實際運行資料
EMC/Pivotal 巨量資料分析平台提供高效大量平行處理能力 – Parallel Data Load
Master
Segment Segment Segment Segment…
ETL Host
Ii tw ro kad jhIi tw ro kad jh
Ii tw ro kad jhtom Jerry 123joe blow 456larr white 789 Ii tw ro kad jhIi tw ro kad jh Ii tw
gpfdist
ClientETL Host…
Ii tw ro kad jhIi tw ro kad jh
Ii tw ro kad jhtom Jerry 123joe blow 456larr white 789 Ii tw ro kad jhIi tw ro kad jh
gpfdist
EMC/Pivotal 巨量資料分析平台提供高效大量平行處理能力 - Query
Interconnect
Storage
Independent Segment Processors
Independent Memory
Independent Direct Storage Connection
Master Segment Processor
Interconnect Switch
Query
sql
sql
sql
sql
sql
sql
sql
sql
seg1x86主機
seg2x86主機
seg3x86主機
seg4x86主機
資料將依據系統管理人員所排定時間,自動在所有節點上重新分佈
容量和性能在擴展後線性增長
Step1 : 新節點初始化加入 MPP 集群
Step2 : 資料在所有節點上重新分佈
EMC/Pivotal 巨量資料分析平台提供Scale-out 與動態線上擴充能力
Master
seg1x86主機
seg2x86主機
seg3x86主機
seg5x86主機
seg6x86主機
interconnect
seg4x86主機
EMC Analytics Lab(1,000 台 x86 主機 Grid & Clustering)
Pivotal HAWQ: SQL Benchmarks
4.2 198
8.7 161
2.0 415
2.7 1,285
2.8 1,815
47X
19X
208X
476X
648X
4.2 37
8.7 596
2.0 50
2.7 55
2.8 59
9X
69X
25X
20X
21X
Pivotal HDPivotal HD
improve improve
The EMC XtremSF Family
XtremSF 2200
2.2 TB
XtremSF 1400
1.4 TB
XtremSF 700
700 GB
XtremSF 550
550 GB
All Cards Are HHHL – Highest Density In The Industry
UP TO 1,130,000 IOPS
Protect In-Memory DB
EMC Isilon.雲儲存解決方案優勢與案例
Support Native CIFS/NFS/HDFSSupport 85.8B #files @ one filesystemCapacity Record : 20PB/one filesystemPerformance Record: 1.6M IOPS/CIFSAuto ILM for Inactive dataOnline balancing Capacity and Performance Address H/W EOSL Challenge without data migrationSimple for Ease of Use
Isilon Scale-out 巨量資料儲存平台with HDFS+CIFS+NFS Innovation
1. Eliminate the data load process; 2. Improve HA; 3. Help to Time to Analytics; 4. Cost Down
NASn
NAS4
NAS3
NAS2 NAS
1
Data source generation-1
Data source generation-2
Data source generation-x
LAN
CIFSNFS
HDFS
1PB 3PB
Data source generation-1
Data source generation-2
Data source generation-x
LAN
CIFSNFS HDFSload
4 copiesSPOF
xStandard Advanced
1 copy
1/3 #x86 Servers
1PB
Isilon Storage
x86 Servers
EMC巨量資料商業運轉平台 ~ 完全整合
VPLEX
DBWebAP
DBWebAP
OLTPLAN
Scale-outNAS - Isilon
CIFS/NFS
HDFS
Log Collector
files
files
OALAN
useruser
OAOA
Scale-outStorage-
VMax
SAN
blocklog
x86x86x86Structured
Data- GPDB
Hadoop
Unstructureddata
x86x86x86OLAPLAN
DBETL
BI
KMRPT
De-dupeBackup
巨量資料分析平台x86 grid
VPLEX
DB WebAP
DB WebAP
OLTPLAN
Scale-outNAS - Isilon
CIFS/NFS
HDFS
Log Collector
files
files
OALAN
useruser
OAOA
Scale-outStorage-
VMax
SAN
block log
x86 x86 x86Structured
Data-GPDB
Hadoop
Unstructureddata
x86 x86 x86 OLAPLAN
DBETL
BI
KMRPT
De-dupeBackup
巨量資料分析平台x86 grid
DBclustering
ip
FCDWDM
ip
SIEM SIEMsync
replication
replication
Active Active
機房-site2機房-site1
Data Domain Data Domain
GPHD/GPMR GPHD/GPMR
RSAenVision
RSAenVision
巨量資料來襲 EMC 讓您效能/效率再提升
透過 Pivotal+EMC 整合達到 FAST data in-Memory 資料不遺失 一份儲存投資
– For both Production and Analytics
一份備份投資– For both Production and Analytics
一份 DR 投資 (儲存 & 頻寬)– For both Production and Analytics
一份管理投資 節省 ETL 時間與 IT 投資 x86 Grid & Clustering 彈性與選擇性 原廠技援
[進一步洽詢][email protected]
Isilon support native NFS/CIFS/HDFS Accelerating Enterprise Hadoop Adoption
1Scale-Out Storage Platform
– Multiple applications & workflows
2No Single Point of Failure
– Distributed Namenode
3End-to-End Data Protection
– SnapshotIQ, SyncIQ, NDMP Backup
4Industry-Leading Storage Efficiency
– >80% Storage Utilization
5Independent Scalability
– Add compute & storage separately
6Multi-Protocol
– Industry standard protocols
– NFS, CIFS, FTP, HTTP, HDFS
Sourcesdata
Isilonnamenode
namenode
namenode
namenode
namenode
Computing nodes