modern data architecture
Post on 17-Jul-2015
334 Views
Preview:
TRANSCRIPT
a (not so) long time ago in a galaxy far, far away…
The most complex system you had to handle was an old AS/400
Publishing data weekly was adequate for most users
Analysts didn’t really exist
Most queries took hours to run – but that was ok
© the DataShed Limited 2015
data explosion
Source(s): CSC: http://www.csc.com/insights/flxwd/78931-big_data_universe_beginning_to_explode Gartner: http://www.gartner.com/technology/research/it-spending-forecast/
Growth between 2010 and 2020:
Data: 500%
Budget: 16%
$3.0
$4.0
$5.0
0
5
10
15
20
25
30
35
40
2010 2012 2013 2014 2015 2016 2017 2018 2020
Glo
bal
IT
Bu
dge
t ($
Tri
llio
n)
Glo
bal
Dat
a V
olu
me
(Zet
tab
ytes
)
Data Growth IT Budget Growth Expon. (Data Growth) Expon. (IT Budget Growth)
© the DataShed Limited 2015
Hadoop & Big Data toolsFirst incarnation in 2005
Highly-scalable data processing, based on a distributed file system (HDFS)
Ability to handle PB size workloads
Becoming more mature – including:
ANSI-SQL compliant data warehousing tools (Hive & Stinger.next)
Batch processing (Map Reduce/Tez, Pig)
Operations management (Ambari)
Security (Knox)
Governance (HCatalog)
© the DataShed Limited 2015
too small?Small cluster: 5 – 50 nodes
Assuming a single node:
24GB RAM
Single socket quad core
4-6 2 TB SATA drives
Total storage ≈ 10 TB
How many of us need to process 10TB of data?
Source: http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.2/bk_cluster-planning-guide/content/conclusion.html
© the DataShed Limited 2015
if not Hadoop, then what?Big Data has driven innovation in both technology and tools.
If you can’t adopt the tools, you can still adopt some of the principles:
Design for scale-out, rather than up.
ELT vs ETL
Lambda data architecture
… amongst other things!
© the DataShed Limited 2015
prepare to scale out
Data Storage
Data Integration
Data Marts & Cubes
Business Intelligence Apps
Executives: DashboardsManagers & Stakeholders: Reports
Business/Data Analysts: Cubes & Direct Access
Specific, small data marts
CRM, ERP, Transactional System
ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)
Data Integration
Data Marts & Cubes
ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)
Data Integration
Data Marts & Cubes
ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)
Specific, small data marts
Specific, small data marts
Data Storage
Data Integration
Data Marts & Cubes
Business Intelligence Apps
Executives: DashboardsManagers & Stakeholders: Reports
Business/Data Analysts: Cubes & Direct Access
Data marts constructed on top of an EDW Cubes present views of this data to business users
ETL Tools (SSIS, Informatica, Scripting, Oracle Data Integrator)
CRM, ERP, Transactional System
© the DataShed Limited 2015
ELT vs ETL
Key considerations:
Metadata & data lineage
How real-time is real-time?
How long does it take you to get data to analysts?
How powerful is your presentation server?
Can you use both?
vs
Schema on read? Or schema on write?
© the DataShed Limited 2015
You don’t need big data to use Big Data tools
Example: Prediction.io (http://prediction.io/)
Open Source Machine Learning Server, utilizes Hadoop, HBase, Spark and ElasticSearch
© the DataShed Limited 2015
top related