tera data
DESCRIPTION
it is free access document to viewTRANSCRIPT
3 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Teradata Company Highlights
• Founded 1979 – West LA• First product to market – 1984• First Terabyte system – 1987• Acquired by AT&T and
merged with acquired NCR – 1992• Tri-vested as part of NCR - 1997• Teradata Corporation – (re)Launched October 1, 2007
> Global Leader in Enterprise Data Warehousing– EDW/ADW Database Technology– Analytic Solutions– Consulting Services
> Positioned in Gartner’s Leaders Quadrant in data warehousing since 1999
• Top 10 U.S. publicly-traded software company> S&P 500 Member> Listed NYSE: “TDC”> NYSE Arca Tech 100> 2007 - $1.7B revenue
• Global presence and world-class customer list> More than 850 customers> More than 2,000 installations
• 5,500+ associates
5 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Continuous (R)evolution
Hardware
+ Database
+ Consulting
+ Data models and reports
+ Analytic applications
6 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Continuous (R)evolution
Sell the HW, give everything else away
Sell the SW with some HW to run on
Sell solving business problems – and technology to solve them
Sell applications with consulting, SW and HW inside
7 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Continuous (R)evolution
90% R&D 10% integration80286
70% R&D 30% integrationi486
20% R&D 80% integrationPentium
10% R&D 90% integrationXeon Quad Core
8 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
1901
1903 1906
1909
1907
1939
1905
1920
1963
1991
1941 1971
1991
1985
19971994
1950
An AT&T Company
TRADEMARK
Global InformationSolutions
9 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Scale
• Every dimension of the technology must scale to meet today’s requirements> Data, Data model complexity, Users, Performance, queries, Data loading, …
• What is a big Data Warehouse?• Total spinning disk?
> 2.5 Petabytes• Big table?
> 150 billion rows• Number of tables?
> 300,000• Insert/Update per day?
> 5 billion records• Identified users?
> 100,000• Queries per day?
> 5 million• Data Turnover rate?
> 1TB per 5 seconds
10 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
The Problem
Accts. Payable
Accts. Receivable
Invoicing
Sales/Orders
Finance G/L
Customer Support
HR
Payroll
Purchasing
Order Fulfillment
Manufacturing
Inventory …
Marketing
Supply Chain
Finance
Risk Management
Maintenance
Sales
Operations
Inventory
Call Center …
ProliferationProliferation of of Data MartsData Marts has resulted in has resulted in fragmented data, higher costs, poor decisionsfragmented data, higher costs, poor decisions
Operational Systems Decision Makers
11 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
The EDW Solution
Accts. Payable
Accts. Receivable
Invoicing
Sales/Orders
Finance G/L
Customer Support
HR
Payroll
Purchasing
Order Fulfillment
Manufacturing
Inventory …
Enterprise Enterprise
Data Data WarehouseWarehouse
(EDW)(EDW)
Integrated data provides consistency of data, Integrated data provides consistency of data, lower costs, better decisionslower costs, better decisions
Marketing
Supply Chain
Finance
Risk Management
Maintenance
Sales
Operations
Inventory
Call Center …
Operational Systems Decision Makers
12 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Active Enterprise Intelligence™An Obvious Trend: More Speed, More Users
Days
Seconds
Strategic Intelligence Operational Intelligence
Enterprise Data WarehouseBI Tools & reports
Analysis & visualizationPredictive Analytics
EDW Enterprise IntegrationMixed workload management
SOA, BPMS, IDEsPortals/composite applications
13 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Active Enterprise Intelligence™ enabled by anActive Data Warehouse™
STRATEGIC INTELLIGENCEOPERATIONAL INTELLIGENCE
Business IntelligenceTools and Applications
Teradata Warehouse
Workflow & Applications
Active EventsActive Access
Suppliers CustomersCall
CenterLogistics MarketingFinance
Product/Services
Executive
Active Enterprise Integration
ActiveAvailability
ActiveWorkload
Management
ActiveLoad
14 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Active Enterprise Intelligence™ in Retail Detecting Retail Fraud
Situation
Thieves make copies of cash register receipts, walk into the store, pick up merchandise, and return items for cash.
Problem
Associates in returns department did not have historical POS receipt retrieval access to verify against previously “returned” receipts or to do returns without receipts.
Solution
Associates query Teradata to quickly check if a return has already occurred on that receipt number. Also used by analysts to understand and prevent excessive returns.
Impact
(for 500-store chain)• 100% ROI in 5 months• Stopped a crime ring on
the first day of rollout• “Cost savings have been
huge”
15 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Active Enterprise Intelligence™ in RetailSingle View of the Customer Across All Channels
Situation
Needed to add Web channel for selling shoes.
Problem
Too much time and cost to keep multiple customer systems synchronized. Realized they needed just one customer database, not one more for the Web, in addition to Call Center, and POS/Store databases.
Solution
Adopted an ADW strategy, moved all customer data to one Teradata system, revised data models to cover all channels, added web channel for commerce, used web services, added TASM to handle multiple workload types
Impact
• 1M tactical hits to the EDW per day from the POS, Call Center, and Web with 0.11 sec response time
• Runs simultaneously with back-office BI, reports, and ETL workloads
• Eliminated all other customer data systems
17 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
What is the Measure of a Great Architecture?
Handle huge changes of underlying technologies and dependent components while continuing to deliver the key value proposition.
19 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
2003 2005 2007 2009 2011
90nm process
45nmprocess
65nmprocess
32nmprocess
22nmprocess
Hyper-Threading Dual Core Multi Core
Processor RoadmapCPU power radically increasing
20002000 2008+2008+
SP
EC
Int2
000
SP
EC
Int2
000
5X5X
SINGLE-CORESINGLE-COREPERFORMANCEPERFORMANCE
DUAL/MULTI-COREPERFORMANCE
20042004Source – Intel Corporation
20 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
What Does Shared Nothing Mean?
• 1985 – Every hardware part, every line of software – “pure” shared nothing
• 1995 – Multiple units of parallelism sharing CPU, memory• 2004 – Multiple units of parallelism sharing multiple
cores, memory• 2009 – Multiple units of parallelism sharing same physical
spindles – but still not sharing data• Future – Multiple units of parallelism in Virtual
machines/cloud not even knowing what physical machine it is on or sharing
21 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Teradata MPP Server Architecture
• Nodes> Incrementally scalable to
1024 nodes• Operating System
> Linux, Windows, Unix• Storage
> Independent I/O> Scales per node
• BYNET Interconnect> Fully scalable bandwidth
• Connectivity> Fully scalable> Channel – ESCON/FICON> LAN, WAN
• Server Management> One console to view
the entire system
SMP Node1 SMP Node2 SMP Node3 SMP Node4
Server Management
Dual BYNET Interconnects
CPU1 CPU2
Memory
Operating Sys
CPU1 CPU2
Memory
Operating Sys
CPU1 CPU2
Memory
Operating Sys
CPU1 CPU2
Memory
Operating Sys
22 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Shared Nothing - Dividing the Work
• “Virtual processors” (vprocs) do the work• Two types
> AMP: owns and operates on the data> PE: handles SQL and external interaction
• Configure multiple vprocs per hardware node> Take full advantage of SMP CPU and memory
• Each vproc has many threads of execution> Many operations executing concurrently> Each thread can do work for any user, transaction
• Software is equivalent regardless of configuration> No user changes as system grows from small SMP to huge
MPP
23 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
AMPsLogs
Locks
Buffers
I/O
Shared Nothing - Dividing the Work
• Basis of Teradata scalability> Each AMP owns an equal slice of the disk> Only that AMP reads that slice
• No single point of control for any operation> I/O, Buffers, Locking, Logging, Dictionary> Nothing centralized> Exponential communication costs avoided
# Nodes
Coordination cost
Teradata
24 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
• Rows automatically distributed evenly by hash partitioning
> Even distribution results in scalable performance> Done in real-time as data are loaded, appended, or changed.> Hash map defined and maintained by the system
– 2**32 hash codes, 64K buckets distributed to AMPs
> Prime Index (PI) column(s) are hashed> Hash is always the same - for the same values> No reorgs, repartitioning, space management
Teradata Data Distribution
AMP1 AMP2 AMP3 AMP4 ……………………………………………………… AMPn
Table A Table B Table C
Primary Index
Teradata Parallel Hash Function
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
P
DM
RowHash (Hash Bucket) Data Fields
25 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Disk Capacity Exploding with Little Increase in Performance
36 GB
5.5
73 GB
6.0
146 GB
6.4
.044
.080
.155
Perf
orm
an
ce p
er
Cap
acit
yM
B/S
ec/G
B
Dis
k D
rive B
an
dw
idth
(M
B /
Sec)
1
2
3
4
5
6
7
8
Disk Drive Capacity
Random I/O; 48K block; 80% read
26 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Platform Change
• Focus used to be > Optimization of expensive CPU cycles> Micro-management of precious disk space
• Now> Manage I/O> Balance CPU power to the I/O capacity> Find new ways to optimize I/O, trading for CPU use as
necessary> Pulling 2.5GB/sec per node continuous
• Discontinuity coming> SSDs become price competitive and reliable
27 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
File System
• Teradata wrote a new rule book> Old one written by IBM 35 years ago, used by all mainstream DBMSs
today - except Teradata• File system built of raw slices• Rows stored in blocks
> Variable length> Grow and shrink on demand> Rows located dynamically
– May be moved to reclaim space, defrag> Maximum block size is configurable
– System default or per table– 8K to 128K– Change dynamically
• Indexes are just rows in tables• Has evolved from direct management of single spindles to
completely virtualized storage, not even knowing spindle location
28 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Workload Management Evolution
• 1984 – pure timeshare• 1987 – 4 priorities, defined by user• 1995 – multiple priorities in multiple partitions• 2000 – weighted workload groups• 2004 – queuing, reserved resources, focus on tactical
work• 2009 – Visualization and detailed workgroup
management• Future – Set service level goals, our job to deliver
29 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Speed10
Active Events
Active Access
Query and Reporting
Active Load
Active Data Warehouse
Active Workload Management
• Manage workloads> Reduce server congestion
• Dynamically adjust in-flight task priority> Turn the dial – change
priorities
• Fast active access queries> Performance, performance,
performance
• Get maximum throughput
Speed60
Speed75
Speed25
31 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
IT, Finance,Planners,
Power Users,Data Miners
Executives,Middles
Managers, Marketing
1000000
100000
10000
1000
100
10
ConsumersSuppliers
B2B
OperationalEmployees
Category Mgr, Line
Managers, Service
Managers
Users
Business Critical
Mission Critical
DualActive
Strategic Intelligence Operational Intelligence
Availability Requirements
32 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
“Always ON” – An Elusive Challenge
• Unplanned downtime> Hardware faults> Software faults> Hangs
• Planned downtime> Software upgrade> Hardware upgrade> Data center maintenance
• “Disasters”> Multi-component failures> Building disasters> Area disasters
• And optimize resource value to the business• And avoid hidden costs and surprises
> Eg Major performance variations• Major opportunity for research – but must be holistic
> Reaches far beyond core database
33 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Real time Operational Actions
StrategicIntelligence
OperationalIntelligence
1. Customer makes multi-segment travel reservation
“Active”Enterprise Data
Warehouse
3. What are the customers’ flying history?
4. How profitable is each customer?
5. Which customers experienced delays or other problems in last 6 months?
2. Flight reroutedcausing missedconnections.
WebSphere MQ,Oracle AQ,
Microsoft MSMQ
6. Customer re-bookedand notified.
7. Airport operations adjusted
34 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
Real Time Customer Management
StrategicIntelligence
OperationalIntelligence
4. Is this customer approaching the predicted loss rate for their segment?
5. What offers are available for this customer?6. Message sent to
floor Luck Ambassador with customer offer to prevent additional losses.
TIBCO2. What is the customer’s
past spending history in all our casinos?
3. What is a significant loss for this person based on market segment, past and predicted behavior?“Active”
Enterprise DataWarehouse
1. Customer inserts Total Rewards Card at Slot Machine
35 > 09/2009 Copyright Teradata © 2007-2009 – All rights Reserved
That’s a Wrap!
• Business requires a new level of decision making> Many more decisions by many more people much faster> Current representation of the state of the enterprise
• Data Warehouse must evolve to support the requirements of Active Enterprise Intelligence
• Technology must evolve to deal with the new requirements> Rich area for research and innovation> Change view of what data warehouse/BI means
• Teradata driving an aggressive roadmap to meet real business requirements