april 10-12, chicago, il pdw architecture gets real: customer implementations brian walker |...
TRANSCRIPT
April 10-12, Chicago, IL
PDW Architecture Gets Real:Customer Implementations
Brian Walker | Microsoft CorporationPDW Center of Excellence
Murshed Zaman | Microsoft CorporationSQL Customer Advisory Team
April 10-12, Chicago, IL
Please silence cell phones
3
Agenda
PDW xVelocity - Reporting Structured/Unstructured DataDemos
Introduction to PDW and How it WorksDetail
Highlight Current Customer Use Cases Future
How does it work?
5
Introducing Parallel Data Warehouse
Pre-Built Hardware + Software Appliance
• Co-engineered with HP and Dell
• Pre-built Hardware
• Pre-installed Software
• Appliance installed in 1-2 days
• Support - Microsoft provides first call support• Hardware partner provides onsite break/fix
support
Appliance Experience
Plug and Play Built-in Best Practices
Save Time
6
The Power of PDWMassively Parallel Processing (MPP)
Uses many separate CPUs running in parallel to execute a single query
Each CPU has its own memory
Dedicated Infiniband network communications between servers
Symmetric Multi-Processing (SMP)
Multiple CPUs used to complete individual processes simultaneously
All CPUs share the same memory, and disks
Network controllers share bandwidth
7
The Basic Full Rack
1 RACK
Infiniband & Ethernet
• 128 cores on 8 compute nodes
• 2TB of RAM on compute• Up to 168 TB of temp
DB• Up to 1PB of user data
• Reduce hardware footprint by virtualizing the entire control server rack down to a few nodes
• 1.5x lower price/TB providing the one of the lowest price/TB in the industry
• Save up to 70% of storage with up to ~15x compression via the xVelocity columstore
• Resilient, scalable, and high performance storage features in Windows Server 2012 replace SAN with high density, low cost SAS JBODS
• 70% more disk I/O bandwidth over SQL Server PDW 2008 R2
SQL Server PDW 2012
8
Dimensional Model
Date Dim
Date Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store Dim
Store Dim IDStore NameStore MgrStore Size
Item Dim
Prod Dim IDProd CategoryProd Sub CatProd Desc
Sls Fact
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
Promo DimMktg Camp ID
Camp NameCamp MgrCamp StartCamp End
I
D
S
I
P
F2
D
S
I
P
F3
D
S
I
P
S4
D
S
I
P
F5
D
s P
F1
Compute Nodes
PDWData Layout
9
Seamlessly Add Capacity
Smallest (53TB) To Largest (6PB)
• Start small with a few Terabyte warehouse
• Add capacity up to 6 Petabytes
53 TB
6 PB
AddCapacity
AddCapacity
Largest Warehouse
PB
Start Small And Grow
Start Small Linearly Scale OUT
10
Any Size : Next-Gen Performance
Columnstore Provides Dramatic Performance
• Updateable and clustered xVelocity columnstore
• Stores data in columnar format
• Memory-optimized for next-generation performance
• Updateable to support bulk and/or trickle loadingUp to50X Faster
Up to 15x compression
Save Timeand Costs
Batch Processing
xVelocity - Fast Data Query Processing
Custo
mer
Sale
s
Country
Supplie
r
Pro
ducts
Demo: xVelocityThe Power of Updatable ColumnStore Indexing on PDW 2012
12
Any Data: Hadoop Integration
• External Tables and full SQL query access to data stored in HDFS
• HDFS bridge for direct & fully parallelized access of data in HDFS
• Joining ‘on-the-fly’ PDW data with data from HDFS
• Parallel import of data from HDFS in PDW tables for persistent storage
• Parallel export of PDW data into HDFS including ‘round-tripping’ of data
Polybase Details
Unstructured data
HDFS Data Nodes
Structured data
EnhancedPDW
Query Engine
Regular
T-SQL
Results
PDW 2012
External TableHDFS Bridge
13
Hadoop Data
Structured Data
Existing Excel Skillset With Big Data
Familiar Tools Analyse Big Data
• Native Microsoft BI Integration to PDW
• Structured and unstructured data in same spreadsheet
• Widely adopted and familiar user tools
No ITIntervention
Analyze AllData Types
High AdoptionOf Excel
Familiar Tools To Analyze Structured/Unstructured Data
14
Demo: PDW 2012 PolybaseSimultaneous Reporting from Structured and Unstructured Data
15
Current Customer Use Cases
16
Large US Grocery Store chain needed an MPP Data Warehouse to improve performance, scale and provide timely data to its Executives and Analysts
PDW will scale to meet future growth and support more functional areas at Hy-Vee
PDW offered 100X Query Performance gain over conventional SQL Server, Faster Data loads and more scale with 7 instead of 2 years of purchasing data
16
Benefits
“…basic queries that previously took 20 minutes only took seconds using the SQL Server 2008 R2 Parallel Data Warehouse.” -Tom Settle, Assistant VP, Data Warehousing, Hy-Vee
Upgrading to PDW Gains 100x Improvement
17
Business ObjectivesProvide Broader Range of Critical Customer Purchasing Data- Current system only supported 2 years of data – Business required 7 years
Critical
Enable Self-Service Reporting - SSAS/SSRS/SharePoint/Excel
Save Time
Enable User Ad hoc Reporting - Leveraging Excel/SharePoint
Query
Improve Performance of Complex Transformations - Faster delivery of data within specified SLAs
Load Speed
Reduced IT Costs - Creating self-sufficient end users – Frees IT to focus on delivering new data
Save Costs
Provide solution that Scales to Meet Future Data Needs- Expansion of history, point of sale detail, and expansion into social media
Scale
18
Shift from ETL to ELT
• Move their complex transformations and calculations to SQL Server Parallel Data Warehouse from ETL server
• PDW has allowed Hy-Vee to create an enterprise data warehouse centralizing data from many sources
• Archiving point of sale source files for later data extraction
Using the Power of MPP
Complex Transformations
19
Upgrade to PDW 2012
• Improves their opportunity to further analyze social media data
• Query data without having to move it into a relational database
• Provides an alternative archive solution for point of sale data
Future Option
Data Archive Challenge – Financial Customer Reporting
Services
Archive Servers
Centralized EDW
• Business only actively analyzes a rolling 12 months of data
• Regulations require data is on-line and accessible for extended period
• Data > 12 months is pushed to a farm of SQL servers to meet regulatory requirements
Current Solution
Data Archive Challenge – Financial Customer Reporting
Services
Archive Servers
Centralized EDW
HDFS Data Nodes
Unstructured data
HDFS bridge
• Replace archive farm with Hadoop cluster
• PDW provides single point of access
• Allows analyst to leverage existing SQL skills
• Much lower maintenance and administration
• Meets regulatory requirements
Future Solution
22
AMD is also processing more reporting queries than it previously could—between 10,000 and 13,000 a day—with an average runtime of a few seconds and virtually no performance issues.
Because of the user complaints about the previous system, the data warehouse team had one employee devoted full time to addressing performance-related support tickets. With Parallel Data Warehouse, AMD has reduced support work to just a few hours a week.
AMD runs an average of 1,500 loads per day, and data loads to a given table range from four-minute to four-hour intervals. AMD averages about 500,000 file loads a day.
22
Benefits
“We used to worry about backlogs, but no more,”
- Rajarao Chitturi, Database and Applications Manager at AMD
AMD Boosts Performance with PDW
23
AMD Business Challenges
• Only supported 6 month data retention
• Issues loading concurrently with high query volume
Obstacles With SMP Oracle
• Loading data always lagged behind by days
• Analyst couldn’t access recent data
• Continuous data loads throughout the day while users were querying the system
Load Demand
• Custom reporting tools hosted on Linux uses JDBC and ODBC drivers
Linux Based Reporting
24
Project OverviewWafer Quality Assurance Data- 42 TB on PDW
Space Saving PDW Index Lite Approach- Oracle required excessive non-clustered indexes to get any performance
Improved Loading Speed- 660 GB/hr. throughput
10,000 – 13,000 Analytic Queries per Day- Most are scan intensive
Faster Backups – Complete in 1~2 hours per Database- Compared to a week on Oracle
Reduced Support Costs by 90%- No more chopping up queries to fit the data warehouse
Critical
Save Time
Query
Save Space
Load Speed
Save Costs
25
Parallel Data Warehouse 2012
26
Other PDW SessionsOnline Advertising: Hybrid Approach to Large-Scale Data Analysis (DAV-303-M)Data Analytics and VisualizationBreakout Session (60 minutes)Fri April 12, 2013, 2:45 PM - 3:45 PM in Sheraton 3
Anna Skobodzinski Christian Bonilla Dmitri Tchikatilov Trevor Attridge
27
Win a Microsoft Surface Pro!
Complete an online SESSION EVALUATION to be entered into the draw.
Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BA Conference website and on Twitter.
Go to passbaconference.com/evals or follow the QR code link displayed on session signage throughout the conference venue.
Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.
April 10-12, Chicago, IL
Thank you!Diamond Sponsor Platinum Sponsor