pdw architecture gets real :
DESCRIPTION
PDW Architecture Gets Real :. Customer Implementations. Brian Walker | Microsoft Corporation PDW Center of Excellence. Murshed Zaman | Microsoft Corporation SQL Customer Advisory Team . Please silence cell phones. Agenda. Introduction to PDW and How it Works Detail. - PowerPoint PPT PresentationTRANSCRIPT
PowerPoint Presentation
PDW Architecture Gets Real:Customer ImplementationsBrian Walker | Microsoft CorporationPDW Center of ExcellenceMurshed Zaman | Microsoft CorporationSQL Customer Advisory Team April 10-12, Chicago, IL#Please silence cell phones
April 10-12, Chicago, IL#AgendaPDW xVelocity - Reporting Structured/Unstructured DataDemosIntroduction to PDW and How it WorksDetailHighlight Current Customer Use Cases Future
# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20133How does it work?
2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20134Introducing Parallel Data WarehousePre-Built Hardware + Software Appliance
Co-engineered with HP and Dell
Pre-built Hardware
Pre-installed Software
Appliance installed in 1-2 days
Support - Microsoft provides first call supportHardware partner provides onsite break/fix support
Appliance Experience
Plug and Play
Built-in Best Practices
Save Time
#5The Power of PDWMassively Parallel Processing (MPP)Uses many separate CPUs running in parallel to execute a single query
Each CPU has its own memory
Dedicated Infiniband network communications between servers
Symmetric Multi-Processing (SMP)Multiple CPUs used to complete individual processes simultaneously
All CPUs share the same memory, and disks
Network controllers share bandwidth
# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20136The Basic Full Rack1 RACKInfiniband & Ethernet128 cores on 8 compute nodes2TB of RAM on computeUp to 168 TB of temp DBUp to 1PB of user data
Reduce hardware footprint by virtualizing the entire control server rack down to a few nodes
1.5x lower price/TB providing the one of the lowest price/TB in the industry
Save up to 70% of storage with up to ~15x compression via the xVelocity columstore
Resilient, scalable, and high performance storage features in Windows Server 2012 replace SAN with high density, low cost SAS JBODS
70% more disk I/O bandwidth over SQL Server PDW 2008 R2SQL Server PDW 2012#MGXFY13
2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20137Dimensional ModelDate DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeItem DimProd Dim IDProd CategoryProd Sub CatProd DescSls FactDate Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars SoldPromo DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndIDSIPF2DSIPF3DSIPS4DSIPF5DsPF1
Compute NodesPDWData Layout
#8Table Distribution example: - Utilizes both kinds of tables for co-location purposes - Dimension tables are typically replicated - Appliance maintains data integrity across all nodes - Fact tables distributed on a single column- Distributed columns should be based on data model and utilization - Choose a column with high cardinality and low varianceSeamlessly Add Capacity
Smallest (53TB) To Largest (6PB)
Start small with a few Terabyte warehouse
Add capacity up to 6 Petabytes
53 TB
6 PB
AddCapacityAddCapacityLargest Warehouse
PB
Start Small And Grow
Start Small Linearly Scale OUT#9
Any Size : Next-Gen PerformanceColumnstore Provides Dramatic Performance
Updateable and clustered xVelocity columnstore
Stores data in columnar format
Memory-optimized for next-generation performance
Updateable to support bulk and/or trickle loadingUp to50X Faster
Up to 15x compressionSave Timeand Costs
Batch Processing
xVelocity - Fast Data Query ProcessingCustomerSalesCountrySupplierProducts#10Demo: xVelocityThe Power of Updatable ColumnStore Indexing on PDW 2012
2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201311 Any Data: Hadoop IntegrationExternal Tables and full SQL query access to data stored in HDFS
HDFS bridge for direct & fully parallelized access of data in HDFS
Joining on-the-fly PDW data with data from HDFS Parallel import of data from HDFS in PDW tables for persistent storage
Parallel export of PDW data into HDFS including round-tripping of dataPolybase Details
Unstructured dataHDFS Data Nodes
Structured dataEnhancedPDW Query Engine
Regular T-SQLResultsPDW 2012
External TableHDFS Bridge#12
Hadoop DataStructured DataExisting Excel Skillset With Big DataFamiliar Tools Analyse Big Data
Native Microsoft BI Integration to PDW
Structured and unstructured data in same spreadsheet
Widely adopted and familiar user toolsNo ITIntervention
Analyze AllData Types
High AdoptionOf ExcelFamiliar Tools To Analyze Structured/Unstructured Data#13Demo: PDW 2012 PolybaseSimultaneous Reporting from Structured and Unstructured Data
# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201314Current Customer Use Cases
# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201315Large US Grocery Store chain needed an MPP Data Warehouse to improve performance, scale and provide timely data to its Executives and AnalystsPDW will scale to meet future growth and support more functional areas at Hy-VeePDW offered 100X Query Performance gain over conventional SQL Server, Faster Data loads and more scale with 7 instead of 2 years of purchasing data16
Benefitsbasic queries that previously took 20 minutes only took seconds using the SQL Server 2008 R2 Parallel Data Warehouse. -Tom Settle, Assistant VP, Data Warehousing, Hy-VeeUpgrading to PDW Gains 100x Improvement#16Business ObjectivesProvide Broader Range of Critical Customer Purchasing Data- Current system only supported 2 years of data Business required 7 yearsCriticalEnable Self-Service Reporting - SSAS/SSRS/SharePoint/ExcelSave Time
Enable User Ad hoc Reporting - Leveraging Excel/SharePoint QueryImprove Performance of Complex Transformations - Faster delivery of data within specified SLAsLoad SpeedReduced IT Costs - Creating self-sufficient end users Frees IT to focus on delivering new dataSave Costs
Provide solution that Scales to Meet Future Data Needs- Expansion of history, point of sale detail, and expansion into social mediaScale
# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201317Shift from ETL to ELT
Move their complex transformations and calculations to SQL Server Parallel Data Warehouse from ETL server
PDW has allowed Hy-Vee to create an enterprise data warehouse centralizing data from many sources Archiving point of sale source files for later data extraction
Using the Power of MPPComplex Transformations#Upgrade to PDW 2012Improves their opportunity to further analyze social media data Query data without having to move it into a relational database
Provides an alternative archive solution for point of sale data
Future Option
#
Data Archive Challenge Financial CustomerReporting ServicesArchive Servers
Centralized EDW
Business only actively analyzes a rolling 12 months of data
Regulations require data is on-line and accessible for extended period
Data > 12 months is pushed to a farm of SQL servers to meet regulatory requirements
Current Solution
20
Data Archive Challenge Financial CustomerReporting ServicesArchive Servers
Centralized EDW
HDFS Data Nodes
Unstructured dataHDFS bridge
Replace archive farm with Hadoop cluster
PDW provides single point of access
Allows analyst to leverage existing SQL skills
Much lower maintenance and administration
Meets regulatory requirementsFuture Solution21AMD is also processing more reporting queries than it previously couldbetween 10,000 and 13,000 a daywith an average runtime of a few seconds and virtually no performance issues.Because of the user complaints about the previous system, the data warehouse team had one employee devoted full time to addressing performance-related support tickets. With Parallel Data Warehouse, AMD has reduced support work to just a few hours a week.AMD runs an average of 1,500 loads per day, and data loads to a given table range from four-minute to four-hour intervals. AMD averages about 500,000 file loads a day.22BenefitsWe used to worry about backlogs, but no more,
- Rajarao Chitturi, Database and Applications Manager at AMD
AMD Boosts Performance with PDW
#22AMD Business ChallengesOnly supported 6 month data retention
Issues loading concurrently with high query volumeObstacles With SMP OracleLoading data always lagged behind by days
Analyst couldnt access recent data
Continuous data loads throughout the day while users were querying the systemLoad DemandCustom reporting tools hosted on Linux uses JDBC and ODBC driversLinux Based Reporting# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201323Project OverviewWafer Quality Assurance Data- 42 TB on PDWSpace Saving PDW Index Lite Approach- Oracle required excessive non-clustered indexes to get any performanceImproved Loading Speed- 660 GB/hr. throughput10,000 13,000 Analytic Queries per Day- Most are scan intensiveFaster Backups Complete in 1~2 hours per Database- Compared to a week on OracleReduced Support Costs by 90%- No more chopping up queries to fit the data warehouseCriticalSave Time
QuerySave SpaceLoad SpeedSave Costs
# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201324Parallel Data Warehouse 2012
#Simplified Loading via direct select/query from HDFS file system (removing Map/Reduce jobs)
Simplified analytic user access by allowing both Hadoop and relational data to be queried together from PDW
Increased the number of users who can leverage the data stored on Hadoop by removing the need to learn new languages access processes (Map/Reduce) and replacing it with familiar SQL queries25Other PDW SessionsOnline Advertising: Hybrid Approach to Large-Scale Data Analysis (DAV-303-M)Data Analytics and VisualizationBreakout Session (60 minutes)Fri April 12, 2013, 2:45 PM - 3:45 PM in Sheraton 3Anna Skobodzinski Christian Bonilla Dmitri Tchikatilov Trevor Attridge #Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw.
Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BA Conference website and on Twitter.
Go to passbaconference.com/evals or follow the QR code link displayed on session signage throughout the conference venue.
Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.
#Thank you!
Diamond Sponsor
Platinum SponsorApril 10-12, Chicago, IL