pdw architecture gets real :

Download PDW Architecture Gets Real :

If you can't read please download the document

Upload: minowa

Post on 23-Mar-2016

77 views

Category:

Documents


2 download

DESCRIPTION

PDW Architecture Gets Real :. Customer Implementations. Brian Walker | Microsoft Corporation PDW Center of Excellence. Murshed Zaman | Microsoft Corporation SQL Customer Advisory Team . Please silence cell phones. Agenda. Introduction to PDW and How it Works Detail. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

PDW Architecture Gets Real:Customer ImplementationsBrian Walker | Microsoft CorporationPDW Center of ExcellenceMurshed Zaman | Microsoft CorporationSQL Customer Advisory Team April 10-12, Chicago, IL#Please silence cell phones

April 10-12, Chicago, IL#AgendaPDW xVelocity - Reporting Structured/Unstructured DataDemosIntroduction to PDW and How it WorksDetailHighlight Current Customer Use Cases Future

# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20133How does it work?

2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20134Introducing Parallel Data WarehousePre-Built Hardware + Software Appliance

Co-engineered with HP and Dell

Pre-built Hardware

Pre-installed Software

Appliance installed in 1-2 days

Support - Microsoft provides first call supportHardware partner provides onsite break/fix support

Appliance Experience

Plug and Play

Built-in Best Practices

Save Time

#5The Power of PDWMassively Parallel Processing (MPP)Uses many separate CPUs running in parallel to execute a single query

Each CPU has its own memory

Dedicated Infiniband network communications between servers

Symmetric Multi-Processing (SMP)Multiple CPUs used to complete individual processes simultaneously

All CPUs share the same memory, and disks

Network controllers share bandwidth

# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20136The Basic Full Rack1 RACKInfiniband & Ethernet128 cores on 8 compute nodes2TB of RAM on computeUp to 168 TB of temp DBUp to 1PB of user data

Reduce hardware footprint by virtualizing the entire control server rack down to a few nodes

1.5x lower price/TB providing the one of the lowest price/TB in the industry

Save up to 70% of storage with up to ~15x compression via the xVelocity columstore

Resilient, scalable, and high performance storage features in Windows Server 2012 replace SAN with high density, low cost SAS JBODS

70% more disk I/O bandwidth over SQL Server PDW 2008 R2SQL Server PDW 2012#MGXFY13

2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/20137Dimensional ModelDate DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar DayStore DimStore Dim IDStore NameStore MgrStore SizeItem DimProd Dim IDProd CategoryProd Sub CatProd DescSls FactDate Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars SoldPromo DimMktg Camp IDCamp NameCamp MgrCamp StartCamp EndIDSIPF2DSIPF3DSIPS4DSIPF5DsPF1

Compute NodesPDWData Layout

#8Table Distribution example: - Utilizes both kinds of tables for co-location purposes - Dimension tables are typically replicated - Appliance maintains data integrity across all nodes - Fact tables distributed on a single column- Distributed columns should be based on data model and utilization - Choose a column with high cardinality and low varianceSeamlessly Add Capacity

Smallest (53TB) To Largest (6PB)

Start small with a few Terabyte warehouse

Add capacity up to 6 Petabytes

53 TB

6 PB

AddCapacityAddCapacityLargest Warehouse

PB

Start Small And Grow

Start Small Linearly Scale OUT#9

Any Size : Next-Gen PerformanceColumnstore Provides Dramatic Performance

Updateable and clustered xVelocity columnstore

Stores data in columnar format

Memory-optimized for next-generation performance

Updateable to support bulk and/or trickle loadingUp to50X Faster

Up to 15x compressionSave Timeand Costs

Batch Processing

xVelocity - Fast Data Query ProcessingCustomerSalesCountrySupplierProducts#10Demo: xVelocityThe Power of Updatable ColumnStore Indexing on PDW 2012

2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201311 Any Data: Hadoop IntegrationExternal Tables and full SQL query access to data stored in HDFS

HDFS bridge for direct & fully parallelized access of data in HDFS

Joining on-the-fly PDW data with data from HDFS Parallel import of data from HDFS in PDW tables for persistent storage

Parallel export of PDW data into HDFS including round-tripping of dataPolybase Details

Unstructured dataHDFS Data Nodes

Structured dataEnhancedPDW Query Engine

Regular T-SQLResultsPDW 2012

External TableHDFS Bridge#12

Hadoop DataStructured DataExisting Excel Skillset With Big DataFamiliar Tools Analyse Big Data

Native Microsoft BI Integration to PDW

Structured and unstructured data in same spreadsheet

Widely adopted and familiar user toolsNo ITIntervention

Analyze AllData Types

High AdoptionOf ExcelFamiliar Tools To Analyze Structured/Unstructured Data#13Demo: PDW 2012 PolybaseSimultaneous Reporting from Structured and Unstructured Data

# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201314Current Customer Use Cases

# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201315Large US Grocery Store chain needed an MPP Data Warehouse to improve performance, scale and provide timely data to its Executives and AnalystsPDW will scale to meet future growth and support more functional areas at Hy-VeePDW offered 100X Query Performance gain over conventional SQL Server, Faster Data loads and more scale with 7 instead of 2 years of purchasing data16

Benefitsbasic queries that previously took 20 minutes only took seconds using the SQL Server 2008 R2 Parallel Data Warehouse. -Tom Settle, Assistant VP, Data Warehousing, Hy-VeeUpgrading to PDW Gains 100x Improvement#16Business ObjectivesProvide Broader Range of Critical Customer Purchasing Data- Current system only supported 2 years of data Business required 7 yearsCriticalEnable Self-Service Reporting - SSAS/SSRS/SharePoint/ExcelSave Time

Enable User Ad hoc Reporting - Leveraging Excel/SharePoint QueryImprove Performance of Complex Transformations - Faster delivery of data within specified SLAsLoad SpeedReduced IT Costs - Creating self-sufficient end users Frees IT to focus on delivering new dataSave Costs

Provide solution that Scales to Meet Future Data Needs- Expansion of history, point of sale detail, and expansion into social mediaScale

# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201317Shift from ETL to ELT

Move their complex transformations and calculations to SQL Server Parallel Data Warehouse from ETL server

PDW has allowed Hy-Vee to create an enterprise data warehouse centralizing data from many sources Archiving point of sale source files for later data extraction

Using the Power of MPPComplex Transformations#Upgrade to PDW 2012Improves their opportunity to further analyze social media data Query data without having to move it into a relational database

Provides an alternative archive solution for point of sale data

Future Option

#

Data Archive Challenge Financial CustomerReporting ServicesArchive Servers

Centralized EDW

Business only actively analyzes a rolling 12 months of data

Regulations require data is on-line and accessible for extended period

Data > 12 months is pushed to a farm of SQL servers to meet regulatory requirements

Current Solution

20

Data Archive Challenge Financial CustomerReporting ServicesArchive Servers

Centralized EDW

HDFS Data Nodes

Unstructured dataHDFS bridge

Replace archive farm with Hadoop cluster

PDW provides single point of access

Allows analyst to leverage existing SQL skills

Much lower maintenance and administration

Meets regulatory requirementsFuture Solution21AMD is also processing more reporting queries than it previously couldbetween 10,000 and 13,000 a daywith an average runtime of a few seconds and virtually no performance issues.Because of the user complaints about the previous system, the data warehouse team had one employee devoted full time to addressing performance-related support tickets. With Parallel Data Warehouse, AMD has reduced support work to just a few hours a week.AMD runs an average of 1,500 loads per day, and data loads to a given table range from four-minute to four-hour intervals. AMD averages about 500,000 file loads a day.22BenefitsWe used to worry about backlogs, but no more,

- Rajarao Chitturi, Database and Applications Manager at AMD

AMD Boosts Performance with PDW

#22AMD Business ChallengesOnly supported 6 month data retention

Issues loading concurrently with high query volumeObstacles With SMP OracleLoading data always lagged behind by days

Analyst couldnt access recent data

Continuous data loads throughout the day while users were querying the systemLoad DemandCustom reporting tools hosted on Linux uses JDBC and ODBC driversLinux Based Reporting# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201323Project OverviewWafer Quality Assurance Data- 42 TB on PDWSpace Saving PDW Index Lite Approach- Oracle required excessive non-clustered indexes to get any performanceImproved Loading Speed- 660 GB/hr. throughput10,000 13,000 Analytic Queries per Day- Most are scan intensiveFaster Backups Complete in 1~2 hours per Database- Compared to a week on OracleReduced Support Costs by 90%- No more chopping up queries to fit the data warehouseCriticalSave Time

QuerySave SpaceLoad SpeedSave Costs

# 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.4/8/201324Parallel Data Warehouse 2012

#Simplified Loading via direct select/query from HDFS file system (removing Map/Reduce jobs)

Simplified analytic user access by allowing both Hadoop and relational data to be queried together from PDW

Increased the number of users who can leverage the data stored on Hadoop by removing the need to learn new languages access processes (Map/Reduce) and replacing it with familiar SQL queries25Other PDW SessionsOnline Advertising: Hybrid Approach to Large-Scale Data Analysis (DAV-303-M)Data Analytics and VisualizationBreakout Session (60 minutes)Fri April 12, 2013, 2:45 PM - 3:45 PM in Sheraton 3Anna Skobodzinski Christian Bonilla Dmitri Tchikatilov Trevor Attridge #Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw.

Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BA Conference website and on Twitter.

Go to passbaconference.com/evals or follow the QR code link displayed on session signage throughout the conference venue.

Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.

#Thank you!

Diamond Sponsor

Platinum SponsorApril 10-12, Chicago, IL