sql server 2012 beyond relational performance and scale
DESCRIPTION
Pragmatic Works SQL Server 2012 Webinar presentationTRANSCRIPT
Beyond Relational Performance and Scale in SQL Server 2012
Michael RysPrincipal Program Manager@SQLServerMike
My favorite Beyond Relational Application
Structured and unstructured Search
Related/”Semantic” Search
Beyond Relational Data
Building and Maintaining Applications with relational and non-relational data is hard
Complex integrationDuplicated functionalityCompensation for unavailable services
Pain Points
Goals
Reduce the cost of managing all dataSimplify the development of applications over all dataProvide management and programming services for all data
What is the Beyond Relational Mission?Efficient storage for all data
Tables, XML, Spatial, Documents, Digital Media, Scientific Records, Factoids…
Rich Data Processing Capabilities for all applications
Data formats and content natively understood for rich application and user experienceConsistent Application Model and Data Constructs to ease application development, migration and long-term retention
Rich Capabilities and Services over all dataProvide rich services, e.g.,
Query and Reason over data and extracted semanticsSearch across structural impedance of different data formatsIntegrated backup/restore for all data
Beyond Relational Story
StructuredData
Query
T-SQL
B-treesManageabilit
yAvailability
Files
Programmability
Beyond Relational Story
StructuredData
Query
T-SQL
B-trees
ManageabilityAvailability
Files
Programmability
Unstructured Data
Search
Beyond Relational Story
StructuredData
Query and Type Operations
T-SQL/Data Types
B-trees
ManageabilityAvailability
Files
Programmability
Unstructured Data
Search
Filestream
Win 32
Semi-structuredData/XML
XML, FTS, SpatialIndices
XQuerySpatial ops
Spatial, XML, HierarchyID
Beyond Relational Story
StructuredData
Query and Type Operations
T-SQL/Data Types
B-trees
Manageability& Availability
Programmability
Unstructured Data
SearchWin 32
Semi-structuredData/XML
Semantic
Platform
Efficient Storage for BR Data
Rich Query and Search Services over all Data
Rich Data ProgrammingCapabilities
Files
Filestream
XML, FTS, SpatialIndices
XQuerySpatial
ops
Spatial, XML, HierarchyID
Beyond Relational in SQL Server 2012
Address important customer requests for Capabilities and rich services for Rich Unstructured Data (RUDS)
Scale Up for storage and searchEasy use/access to Unstructured data from all applicationsRich insight into unstructured data to make better decisions
We deliver what you asked for to build Spatial-aware Applications
Advanced 2D SpatialMake Spatial pervasive across platformImprove performance and scale
Service Broker Message Broadcast
Rich Unstructured Data Performance and Scale
Scale Up for storage and search to 100m to 500m documentsMultiple containers for FileStream Scale Up Improved Scale Up for Search
Rich Unstructured Data & Services Ecosystem
Fulltext Search
Semantic Similarity Search
Rich
S
erv
ices
Database
Disk1
Disk2
Disk3
Multiple Containers
Sca
le-u
p
Solu
tions
Database Applications
Transactional Access
Blobs
DB FileStre
DB FileStreams
Integrated Backup/Replication/AlwaysO
n
Integrated AdministrationIntegrated Administration?
Windows Apps
SMB Share Files/Folders
FileStream API
Streaming Win32 AccessStreaming Win32 Access??
Customer Application
Azure lib Centera lib
SQL FILESTREAM lib
SQL RBS API
Azure Centera SQL DB
Remote BLOB Storage
FileStreamsFileTable
SQL Apps
FilestreamStorage Attribute on VARBINARY(MAX)
Works with integrated FTSUnstructured data stored directly in the file system (requires NTFS)Dual Programming Model
TSQL (Same as SQL BLOB)Win32 Streaming APIs with T-SQL transactional semantics
Data ConsistencyIntegrated Manageability
Back Up/RestoreAdministration
Size limit is the file system volume sizeSQL Server Security Stack
Store BLOBs in DB + File SystemApplication
BLOB
DB
FILETABLE Overview
FileTable: A Table of Files/Directories
User created Table with a fixed schema
contains FILESTREAM and File Attributes
Each row represents a File or a Directory
System defined constraints maintain the tree integrity
File/Directory hierarchy view through a Windows Share
Supports Win32 APIs for File/Directory Management
DB Storage is Transparent to Win32 applications
SMB level of application compatibility
Virtual network name (VNN) path support for transparent Win32 application failover
Private Docs(Database1)
Office Docs(Database2)
LogFiles (FileTable)
Documents(FileTable)
Media(FileTable)
MSSQLSERVER
\\my_machine\MSSQLSERVER\Office Docs\Documents
FILESTREAM Share
Database Directories
FileTable Directories
FileTable Folder Hierarchy
User-Defined Directory Structure
Some FileStream/FileTable performance tipsReading bigger buffers gives better performance
Volumes hosting FILESTREAM/FILETABLE data should have 8.3 name generation and LastAccessTime disabled
FILESTREAM/FILETABLE containers to reside on dedicated volumes
Have one volume per FILESTREAM/FILETABLE containerenables space management at volume level
“Magic” SMB buffer size = ~60KB Another “good” value is 480KB
ROWGUID unique index for aligned partitioning for FILESTREAM
AntiVirus programs should be configured not to delete infected files but to quarantine them
If using compressed volumes, use cluster size 4 KB
FILESTREAM Read Performance (Remote)
240 KB 480 KB 1 MB 2 MB 4 MB 8 MB0
100
200
300
400
500
600
700
800
900
Filestream Win32 (Filesystem) Ac-cess
Filestream T-SQL
Varbinary
Filesystem Win32 Access Gain (%)T
hro
ug
hp
ut
(Mb
ps
)
Measured with SQL Server 2008
FILESTREAM Write Performance (Remote)
240 KB 480 KB 1 MB 2 MB 4 MB 8 MB
-200
-100
0
100
200
300
400
500
600 Insert
Filestream Win32 (Filesys-tem) Access
Filestream T-SQL
Varbinary
Filesystem Win32 Access Gain (%)
Th
rou
gh
pu
t (M
bp
s)
Measured with SQL Server 2008
Unstructured Data Scale-upMultiple Containers for FILESTREAM data
SQL 2008 R2Only one storage container/FILESTREAM filegroup
Limits storage capacity scaling and I/O scaling
SQL Server 2012Support for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statements
Ability to set max_size for the containers
DBCC Shrinkfile Emptyfile support
Scaling FlexibilityStorage scaling by adding additional storage drives
I/O scaling with multiple spindles
Unstructured Data : Multiple containers
Use of multiple spindles for achieving better I/O Scalability
RUDS Scale-up: FileStream Perf/ScaleImproved performance of T-SQL and File I/O access
Various enhancements to improve read/write throughput 5 fold increase in Read throughput
Linear scaling with large number of concurrent threads
2012 2012
Full Text Search Improvements in SQL Server 2012Improved Performance and Scale:
Scale-up to 350M documents
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
At par or better than main database search competitors
New Functionality:Property Search
customizable NEAR
New Wordbrakers: update existing WB, add Czech and Greek
Innovation in Search: Semantic Similarity Search
Full Text Search Performance & Scale ImprovementsArchitectural Improvements
Improved internal implementation
Queries no longer block Index updates
Improved Query Plans: Better Plans for common queries
Fulltext predicate folding
Parallel Plan execution
Index and Query tested on scale up to 350Million documents with < ~2 Sec Response
~3X better w/o DML and ~9X better with DML throughput
Scale easily with increasing number of connections
Scale-up: Full-Text Search
Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
2012
2005/8
2005/8 vs 2012
Scale-up: Full-Text Search
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark
2012
2005/8
2005/8 vs 2012
Performance and Scale for Spatial ApplicationsSupport Persisted computed spatial columnsNew geodetic SRID for faster calculationsImproved implementation of operations
Faster Spatial index creation for point data (4 to 5 times faster)Faster point data queriesOptimized STBuffer, lower memory footprintFaster “secondary” filter step
Improved default spatial indexing scheme and new hintsAutoGridQuery Window Grid density hint
Spatial Index CompressionImproved index-aware query plans
Nearest NeighborOptimized spatial query plan for STDistance and STIntersects like queries
Support Persisted Computed Columns
Convert 2 columns (latitude, longitude) to geographyalter table MyTable
add geo as (geography::Point(lat, lon, 4326)) persisted
Spatial Reference ID (SRID)Each Spatial object has an SRID associatedSRID is “locale” for spatial objects
Determines Coordinate systemMeasurementsProjection semanticsGeoid dimensions
Only objects of same SRID can operationally be combinedSRID for GEOMETRY (default: 0)
User-defined, no impact on operational semantics
SRID for GEOGRAPHY (default: WGS 84)Impacts operational semantics390 predefined SRIDs based on European Petroleum Survey Group List:select * from sys.spatial_reference_systemsSQL Server 2012: We added Microsoft specified UnitSphere SRID 104001 for a spherical globe!
Spatial Indexing Basics
In general, split predicates in twoPrimary filter finds all candidates, possibly with false positives (but never false negatives)Secondary filter removes false positives
The index provides our primary filterOriginal predicate is our secondary filterSome tweaks to this scheme
Sometimes possible to skip secondary filter
A B
C
D A BD A BPrimary Filter (Index lookup)
Secondary Filter (Original predicate)E
Spatial index tessellation
Better and more continuous coverage
64 cells 128 cells 256 cells
Fully contained
cellsPartially contained
cells
Auto Grid Spatial Index
New spatial index Tessellations:
geometry_auto_gridgeography_auto_grid
Uses 8 Grid levels instead of the previous 4No GRIDS parameter needed (or available)
Fixed at HLLLLLLLdefault number of cells per object:
8 for geometry 12 for geography
More stable performance for windows of different sizefor data with different spatial density
For default values:Up to 2x faster for longer queries > 500 ms
More efficient primary filter Fewer rows returned
10ms slower for very fast queries < 50 ms
Increased tessellation time which is constant
Spatial Index Performance
New grid gives much stable performance for query windows of different sizeBetter grid coverage gives fewer high peaks
DEMOIndexing and Performance
Query window number of cells
Typical spatial query performanceOptimal value (theoretical) is
somewhere between two extremes
Time needed to process false
positives
Default values:512 - Geometry AUTO grid768 - Geography AUTO grid1024 - MANUAL grids
SELECT * FROM table t WITH (SPATIAL_WINDOW_MAX_CELLS=256)WHERE t.geom.STIntersects(@window)=1;
Query Window Hinting (SQL Server 2012)
• SELECT * FROM table t with(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects(@window)=1
• Used if an index is chosen (does not force an index)• Overwrites the default (512 for geometry, 768 for geography)• Rule of thumb:
• Higher value makes primary filter phase longer but reduces work in secondary filter phase
• Set higher for dense spatial data • Set lower for sparse spatial data
Query Hinting
demo
Spatial Index Compression
CREATE SPATIAL INDEX idxGeog ON table(geography column) USING GEOGRAPHY_GRID WITH ( DATA_COMPRESSION = page | row );
On the basis of internal tests, with compression- 40%-50% smaller
- 20% faster -15% slower queries- Per partition compression setting is not
supported.
Additional Query Processing Support
• Index intersection• Enables efficient mixing of spatial and non-spatial
predicates• Matching
• New in SQL Server 2012: Nearest Neighbor query• Distance queries: convert to STIntersects• Commutativity: a.STIntersects(b) = b.STIntersects(a)• Dual: a.STContains(b) = b.STWithin(a)• Multiple spatial indexes on the same column
• Various bounding boxes, granularities• Outer references as window objects
• Enables spatial join to use one index
Spatial Nearest Neighbor
Main scenarioGive me the closest 5 Italian restaurants
Execution plan SQL Server 2008/2008 R2: table scanSQL Server 2012: uses spatial index
Specific query pattern requiredSELECT TOP(5) *FROM Restaurants rWHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULLORDER BY r.pos.STDistance(@me)
Nearest Neighbor Performance in SQL Server 2012
demo
Nearest Neighbor Performance
NN query vs best current workaround (sort all points in 10km radius)
*Average time for NN query is ~236ms
Find the closest 50 business points to a specific location (out of 22 million in total)
Spatial Tips on index settingsSome best practice recommendations (YMMV):• Start out with new default tessellation• Point data: always use HIGH for all 4 level. CELL_PER_OBJECT
are not relevant in the case.• Simple, relatively consistent polygons: set all levels to LOW or
MEDIUM, MEDIUM, LOW, LOW • Very complex LineString or Polygon instances:
• High number of CELL_PER_OBJECT (often 8192 is best)• Setting all 4 levels to HIGH may be beneficial
• Polygons or line strings which have highly variable sizes: experimentation is needed.
• Rule of thumb for GEOGRAPHY: if MMMM is not working, try HHMM
What to do if my Spatial Query is slow?• Make sure you are running SQL Server 2008 SP1, 2008 R2 or
2012• Check query plan for use of index• Make sure it is a supported operation• Hint the index (and/or a different join type)• Do not use a spatial index when there is a highly selective non-
spatial predicate• Run above index support procedure:
• Assess effectiveness of primary filter (Primary_Filter_Efficiency)• Assess effectiveness of internal filter (Internal_Filter_Efficiency)• Redefine or define a new index with better characteristics
• More appropriate bounding box for GEOMETRY• Better grid densities
Related ContentSome Rich Unstructured Data Presentations (with further links):
http://www.slideshare.net/MichaelRys/sql-bits-brrudshttp://www.slideshare.net/MichaelRys/filetable-and-semantic-search-in-sql-server-2012 http://www.sqlserverlaunch.com/WW/theater?sid=634
Some Spatial Presentations (with further links):http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatialhttp://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial-indexing
Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1
Find Us Later At…On Twitter: @SQLServerMike, @Spatial_EdBlogs: http://sqlblog.com/blogs/michael_rys, http://blogs.msdn.com/b/edkatibah/