sql server - full text search

77
SQL Server 2008 for Developers UTS Short Course

Upload: peter-gfader

Post on 29-Nov-2014

1.061 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: SQL Server - Full text search

SQL Server 2008 for DevelopersUTS Short Course

Page 2: SQL Server - Full text search

Specializes in

C# and .NET (Java not anymore)

TestingAutomated tests

Agile, ScrumCertified Scrum Trainer

Technology aficionado • Silverlight• ASP.NET• Windows Forms

Peter Gfader

Page 3: SQL Server - Full text search

Course Timetable & Materials

http://www.ssw.com.au/ssw/Events/2010UTSSQL/

Resources

http://sharepoint.ssw.com.au/Training/UTSSQL/

Course Website

Page 4: SQL Server - Full text search

Course OverviewSession

Date Time Topic

1Tuesday03-08-2010

18:00 - 21:00

SQL Server 2008 Management Studio

2Tuesday10-08-2010

18:00 - 21:00

T-SQL Enhancements

3Tuesday17-08-2010

18:00 - 21:00

High Availability

4Tuesday24-08-2010

18:00 - 21:00

CLR Integration

5 Tuesday31-08-2010

18:00 - 21:00 Full-Text Search

Page 5: SQL Server - Full text search

.NET

.NET FX

CLR

What we did last weekCLR Integration

Page 6: SQL Server - Full text search

Stored Proc

Functions

Triggers

Bottom Line Use T-SQL for all data operations Use CLR assemblies for any complex calculations

and transformations

What we did last weekCLR Integration

Page 7: SQL Server - Full text search

Find all products that have a productnumber starting with BK

Find all products with "Road" in the name that are Silver

Find a list of products that have no review Find the list price ([listprice]) of all products in our shop What is the sum of the list price of all our products Find the product with the maximum and minimum

listprice Find a list of products with their discount sale (hint see

Sales.SalesOrderDetail) Find the sum of prices of the products in each

subcategory

Homework?

Page 8: SQL Server - Full text search

Session 5SQL Server Full-Text Searchusing Full-Text search in SQL Server 2008

Page 9: SQL Server - Full text search

What is Full text search

The old way 2005

The new way 2008

How to

Querying

Agenda

Page 10: SQL Server - Full text search

SELECT *FROM [Northwind].[dbo].[Employees]WHERE Notes LIKE '%grad%‘

What is Fulltext search

Page 11: SQL Server - Full text search

Allows searching for text/words in columns

Similar words Plural of words

Based on special index

Full-text index (Full text catalog)

SELECT *FROM [Northwind].[dbo].[Employees]WHERE FREETEXT(*,'grad‘)

What is REAL Fulltext search

Page 12: SQL Server - Full text search

Theory

Page 13: SQL Server - Full text search

Full-text index

Information about words and their location in columns

Used in full text queries

Full-text catalog

Group of full text indexes (Container)

Word breaker

Tokenizes text based on language

Full-Text Search Terminology 1/3

Page 14: SQL Server - Full text search

Token

Word identified by word breaker

Stemmer

Generate inflectional forms of a word (language specific)

Filter

Extract text from files stored in a varbinary(max) or image column

Population or Crawl

Creating and maintaining a full-text index.

Full-Text Search Terminology 2/3

Page 15: SQL Server - Full text search

Stopwords/Stoplists

not relevant word to search e.g. ‘and’, ‘a’, ‘is’ and ‘the’ in English

Accent insensitivity

cafè = cafe

Full-Text Search Terminology 3/3

Page 16: SQL Server - Full text search

Fulltext search – Under the hood

Page 17: SQL Server - Full text search

The old way! SQL 2005

Page 18: SQL Server - Full text search
Page 19: SQL Server - Full text search

The new way! SQL 2008

Page 20: SQL Server - Full text search
Page 21: SQL Server - Full text search

How toAdministration

Page 22: SQL Server - Full text search

Administering Full-Text Search

Full-text administration can be separated into three main tasks:

Creating/altering/dropping full-text catalogs

Creating/altering/dropping full-text indexes

Scheduling and maintaining index population.

Page 23: SQL Server - Full text search

Administering Full-Text Search

sp_fulltext_catalog sp_help_fulltext_catalogs_cursor

sp_fulltext_column sp_help_fulltext_columns

sp_fulltext_database sp_help_fulltext_columns_cursor

sp_fulltext_service sp_help_fulltext_tables

sp_fulltext_table sp_help_fulltext_tables_cursor

sp_help_fulltext_catalogs

Page 24: SQL Server - Full text search

Index vs. Full-text index

Full-text indexes Regular SQL Server indexes

Stored in the file system, but administered through the database.Stored under the control of the database in which they are defined

Stored under the control of the database in which they are defined

Only 1 full-text index allowed per table

Several regular indexes allowed per table

Addition of data to full-text indexes, called population, can be requested through either a schedule or a specific request, or can occur automatically with the addition of new data

Updated automatically when the data upon which they are based is inserted, updated, or deleted

Page 25: SQL Server - Full text search

Automatic update of index

Slows down database performance

Manually repopulate full text index

Time consuming

Asynchronous process in the background

Periods of low activity Index not up to date

Administering Full-Text Search

Page 26: SQL Server - Full text search

How toCreating a Full Text Catalog

SQL 2005 Only

SQL 2008 is smart

Page 27: SQL Server - Full text search

Click icon to add chart

SQL 2005

Page 28: SQL Server - Full text search

Creating a Full-Text Catalog (SQL 2005)

Syntax

CREATE FULLTEXT CATALOG catalog_name      

[ON FILEGROUP filegroup ]      [IN PATH 'rootpath']      

[WITH <catalog_option>]      

[AS DEFAULT]      

[AUTHORIZATION owner_name ] <catalog_option>::=      ACCENT_SENSITIVITY = {ON|OFF}

Example

USE AdventureWorks_FulllText

CREATE FULLTEXT CATALOG AdventureWorks_FullTextCatalog

ON FILEGROUP FullTextCatalog_FG WITH ACCENT_SENSITIVITY = ON AS DEFAULTAUTHORIZATION dbo

Page 29: SQL Server - Full text search

Creating a Full-Text CatalogStep by step

1. Create a directory on the operating system named C:\test

2. Launch SSMS, connect to your instance, and open a new query window

3. Add a new filegroup to the AdventureWorks_FulllText

USE MasterGOALTER DATABASE AdventureWorks_FulllText GOALTER DATABASE AdventureWorks_FulllText ADD FILE (NAME = N’

AdventureWorks_FulllText _data’, FILENAME=N’C:\TEST\ AdventureWorks_FulllText _data.ndf’, SIZE=2048KB, FILEGROTH=1024KB ) TO FILEGROUP [FTFG1]

GO

4. Create a full-text catalog on the FTFG1 filegroup by executing the following command:USE AdventureWorks_FulllText GOCREATE FULLTEXT CATALOG AWCatalog on FILEGROUP FTFG1 IN PATH ‘C:\TEST’ AS DEFAULT;GO

Page 30: SQL Server - Full text search

Click icon to add chart

SQL 2008

Page 31: SQL Server - Full text search

Click icon to add chart

SQL 2008

Page 32: SQL Server - Full text search

How toCreating Full Text Indexes

Page 33: SQL Server - Full text search
Page 34: SQL Server - Full text search
Page 35: SQL Server - Full text search
Page 36: SQL Server - Full text search
Page 37: SQL Server - Full text search
Page 38: SQL Server - Full text search
Page 39: SQL Server - Full text search
Page 40: SQL Server - Full text search
Page 41: SQL Server - Full text search
Page 42: SQL Server - Full text search

Property of column

Page 43: SQL Server - Full text search

Full-text Index property window

Page 44: SQL Server - Full text search
Page 45: SQL Server - Full text search

How to Index and Catalog Population

Page 46: SQL Server - Full text search

Because of the external structure for storing full-text indexes, changes to underlying data columns are not immediately reflected in the full-text index. Instead, a background process enlists the word breakers, filters and noise word filters to build the tokens for each column, which are then merged back into the main index either automatically or manually. This update process is called population or a crawl. To keep your full-text indexes up to date, you must periodically populate them.

Populating a Full-Text Index

Page 47: SQL Server - Full text search

You can choose from there modes for full-text population:

Full

Incremental

Update

Populating a Full-Text Index

Page 48: SQL Server - Full text search

Full

Read and process all rows Very resource-intensive

Incremental

Automatically populates the index for rows that were modified since the last population

Requires timestamp column

Update

Uses changes tracking from SQL Server (inserts, updates, and deletes) Specify how you want to propagate the changes to the index

• AUTO automatic processing• MANUAL implement a manual method for processing changes

Populating a Full-Text Index

Page 49: SQL Server - Full text search

Example

ALTER FULLTEXT INDEX ON Production.ProductDescription START FULL POPULATION;

ALTER FULLTEXT INDEX ON Production.Document START FULL POPULATION;

Populating a Full-Text Index

Page 50: SQL Server - Full text search

Syntax

ALTER FULLTEXT CATALOG catalog_name { REBUILD [ WITH ACCENT_SENSITIVITY = { ON | OFF } ] | REORGANIZE | AS DEFAULT }

REBUILD deletes and rebuild

ACCENT_SENSITIVITY change

REORGANIZE merges all changes

Performance Frees up disk and memory

Populating a Full-Text Catalog

Page 51: SQL Server - Full text search

Example

USE AdventureWorks_FulllText;

ALTER FULLTEXT CATALOG AdventureWorks_FullTextCatalog REBUILD WITH ACCENT_SENSITIVITY=OFF;

-- Check Accentsensitivity

SELECT FULLTEXTCATALOGPROPERTY('AdventureWorks_FullTextCatalog', 'accentsensitivity');

Populating a Full-Text Catalog

Page 52: SQL Server - Full text search
Page 53: SQL Server - Full text search
Page 54: SQL Server - Full text search

Managing Population Schedules

In SQL 2000, full text catalogs could only be populated on specified schedules

SQL 2005/2008 can track database changes and keep the catalog up to date, with a minor performance hit

Page 55: SQL Server - Full text search

Full-Text query keywords

FREETEXT

FREETEXTTABLE

CONTAINS

CONTAINSTABLE

Querying SQL Server Using Full-Text SearchHow toQuerying SQL Server Using Full-Text Search

Page 56: SQL Server - Full text search

FREETEXT

Fuzzy search (less precise )

Inflectional forms (Stemming) Related words (Thesaurus)

Page 57: SQL Server - Full text search

FREETEXT

Fuzzy search (less precise )

Inflectional forms (Stemming) Related words (Thesaurus)

SELECT ProductDescriptionID, Description

FROM Production.ProductDescription

WHERE [Description] LIKE N'%bike%';

SELECT ProductDescriptionID, Description

FROM Production.ProductDescription

WHERE FREETEXT(Description, N’bike’);

Page 58: SQL Server - Full text search

FREETEXTTABLE

+ rank column Value between 1 and 1,000 Relative number, how well the row matches the search criteria

SELECT

PD.ProductDescriptionID,

PD.Description,

KEYTBL.[KEY],

KEYTBL.RANK

FROM

Production.ProductDescription AS PD

INNER JOIN FREETEXTTABLE(Production.ProductDescription,

Description, N’bike’)

AS KEYTBL ON PD.ProductDescriptionID = KEYTBL.[KEY]

Page 59: SQL Server - Full text search

CONTAINS

• Lets you precise what fuzzy matching algorithm to use

SELECT ProductDescriptionID, Description FROM

Production.ProductDescription

WHERE CONTAINS(Description, N'bike');

SELECT ProductDescriptionID, Description FROM

Production.ProductDescription

WHERE CONTAINS(Description, N‘”bike*”'):

INFLECTIONAL Consider word stems in search“ride“ “riding", “riden", ..

THESAURUSReturn Synonyms"metal“ "gold", "aluminium"," steel", ..

Page 60: SQL Server - Full text search

SELECT ProductDescriptionID, Description FROM

Production.ProductDescription

WHERE CONTAINS(Description, N' FORMSOF (INFLECTIONAL, ride) ');

SELECT ProductDescriptionID, Description FROM

Production.ProductDescription

WHERE CONTAINS(Description, N' FORMSOF (THESAURUS, ride) ');

Word proximity NEAR ( ~ ) How near words are in the text/document

SELECT ProductDescriptionID, Description FROM

Production.ProductDescription

WHERE CONTAINS(Description, N'mountain NEAR bike');

SELECT ProductDescriptionID, Description FROM

Production.ProductDescription

WHERE CONTAINS(Description, N'mountain ~ bike');

SELECT ProductDescriptionID, Description FROM

Production.ProductDescription

WHERE CONTAINS(Description, 'ISABOUT (mountain weight(.8), bikes

weight (.2) )');

Page 61: SQL Server - Full text search

Querying SQL Server Using Full-Text Search

Page 62: SQL Server - Full text search

Full-text search much more powerful than LIKE More specific, relevant results Better performance

• LIKE for small amounts of text • Full-text search scales to huge documents

Provides ranking of results

Common uses Search through the content in a text-intensive,

database driven website, e.g. a knowledge base Search the contents of documents stored in BLOB

fields Perform advanced searches

• e.g. with exact phrases - "to be or not to be" (however needs care!)

• e.g. Boolean operators - AND, OR, NOT, NEAR

Page 63: SQL Server - Full text search

The power of FTS is in the expression which is passed to the CONTAINS or CONTAINSTABLE function

Several different types of terms:

Simple terms Prefix terms Generation terms Proximity terms Weighted terms

Writing FTS terms

Page 64: SQL Server - Full text search

Either words or phrases

Quotes are optional, but recommended

Matches columns which contain the exact words or phrases specified

Case insensitive

Punctuation is ignored

e.g.

CONTAINS(Column, 'SQL') CONTAINS(Column, ' "SQL" ') CONTAINS(Column, 'Microsoft SQL Server') CONTAINS(Column, ' "Microsoft SQL Server" ')

Simple terms

Page 65: SQL Server - Full text search

Matches words beginning with the specified text

e.g.

CONTAINS(Column, ' "local*" ')• matches local, locally, locality

CONTAINS(Column, ' "local wine*" ')• matches "local winery", "locally wined"

Prefix terms

Page 66: SQL Server - Full text search

Inflectional

FORMSOF(INFLECTIONAL, "expression") "drive“ "drove", "driven", .. (share the same stem) When vague words such as "best" are used, doesn't match the

exact word, only "good"

Thesaurus

FORMSOF(THESAURUS, "expression") "metal“ "gold", "aluminium"," steel", ..

Both return variants of the specified word, but variants are determined differently

Generation terms

Page 67: SQL Server - Full text search

Supposed to match synonyms of search terms – but the thesaurus seems to be very limited

Does not match plurals

Not particularly useful

http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506231

Thesaurus

Page 68: SQL Server - Full text search

Syntax

CONTAINS(Column, 'local NEAR winery')

CONTAINS(Column, ' "local" NEAR "winery" ')

Important for ranking

Both words must be in the column, like AND

Terms on either side of NEAR must be either simple or proximity terms

Proximity terms

Page 69: SQL Server - Full text search

Each word can be given a rank

Can be combined with simple, prefix, generation and proximity terms

e.g.

CONTAINS(Column, 'ISABOUT(performance weight(.8),comfortable weight(.4)

)') CONTAINS(Column, 'ISABOUT(

FORMSOF(INFLECTIONAL, "performance") weight (.8),FORMSOF(INFLECTIONAL, "comfortable") weight (.4)

)')

Weighted terms

Page 70: SQL Server - Full text search

ProContraPros?Cons?

Page 71: SQL Server - Full text search

Full text catalogs

Disk space Up-to-date Continuous updating performance hit

Queries

Complicated to generate Generated as a string Generated on the client

Disadvantages

Page 72: SQL Server - Full text search

Backing up full text catalogs

SQL 2005

Included in SQL backups by default Retained on detach and re-attach Option in detach dialog to include keep the full text

catalog

In SQL2008 you don’t have to worry about this

Advantages

Page 73: SQL Server - Full text search

Much more powerful than LIKE

Specific Ranking Performance

Pre-computed ranking (FREETEXTTABLE)

Configurable Population Schedule

Continuously track changes, or index when the CPU is idle

Advantages

Page 74: SQL Server - Full text search

Pluralcast - SQL Server Under the Covers

http://shrinkster.com/1ff4

Dotnetrocks - Search for SQL Server

http://www.dotnetrocks.com/archives.aspx

RunAsRadio - Search for SQL Server

http://www.runasradio.com/archives.aspx

Quick tips - Podcasts

Page 75: SQL Server - Full text search

Full text search

Download from Course Materials Site (to copy/paste scripts) or type manually:

http://sharepoint.ssw.com.au/Training/UTSSQL/

Session 5 Lab

Page 76: SQL Server - Full text search

3 things…

[email protected]

u

http://blog.gfader.com/

twitter.com/peitor

Page 77: SQL Server - Full text search

Thank You!

Gateway Court Suite 10 81 - 91 Military Road Neutral Bay, Sydney NSW 2089 AUSTRALIA

ABN: 21 069 371 900

Phone: + 61 2 9953 3000 Fax: + 61 2 9953 3105

[email protected]