pattern matching and full text search in sql · 2012-03-25 · pattern matching and full text...

126
Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics, University of Zagreb Pavlinska 2, 42000 Varaˇ zdin [email protected] 17.02.2012. Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 1 / 35

Upload: others

Post on 21-Jan-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Pattern Matching and Full Text Search in SQL

Markus Schatten, PhD

Assistant ProfessorFaculty of Organization and Informatics,

University of ZagrebPavlinska 2, 42000 [email protected]

17.02.2012.

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 1 / 35

Page 2: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Outline

1 Introduction

2 Pattern Matching in PostgreSQL

3 POSIX Regular Expressions

4 Full Text Search

5 Statistics and aggregation

6 Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 2 / 35

Page 3: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Tools

PostgreSQL 9.1

LibreOffice.org BaseConnect with host=localhost dbname=wikileaks port=5432username ubuntupassword wikileaks

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 3 / 35

Page 4: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Tools

PostgreSQL 9.1LibreOffice.org Base

Connect with host=localhost dbname=wikileaks port=5432username ubuntupassword wikileaks

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 3 / 35

Page 5: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Why pattern matching in databases?

“Good” databases often do not need pattern matching

A curious case:field name “data”type varchar( 200 )structure: first 10 characters internal user ID (numeric), then user’s surnameended by a ’$’ sign, then user’s name again ended by a ’$’ sign, onecharacter indicator of user type ...

Databases with lots of textual information (web applications, documentmanagement systems, ...)...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 4 / 35

Page 6: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Why pattern matching in databases?

“Good” databases often do not need pattern matchingA curious case:

field name “data”type varchar( 200 )structure: first 10 characters internal user ID (numeric), then user’s surnameended by a ’$’ sign, then user’s name again ended by a ’$’ sign, onecharacter indicator of user type ...

Databases with lots of textual information (web applications, documentmanagement systems, ...)...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 4 / 35

Page 7: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Why pattern matching in databases?

“Good” databases often do not need pattern matchingA curious case:

field name “data”type varchar( 200 )structure: first 10 characters internal user ID (numeric), then user’s surnameended by a ’$’ sign, then user’s name again ended by a ’$’ sign, onecharacter indicator of user type ...

Databases with lots of textual information (web applications, documentmanagement systems, ...)

...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 4 / 35

Page 8: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Why pattern matching in databases?

“Good” databases often do not need pattern matchingA curious case:

field name “data”type varchar( 200 )structure: first 10 characters internal user ID (numeric), then user’s surnameended by a ’$’ sign, then user’s name again ended by a ’$’ sign, onecharacter indicator of user type ...

Databases with lots of textual information (web applications, documentmanagement systems, ...)...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 4 / 35

Page 9: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Outline

1 Introduction

2 Pattern Matching in PostgreSQL

3 POSIX Regular Expressions

4 Full Text Search

5 Statistics and aggregation

6 Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 5 / 35

Page 10: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The LIKE expression

Standard SQL pattern matching expression

Uses simple wildcard characters to match strings% matches any 0 or more characters

matches exactly one characterUsefull for keyword search and (very) simple patterns

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 6 / 35

Page 11: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The LIKE expression

Standard SQL pattern matching expressionUses simple wildcard characters to match strings

% matches any 0 or more charactersmatches exactly one character

Usefull for keyword search and (very) simple patterns

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 6 / 35

Page 12: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The LIKE expression

Standard SQL pattern matching expressionUses simple wildcard characters to match strings

% matches any 0 or more charactersmatches exactly one character

Usefull for keyword search and (very) simple patterns

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 6 / 35

Page 13: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT ’e-Discovery’ LIKE ’%e%very%’

SELECT ’e-Discovery’ LIKE ’%Disco%’

SELECT ’e-Discovery’ LIKE ’___is______’

SELECT ’e-Discovery’ NOT LIKE ’_e%’

SELECT ’e-Discovery’ NOT LIKE ’%asy%’

SELECT ’e-Discovery’ LIKE ’%over_’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 6 / 35

Page 14: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT *FROM cableWHERE origin LIKE ’%ZAGREB%’

SELECT *FROM cableWHERE content LIKE ’%TELEFONICA%’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 6 / 35

Page 15: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The ILIKE expression

PostgreSQL specific pattern matching expression (not standard!)

Same functionality as LIKE but case insensitive

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 7 / 35

Page 16: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The ILIKE expression

PostgreSQL specific pattern matching expression (not standard!)Same functionality as LIKE but case insensitive

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 7 / 35

Page 17: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT *FROM cableWHERE origin ILIKE ’%zagreb%’

SELECT *FROM cableWHERE content ILIKE ’%telefonica%’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 7 / 35

Page 18: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

LIKE and ILIKE

Shortcuts:~~ = LIKE!~~ = NOT LIKE~~* = ILIKE!~~* = NOT ILIKE

Advantage - very fastDissadvantage - very limited

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 8 / 35

Page 19: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

LIKE and ILIKE

Shortcuts:~~ = LIKE!~~ = NOT LIKE~~* = ILIKE!~~* = NOT ILIKE

Advantage - very fast

Dissadvantage - very limited

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 8 / 35

Page 20: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

LIKE and ILIKE

Shortcuts:~~ = LIKE!~~ = NOT LIKE~~* = ILIKE!~~* = NOT ILIKE

Advantage - very fastDissadvantage - very limited

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 8 / 35

Page 21: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The SIMILAR TO expression

“a curious cross between LIKE notation and common regular expressionnotation”

Uses SQL standard definition of regular expressionsLike the LIKE expression always matches the whole string

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 9 / 35

Page 22: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The SIMILAR TO expression

“a curious cross between LIKE notation and common regular expressionnotation”Uses SQL standard definition of regular expressions

Like the LIKE expression always matches the whole string

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 9 / 35

Page 23: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The SIMILAR TO expression

“a curious cross between LIKE notation and common regular expressionnotation”Uses SQL standard definition of regular expressionsLike the LIKE expression always matches the whole string

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 9 / 35

Page 24: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

SIMILAR TO metacharacters

% matches any 0 or more characters

matches exactly one character| denotes alternation (either of two alternatives)

* denotes repetition of the previous item zero or more times+ denotes repetition of the previous item one or more times

() parentheses can be used to group items into a single logicalitem

[...] a bracket expression specifies a character class

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 25: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

SIMILAR TO metacharacters

% matches any 0 or more charactersmatches exactly one character

| denotes alternation (either of two alternatives)

* denotes repetition of the previous item zero or more times+ denotes repetition of the previous item one or more times

() parentheses can be used to group items into a single logicalitem

[...] a bracket expression specifies a character class

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 26: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

SIMILAR TO metacharacters

% matches any 0 or more charactersmatches exactly one character

| denotes alternation (either of two alternatives)

* denotes repetition of the previous item zero or more times+ denotes repetition of the previous item one or more times

() parentheses can be used to group items into a single logicalitem

[...] a bracket expression specifies a character class

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 27: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

SIMILAR TO metacharacters

% matches any 0 or more charactersmatches exactly one character

| denotes alternation (either of two alternatives)

* denotes repetition of the previous item zero or more times

+ denotes repetition of the previous item one or more times() parentheses can be used to group items into a single logical

item[...] a bracket expression specifies a character class

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 28: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

SIMILAR TO metacharacters

% matches any 0 or more charactersmatches exactly one character

| denotes alternation (either of two alternatives)

* denotes repetition of the previous item zero or more times+ denotes repetition of the previous item one or more times

() parentheses can be used to group items into a single logicalitem

[...] a bracket expression specifies a character class

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 29: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

SIMILAR TO metacharacters

% matches any 0 or more charactersmatches exactly one character

| denotes alternation (either of two alternatives)

* denotes repetition of the previous item zero or more times+ denotes repetition of the previous item one or more times

() parentheses can be used to group items into a single logicalitem

[...] a bracket expression specifies a character class

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 30: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

SIMILAR TO metacharacters

% matches any 0 or more charactersmatches exactly one character

| denotes alternation (either of two alternatives)

* denotes repetition of the previous item zero or more times+ denotes repetition of the previous item one or more times

() parentheses can be used to group items into a single logicalitem

[...] a bracket expression specifies a character class

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 31: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT ’never’ SIMILAR TO ’never’

SELECT ’gonna’ NOT SIMILAR TO ’gon’

SELECT ’give’ SIMILAR TO ’%(i|w)%’

SELECT ’you up’ NOT SIMILAR TO ’(you_)|(up)’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 32: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT ’email: [email protected]’SIMILAR TO ’email: *%@foi.hr’

SELECT ’email: [email protected]’SIMILAR TO ’email: *%@foi.hr’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 33: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT ’foooooo’ SIMILAR TO ’f(o)*’

SELECT ’foooooo’ SIMILAR TO ’f(o)+’

SELECT ’f’ SIMILAR TO ’f(o)*’

SELECT ’f’ NOT SIMILAR TO ’f(o)+’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 10 / 35

Page 34: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Bracket expressions

[0-9] matches any digit (0-9)

[a-z] matches any lowercase character[A-Z] matches any uppercase character[.?#] matches either the ’.’, the ’?’ or the ’#’ character

[a-z3-6.,] matches a lowercase character, a digit from 3 to 6, the ’.’ or the’,’ character

...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 11 / 35

Page 35: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Bracket expressions

[0-9] matches any digit (0-9)[a-z] matches any lowercase character

[A-Z] matches any uppercase character[.?#] matches either the ’.’, the ’?’ or the ’#’ character

[a-z3-6.,] matches a lowercase character, a digit from 3 to 6, the ’.’ or the’,’ character

...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 11 / 35

Page 36: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Bracket expressions

[0-9] matches any digit (0-9)[a-z] matches any lowercase character[A-Z] matches any uppercase character

[.?#] matches either the ’.’, the ’?’ or the ’#’ character[a-z3-6.,] matches a lowercase character, a digit from 3 to 6, the ’.’ or the

’,’ character...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 11 / 35

Page 37: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Bracket expressions

[0-9] matches any digit (0-9)[a-z] matches any lowercase character[A-Z] matches any uppercase character[.?#] matches either the ’.’, the ’?’ or the ’#’ character

[a-z3-6.,] matches a lowercase character, a digit from 3 to 6, the ’.’ or the’,’ character

...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 11 / 35

Page 38: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Bracket expressions

[0-9] matches any digit (0-9)[a-z] matches any lowercase character[A-Z] matches any uppercase character[.?#] matches either the ’.’, the ’?’ or the ’#’ character

[a-z3-6.,] matches a lowercase character, a digit from 3 to 6, the ’.’ or the’,’ character

...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 11 / 35

Page 39: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Bracket expressions

[0-9] matches any digit (0-9)[a-z] matches any lowercase character[A-Z] matches any uppercase character[.?#] matches either the ’.’, the ’?’ or the ’#’ character

[a-z3-6.,] matches a lowercase character, a digit from 3 to 6, the ’.’ or the’,’ character

...

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 11 / 35

Page 40: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT ’27 October 2001’SIMILAR TO’[0-9]+ [A-Z][a-z]+ [1-2][0-9][0-9][0-9]’

SELECT *FROM cableWHERE contentSIMILAR TO’% [0-9]+ [A-Z][a-z]+ [1-2][0-9][0-9][0-9]%’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 12 / 35

Page 41: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Outline

1 Introduction

2 Pattern Matching in PostgreSQL

3 POSIX Regular Expressions

4 Full Text Search

5 Statistics and aggregation

6 Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 12 / 35

Page 42: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The syntax of POSIX regular expressions is defined with atoms,quantifiers, and constraints

As opposed to the previous cases, regex match any part of a string(constraints can be used to match the whole string)

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 13 / 35

Page 43: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

The syntax of POSIX regular expressions is defined with atoms,quantifiers, and constraintsAs opposed to the previous cases, regex match any part of a string(constraints can be used to match the whole string)

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 13 / 35

Page 44: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms

(re) (where re is any regular expression) matches a match for re,with the match noted for possible reporting

(?:re) as above, but the match is not noted for reporting (a”non-capturing” set of parentheses)

. matches any single character (similar to in LIKE)[chars] a bracket expression, matching any one of the chars

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 14 / 35

Page 45: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms

(re) (where re is any regular expression) matches a match for re,with the match noted for possible reporting

(?:re) as above, but the match is not noted for reporting (a”non-capturing” set of parentheses)

. matches any single character (similar to in LIKE)[chars] a bracket expression, matching any one of the chars

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 14 / 35

Page 46: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms

(re) (where re is any regular expression) matches a match for re,with the match noted for possible reporting

(?:re) as above, but the match is not noted for reporting (a”non-capturing” set of parentheses)

. matches any single character (similar to in LIKE)

[chars] a bracket expression, matching any one of the chars

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 14 / 35

Page 47: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms

(re) (where re is any regular expression) matches a match for re,with the match noted for possible reporting

(?:re) as above, but the match is not noted for reporting (a”non-capturing” set of parentheses)

. matches any single character (similar to in LIKE)[chars] a bracket expression, matching any one of the chars

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 14 / 35

Page 48: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms - cont.

\k (where k is a non-alphanumeric character) matches thatcharacter taken as an ordinary character, e.g., \\ matches abackslash character

\c where c is alphanumeric (possibly followed by other characters)is an escape

{ when followed by a character other than a digit, matches theleft-brace character {; when followed by a digit, it is thebeginning of a bound (explained later)

x where x is a single character with no other significance,matches that character

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 15 / 35

Page 49: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms - cont.

\k (where k is a non-alphanumeric character) matches thatcharacter taken as an ordinary character, e.g., \\ matches abackslash character

\c where c is alphanumeric (possibly followed by other characters)is an escape

{ when followed by a character other than a digit, matches theleft-brace character {; when followed by a digit, it is thebeginning of a bound (explained later)

x where x is a single character with no other significance,matches that character

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 15 / 35

Page 50: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms - cont.

\k (where k is a non-alphanumeric character) matches thatcharacter taken as an ordinary character, e.g., \\ matches abackslash character

\c where c is alphanumeric (possibly followed by other characters)is an escape

{ when followed by a character other than a digit, matches theleft-brace character {; when followed by a digit, it is thebeginning of a bound (explained later)

x where x is a single character with no other significance,matches that character

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 15 / 35

Page 51: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Atoms - cont.

\k (where k is a non-alphanumeric character) matches thatcharacter taken as an ordinary character, e.g., \\ matches abackslash character

\c where c is alphanumeric (possibly followed by other characters)is an escape

{ when followed by a character other than a digit, matches theleft-brace character {; when followed by a digit, it is thebeginning of a bound (explained later)

x where x is a single character with no other significance,matches that character

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 15 / 35

Page 52: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Quantifiers

* a sequence of 0 or more matches of the atom

+ a sequence of 1 or more matches of the atom? a sequence of 0 or 1 matches of the atom

{m} a sequence of exactly m matches of the atom{m,} a sequence of m or more matches of the atom

{m,n} a sequence of m through n (inclusive) matches of the atom; mcannot exceed n

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 16 / 35

Page 53: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Quantifiers

* a sequence of 0 or more matches of the atom+ a sequence of 1 or more matches of the atom

? a sequence of 0 or 1 matches of the atom{m} a sequence of exactly m matches of the atom

{m,} a sequence of m or more matches of the atom{m,n} a sequence of m through n (inclusive) matches of the atom; m

cannot exceed n

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 16 / 35

Page 54: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Quantifiers

* a sequence of 0 or more matches of the atom+ a sequence of 1 or more matches of the atom? a sequence of 0 or 1 matches of the atom

{m} a sequence of exactly m matches of the atom{m,} a sequence of m or more matches of the atom

{m,n} a sequence of m through n (inclusive) matches of the atom; mcannot exceed n

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 16 / 35

Page 55: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Quantifiers

* a sequence of 0 or more matches of the atom+ a sequence of 1 or more matches of the atom? a sequence of 0 or 1 matches of the atom

{m} a sequence of exactly m matches of the atom

{m,} a sequence of m or more matches of the atom{m,n} a sequence of m through n (inclusive) matches of the atom; m

cannot exceed n

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 16 / 35

Page 56: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Quantifiers

* a sequence of 0 or more matches of the atom+ a sequence of 1 or more matches of the atom? a sequence of 0 or 1 matches of the atom

{m} a sequence of exactly m matches of the atom{m,} a sequence of m or more matches of the atom

{m,n} a sequence of m through n (inclusive) matches of the atom; mcannot exceed n

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 16 / 35

Page 57: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Quantifiers

* a sequence of 0 or more matches of the atom+ a sequence of 1 or more matches of the atom? a sequence of 0 or 1 matches of the atom

{m} a sequence of exactly m matches of the atom{m,} a sequence of m or more matches of the atom

{m,n} a sequence of m through n (inclusive) matches of the atom; mcannot exceed n

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 16 / 35

Page 58: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Constraints

ˆ matches at the beginning of the string$ matches at the end of the string

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 17 / 35

Page 59: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

POSIX regular expressions in PostgreSQL

~ matches regular expression, case sensitive

~* matches regular expression, case insensitive!~ does not match regular expression, case sensitive!~* does not match regular expression, case insensitive

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 18 / 35

Page 60: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

POSIX regular expressions in PostgreSQL

~ matches regular expression, case sensitive~* matches regular expression, case insensitive

!~ does not match regular expression, case sensitive!~* does not match regular expression, case insensitive

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 18 / 35

Page 61: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

POSIX regular expressions in PostgreSQL

~ matches regular expression, case sensitive~* matches regular expression, case insensitive!~ does not match regular expression, case sensitive

!~* does not match regular expression, case insensitive

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 18 / 35

Page 62: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

POSIX regular expressions in PostgreSQL

~ matches regular expression, case sensitive~* matches regular expression, case insensitive!~ does not match regular expression, case sensitive!~* does not match regular expression, case insensitive

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 18 / 35

Page 63: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT’Never gonna let you down’ ˜* ’ˆnever’

SELECT’Never gonna run around and desert you’ !˜ ’ˆgonna’

SELECT’Never gonna make you cry’ ˜ ’cry$’

SELECT’Never gonna say goodbye’ !˜* ’SAY$’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 18 / 35

Page 64: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT ’27-9-2001’ ˜ ’[0-9]{1,2}\-[0-9]{1,2}\-[1-2][0-9]{3}’

SELECT *FROM cableWHEREcontent ˜ ’[0-9]{1,2}\-[0-9]{1,2}\-[1-2][0-9]{3}’� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 19 / 35

Page 65: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Auxilliary functions

substring(string from pattern) where pattern is a regularexpression extracts the matching substring

regexp replace(source, pattern, replacement) replaces thematching substring with replacement

regexp matches(string, pattern) returns all matching substringsregexp split to table(string, pattern) uses regular expression

match as delimeter and returns splitted tableregexp split to array(string, pattern) same as above, but

returns array

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 19 / 35

Page 66: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Auxilliary functions

substring(string from pattern) where pattern is a regularexpression extracts the matching substring

regexp replace(source, pattern, replacement) replaces thematching substring with replacement

regexp matches(string, pattern) returns all matching substringsregexp split to table(string, pattern) uses regular expression

match as delimeter and returns splitted tableregexp split to array(string, pattern) same as above, but

returns array

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 19 / 35

Page 67: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Auxilliary functions

substring(string from pattern) where pattern is a regularexpression extracts the matching substring

regexp replace(source, pattern, replacement) replaces thematching substring with replacement

regexp matches(string, pattern) returns all matching substrings

regexp split to table(string, pattern) uses regular expressionmatch as delimeter and returns splitted table

regexp split to array(string, pattern) same as above, butreturns array

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 19 / 35

Page 68: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Auxilliary functions

substring(string from pattern) where pattern is a regularexpression extracts the matching substring

regexp replace(source, pattern, replacement) replaces thematching substring with replacement

regexp matches(string, pattern) returns all matching substringsregexp split to table(string, pattern) uses regular expression

match as delimeter and returns splitted table

regexp split to array(string, pattern) same as above, butreturns array

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 19 / 35

Page 69: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Auxilliary functions

substring(string from pattern) where pattern is a regularexpression extracts the matching substring

regexp replace(source, pattern, replacement) replaces thematching substring with replacement

regexp matches(string, pattern) returns all matching substringsregexp split to table(string, pattern) uses regular expression

match as delimeter and returns splitted tableregexp split to array(string, pattern) same as above, but

returns array

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 19 / 35

Page 70: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT substring(’Never gonna tell a lie’ from ’gonna(.*)lie’)

SELECT regexp_replace( ’... and hurt you’, ’hurt’, ’rickroll’ )

SELECT regexp_matches( ’27 Sep 2001’, ’([0-9]{1,2})([A-Z][a-z]{2}) ([1-2][0-9]{3})’ )

SELECT regexp_split_to_table( ’[email protected]; [email protected];[email protected]’, ’\; ’ )� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 20 / 35

Page 71: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Outline

1 Introduction

2 Pattern Matching in PostgreSQL

3 POSIX Regular Expressions

4 Full Text Search

5 Statistics and aggregation

6 Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 20 / 35

Page 72: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search

Problems with pattern matching:

No linguistic support (for example verb forms: satisfies, satisfy ...)No ordering/ranking of search resultsRelatively slow due to lack of indexing support

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 21 / 35

Page 73: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search

Problems with pattern matching:No linguistic support (for example verb forms: satisfies, satisfy ...)

No ordering/ranking of search resultsRelatively slow due to lack of indexing support

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 21 / 35

Page 74: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search

Problems with pattern matching:No linguistic support (for example verb forms: satisfies, satisfy ...)No ordering/ranking of search results

Relatively slow due to lack of indexing support

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 21 / 35

Page 75: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search

Problems with pattern matching:No linguistic support (for example verb forms: satisfies, satisfy ...)No ordering/ranking of search resultsRelatively slow due to lack of indexing support

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 21 / 35

Page 76: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full text indexing

Full text indexing allows documents to be preprocessed and an indexsaved for later rapid searching:

Parsing documents into tokens - e.g., numbers, words, complex words,email addresses, so that they can be processed differentlyConverting tokens into lexemes - normalized strings so that differentforms of the same word are made alikeStoring preprocessed documents optimized for searching - eachdocument can be represented as a sorted array of normalized lexemespossibly with positional information to use for proximity ranking

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 22 / 35

Page 77: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full text indexing

Full text indexing allows documents to be preprocessed and an indexsaved for later rapid searching:

Parsing documents into tokens - e.g., numbers, words, complex words,email addresses, so that they can be processed differently

Converting tokens into lexemes - normalized strings so that differentforms of the same word are made alikeStoring preprocessed documents optimized for searching - eachdocument can be represented as a sorted array of normalized lexemespossibly with positional information to use for proximity ranking

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 22 / 35

Page 78: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full text indexing

Full text indexing allows documents to be preprocessed and an indexsaved for later rapid searching:

Parsing documents into tokens - e.g., numbers, words, complex words,email addresses, so that they can be processed differentlyConverting tokens into lexemes - normalized strings so that differentforms of the same word are made alike

Storing preprocessed documents optimized for searching - eachdocument can be represented as a sorted array of normalized lexemespossibly with positional information to use for proximity ranking

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 22 / 35

Page 79: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full text indexing

Full text indexing allows documents to be preprocessed and an indexsaved for later rapid searching:

Parsing documents into tokens - e.g., numbers, words, complex words,email addresses, so that they can be processed differentlyConverting tokens into lexemes - normalized strings so that differentforms of the same word are made alikeStoring preprocessed documents optimized for searching - eachdocument can be represented as a sorted array of normalized lexemespossibly with positional information to use for proximity ranking

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 22 / 35

Page 80: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search in PostgreSQL

PostgreSQL uses the @@ (match) operator and two additional datatypes:

tsvector (preprocessed) text documenttsquery (preprocessed) query

The @@ operator returns true if a tsvector matches a tsquery

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 23 / 35

Page 81: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search in PostgreSQL

PostgreSQL uses the @@ (match) operator and two additional datatypes:tsvector (preprocessed) text document

tsquery (preprocessed) queryThe @@ operator returns true if a tsvector matches a tsquery

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 23 / 35

Page 82: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search in PostgreSQL

PostgreSQL uses the @@ (match) operator and two additional datatypes:tsvector (preprocessed) text documenttsquery (preprocessed) query

The @@ operator returns true if a tsvector matches a tsquery

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 23 / 35

Page 83: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Full Text Search in PostgreSQL

PostgreSQL uses the @@ (match) operator and two additional datatypes:tsvector (preprocessed) text documenttsquery (preprocessed) query

The @@ operator returns true if a tsvector matches a tsquery

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 23 / 35

Page 84: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT to_tsvector(’a fat cat sat on a mat and ate a fat rat’)

@@ to_tsquery(’cat & rat’);?column?

----------t

SELECT to_tsquery(’fat & cow’) @@ to_tsvector(’a fat cat saton a mat and ate a fat rat’);

?column?----------f� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 23 / 35

Page 85: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT to_tsvector( ’Today is another nice rainy day in

Manchester!’ );to_tsvector

-------------------------------------------------------------’anoth’:3 ’day’:6 ’manchest’:8 ’nice’:4 ’raini’:5 ’today’:1

(1 row)� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 24 / 35

Page 86: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

tsquery Syntax

The tsquery datatype uses the following syntax:

& - denotes conjunction (logical AND)| - denotes disjunction (logical OR)! - denotes negation (logical NOT)

() - bracket are used to group expressions together

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 24 / 35

Page 87: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

tsquery Syntax

The tsquery datatype uses the following syntax:& - denotes conjunction (logical AND)

| - denotes disjunction (logical OR)! - denotes negation (logical NOT)

() - bracket are used to group expressions together

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 24 / 35

Page 88: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

tsquery Syntax

The tsquery datatype uses the following syntax:& - denotes conjunction (logical AND)| - denotes disjunction (logical OR)

! - denotes negation (logical NOT)() - bracket are used to group expressions together

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 24 / 35

Page 89: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

tsquery Syntax

The tsquery datatype uses the following syntax:& - denotes conjunction (logical AND)| - denotes disjunction (logical OR)! - denotes negation (logical NOT)

() - bracket are used to group expressions together

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 24 / 35

Page 90: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

tsquery Syntax

The tsquery datatype uses the following syntax:& - denotes conjunction (logical AND)| - denotes disjunction (logical OR)! - denotes negation (logical NOT)

() - bracket are used to group expressions together

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 24 / 35

Page 91: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECTto_tsvector( ’E-discovery discovers new knowledge in huge

haystacks of data’ )@@to_tsquery( ’new & ( ! huge | data )’ );� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 25 / 35

Page 92: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 25 / 35

Page 93: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECTto_tsvector( ’E-discovery discovers new knowledge in huge

haystacks of data’ )@@to_tsquery( ’new & ! ( huge | data )’ )� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 26 / 35

Page 94: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 26 / 35

Page 95: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT *FROMcable c,to_tsquery( ’siemens & ( iran | iraq ) & ! kuwait’ ) q

WHERE q @@ to_tsvector( content )� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 27 / 35

Page 96: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Ranking Search Results

PostgreSQL has currently implemented two ranking functions:

ts rank( vector, query ) - based on frequency of matching lexemsts rank cd( vector, query ) - based on cover density ranking

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 27 / 35

Page 97: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Ranking Search Results

PostgreSQL has currently implemented two ranking functions:ts rank( vector, query ) - based on frequency of matching lexems

ts rank cd( vector, query ) - based on cover density ranking

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 27 / 35

Page 98: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Ranking Search Results

PostgreSQL has currently implemented two ranking functions:ts rank( vector, query ) - based on frequency of matching lexemsts rank cd( vector, query ) - based on cover density ranking

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 27 / 35

Page 99: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT c.id, ts_rank_cd( p.to_tsvector, q ) AS rankFROMcable c,preprocessed p,to_tsquery( ’siemens & ( iran | iraq )’ ) q

WHEREc.id = p.idAND q @@ p.to_tsvector

ORDER BY rank DESCLIMIT 5

id | rank--------+-----------230290 | 0.160565184343 | 0.0525963241208 | 0.0509475232601 | 0.0353333201137 | 0.0302077

(5 rows)� �Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 28 / 35

Page 100: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 28 / 35

Page 101: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Outline

1 Introduction

2 Pattern Matching in PostgreSQL

3 POSIX Regular Expressions

4 Full Text Search

5 Statistics and aggregation

6 Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 29 / 35

Page 102: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Aggregate functions

avg finds the average of a numerical attribute

sum finds the sum of a numerical attributemin find the smalest value of an attributemax find the highest value of an attribute

count find the number of values of some attributevariance find the variance of some numerical attributestddev find the standard deviation of some numerical attribute

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 30 / 35

Page 103: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Aggregate functions

avg finds the average of a numerical attributesum finds the sum of a numerical attribute

min find the smalest value of an attributemax find the highest value of an attribute

count find the number of values of some attributevariance find the variance of some numerical attributestddev find the standard deviation of some numerical attribute

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 30 / 35

Page 104: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Aggregate functions

avg finds the average of a numerical attributesum finds the sum of a numerical attributemin find the smalest value of an attribute

max find the highest value of an attributecount find the number of values of some attribute

variance find the variance of some numerical attributestddev find the standard deviation of some numerical attribute

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 30 / 35

Page 105: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Aggregate functions

avg finds the average of a numerical attributesum finds the sum of a numerical attributemin find the smalest value of an attributemax find the highest value of an attribute

count find the number of values of some attributevariance find the variance of some numerical attributestddev find the standard deviation of some numerical attribute

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 30 / 35

Page 106: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Aggregate functions

avg finds the average of a numerical attributesum finds the sum of a numerical attributemin find the smalest value of an attributemax find the highest value of an attribute

count find the number of values of some attribute

variance find the variance of some numerical attributestddev find the standard deviation of some numerical attribute

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 30 / 35

Page 107: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Aggregate functions

avg finds the average of a numerical attributesum finds the sum of a numerical attributemin find the smalest value of an attributemax find the highest value of an attribute

count find the number of values of some attributevariance find the variance of some numerical attribute

stddev find the standard deviation of some numerical attribute

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 30 / 35

Page 108: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Aggregate functions

avg finds the average of a numerical attributesum finds the sum of a numerical attributemin find the smalest value of an attributemax find the highest value of an attribute

count find the number of values of some attributevariance find the variance of some numerical attributestddev find the standard deviation of some numerical attribute

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 30 / 35

Page 109: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT count(*)FROM cable

SELECT min(date), max(date)FROM cable� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 31 / 35

Page 110: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

GROUP BY and HAVING

Aggregate functions are usually used with a GROUP BY clause to groupaggregate values by some given attribute.

HAVING is used for constraints on aggragate expressions.

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 31 / 35

Page 111: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

GROUP BY and HAVING

Aggregate functions are usually used with a GROUP BY clause to groupaggregate values by some given attribute.

HAVING is used for constraints on aggragate expressions.

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 31 / 35

Page 112: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT date, count(*)FROM cableGROUP BY dateORDER BY 2 DESC� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 31 / 35

Page 113: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT origin, count(*)FROM cableGROUP BY 1HAVING count(*) > 100ORDER BY 2� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 32 / 35

Page 114: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Gathering Text Search Statistics

PostgreSQL provides you also with the ts stat( sql query )function which allows you to gather useful statistics

Note that ts stat returns a set of records of the form ( word, ndoc,nentry ):

word - the lexem (word)ndoc - the number of documents the lexem occurs in

nentry - the total number of occurences of the lexem

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 32 / 35

Page 115: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Gathering Text Search Statistics

PostgreSQL provides you also with the ts stat( sql query )function which allows you to gather useful statisticsNote that ts stat returns a set of records of the form ( word, ndoc,nentry ):

word - the lexem (word)ndoc - the number of documents the lexem occurs in

nentry - the total number of occurences of the lexem

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 32 / 35

Page 116: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Gathering Text Search Statistics

PostgreSQL provides you also with the ts stat( sql query )function which allows you to gather useful statisticsNote that ts stat returns a set of records of the form ( word, ndoc,nentry ):

word - the lexem (word)

ndoc - the number of documents the lexem occurs innentry - the total number of occurences of the lexem

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 32 / 35

Page 117: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Gathering Text Search Statistics

PostgreSQL provides you also with the ts stat( sql query )function which allows you to gather useful statisticsNote that ts stat returns a set of records of the form ( word, ndoc,nentry ):

word - the lexem (word)ndoc - the number of documents the lexem occurs in

nentry - the total number of occurences of the lexem

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 32 / 35

Page 118: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Gathering Text Search Statistics

PostgreSQL provides you also with the ts stat( sql query )function which allows you to gather useful statisticsNote that ts stat returns a set of records of the form ( word, ndoc,nentry ):

word - the lexem (word)ndoc - the number of documents the lexem occurs in

nentry - the total number of occurences of the lexem

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 32 / 35

Page 119: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

� �SELECT *FROM ts_stat( ’SELECT to_tsvector FROM preprocessed’ )ORDER BYnentry DESC,ndoc DESC,word

LIMIT 10;� �

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 33 / 35

Page 120: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Outline

1 Introduction

2 Pattern Matching in PostgreSQL

3 POSIX Regular Expressions

4 Full Text Search

5 Statistics and aggregation

6 Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 33 / 35

Page 121: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Mini assignment

Keyword/pattern definition

Define a set of keywords and data possibly found in cables which for you asan investigator would be of importance. What kind of (structured) data isimportant to an investigative process? Which kind of statistics about thesedata will indicate if they are interesting or not?

e.g. keywords: risk, plan, transaction, signature, intelligence etc.e.g. data: money amounts, telephone numbers, email addresses,snailmail addresses, domains, dates, account numbers, etc.

Try to create patterns and queries that will allow you to search for these data!

Database search application

Create a small application that will allow you to search the database andgather statistics. Think about your user friendlyness!

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 34 / 35

Page 122: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Mini assignment

Keyword/pattern definition

Define a set of keywords and data possibly found in cables which for you asan investigator would be of importance. What kind of (structured) data isimportant to an investigative process? Which kind of statistics about thesedata will indicate if they are interesting or not?

e.g. keywords: risk, plan, transaction, signature, intelligence etc.

e.g. data: money amounts, telephone numbers, email addresses,snailmail addresses, domains, dates, account numbers, etc.

Try to create patterns and queries that will allow you to search for these data!

Database search application

Create a small application that will allow you to search the database andgather statistics. Think about your user friendlyness!

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 34 / 35

Page 123: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Mini assignment

Keyword/pattern definition

Define a set of keywords and data possibly found in cables which for you asan investigator would be of importance. What kind of (structured) data isimportant to an investigative process? Which kind of statistics about thesedata will indicate if they are interesting or not?

e.g. keywords: risk, plan, transaction, signature, intelligence etc.e.g. data: money amounts, telephone numbers, email addresses,snailmail addresses, domains, dates, account numbers, etc.

Try to create patterns and queries that will allow you to search for these data!

Database search application

Create a small application that will allow you to search the database andgather statistics. Think about your user friendlyness!

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 34 / 35

Page 124: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Mini assignment

Keyword/pattern definition

Define a set of keywords and data possibly found in cables which for you asan investigator would be of importance. What kind of (structured) data isimportant to an investigative process? Which kind of statistics about thesedata will indicate if they are interesting or not?

e.g. keywords: risk, plan, transaction, signature, intelligence etc.e.g. data: money amounts, telephone numbers, email addresses,snailmail addresses, domains, dates, account numbers, etc.

Try to create patterns and queries that will allow you to search for these data!

Database search application

Create a small application that will allow you to search the database andgather statistics. Think about your user friendlyness!

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 34 / 35

Page 125: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Mini assignment

Keyword/pattern definition

Define a set of keywords and data possibly found in cables which for you asan investigator would be of importance. What kind of (structured) data isimportant to an investigative process? Which kind of statistics about thesedata will indicate if they are interesting or not?

e.g. keywords: risk, plan, transaction, signature, intelligence etc.e.g. data: money amounts, telephone numbers, email addresses,snailmail addresses, domains, dates, account numbers, etc.

Try to create patterns and queries that will allow you to search for these data!

Database search application

Create a small application that will allow you to search the database andgather statistics. Think about your user friendlyness!

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 34 / 35

Page 126: Pattern Matching and Full Text Search in SQL · 2012-03-25 · Pattern Matching and Full Text Search in SQL Markus Schatten, PhD Assistant Professor Faculty of Organization and Informatics,

Introduction Pattern Matching in PostgreSQL POSIX Regular Expressions Full Text Search Statistics and aggregation Mini assignment

Markus Schatten, PhD (FOI) Pattern Matching and Full Text Search in SQL 17.02.2012. 35 / 35