? cql – a common query languagemike taylor [email protected] cql – a common query language 1....

42
? CQL – a Common Query Language Mike Taylor <[email protected] CQL – a Common Query Language 1.What CQL is 2.Motivation 3.Examples and explanation 4.Applications 5.Implementation

Upload: jeremiah-macpherson

Post on 28-Mar-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

?CQL – a Common Query Language Mike Taylor <[email protected]>

CQL – a Common Query Language

1.What CQL is

2.Motivation

3.Examples and explanation

4.Applications

5.Implementation

Page 2: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 1: What CQL is

● CQL is a query language:– For humans to type– For query forms to generate– For translating other languages into

Page 3: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 1: What CQL is

● CQL is a query language:– For humans to type– For query forms to generate– For translating other languages into

● The only query language of SRW/SRU

Page 4: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 1: What CQL is

● CQL is a query language:– For humans to type– For query forms to generate– For translating other languages into

● The only query language of SRW/SRU

● Also applicable in other contexts:– Z39.50 (instead of the Type-1 Query)– Vendor-neutral format for Metasearch

Page 5: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Specifications and implementations

● CQL is a specification for expressing queries abstractly.– you don't need to know the database schema.

Page 6: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Specifications and implementations

● CQL is a specification for expressing queries abstractly.– you don't need to know the database schema.

● It has to be parsed by a CQL parser.– parser produces a form easy to program with.

Page 7: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Specifications and implementations

● CQL is a specification for expressing queries abstractly.– you don't need to know the database schema.

● It has to be parsed by a CQL parser.– parser produces a form easy to program with.

● It has to be executed by some specific database engine.– implementations will vary in what they support.

Page 8: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 2: Motivation

Most query languages fall into one of two camps:

● Complex and powerful, but cryptic and hard to learn– SQL, Prefix Query Format (PQF), XML Query

Page 9: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 2: Motivation

Most query languages fall into one of two camps:

● Complex and powerful, but cryptic and hard to learn– SQL, Prefix Query Format (PQF), XML Query

● Easy to learn and use, but lacking in power– Google, AltaVista, CCL

Page 10: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 2: Motivation

Most query languages fall into one of two camps:

● Complex and powerful, but cryptic and hard to learn– SQL, Prefix Query Format (PQF), XML Query

● Easy to learn and use, but lacking in power– Google, AltaVista, CCL

CQL aims to “make simple queries easy, and complexqueries possible” (to paraphrase Larry Wall, of Perl)

Page 11: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Learning curves for query languages

Power of query that can be expressed

Eff

ort

in learn

ing q

uery

lan

guag

e

SQL

Page 12: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Learning curves for query languages

Power of query that can be expressed

Eff

ort

in learn

ing q

uery

lan

guag

e

SQL

Google

Page 13: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Learning curves for query languages

Power of query that can be expressed

Eff

ort

in learn

ing q

uery

lan

guag

e

SQL

Google

CQL

Page 14: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 3: Examples and explanation

Core concepts

● Simple terms● Quoting● Booleans● Parentheses● Pattern matching● Indexes● Prefixes● Context sets● Relations

Page 15: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 3: Examples and explanation

Core concepts

● Simple terms● Quoting● Booleans● Parentheses● Pattern matching● Indexes● Prefixes● Context sets● Relations

Esoteric concepts(Next session!)

● Word anchoring● Proximity● More on relations● Relation modifiers● Boolean modifiers● Profiles● Prefix mapping

Page 16: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: simple terms

Here are some perfectly good CQL queries:

● fish

● Churchill

● dinosaur

● comp.sources.misc

Page 17: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: quotingDouble-quote marks remove the special meanings ofspecial characters like space (which otherwise separatestokens) and of keywords such as “and” and “or”.

● "dinosaur"● "the complete dinosaur"● "ext–>u.generic"● "and"

Page 18: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: quotingDouble-quote marks remove the special meanings ofspecial characters like space (which otherwise separatestokens) and of keywords such as “and” and “or”.

● "dinosaur"● "the complete dinosaur"● "ext–>u.generic"● "and"

(Backslash removes the special meaning of followingdouble-quote characters.)

● "the \"nuxi\" problem"

Page 19: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: booleans

The keywords “and” and “or” are boolean operators.The keyword “not” is an and-not binary operator.There is no unary negation operator. Case is notsignificant, so “AND” and “aNd” also work.

● dinosaur or bird● dinosaur not reptile● dinosaur and bird and reptile● dinosaur and bird or dinobird● dinosaur not theropod not ornithischian

Page 20: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: boolean precedence

The “and”, “or” and “not” booleans all have equalprecedence and are evaluated left-to-right.

● dinosaur and bird or dinobirdMEANS

(dinosaur and bird) or dinobird

● dinosaur or bird and dinobirdMEANS

(dinosaur or bird) and dinobirdNOT

dinosaur or (bird and dinobird)

Page 21: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: parentheses

Parentheses may be used to override the defaultleft-to-right parsing of boolean operators.

● dinosaur and (bird or dinobird)● dinosaur or (bird and dinobird)● (bird or dinosaur) and (feathers or scales)● "feathered dinosaur" and (yixian or jehol)● (((a and b) or (c not d) not (e or f and g)) and h not i) or j

Page 22: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: pattern matchingThere are two pattern-matching characters:

* matches any number of characters? matches any single character

● dinosaur* – matches “dinosaurs”, “dinosauria”● *sauria – matches “dinosauria”, “carnosauria”● man?raptor – matches “maniraptor”, “manuraptor”● man?raptor* – matches the plurals of these● "comp* *saur" – matches “complete dinosaur”

Page 23: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: pattern matchingThere are two pattern-matching characters:

* matches any number of characters? matches any single character

● dinosaur* – matches “dinosaurs”, “dinosauria”● *sauria – matches “dinosauria”, “carnosauria”● man?raptor – matches “maniraptor”, “manuraptor”● man?raptor* – matches the plurals of these● "comp* *saur" – matches “complete dinosaur”

A preceding backslash removes their special meaning.

● char\* – matches literal “char*”

Page 24: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: indexes

A term of the form name=value is a query for the specifiedvalue occurring within the named index.

Page 25: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: indexes

A term of the form name=value is a query for the specifiedvalue occurring within the named index.

● title=Churchill – finds biographies of Churchill● author=Churchill – finds books written by him● title=dinosaur and author=farlow● title=(dinosaur and bird)● subject=(dinosaur* or pterosaur*)

Index names are case-insensitive, so “title” is the sameindex as “TITLE”, “Title” or “tiTLe”.

Page 26: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: prefixes

The meaning of an index can be specified more fullyby a prefix indicating what context set it is from. Themeaning of “title” is different in cross-domain searching(Dublin Core), bibliographic searching (Bath Profile)and heraldry.

Page 27: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: prefixes

The meaning of an index can be specified more fullyby a prefix indicating what context set it is from. Themeaning of “title” is different in cross-domain searching(Dublin Core), bibliographic searching (Bath Profile)and heraldry.

● dc.title="the complete dinosaur"● property.title=freehold● heraldry.title=(viscount or duke)● cql.serverChoice=fruit● cql.resultSet=YXJjaGJpc2hvcAp

Prefixes are case-insensitive.

Page 28: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: context sets

A context set is a set of indexes that are related to aparticular area (plus some other more esoteric stuff thatyou can ignore).

For example, the Dublin Core context set containsindexes for searching against the fifteen DC elements:

title, creator, subject, description, publisher,contributor, date, type, format, identifier,source, language, relation, coverage, rights.

The context set prose must define their semantics.

Page 29: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: some context sets

A few core sets created by the SRW editorial board:

● CQL – for core indexes such as resultSetId● DC – for metadata searching with Dublin Core● Rec – metadata about the record, not the resource● Net – network concepts such as host-name and port

Page 30: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: some context sets

A few core sets created by the SRW editorial board:

● CQL – for core indexes such as resultSetId● DC – for metadata searching with Dublin Core● Rec – metadata about the record, not the resource● Net – network concepts such as host-name and port

Also, many application-specific sets:

● Bath, Zthes, CCG, Music● Rel – deep voodoo for relevance matching● GILS and GEO are in development

Page 31: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

A digression on the CQL context set

The CQL context set is special. It contains some “magic”indexes:

Page 32: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

A digression on the CQL context set

The CQL context set is special. It contains some “magic”indexes:

● cql.anywhere – searches in all the indexes available

● cql.serverChoice – allows the server to choose whateverindex or indexes are suitable

● cql.resultSetId – finds the records obtained in a previoussearch, e.g. for refinement by combining with otherquery terms.

Page 33: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: relations

Usually “=” connects an index with its relation, but all theother obvious numeric relations are supported:

● Height = 13● numberOfWheels <= 3● numberOfPlates = 18● lengthOfFemur > 2.4● BioMass >= 100● NumberOfToes <> 3 (inequality)

Page 34: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: special relations

The keywords “any” and “all” can be used as relations,indicating that any one of, or all of, the words specifiedin the term must be found in the index:

Page 35: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: special relations

The keywords “any” and “all” can be used as relations,indicating that any one of, or all of, the words specifiedin the term must be found in the index:

● author all "kernighan ritchie"– shorthand for

author=kernighan and author=ritchie

Page 36: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: special relations

The keywords “any” and “all” can be used as relations,indicating that any one of, or all of, the words specifiedin the term must be found in the index:

● author all "kernighan ritchie"– shorthand for

author=kernighan and author=ritchie

● author any "kernighan ritchie thompson"– shorthand for

author=kernighan or author=ritchie orauthor=thompson

Page 37: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: whole-field searching

The keywords “exact” can be used as a relation, indicatinga search for the value of a whole field rather than wordswithin it:

Page 38: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: whole-field searching

The keywords “exact” can be used as a relation, indicatinga search for the value of a whole field rather than wordswithin it:

● title=jaws– finds Jaws and The Jaws of Fate.

● title exact jaws– finds Jaws but NOT The Jaws of Fate.

Page 39: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

CQL features: whole-field searching

The keywords “exact” can be used as a relation, indicatinga search for the value of a whole field rather than wordswithin it:

● title=jaws– finds Jaws and The Jaws of Fate.

● title exact jaws– finds Jaws but NOT The Jaws of Fate.

● title exact "The Jaws of Fate"– finds The Jaws of Fate but NOT Jaws.

Page 40: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 4: Applications

CQL has been deployed in many kinds of application:

● Google-like structureless searching● Simple metadata searching with the Dublin Core● Bath Profile for bibliographic data● Zthes profile for hierarchical thesaurus navigation● CCG for collectable card games● Music – musicalKey, arranger, duration, etc.● GILS (Global Information Locator Service)● ... your application goes here!

Page 41: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Chapter 5: Implementations

There are good-quality free CQL implementationsin several important languages:

● Java (Mike Taylor's CQL-Java package)● C/C++ (Adam Dickmeiss in Index Data's YAZ)● Python (Rob Sanderson in Cheshire)● Perl (Ed Summers' CQL::Parser module)● Visual Basic is in development (Thomas Habing)● ... your language goes here!

Page 42: ? CQL – a Common Query LanguageMike Taylor mike@indexdata.com CQL – a Common Query Language 1. What CQL is 2. Motivation 3. Examples and explanation 4

CQL – a Common Query Language Mike Taylor <[email protected]>

Conclusion: What to take home

● CQL makes easy queries easy and hard ones possible● You can use it well without learning the hard bits● It is used in SRW/SRU but also applicable elsewhere● It is extensible through context sets● Existing context sets support lots of applications● There are free implementations in several languages● Tutorial on-line at:

http://zing.z3950.org/cql/intro.html